Suppose I want to create a robots.txt file for my website and I want all my webpages like galleries, forums, blogs etc to be crawled....
Can anyone help me with this - I mean is this two-line code enough:
User-agent: *
Disallow: /cgi-bin/
Page 1 of 1
robots.txt
#2
Posted 12 February 2012 - 07:19 PM
soumikbhat, on 07 February 2012 - 04:30 AM, said:
Suppose I want to create a robots.txt file for my website and I want all my webpages like galleries, forums, blogs etc to be crawled....
Can anyone help me with this - I mean is this two-line code enough:
User-agent: *
Disallow: /cgi-bin/
Can anyone help me with this - I mean is this two-line code enough:
User-agent: *
Disallow: /cgi-bin/
Well It depends
If you only want /cgi-bin/ to not be crawled
All other directories will be crawled except /cgi-bin/
if you want others not to be crawled just put them under /cgi-bin/
E.G.
Noe: This is an example - This excludes the three directories that are listed in there, hence the bots cannot crawl these directories unless removed.
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
#3
Posted 24 February 2012 - 12:23 PM
you can generate one of they in the google webmaster tools website
#4
Posted 24 February 2012 - 02:57 PM
if you want everything to be crawled, don't worry about robots.txt, robots.txt is used to dis-allow them to crawl certain pages
#5
Posted 28 February 2012 - 09:49 PM
i want to ask, whether this word should be removed :
it is at the top, can you help me
# HOW TO USE THIS FILE: # 1) Edit this file to change "/forum/" to the correct relative path from your base URL, for example if your forum was at "domain.com/sites/community", then you'd use "/sites/community/" # 2) Rename the file to 'robots.txt' and move it to your web root (public_html, www, or htdocs) # 3) Edit the file to remove this comment (anything above the dashed line, including the dashed line # # NOTES: # Even though wild cards and pattern matching are not part of the robots.txt specification, many search bots understand and make use of them #------------------------ REMOVE THIS LINE AND EVERYTHING ABOVE SO THAT User-agent: * IS THE FIRST LINE ------------------------------------------
it is at the top, can you help me
#6
Posted 29 February 2012 - 04:49 AM
yes you can remove it, it is just a comment to let you know what it is and how to use it
#8
Posted 23 July 2012 - 06:25 AM
Robots.txt file usage is sometimes ignored. On the other hand, it is an important factor for the webpages being indexed properly and very easy to setup.Robots.txt is a file that is used to exclude content from the crawling process of search engine spiders / bots. Robots.txt is also called the Robots Exclusion Protocol.
we prefer that our webpages are indexed by the search engines. But there may be some content that we don’t want to be crawled & indexed. Like the personal images folder, website administration folder, customer’s test folder of a web developer, no search value folders like cgi-bin, and many more. The main idea is we don’t want them to be indexed.
User-agent: this parameter defines, for which bots the next parameters will be valid. * is a wildcard which means all bots or Googlebot for Google.
Disallow: defines which folders or files will be excluded. None means nothing will be excluded, / means everything will be excluded or /folder name/ or /filename can be used to specify the values to excluded. Folder name between slashes like /folder name/ means that only folder name/default.html will be excluded. Using 1 slash like /folder name means all content inside the folder name folder will be excluded.
shopping cart services in india
we prefer that our webpages are indexed by the search engines. But there may be some content that we don’t want to be crawled & indexed. Like the personal images folder, website administration folder, customer’s test folder of a web developer, no search value folders like cgi-bin, and many more. The main idea is we don’t want them to be indexed.
User-agent: this parameter defines, for which bots the next parameters will be valid. * is a wildcard which means all bots or Googlebot for Google.
Disallow: defines which folders or files will be excluded. None means nothing will be excluded, / means everything will be excluded or /folder name/ or /filename can be used to specify the values to excluded. Folder name between slashes like /folder name/ means that only folder name/default.html will be excluded. Using 1 slash like /folder name means all content inside the folder name folder will be excluded.
shopping cart services in india
#9
Posted 14 August 2012 - 07:00 AM
You can also generate this code in the Google Webmaster tool. So register yourself to Google Webmaster tool..
#10
Posted 26 December 2012 - 06:52 AM
You may add this code in robots.txt file:-
user-agent: *
Allow: /galleries/
Allow:/forums/
Allow:/blogs/
As this code indicates crawler to crawl these web-page & get index by the google server.
user-agent: *
Allow: /galleries/
Allow:/forums/
Allow:/blogs/
As this code indicates crawler to crawl these web-page & get index by the google server.
Share this topic:
Page 1 of 1

Sign In »
Register Now!
Help
Twitter
Flickr


Back to top














