SixServe - Free No Ads cPanel Web Hosting: robots.txt - SixServe - Free No Ads cPanel Web Hosting

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

robots.txt

#1
User is offline   soumikbhat 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 244
  • Joined: 25-December 11
Suppose I want to create a robots.txt file for my website and I want all my webpages like galleries, forums, blogs etc to be crawled....

Can anyone help me with this - I mean is this two-line code enough:

User-agent: *
Disallow: /cgi-bin/
0

#2
User is offline   Christian (Geonode) 

  • Advanced Member
  • Group: Staff
  • Posts: 146
  • Joined: 19-May 11
  • LocationJacksonville, FL

View Postsoumikbhat, on 07 February 2012 - 04:30 AM, said:

Suppose I want to create a robots.txt file for my website and I want all my webpages like galleries, forums, blogs etc to be crawled....

Can anyone help me with this - I mean is this two-line code enough:

User-agent: *
Disallow: /cgi-bin/


Well It depends

If you only want /cgi-bin/ to not be crawled

All other directories will be crawled except /cgi-bin/
if you want others not to be crawled just put them under /cgi-bin/

E.G.
Noe: This is an example - This excludes the three directories that are listed in there, hence the bots cannot crawl these directories unless removed.

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

My Anime Site- Check it Out


My Gaming site - Here
0

#3
User is offline   Dinhostork 

  • Dinho Stork
  • PipPipPip
  • Group: Members
  • Posts: 212
  • Joined: 01-December 11
  • LocationBrazil,
you can generate one of they in the google webmaster tools website :rolleyes:
Posted Image
0

#4
User is offline   JoshuaDeacon 

  • SixServe Staff
  • Group: Staff
  • Posts: 124
  • Joined: 31-December 11
  • LocationNorth Carolina
if you want everything to be crawled, don't worry about robots.txt, robots.txt is used to dis-allow them to crawl certain pages
0

#5
User is offline   SixPriests 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 142
  • Joined: 20-December 11
  • LocationJakarta City
i want to ask, whether this word should be removed :

# HOW TO USE THIS FILE:
# 1) Edit this file to change "/forum/" to the correct relative path from your base URL, for example if your forum was at "domain.com/sites/community", then you'd use "/sites/community/"
# 2) Rename the file to 'robots.txt' and move it to your web root (public_html, www, or htdocs)
# 3) Edit the file to remove this comment (anything above the dashed line, including the dashed line
#
# NOTES:
# Even though wild cards and pattern matching are not part of the robots.txt specification, many search bots understand and make use of them
#------------------------ REMOVE THIS LINE AND EVERYTHING ABOVE SO THAT User-agent: * IS THE FIRST LINE ------------------------------------------


it is at the top, can you help me :(
Posted Image

"we are born with brains in our skull
so no matter how poor wherever we are, we are still rich
because there will be no one can steal our brains, our minds and our ideas
and what do you think in your brain is much more valuable than gold and jewels."
0

#6
User is offline   JoshuaDeacon 

  • SixServe Staff
  • Group: Staff
  • Posts: 124
  • Joined: 31-December 11
  • LocationNorth Carolina
yes you can remove it, it is just a comment to let you know what it is and how to use it
0

#7
User is offline   soumikbhat 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 244
  • Joined: 25-December 11
thanks...
0

#8
User is offline   petersmith 

  • Member
  • PipPip
  • Group: Members
  • Posts: 10
  • Joined: 23-July 12
Robots.txt file usage is sometimes ignored. On the other hand, it is an important factor for the webpages being indexed properly and very easy to setup.Robots.txt is a file that is used to exclude content from the crawling process of search engine spiders / bots. Robots.txt is also called the Robots Exclusion Protocol.
we prefer that our webpages are indexed by the search engines. But there may be some content that we don’t want to be crawled & indexed. Like the personal images folder, website administration folder, customer’s test folder of a web developer, no search value folders like cgi-bin, and many more. The main idea is we don’t want them to be indexed.

User-agent: this parameter defines, for which bots the next parameters will be valid. * is a wildcard which means all bots or Googlebot for Google.
Disallow: defines which folders or files will be excluded. None means nothing will be excluded, / means everything will be excluded or /folder name/ or /filename can be used to specify the values to excluded. Folder name between slashes like /folder name/ means that only folder name/default.html will be excluded. Using 1 slash like /folder name means all content inside the folder name folder will be excluded.


shopping cart services in india
0

#9
User is offline   Raman Mahajan 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 19-May 12
You can also generate this code in the Google Webmaster tool. So register yourself to Google Webmaster tool..
0

#10
User is offline   sanath123 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 3
  • Joined: 26-December 12
You may add this code in robots.txt file:-

user-agent: *
Allow: /galleries/
Allow:/forums/
Allow:/blogs/

As this code indicates crawler to crawl these web-page & get index by the google server.
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users

Squidix Web Hosting