Protect your website using Robots.txt or Nofollow Metas
A robots.txt file allows permission or denied to crawl and index the website pages, images and other data to search engines robots to their database. Robots.txt file protect your private data, information and files from search engine bots.
Google Official blog posts described it as "Controlling how search engines access and index your website"
"The key is a simple file called robots.txt that has been an industry standard for many years. It lets a site owner control how search engines access their web site. With robots.txt you can control access at multiple levels -- the entire site, through individual directories, pages of a specific type, down to individual pages. Effective use of robots.txt gives you a lot of control over how your site is searched, but its not always obvious how to achieve exactly what you want. This is the first of a series of posts on how to use robots.txt to control access to your content."
There may be a question in your mind that Is really my site need a robot file? If your site contains any private data or private pages like some news pages accessing allowed for registered users only. Then your website needs a robots.txt file so that search engine robots may not access your pages and index them in search engine database.
If you have decided to create an robot.txt file. Then you can do it your self by manually you need not any tool and software to create this file. First of all, Look at your all pages, folders, images, data (like other formats of files) of website.
See and point what pages need to protect from search engine bot.
After deciding the pages, folders, files, datas need to be protect define these into simple format like:
----------------------------------------
User-Agent: *
Disallow: /filename.html
Disallow: /foldername/
Disallow: /foldername/subfoldername/
Disallow: /foldername/filename.html
Disallow: /*.gif$
----------------------------------------
User-Agent: the robot the following rule applies to
Disallow: the pages you want to block
Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images)
A single line should contain a single path. You can include as many lines as you want. Write these syntax according to your files and folders names into a notepad txt file and save as it robots.txt. After completion this step you need to upload this file at root of the domain. For example your website domain is http://www.yourwebsitename.com/. The valid path for robots file will be http://www.yourwebsitename.com/robots.txt.
If you want to block the entire site, you use a single disallow line as
----------------------------------------
User-Agent: *
Disallow: /
----------------------------------------
URLs are case-sensitive. For example, Disallow: /private_file.html would block http://www.yourwebsitename.com/private_file.html, but would allow http://www.yourwebsitename.com/Private_File.html.
If you want to disallow to a particular search engine, then you can define the search engine bot name like googlebot.
----------------------------------------
User-Agent: Googlebot
Disallow: /
----------------------------------------
Generally google uses many user-agents like Googlebot, Googlebot-Mobile, Gogolebot-Image and Mediapartners-google. You can define any one the bot or all the bot in the robots.txt file. You can allow also your whole website for a particular search engine bots. Simply you will have to write syntax like
----------------------------------------
User-Agent: Googlebot
disallow:
----------------------------------------
You can use both permissions disallow and allow within a single file. You need not to create another file for both permission. Never create another robots file with another or same name otherwise search engine will confuse. You can block accessing to all sub directories that Begin with some same characters. Use (*) asterisk to match a sequence of characters. For example:
----------------------------------------
User-Agent: Googlebot
Disallow: /private*/
----------------------------------------
If you do not want to use a robots file, you can use no follow meta for protecting your website pages and data. To do this, simply add a META tag into the html file within , so it starts something like:
<meta name="robots" content="noindex, nofollow">
This stops search engine bots from follow and indexing this file. META tags are particularly useful if you have permission to edit the individual files. But robots.txt controls on all over site.basis. You can find out more about robots.txt at http://www.robotstxt.org.
0 comments:
Post a Comment