While we always want search engines to crawl and index our webpages sometimes the requirements are completely opposite. I came across this problem when I first set up the development environment for Studytonight.
Well, generally the test environment for web application is a sub-domain of the main domain name, for example if your main website is available on www.example.com and your test or development environment is on test.example.com then if you will not inform the search engines that this is a test environment, they will crawl and index it and your test environment might start appearing in search results on search engines like Google.
And who knows, over a period of time, your test environment's URL gain more authority than your main sub-domain which is something you would never want.
Because this is your test environment you would never want users to reach it and use it.
There are many "maybe" solutions for this and some "for sure" solutions. "maybe" solutions may or may not fully work but the "for sure" solution will definitely prevent search engines from crawling/indexing your website.
You can also inform a search engine not to index your website's webpages by sending the following response headers in your HTTP response:
Header set X-Robots-Tag "noindex, nofollow"
This will be a server level change which you can do by adding the above line in your apache server's (or whichever server you are using) configuration file or in case of Apache web server we can also specify this in the .htaccess file.
We can also password protect access to any directory or complete website using the .htaccess file. Here is the complete step by step guide for it: Using .htacess file to Password Protect any Web page or directory.
If you are using cPanel on your hosting then it's super simple to password protect your website which will completely hide your test environment from internet traffic and from search engines. Here is the complete step by step guide for it: How to Password Protect any Web Page or Directory on cPanel
The robots.txt file can be used to inform search engines to not crawl certain webpages, any particular directory or the complete website.
The robots.txt file should be kept in the /root directory, for example, if your domain name is example.com, then the robots.txt file should be available at example.com/robots.txt
To disallow all the search engines from crawling your website, create a robots.txt file and add the following code to it:
User-agent: * Disallow: /
Now, I am categorizing this as a "maybe" solution because this will prevent crawling of webpages but it might not stop the search engine from indexing your webpages.
We can add a "nofollow" meta tag to all our webpages, yes this must be added in the head section of all the webpages which we do not want to be indexed.
<meta name="robots" content="nofollow" />
Although it's difficult to add a meta tag to all the webpages of your development environment especially when you have a lot of webpages this is a better solution as compared to the robots.txt file. Whenever Google or any other search engine finds this tag on a webpage it will not index that particular webpage.
You may use any of the above techniques to hide your testing or development environment from search engines but we would highly recommend following the "for sure" ones as the other options might not work and you may not realize until its too late.
Also, ignoring hiding your testing environment from search engines can prove harmful for your main website as your very own testing website can give competition to your main website in google or other search engines's search results.