A robots.txt file is a simple set of instructions for search engine bots that visit your website. It tells them which pages they are allowed to look at and which ones they should ignore.
As a key part of technical Search Engine Optimisation (SEO), this small file helps you manage how search engines crawl your site.
For any business in Singapore aiming to improve its online visibility, understanding robots.txt is a fundamental first step. This guide will explain everything a beginner needs to know, in simple terms.
Understanding the Role of a Robots.txt File
Before diving into the technical details, let’s first get a clear picture of what a robots.txt file is and the job it does for your website. Think of it as a friendly gatekeeper for search engine crawlers.
What Exactly is a Robots.txt File?
A robots.txt file is a plain text file that sits in the main folder of your website. Its name is always robots.txt, all in lowercase. It is part of a wider standard known as the Robots Exclusion Protocol, which is a set of rules used to communicate with web crawlers, also known as bots or spiders. These bots, like Googlebot, are sent by search engines to discover and read the pages on your site. The robots.txt file provides them with directions, guiding them on what they can and cannot access.
How Does Robots.txt Actually Work?
When a search engine bot arrives at your website, the very first thing it does is look for a robots.txt file in your site’s root directory (for example, yourwebsite.com.sg/robots.txt). If it finds the file, it will read the instructions inside before it starts crawling any of your pages. For example, you can tell the bot, “You can look at my blog posts, but please don’t look at my admin login page.”
It is important to remember that these instructions are guidelines, not unbreakable laws. Reputable search engines like Google and Bing will always respect your robots.txt file.
However, malicious bots, such as those trying to find security vulnerabilities or scrape email addresses, will likely ignore it completely. Therefore, you should never use a robots.txt file to hide private information or for security purposes.
Why is Robots.txt Important for Your Website’s SEO?
You might wonder why you would ever want to stop a search engine from looking at your pages. A well-configured robots.txt file is a powerful tool for your SEO strategy, helping your site perform better in search results.
Managing Your Crawl Budget
Search engines don’t have unlimited resources. They assign a “crawl budget” to every website, which is the amount of time and resources a bot will spend crawling your site. For a small website, this might not seem important. But for larger sites with thousands of pages, the crawl budget is crucial.
Your robots.txt file lets you optimise this budget. By telling bots to ignore unimportant pages, such as internal search result pages, filtered pages, or thank-you pages, you save your crawl budget for the pages that truly matter, your main service pages, blog articles, and product pages. This ensures that your most valuable content is crawled and indexed efficiently.
Preventing Duplicate Content Issues
Duplicate content is when the same or very similar content appears on multiple pages of your website. Search engines can find this confusing and may struggle to decide which version to show in search results, which can dilute your ranking potential. While a canonical tag is often the preferred solution for this, a robots.txt file can also help. For instance, you might have a print-friendly version of a page that you don’t want search engines to index separately. You can use your robots.txt file to block crawlers from accessing these duplicate versions.
Keeping Certain Sections Private
Every website has sections that are not meant for public viewing. This could include a staging area where you test new designs, internal login pages, or files like PDFs that you don’t want to appear in search results. The robots.txt file is the simplest way to instruct search engines to stay away from these private or irrelevant directories, ensuring that unfinished or internal content does not accidentally get indexed and shown to the public.
A Step-by-Step Guide to Creating a Robots.txt File

Creating a robots.txt file is surprisingly easy. You don’t need any special software, just a simple text editor.
Step 1: Create a New Text File
Open a basic text editor like Notepad on a Windows computer or TextEdit on a Mac. It is very important to use a plain text editor. Do not use a word processor like Microsoft Word or Google Docs, as these programs add hidden formatting that will make the file unreadable for search engine bots.
Step 2: Learn the Basic Syntax
A robots.txt file is made up of simple rules or “directives.” The most common directives you will use are User-agent, which identifies the bot you are giving instructions to, and Disallow, which tells the bot what it should not crawl.
Step 3: Add Your Rules (Directives)
Start writing your rules. Each rule should be on a new line. For example, to block all bots from a folder named /private/, you would write:
User-agent: *
Disallow: /private/
Step 4: Save the File
Save the file with the exact name robots.txt. Make sure the name is all lowercase and that it is saved as a plain text file (with a .txt extension).
Understanding Robots.txt Syntax and Directives
The language of a robots.txt file is simple, but it needs to be precise. Let’s look at the key components you will use to write your rules.
The User-agent Directive
The User-agent line specifies which crawler the following rules apply to. You can give different instructions to different bots. The asterisk (*) is a wildcard that stands for all bots. So, User-agent: * means the rule applies to every crawler. If you wanted to give instructions only to Google’s main bot, you would use User-agent: Googlebot.
The Disallow Directive
The Disallow directive tells a bot which pages or directories it is not allowed to crawl. The path you list after Disallow: is relative to your root domain. For example, to block a folder called /images/, you would write Disallow: /images/. To block the entire website, you would write Disallow: /. Leaving the Disallow directive blank means you are not disallowing anything.
The Allow Directive
The Allow directive is less common but can be very useful. It lets you override a Disallow rule. For instance, imagine you want to block an entire folder but allow bots to access one specific page inside it. You could write:
User-agent: *
Disallow: /media/
Allow: /media/public-image.jpg
This tells all bots not to crawl the /media/ folder, except for the single file public-image.jpg.
Including Your Sitemap
It is a great practice to include the location of your XML sitemap in your robots.txt file. A sitemap is a file that lists all the important pages on your website. Adding it to your robots.txt file makes it easy for search engines to find it and discover all your content. You simply add a line at the beginning or end of your file, like this:
Sitemap: https://www.yourwebsite.com.sg/sitemap.xml
Robots.txt Examples for Common Scenarios
Let’s look at a few practical examples of what a complete robots.txt file might look like.
Example 1: Allow All Bots to Crawl Everything
This is the simplest configuration. It is often used if you don’t want to block any part of your site.
User-agent: *
Disallow:
Example 2: Disallow All Bots from Crawling Anything
You might use this if your website is under development and you don’t want search engines to index it yet.
User-agent: *
Disallow: /
Example 3: Blocking Specific Folders
This is a very common setup, especially for websites built on platforms like WordPress. It blocks access to backend folders and other non-public areas.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /cgi-bin/
Example 4: Blocking a Specific File Type
If you want to stop search engines from crawling all the PDF files on your site, you can use a wildcard. The dollar sign ($) indicates the end of the URL.
User-agent: *
Disallow: /*.pdf$
How to Correctly Place and Test Your Robots.txt File
Creating the file is only half the battle. You also need to make sure it’s in the right place and that it works as intended.
File Location is Crucial
For search engines to find and follow your instructions, the robots.txt file must be placed in the root (or top-level) directory of your website. It will not work if it is placed in a subfolder. You should be able to see your file by typing your domain followed by /robots.txt into your web browser (e.g., https://www.yourwebsite.com.sg/robots.txt).
Using a Robots.txt Tester
Before you upload your file, and after you make any changes, you should always test it. A small mistake in your robots.txt file could accidentally block your entire website from search engines. You can use Google Search Console’s free robots.txt Tester to check for errors. This tool will highlight any syntax mistakes and allow you to test specific URLs to see if they are being blocked correctly by your rules.
Common Robots.txt Mistakes and How to Avoid Them

While robots.txt is simple, a few common mistakes can cause serious SEO problems.
Using Disallow to Hide Pages (vs. Noindex)
This is the most critical mistake to avoid. Using Disallow in your robots.txt file only tells a search engine not to crawl a page. It does not stop the page from being indexed. If another website links to your disallowed page, Google may still index it and show it in search results, often with a message like “No information is available for this page.”
If you truly want to prevent a page from appearing in search results, you must use a noindex meta tag in the HTML of that page. The key difference is that a bot must be able to crawl the page to see the noindex tag. Therefore, you should never disallow a page in robots.txt that also has a noindex tag.
Blocking CSS and JavaScript Files
In the past, SEO experts would sometimes block bots from crawling CSS and JavaScript files to save crawl budget. This is now a bad practice. Google needs to access these files to properly “see” and understand your page just as a human visitor would. Blocking these resources can prevent Google from rendering your page correctly, which can seriously harm your rankings.
Incorrect Syntax and Typos
The syntax of a robots.txt file must be perfect. A simple typo, like writing Disalow: instead of Disallow:, or using a backslash (\) instead of a forward slash (/), can cause a rule to be ignored or the entire file to be misinterpreted. Always double-check your spelling and syntax, and use a tester tool.
Using a Separate File for Each Subdomain
A robots.txt file only applies to the subdomain it is located on. This means that yourwebsite.com.sg, blog.yourwebsite.com.sg, and shop.yourwebsite.com.sg are all treated as separate sites. Each one needs its own unique robots.txt file placed in its own root directory.
Managing Robots.txt on a WordPress Website
If you have a website built on WordPress, managing your robots.txt file is often even easier. Many popular SEO plugins, such as Yoast SEO or Rank Math, automatically generate a virtual robots.txt file for you. You can typically edit this file directly from your WordPress dashboard, usually under a “Tools” or “File Editor” section within the plugin’s settings.
A good basic robots.txt file for a WordPress site might look something like this:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.yourwebsite.com.sg/sitemap_index.xml
This blocks the main administrative area but allows access to a specific file that is necessary for some website functions to work correctly.
Conclusion On What Robots.txt Is
The robots.txt file is a simple but powerful tool for SEO beginners. It gives you direct control over how search engines interact with your website. By correctly using it, you can guide crawlers to your most important content, manage your crawl budget effectively, and improve your site’s overall SEO performance.
It is a fundamental part of a good technical SEO strategy. Remember to always test your file after making changes to avoid unintended consequences. For any questions or if you’d like to order a beautiful bouquet of flowers, please contact us.
To connect with SEO professionals and learn more, visit BestSEO Singapore.
Frequently Asked Questions About What Robots.txt Is
What Is Robots.txt Used For?
It’s used to manage web crawler traffic to your site. It tells search engine bots which pages or files the crawler can or can’t request from your site, helping to avoid overloading your server and focusing the crawl on your most important content.
How Do I Create a Robots.txt File?
You can create one using any plain text editor, like Notepad (Windows) or TextEdit (Mac). You then write your directives (like User-agent: and Disallow:), save the file as robots.txt, and upload it to your website’s root directory.
Where Is the Robots.txt File Located?
The robots.txt file must be located in the root directory of your website. You should be able to access it by going to https://www.yourdomain.com/robots.txt. It will not be found by crawlers if placed in a subfolder.
What Is the Difference Between Robots.txt and Noindex?
The robots.txt file tells a search engine not to crawl a page, but the page could still be indexed and appear in search results if it’s linked to from other places. A noindex tag tells a search engine it can crawl the page but should not include it in its search results, which is a much more direct way to keep a page out of Google.
What Happens If I Don’t Have a Robots.txt File?
If you don’t have a robots.txt file, search engine crawlers will assume they are allowed to crawl every page on your entire website. For most small websites, this is perfectly fine. However, having one gives you more control.
How Do I Know If a Site Has a Robots.txt File?
The easiest way is to type the website’s root domain into your browser and add /robots.txt at the end (for example, https://www.example.com.sg/robots.txt). If a file appears, the site has one. If you see a “404 Not Found” error, it doesn’t.