.png)
SEO Is Not That Hard
Are you eager to boost your website's performance on search engines like Google but unsure where to start or what truly makes a difference in SEO?
Then "SEO Is Not That Hard" hosted by Edd Dawson, a seasoned expert with over 20 years of experience in building and successfully ranking websites, is for you.
Edd shares actionable tips, proven strategies, and valuable insights to help you improve your Google rankings and create better websites for your users.
Whether you're a beginner or a seasoned SEO professional, this podcast offers something for everyone. Join us as we simplify SEO and give you the knowledge and skills to achieve your online goals with confidence.
Brought to you by keywordspeopleuse.com
SEO Is Not That Hard
Robots.txt - upsides and pitfalls
The episode explores why understanding robots.txt is essential for effective SEO. We discuss its role in managing how search engines crawl and index a website, common mistakes to avoid, and best practices for configuration.
• Definition and function of robots.txt
• Importance of crawl budget and efficiency
• Common pitfalls and testing recommendations
Keep tuning in for insights on SEO strategies and tools to enhance your website's performance!
Learn more about robots.txt at https://robotstoolkit.com/
SEO Is Not That Hard is hosted by Edd Dawson and brought to you by KeywordsPeopleUse.com
Help feed the algorithm and leave a review at ratethispodcast.com/seo
You can get your free copy of my 101 Quick SEO Tips at: https://seotips.edddawson.com/101-quick-seo-tips
To get a personal no-obligation demo of how KeywordsPeopleUse could help you boost your SEO and get a 7 day FREE trial of our Standard Plan book a demo with me now
See Edd's personal site at edddawson.com
Ask me a question and get on the show Click here to record a question
Find Edd on Linkedin, Bluesky & Twitter
Find KeywordsPeopleUse on Twitter @kwds_ppl_use
"Werq" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
http://creativecommons.org/licenses/by/4.0/
Hello and welcome to. Seo is not that hard. I'm your host, ed Dawson, the founder of the SEO intelligence platform, keywordfupoleasercom, where we help you discover the questions people ask online and learn how to optimize your content for traffic and authority. I've been in SEO and online marketing for over 20 years and I'm here to share the wealth of knowledge, hints and tips I've amassed over that time. Hello, welcome back to SEO is not that hard. It's me here, ed Dawson, as usual, and today I'm going to be dedicating an entire episode to something that I'm quite surprised I haven't done already. So, anyway, I looked back through the archives and realised, yeah, I might have mentioned this thing before, but never done a whole episode on it, and that's the robots. So if you spend any time whatsoever researching SEO, at some point you're going to come across the term robotstxt and you might wonder what it is. So this is the episode for you.
Speaker 1:If you don't know, or even if you do know, what a robotstxt is, maybe there'll be something you'll learn within this. So, first of all, what is robotstxt? Well, at its most basic level, a robotstxt file is a text file that lives in the root directory of your website, so that means it's the very. You know it's in the very first directory on your website and its primary purpose is to tell web crawlers or robots, which pages or sections of your site they're allowed to crawl in index and which ones they should ignore. So you know, you can think of the robotstxt as it's not the greatest analogy, but it's a bit like a bouncer at a club. So it stands at the door and decides who gets in and who doesn't, although we'll talk about later why it's not a very good bouncer. But the file itself is essential for controlling the behavior of search engine bots like Googlebot, bingbot and other bots that will come to your site and want to crawl content. So why does the robotstxt matter? For SEO, so it's your content. Why does it matter that you should be telling robots what they should and shouldn't do with it? Well, the answer is you know it's really important if you want to control how search engines interact with your site.
Speaker 1:There's a few reasons. First one, crawl budget efficiency. By instructing bots on which areas to skip of your site, you help them focus on the parts of your site that matter most to you, and this is particularly important for large websites where you might not want bots wasting their crawl budget on duplicate or irrelevant pages. And crawl budget if that's not a term you've heard of, that's the fact that Google and other search engines will only decide to crawl a certain amount of pages on your site. If you're not very well-known site and you're not very authoritative, they're not going to necessarily crawl every page on your site. The bigger you are, the more authoritative your site, the more that they will crawl. But essentially there's a kind of a limit that Google will place on how much it's going to crawl of your site. So you've got a lot of pages. You've got more pages than potentially your crawl budget. You want to focus Google on the pages important to you and not have it go to pages that aren't important, and this is robotstech is something you can use to. You can prevent the indexing of low-value content so you can block search engines from indexing pages that aren't meant for public consumption, like admin panels, staging areas or duplicate content, and this helps keep your search results clean and focused on your best pages. Again, it goes into the crawl budget idea of focusing only on your most important pages. Thirdly, it can mitigate server load If you allow bots to crawl every little page you can end up with bots spending more time on your on your site than actual, real people and you can put strain on your server. So you know you can use the robotstxt to help make sure they only go to the right places and don't spend too much time getting stuck in loops and that kind of thing.
Speaker 1:And fourth, security considerations. You know robotstxt isn't a security mechanism, but it can deter casual snooping by web crawlers from indexing sensitive directors you might not want to be indexed in search engines. However, you must remember that the robotstxt is publicly accessible, so anyone can read the robotstxt. So you don't want to rely on it for actual security, because we'll get to why. There is a pitfall around this.
Speaker 1:So let's look at those pitfalls, those common mistakes in the pitfalls. First of all, if you're not, you don't do it right. You can block important pages. One common mistake that you can quite often see is where people will get their robostxt wrong and they'll actually block the crawlers from crawling the pages they want. So if you have a blog directory say a slash blog directory and then you mistakenly include a disallow directive on that blog directory, none of your blog posts will appear in search results. That's not what you want. Secondly, using robotstxt as a security measure is a common mistake because you can't rely on it, on web crawlers actually following all the rules that you set in your robotstxt. All the big ones, all the ones like Google, bing, all the serious ones you know, the big, well-known ones should and probably most of the time, do follow the robotstxt directive, because they're going to get in a lot of trouble and a lot of flack if they don't.
Speaker 1:So you can rely on the big services to follow the directives you give in robotstxt, but not everyone will. You will find some people will come and crawl your site and will ignore your robotstxt and will just crawl everything. These will be people that might be wanting to take your content and repurpose it or have other reasons where they're just ignoring robotstxt because there's no law that says you have to follow robotstxt and many, many won't. So if you really want to hide something, then you need another mechanism that will actually A hide it or B enforce hiding that content from whoever. You don't want to see it, because you will get people will try and hack you. It's just a way of life.
Speaker 1:Get in the file. You want to keep your robotstxt simple. They can get very big. Complex rules can lead to unintended consequences where you know you say one thing in one rule and then another rule that contradicts the original rule. You can get yourself in a bit of a twist there. So you want to keep it as simple as you possibly can. The file should be clear and easy for both humans and robots to understand.
Speaker 1:And fourth, not testing changes. Any change you make to robotstech should be tested. A simple misconfiguration can lead to your entire site bnd index and I've seen it happen. Um, so you can find tools like I mean. We've actually got a tool. It's one of the first SEO tools we created. It's called robotstoolkitcom and with that you can actually submit your file. You can just give us the URL to your file or you can paste a sample robotstxt file in and we'll run through that file, check the rules and then give you a full rundown rundown of whether that um is what a, what it is doing in written in human language, and then b also if you see any errors in it. So you can um, you can check. You can check that. So, robotstoolkitcom I'll put a link in the show notes. So you're going to make a robot text.
Speaker 1:What are the best practices for creating robotstxt. So first of all, like I said before, keep it clear. To Keep it simple, keep it clear, write straightforward rules. Now I'm going to read some of the rules out here. It's probably not going to be the best way of doing it because it's a podcast, but, for example, a very simple one would be user-agent colon star, which is saying all user agents. And then you have the rule disallow colon slash, admin slash, which says don't go to the admin directory. You're not allowed to go to the admin directory because you don't want people crawling that. And the next line will say disallow colon slash, staging slash. And that's saying don't go to my staging directory. I don't want you looking at the staging content. And the next line would say allow colon slash, and that's basically saying allow everything else. So it will follow these rules in order. So follow the disallows. But then it will say right, I'm not allowed here. I'm not allowed here. Everywhere else I'm allowed.
Speaker 1:You want to update regularly as your website grows and evolves. Keep your robots text file up to date, you know. Review it periodically to ensure it still aligns with the current site structure you've got and where you do and don't want things to go. A great thing to put in there probably going to be the subject of another podcast is putting in sitemap directives. You might have heard of a sitemapxml. That's like a list of all the pages on your site in an xml file in your robots touch takes is a place to put a link to that sitemap so that a robot can come along and you'll see a sitemap link to the XML file that is a sitemap for your site. So that's a really handy place to put that so that that crawler can come along and go. All right, here's where the sitemap is. I'll go and grab that because that'll tell me what everything on the site that I should index is.
Speaker 1:Fourth, be specific when necessary. If only certain bots should follow a specific rule, then you can specify that directly. So, for instance, if you want to allow Googlebot to the whole site but you want to block another crawler, you can do that by using separate rules. So you can say user agent, googlebot allows slash, which will allow everything. You can say user agent, say bad bot, there's some kind of bad bot that you don't want to allow. You can disallow slash and then bad bot if it's following the robotstxt rules as you set them will then not crawl your site, but remember, some bots will ignore this. And then finally, yeah, test your configuration. As I say, you can use robotstoolkitcom. There are other robotstoolkit checkers out there as well. Always check your robotstxt with one of these tools to check that it's not doing something that you don't want it to.
Speaker 1:So hopefully, if you've never heard of robotstxt before, you've now got an idea of why it's an important thing to have on your site. It's essentially that little gatekeeper that can just say yep, here you go, you can go here, you can go there. I don't want you to go here, I don't want you to go there. It's like a little person signposted. I mean, as I say, saying it's a nightclub bouncer isn't the greatest analogy, because a nightclub bouncer will stop people coming in they don't want in, whereas robotstech is more like probably like a concierge who's just basically telling people where to go and where not to go and trusting that everyone will follow that. Not everyone will.
Speaker 1:So if there are things that you really want hidden, use another method like, say, password-protecting areas using custom kind of authentication if you really don't want to bot in there. That's the safest way for those, but in general, for most bots, all the bots that are important to you, like Google Bot, bing Bot, the ones that are from sort of the more genuine corporations, will follow them. So it's a good thing, good practice, to do them and, as I say, the place to go if you want to check your robotstxt is robotscom and, yeah, you'll find information on there about robotstxt itself, where you should use it, when you should use it and also how to test it. So, yeah, go and take a look and look and, yeah, link in the show notes for that. So that's it for today, um, and remember, yeah, until next time, keep optimizing, stay curious and remember seo is not that hard when you understand the basics. Thanks for listening. It means a lot to me.
Speaker 1:This is where I get to remind you where you can connect with me and my seo tools and services. You can find links to all the links I mentioned here in the show notes. Just remember, with all these places where I use my name, that ed is spelled with two d's. You can find me on linkedin and blue sky. Just search for ed dawson on both.
Speaker 1:You can record a voice question to get answered on the podcast. The link is in the show notes. You can try our seo intelligence platform keywords people use at keywords people usecom, where we can help you discover the questions and keywords people are asking online. Post those questions and keywords into related groups so you know what content you need to build topical authority. And finally, connect your google search console account for your sites so we can crawl and understand your actual content, find what keywords you rank for and then help you optimize, continually refine your content, targeted, personalized advice, keep your traffic growing. If you're interested in learning more about me personally or looking for dedicated consulting advice, then visit wwweddawsoncom. Bye for now and see you in the next episode of SEO. That Is Not that Hard.