SEO Is Not That Hard

Cosine Similarity and why it matters to SEO

Edd Dawson Season 1 Episode 187

Send us a text

Discover the secret sauce behind optimizing your content for search engines and master the art of aligning with search intent. Unravel the mystery of cosine similarity, a powerful mathematical tool, and learn how it transforms text into vectors to measure the relationship between your content and user queries. Join me, Ed Dawson, as I guide you through a library analogy to shed light on how this technique helps search engines like Google determine relevance, giving you the edge to boost your search rankings. With a focus on vector space models, you'll gain insights into how search engines evaluate web pages and how you can leverage this knowledge to enhance your SEO strategies.

Transform your content creation approach by leveraging cosine similarity. Explore practical strategies that help refine keyword relevance and align with user intent, ensuring your content hits the mark every time. From content gap analysis to strategic clustering, learn how to identify missing topics, reduce topic cannibalization, and optimize site architecture to maximize unique page value. Embrace the power of semantic richness and regular audits to maintain high-quality content that avoids the pitfalls of keyword stuffing and duplicate content. Whether you’re looking to enrich the value of a smoothie recipe or improve overall SEO performance, these actionable insights will empower your content-query alignment like never before.

SEO Is Not That Hard is hosted by Edd Dawson and brought to you by KeywordsPeopleUse.com

You can get your free copy of my 101 Quick SEO Tips at: https://seotips.edddawson.com/101-quick-seo-tips

To get a personal no-obligation demo of how KeywordsPeopleUse could help you boost your SEO and get a 7 day FREE trial of our Standard Plan book a demo with me now

See Edd's personal site at edddawson.com

Ask me a question and get on the show Click here to record a question

Find Edd on Linkedin, Bluesky & Twitter

Find KeywordsPeopleUse on Twitter @kwds_ppl_use

"Werq" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
http://creativecommons.org/licenses/by/4.0/

Speaker 1:

Hello and welcome to. Seo is not that hard. I'm your host, ed Dawson, the founder of keywordspeopleusecom, the place to find and organise the questions people ask online. I'm an SEO developer, affiliate marketer and entrepreneur. I've been building and monetising websites for over 20 years and I've bought and sold a few along the way. I'm here to share with you the SEO knowledge, hints and tips I've built up over the years the SEO knowledge, hints and tips I've built up over the years.

Speaker 1:

Hello and welcome back to another episode of SEO. Not that Hard. It's me here, ed Dawson, hosting as usual, and today I'm going to talk about a topic that is becoming of increasing importance in SEO, and that is cosine similarity. Now, before you hit, skip thinking we're going to dive into heavy maths or anything here. Just stick with me. Okay, we're going to dive into heavy maths or anything here, then just stick with me. Okay, we're going to keep it straightforward. I'm not going to try and explain a lot of complex maths on a podcast, because that would be really tricky. Um, you don't actually need to know the maths behind it to understand why cosine cosine similarity is important. Um, so we'll go through it and hopefully it'll make some sense. And, yeah, let's see how we go.

Speaker 1:

So, first of all, what exactly is cosine similarity? Well, at its core, cosine similarity is simply a measure which is used to determine how similar two pieces of text are, regardless of the size of each piece of text. So if you think of it as a way to quantify the similarity between two documents or web pages by converting them into a thing called a mathematical vector and then calculating the cosine of the angle between these vectors, now that's as complex math as I'm going to talk about. I'll break it down into kind of a simple analogy. So imagine you're standing in a library that's filled with books. Each of those books represents a web page, and the words inside the books are the content. We now want to find books which are similar in content to a particular book we're interested, we're interested in so. So cosine similarity then helps us do that by comparing the frequency and occurrence of words in each book and then effectively sort of telling us how closely related they are. So in the seo world, this is incredibly useful.

Speaker 1:

I mean, search engines like google use similar concepts to understand this relevance of web pages to a user's query. So they essentially they, they take a query and they turn it into a vector, which is an embedding, which was what we talked about in previous episodes. They convert it to this embedding, this vector. And then they look at page content Now, the page content they've also converted into this embedding, this vector and then they use cosine similarity to see how closely related these two pieces of content are. Obviously, a query tends to be a very, very small piece of text, not a huge amount of text in a query, and a document has a lot of text in it. A web page has a lot of text in it normally. So we can use cosine similarity to see how similar and how related the query is to the document, the web page, and vice versa. And knowing this and understanding this, we can then use it to optimise our content to better match the search intent of queries and improve our rankings.

Speaker 1:

So how does it work under the bonnet? So let's look into it a little bit deeper. Well, not too deep again, don't want to get too mathsy, but a bit more how it works. So let's talk about these vectors again, and if you've listened to my podcast on embeddings, this will make a bit more sense. If you haven't listened to that one yet, go back to it, but let's talk about it here.

Speaker 1:

So, in simple terms, vectors an array of numbers that represent certain characteristics, in our case words, in a document. When we convert text into these vectors, we're essentially transforming words into numerical values based on their frequency and their importance within that text. So here's how it works. So, say suppose, we have two documents, document a, document b. We compile a list of all the unique words in both documents. Each document is then represented as a vector, where each element corresponds to the frequency of the words in that document. So, for example, if both documents contain the word SEO frequently, the value for SEO in the vector will be high, indicating the term is significant in both documents. Now, cosine similarity calculates the cosine angle between these two vectors. So if the angle is small meaning the cosine value is close to 1, then the documents are very similar, and if the angle is large, with a cosine value closer to zero, then the documents are quite different. Now this mathematical approach allows us to quantify textual similarity in a way that's meaningful and that we can work with, and also because it's mathematical. You know, we get the same result each time if we put the same input in. So it means we can um reliably use it and reliably work with it every single time. Okay, so that's the basics covered. So let's discuss now.

Speaker 1:

So let's think about how cosine similarity applies to SEO and, even more specifically, how Google incorporates it into their algorithms. So search engines their aim is to provide the most relevant results to a user's query, so to do this effectively, they need to understand the content of web pages and how they relate to each other and to the search terms that people are using. So Coson similarity plays a role in this process by helping the search engines measure the similarity between the query and the documents in their index, as I was saying earlier. So, for example, google has been known to use vector space models which is what this is all about in their information retrieval systems for a long time. Vector space models, which is what this is all about in their information retrieval systems for a long time. So in these models, both queries and the documents are represented as these vectors, as embeddings in a multidimensional space. So this relevance of documents to a query is then determined by the similarity between the vectors, often calculated by using cosine similarity.

Speaker 1:

If it's not exactly cosine similarity, something very, very similar. Another area where Google uses a very similar concept is their use of word embeddings in word to vec, which is a technology that google developed, and word to vec represents words as vectors in a high dimensional space. That's the blurb, and it captures the semantic relationships between words. So by computing the cosine similarity between vectors, again google can understand context and synonym, which improves their search results. And we know that google uses these concepts in algorithms like updates that they use, such as hummingbird and rank brain. They've emphasized semantic search. Okay, and the way they, yeah, doing that semantic search using cosine similarity so that they can, they can work out synonyms is really really important in these and understand understanding user intent. So these models, again, are very similar and they use concepts which are very similar. And again to cosine similarity, now we understand that google uses these method. It helps us realize you know quite how important the um it is to create content that's not just keyword rich but also semantically rich and relevant to the queries that we're targeting.

Speaker 1:

So in terms of practicalities, how can we, as seos or content creators, leverage cosine similarity to sort of enhance our strategies? So first we have to consider keyword relevance. So when we're creating content, we want to ensure it's closely aligned with the topic and queries that our audiences are interested in. So by analysing the cosine similarity between our content and popular search queries, we can adjust the content to bat and match that user intent. So, for example, if we're writing an article about healthy smoothie recipes, we'd want to include terms like nutritious ingredients, easy, make, vitamin rich, that kind of thing. This not only boosts cosine similarity score between our content and any potential queries that people are going to put in around healthy smoothies, it also enriches the content's overall value. And secondly, cosine similarity can it can aid in content gap analysis. So by comparing our content with that of competitors, we can start to identify areas where we might be sort of missing entities content topics related to the broader context of the topic we're trying to create content about.

Speaker 1:

Thirdly, if we understand that Google uses these vector-based models to gauge and assess content relevance, using these semantically-related terms, we can improve how our content is perceived by Google itself and its algorithms. And lastly, it can help in content clustering and site architecture so we can group together content pieces that are similar. We can create better organized, sort of more structurally organized websites with user-friendly structures, and we can use it to again, I've talked a lot about topic cannibalization and keyword cannibalization recently. We can start to use it to see actually we've got two pages that have got very, very similar casein similarities sorry case means that actually are these pages topic cannibalizing each other because they're so similar? So we can use this to help improve user experience, improve our crawl efficiency, making sure we've only got the pages we need and not sort of cannibalizing ourself.

Speaker 1:

This method of being able to have a reliable way and consistent way of comparing pages and different types of content to each other and how similar they are is really really powerful. So now I know we've talked a lot about maths, cosine similarity, all these kind of things vectors, embeddings, all sorts and it sounds like you know, do you need to be a data scientist to use cosine similarity? And the great news is no, you don't. Okay, there's loads of tools and software out there that can help you calculate cosine similarity, and the great news is no, you don't. Okay, there's loads of tools and software out there that can help you calculate cosine similarity without you didn't do any of the maths of the current number crunching yourself, one of them being, you know, our site content optimizer, which integrates with Google search console in keywords people use. We've got cosine similarity baked in, so where you know if you connect your site to keywords people use.

Speaker 1:

What we will do is we will go and download all your content, will work out the embedding for it, what the vectors are for it will then also download all the queries that you're ranking for in Google from Google Search Console and we'll work out the embedding, the vector representation of all those queries and then we'll calculate the cosine similarity between every query and the content that you are ranking for. On that. This means that you can see for every query and every page on your site how well aligned they are and how slim they are, how closely connected they are, and then you can use this information to then obviously improve your content. If you find a query you think actually I'm not, I'm not ranking well, I'm not, my similarity isn't good enough on that, you've then got something to score against. You can make a change, change the content, and then we can recalculate the embeddings and tell you how well you've performed on that. But it's worth noting, it's worth understanding that the concept of this is more important than the calculation itself. So you just need to know that search engines like Google are assessing the semantic similarity of the content and using tools like this that can work out that similarity for you can help guide you in creating that more targeted and more effective content.

Speaker 1:

So let's think about the best practices we sort of need to keep in mind when we're going to use cosine similarity in our SEO. So, first, always prioritize quality over quantity. So, while it's important to include those relevant keywords and those phrases and those phrases, stuffing your content with keywords can hurt readability. Okay, we don't want to do that. It's not about keyword and content stuffing. I don't want to be penalized for doing this. We want to aim for natural language that provides value to our readers.

Speaker 1:

Secondly, we want to focus on semantic rich richness. So when you cinnamons related terms and varied language to cover the topic comprehensively, okay it's. This not only improves your cosine similarity, it's also going to boost you with various queries. It also aligns to how Google's algorithms assess content relevance. So that semantic richness is important. And thirdly, we need to regularly audit your content. So over time, your relevance, your certain types of keywords, can change. So by reviewing your content on your popular queries, you can update and refresh it to maintain its effectiveness. And, additionally, just be mindful of how Google uses vector models and embeddings. Incorporating these relevant entities and topics that are semantically connected can really enhance your content's performance in search results. And lastly, just be cautious Going back to the cannibalisation of duplicate content.

Speaker 1:

Enhance your content's performance in search results and, lastly, just be cautious. Okay, going back to the cannibalization of duplicate content. You know too much similarity between our own pages can be detrimental, so we want to make sure that each page has its unique value and it targets those specific keywords and topics. Okay, we don't want to going back to the example of using previous podcasts. Don't want to topic cannibalize okay, we want to try and keep topics separate to avoid this issue of topic and keyword cannibalization. So hopefully this has been useful.

Speaker 1:

Hopefully you can see that how cosine similarities. It's a valuable concept. It helps us measure the similarity between pieces of text by converting them to those vectors, those embeddings, and calculating the cosine of the angle between them and how this is used by Google when it's looking at what pages, what content, to rank for certain queries. And also, hopefully, now you understand how you can use cosine similarity, you can use tools like QSP People Use and others to work out and see what your cosine similarity is between the queries you're targeting and the content that you have on your page, and hopefully you know it's content you have on your page and hopefully you know it's not been too over technical for you. Hopefully it's the concept that's important. You haven't got to worry about how to actually calculate these similarities yourselves. I say there's tools out there, our tool, other tools can do it for you, but they hope you can understand how and why it works. So why having that similarity and using this, this score, can help you work out how to improve your content.

Speaker 1:

So that's it for today and until next time. Keep optimizing, stay curious and remember seo is not that hard when you understand the basics. See you later. Before I go, I just want to let you know that if you'd like a personal demo of our tools that keywords people use, that you can book a free, no obligation one-on-one video call with me where I show you how we can help you level up your content by finding and answering the questions your audience actually have. You can also ask me any SEO questions you have. You just need to go to keywordspeopleusecom slash demo where you can pick a time and date that suits you for us to catch up Once again. That's's keywordspeopleusecom slash demo and you can also find that link in the show notes of today's episode. Hope to chat with you soon.

Speaker 1:

Thanks for being a listener. I really appreciate it. Please subscribe and share. It really helps. Seo is not that hard. It's brought to you by keywordspeopleusecom, the place to find and organize the questions people ask online. See why thousands of people use us every day. Try it today for free at keywordspeopleusecom To get an instant hit of more SEO tips. Then find the link to download a free copy of my 101 quick SEO tips in the show notes of today's episode. If you want to get in touch, have any questions, I'd love to hear from you. I'm at Channel 5 on Twitter. You can email me at podcast at keywords people usecom. Bye for now and see you in the next episode of SEO is not that hard.

People on this episode