SEO Is Not That Hard

Entities Part 5: How ChatGPT Really Thinks About Your Brand

Edd Dawson Season 1 Episode 325

Send us a text

Forget neat rows of facts—your brand lives inside AI as a point on a vast map of meaning. We unpack how large language models like ChatGPT convert words into vectors, arrange them in a multi‑dimensional latent space, and “reason” by navigating probabilistic paths rather than retrieving certified entries from a knowledge graph. That shift explains both the astonishing creativity of LLMs and the stubborn problem of hallucinations, and it reveals why your content choices directly influence how machines see you.

We start by separating Google’s Knowledge Graph—built on labelled, verifiable relationships—from the statistical engine that powers LLMs. From there, we walk through tokens, embeddings, and the geometry of meaning: why “king” sits near “queen,” how “bank” splits by context, and how directions in vector space encode relationships like gender or capital cities. Then we explore probabilistic reasoning and chain‑of‑thought prompting, showing how stepwise guidance can reduce errors by constraining the model’s path through its internal map.

The practical payoff is clear: you can shape your brand’s coordinates. Consistent naming, precise definitions, structured internal linking, authoritative citations, and schema markup help AIs place you in the right neighbourhood of concepts. Pillar pages and topical clusters reinforce the connections that matter, while concise fact sheets and retrieval‑ready content give models the anchors they need to avoid plausible-but-wrong continuations. Think of every page as another vector pull toward accuracy; over time, your credibility becomes the shortest path the model can take.

If this helped you see how AI really “thinks” about your brand, follow the show, share it with a colleague, and leave a quick review. Got a question you want answered on air? Send a voice message via the link in the show notes and tell us where you want your brand’s coordinates to land.

SEO Is Not That Hard is hosted by Edd Dawson and brought to you by KeywordsPeopleUse.com

Help feed the algorithm and leave a review at ratethispodcast.com/seo

You can get your free copy of my 101 Quick SEO Tips at: https://seotips.edddawson.com/101-quick-seo-tips

To get a personal no-obligation demo of how KeywordsPeopleUse could help you boost your SEO and get a 7 day FREE trial of our Standard Plan book a demo with me now

See Edd's personal site at edddawson.com

Ask me a question and get on the show Click here to record a question

Find Edd on Linkedin, Bluesky & Twitter

Find KeywordsPeopleUse on Twitter @kwds_ppl_use

"Werq" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
http://creativecommons.org/licenses/by/4.0/

SPEAKER_00:

Hello and welcome to SEO Is Not That Hard. I'm your host, Ed Dawson, the founder of the SEO intelligence platform KeywordPupeopleEaser.com where we help you discover the questions people ask online and then how to optimise your content to build traffic and authority. I've been in SEO for online marketing for over 20 years and I'm here to share the wealth of knowledge, hints and tips I've amassed over that time. Hello and welcome back to SEO's Not That Hard. It's me here at Dawson as always. And today we're on to the next episode in special series about entities, and that is part five, and that is how ChatGPT really thinks about your brand. So far, we focused on understanding Google when it comes to entities, and we started by defining entities as what real-world things, not strings, and how the search engines now prioritize those. And we looked at how Google reads our websites to identify those entities and then files that information away in its gigantic database, the knowledge graph. We learned that the knowledge graph is like a really well-organised library with a like catalogue cards that detail the exact relationship between billions of facts. It knows Apple Inc. is an organization and Steve Jobs is a person and it knows the relationship between them is founder. But in the last couple of years, there's a new player on the scene, and it's really changed the conversation in SEO. And I'm talking obviously, of course, about the large language models and the technology behind tools like ChatGPT, Claude, Gemini, Perplexity. And those AI systems, they're not just another feature of Google, they represent what is really a fundamentally different way of processing and understanding information. And this brings up the really important question. If an LLM doesn't have a neat labelled knowledge graph like Google does, then how does an AI that only understands maths essentially actually represent a complex entity like your brand? So today we're going to look at that question. So we're leaving the organized library of the knowledge graph behind to venture into the sort of fascinating but very different mathematical mind of an LLM. So the first, most important thing to understand is that an LLM is not a database, and this is a really crucial distinction. Google's Knowledge Graph is designed to be a database. It stores discrete, verifiable facts. The capital of France is Paris. Steve Jobs was the founder of Apple. When you ask Google a factual question like that and it goes into the knowledge graph, it's finding the right sort of book, the right library book, the right part of the database, and it's bringing you back a and reading you a fact from that book that it is convinced of. An LLM doesn't work that way, okay? It's not retrieving stored facts. Well, fundamentally, an LLM is a giant, really incredibly sophisticated statistical prediction engine. It's been trained on a colossal amount of text on the internet, from books, from articles, from all over, and its primary function is to do one thing, and that is predict the next most probable word in any given sequence. So when you ask ChatGPT what is the capital of France, it doesn't know the answer in the way that a database does. Instead, what it's done is it's analysed countless documents where that question appeared, and it has learned that the most statistically probable word to follow that sequence, the capital of France, is going to be the word Paris. It's completing a pattern, it's not retrieving a fact. And it might sound like a small difference, but this really is fundamental to the whole difference between LLMs and everything that came before. It's what gives the LLMs their most incredible power, but it's also what gives them its most dangerous flaw. So to get our heads around this, we really need to understand how an LLM represents meaning itself. So, how does a machine that only understands numbers learn the meaning of our entities? It does this by creating a universal map of meaning, and this is where we connect the dots together. So the process starts by breaking our language down into a format the machine can use, and it takes our text and it splits it into smaller units called tokens. Then each token, whether it's a word, a phrase, or a whole entity, is converted into a long list of numbers. This numerical representation is called a vector embedding. And this is the answer to the question. Okay, the vector embedding is the LLM's internal representation of an entity. It's the semantic core, the backbone that transforms a real-world concept into a string of numbers that the model can actually process. So think of it like this: so imagine you wanted to create a map of every concept in the world. To place a concept on that map, you need a set of coordinates. And a vector embedding is essentially a set of highly complex coordinates for an entity. But instead of obviously in a normal graph, we just have two coordinates, you know, the X and the Y, or like the longitude and latitude on a map. These vectors is said can have hundreds or even thousands of coordinates, and these are the dimensions that it is represented in. And each represents a different feature or attribute of the entity's meaning. So by that we mean once every entity has its coordinates, the LLM can then place it as a point on a vast multi-dimensional map. And this map is called a latent space or a vector space. And the way this map is organised is what allows the AI to understand our world. So the model's training process organises the space so that entities with similar meanings are entities that frequently appear in similar contexts to positioned close to one another. So the point on a map for the entity king will be always very close to the point for queen. The point for dog will be near puppy and canine. And crucially, for sort of disambiguation of similarly named entities, if you have the entity for the point of a bank in a financial context of a bank where you know you you can get cash, open a bank account, that would be in a neighbourhood with the the words the entities loan and account. The point for the entity bank, when it we're talking about a river bank, will be completely different region of the map and it will be close to terms like stream and shore. So the model by doing this learns to create different vector embeddings for the same word based on the other entities surrounding it. That provides it with that semantic context. But it gets even more incredible. The model doesn't just learn the proximity of terms, it learns relationships. So the distance and direction, the literal, it's like literal vector math. So that distance and direction between king and queen will be remarkably similar to the mathematical relationship between man and woman. So it's learned the concept of gender as a direction on its map. Likewise, the vector relationship between entity France and the entity Paris will be very similar to that between Japan and Tokyo, or the UK and London. What it's done there is it's learned the concept of a capital city. And this geometric arrangement of entities is how the model captures deep context and nuance and the semantic relationships without needing a formal human-built knowledge graph. It's built its own map of meaning based purely on those statistical patterns and the data it was trained on. So if an LLM is a prediction engine using a giant map of meaning, how does it reason? Its reasoning process is completely different from the logical deduction of a traditional computer. It's more like reasoning by analogy and probability. So when you give an LLM a prompt, it converts your words and the entities within it into a series of vectors to find its starting location on its map. And then based on all the patterns it's learned, it calculates the most probable path to take from that point. So it predicts the most likely next vector, which it then translates back into a word, then the next, and then the next, and it generates a response token by token. Now, this is what allows the model to make incredible, really creative leaps that a human might not. It can find abstract mathematical similarities between the vector clusters for seemingly unrelated topics. For example, it might find a structural pattern in its map that is shared between protein folding algorithms and urban traffic flow. And this allow will allow it to generate like a novel insight as about how one field could inform the other. It's not performing any kind of logic, it's just identifying and extending a deep, really deep mathematical pattern. Now, this is the source of the LLM's greatest strength, but it's also, as we mentioned earlier, is its most significant weakness because the model is always just completing a pattern rather than retrieving verified facts. So it can generate highly plausible but entirely false information with absolute confidence. You know, what we know as hallucinations. So to address this, researchers have developed and are continuing to develop techniques to guide this process, this probabilistic, to try and get you better reasonings. And you might have heard of chain of thought prompting or COT. And this is a technique where you explicitly instruct the model to think step by step before giving a final answer. And this is what most of the LLMs now do out of the box. And what it does is it forces the model to break complex problems down into sort of smaller, intermediate steps, much in the way that a human might process and work out how to perform a task that requires any kind of logic. And this is a way of putting guardrails onto that pattern-matching LLM brain to steer it towards hopefully more logical and more accurate conclusions. So bringing all that together, today what we've covered is how LLMs like ChatGPT have a completely different architecture for knowledge than Google's knowledge graph. They're not databases of facts, they're probabilistic prediction engines. They don't understand entities directly, they convert them into numerical coordinates called vector embeddings. They organise these coordinates on a giant multidimensional map of meaning that's called a latent space where proximity and direction represent semantic relationships, and they reason by navigating this map and finding the most likely path forward, which allows for incredible creativity but also opens up the door for those factual errors, those hallucinations. So what's your takeaway from this today? Your goal always is to provide the clearest, most interconnected and factually accurate information you can about your niche, your topic. And the reason why is because every piece of high quality content you create helps those LLMs build a better, more accurate map of your corner of the world. So when your website continually provides clear definitions, logical structures, verifiable facts, what you're doing is you're helping the AI place the entities related to your brand in the right neighbourhood on its map and with the right connections to other authoritative concepts and ideas. But this problemistic nature, this tendency to complete a pattern even when it doesn't have the facts, does lead to that single biggest problem with this whole LLM technology, those AI hallucinations. So in our next episode, we're going to tackle that problem head on. We'll look at why hallucinations happen, and more importantly, we'll discuss the new groundbreaking strategies that allows you to position your website as the source of truth that helps ground those AIs in reality. So that's it for today. So until next time, remember keep optimising, stay curious, and remember SEO is not that hard when you understand the basics. Thanks for listening, it means a lot to me. This is where I get to remind you where you can connect with me and my SEO tools and services. You can find links to all the links I mentioned here in the show notes. Just remember with all these places where I use my name, the Ed is spelled with 2Ds. You can find me on LinkedIn and Blue Sky, just search for Ed Dawson on both. You can record a voice question to get answered on the podcast, the link is in the show notes. You can try my SEO intelligence platform Keywords People Use at keywordspeopleuse.com where we can help you discover the questions and keywords people asking online. Most of those questions and keywords into related groups so you know what content you need to build topical authority, and finally, connect your Google Search Console account for your sites so we can crawl and understand your actual content. Find what keywords you rank for and then help you optimise and continually refine your content. Targeted personalised advice to keep your traffic growing. If you're interested in learning more about me personally or looking for dedicated consulting advice, then visit www.eddawson.com. Bye for now and see you in the next episode of SUA is not a hammered.

People on this episode