SEO Is Not That Hard

EMERGENCY PODCAST - Google Algorithm Document Leak

Edd Dawson

Send us a text

Breaking News: Google Algorithm API documents leaked!

Link to Rand Fishkins Tweet: https://x.com/randfish/status/1795282226038624418
Link to Rand Fishkins Article: https://sparktoro.com/blog/an-anonymous-source-shared-thousands-of-leaked-google-search-api-documents-with-me-everyone-in-seo-should-see-them/
Link to Mike Kings Article: https://ipullrank.com/google-algo-leak
Link to leaked documents: https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-reference.html

SEO Is Not That Hard is hosted by Edd Dawson and brought to you by KeywordsPeopleUse.com

You can get your free copy of my 101 Quick SEO Tips at: https://seotips.edddawson.com/101-quick-seo-tips

To get a personal no-obligation demo of how KeywordsPeopleUse could help you boost your SEO and get a 7 day FREE trial of our Standard Plan book a demo with me now

See Edd's personal site at edddawson.com

Ask me a question and get on the show Click here to record a question

Find Edd on Linkedin, Bluesky & Twitter

Find KeywordsPeopleUse on Twitter @kwds_ppl_use

"Werq" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
http://creativecommons.org/licenses/by/4.0/

Speaker 0:

Hello, it's Ed here. Welcome to this breaking news edition of SEO is not that hard. This is a out of sequence podcast today because there's been some breaking news, and that is that a whole load of internal Google search API documentation has been leaked onto the internet. Now this is rather interesting because Google has always been a black box. We've never been able to see anything on the inside of it. We can only ever work out from this of the, the information that google let let out themselves, and then from experience of our own seo efforts as to what does and doesn't work. And you know we've always had to trust what Google tell us and you know plenty of times there's been some misinformation. We all know that Now this document leak was put out on Twitter today by Rand Fishkin, and he also published a write-up of what he found on it and also a guy called Mike King from iPullRank. Randall had shared these documents with Mike over the weekend and he has also put out a write-up and also links to the actual API documents themselves. Now, this is really early days. I've only spent maybe a couple of hours looking at this myself, so there's still loads more to learn of this, but I wanted to break the news that these documents are now available. I will put links in the show notes to Rand's write-up of it, to Mike's write-up of it and also to where you can actually find the docs.

Speaker 0:

Headline takeaways, though, which have come out from it which are quite interesting. First one is click data. There is lots of reference to how Google use data on click interactions from the search results, for example, whether people are pogo sticking, and that is, the jumping from search results to web pages and then back to the search results and how that can interact and have an effect on rankings. There's lots of data about chrome click streams, and that's how google takes actual, real-time user data from how you use chrome. Now, billions of people worldwide use chrome to sort of surf the web and what, what they're actually kicking on, how long they're spending on on pages, and things like that. There's lots in there about that. That's very interesting. How google are using click data for link weighting. Now this is a really, really interesting one, apparently.

Speaker 0:

Um, google they put it. Let me just check here. Yeah, so they put their, they classify their link indexes into three buckets, three tiers low, medium and high quality, and they use click data to work out which link graph a link you know should be placed within. So, for example, if a link is getting no clicks from the click stream data it's picking up from places like chrome, then google's going to consider that a low quality link and ignore it. But if there's a high volume of clicks from um you know, from verifiable devices, so, like I said, let's clone chrome data, then that would go into the high quality index and it's going to pass ranking signals like page rank. So lots to think about there in terms of the kind of links um that people build, especially if you're buying links, which obviously I've made my thoughts clear on whether that's a good idea or not in previous episodes, um, but if you are doing things like that, obviously links that are just from um sites and pages that are never ever going to get seen or clicked by anybody clearly are going to get ignored. So, um, that's some really interesting stuff there.

Speaker 0:

Some other interesting attributes that I came across um site authority attribute. Google for years have said there's no such thing as a site weighting, so domain rating, that kind of thing. There is mention of a site authority attribute within the documentation that can be applied to web pages based on the site authority attribute. That goes site wide. Next, we've got an attribute called host age and that is used it says in the document specifically to sandbox fresh spam in serving time, um. So this is a attribute where they obviously look at how old a website is the age of the host and then they could apply this um attribute to it to essentially it's the sandbox. I know there's a lot of it's one of those topics that people have argued for and against a lot over time, um, but clearly there is some kind of sandboxing in there. Now, whether it applies, for what length of time it applies, we don't know how they decide if they're going to apply it, we don't know. I mean, there might be ways out of that. For example, if a site gets a good number of links when it's brand new from authoritative sources, it might bypass this sandbox filter. But there definitely is something in there called hostage, now this sandbox filter. But there definitely is something in there called host age. Now, to that I found particularly interesting access.

Speaker 0:

I've got a big interest in topical Authority. I mean, building topical Authority is what I've tried to do for years with websites, I think reasonably successfully, and there's two attributes in there. First one is called site focus score and this is how much a site is focused on one topic. So obviously Google is looking at sites and trying to work out. It's just a generous site is focused on one topic. So obviously Google is looking at sites and trying to work out is this a generous site? Is it on one topic? What range of topics is it on? So they've got that site focus score and to go with that they've also got a site radius attribute, which is a measure of how far away from the main topic of a site a page strays. So this is so they. This is, for example, you put a new page up and if it's not on topic with the current site, then the radius score is going to be high because you can become further away from the core topic of the site. So quite an interesting one here. There's a lot more research to do on this one because it might show some interesting ways of how we can work out how tight to a site's core topic is the topical authority area of pages. And when trying to decide which new pages to add to a site, it might be interesting to try and work out with embeddings how close, how far away from the center of that topic area are pages, and it might be helpful to identify what core pages need to be worked on on a site. So there's something really interesting to think about there now, as I say.

Speaker 0:

This is these this leak covers two and a half thousand modules with over 14 000 attributes for across these modules. There's a huge amount of data here. Caveatsats are obviously we don't know which of these, any of these modules and attributes, might be deprecated, that's as in they're no longer being used by Google. Date wise, it looks like they're somewhere. These docs are somewhere between just a month or so old or up to a year old, so there is maybe a time gap in there where some things might be out of date.

Speaker 0:

In terms of its authenticity, um rand specifically mentions in his post how he's approached various other ex-googlers to see if they think it looks genuine, and the kind of feedback he's getting is that yes, it does look genuine. I mean, if it is a hoax, someone spent an awful lot of time creating this, even if it was done with ai, I mean. I think if someone had used ai to try and create this, I think the two guys who looked at it so far around and might probably would have picked up if there was ai and the inconsistencies there, um, to create this, so I think it looks genuine. Um, ex-googlers think it looks genuine, so it does look genuine. But, yeah, obviously the caveats are we don't know what isn't there. We don't know what's deprecated.

Speaker 0:

Again, with all of these things, there is no further detail on the various importance of different things. So, while it definitely clears up some questions that people have had for many years over what doesn't, doesn't or hasn't hasn't been included, and it definitely gives lots of information around there, I suspect there's going to be a lot of research and analysis done by lots of people on this. There's going to be loads more coming out. So, um, there's going to be lots to watch here, um, but I just wanted to get it out quickly, get it shared so you can take a look yourself. Um, I'd love to hear everyone's thoughts, things that interesting things you find. Just get in touch in the usual ways and, yeah, hope it's useful and good luck reading into it.

People on this episode