The Actual Surfer PageRank Model? Artwork

SEO Is Not That Hard

Are you eager to boost your website's performance on search engines like Google but unsure where to start or what truly makes a difference in SEO?

Then "SEO Is Not That Hard" hosted by Edd Dawson, a seasoned expert with over 20 years of experience in building and successfully ranking websites, is for you.

Edd shares actionable tips, proven strategies, and valuable insights to help you improve your Google rankings and create better websites for your users.

Whether you're a beginner or a seasoned SEO professional, this podcast offers something for everyone. Join us as we simplify SEO and give you the knowledge and skills to achieve your online goals with confidence.

Brought to you by keywordspeopleuse.com

All Episodes

SEO Is Not That Hard

The Actual Surfer PageRank Model?

June 10, 2024 • Edd Dawson • Season 1 • Episode 118

Send us a text

Could the next major shift in SEO be hidden within a recent Google API documentation leak? Find out as we uncover what could be the "actual surfer model," a potential new approach to PageRank that promises to revolutionize how we understand link quality and web page ranking. Join host Ed Dawson in this episode of "SEO is Not That Hard," where we trace the evolution of Google's PageRank from the original "random surfer" model, designed to identify the most significant pages by simulating random clicks, to the more refined "reasonable surfer" model, which predicts the most likely user behavior.

In an electrifying discussion, Ed dissects the leaked documentation, revealing how Google categorizes links into high, medium, and low-quality tiers. Learn about the implications of these tiers on your SEO strategy as Ed explains the nuances of high-quality base documents, medium-quality supplemental documents, and low-quality black hole documents. This episode is a treasure trove of insights, offering practical tips for both seasoned SEO professionals and newcomers eager to stay ahead in the ever-evolving landscape of search engine optimization. Don't miss out on the chance to understand these groundbreaking changes and what they mean for your future SEO endeavors.

SEO Is Not That Hard is hosted by Edd Dawson and brought to you by KeywordsPeopleUse.com

Help feed the algorithm and leave a review at ratethispodcast.com/seo

You can get your free copy of my 101 Quick SEO Tips at: https://seotips.edddawson.com/101-quick-seo-tips

To get a personal no-obligation demo of how KeywordsPeopleUse could help you boost your SEO and get a 7 day FREE trial of our Standard Plan book a demo with me now

See Edd's personal site at edddawson.com

Ask me a question and get on the show Click here to record a question

Find Edd on Linkedin, Bluesky & Twitter

Find KeywordsPeopleUse on Twitter @kwds_ppl_use

"Werq" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
http://creativecommons.org/licenses/by/4.0/

Speaker 1: 0:54

Hello and welcome to. Seo is not that hard. I'm your host, ed Dawson, the founder of keywordspeopleusecom, the place to find and organise the questions people ask online. I'm an SEO developer, affiliate marketer and entrepreneur. I've been building and monetising websites for over 20 years and I've bought and sold a few along the way. I'm here to share with you the SEO knowledge, hints and tips I've built up over the years the SEO knowledge, hints and tips I've built up over the years.

Speaker 1: 1:18

Hello and welcome to this latest episode of SEO is not that hard. It's Ed here, and today I'm going to be talking about some findings from the Google API documentation leak and specifically around what I'm calling a potential new page rank model, which I'm calling the actual surfer model. So if you have listened to any of the previous episodes about page rank, or listen or know anything about a page rank, the original page rank model was based on a random surfer model, which is where they allocated page rank based on modeling a surfer a surfer being any kind of web user randomly clicking on links on web pages and doing this again and again and again, recursively, and working out from that, you could work out which for the sort of most significant pages on the internet on the link graph and assign page rank accordingly. That was the very earliest model, now that was very susceptible to link manipulation. When people realized they could just add links from any web page pointing to their web page and it would sort of raise their page rank. It was very easy relatively easy, I would say to spam that link model. Then in the mid to early 2000, they updated the model to be what they called a reasonable surfer page rank model, and that was where Google started to. Rather than thinking that every link was equal on a page, they tried to work out which links were most likely to be clicked on a page, because fundamentally people are less likely to click footer links as they are compared to a link that's in another high navigation or which is it within, like the context, of the content on a page. So google was then trying to think well, actually it's not a random click, what's the, what's the most reasonable links that someone's likely to click on? And again, this changed how linking worked and how they valued different links. And at the same time they also implemented the no follow attribute where they were saying to people if you're using sponsored links or links that you don't want to follow page rank through, then use the no follow attribute against those links. And again, this is google trying to combat web spam. Yeah, link spam, um, and that was where we got to. And there was a few other things like the. They implemented seed sites that were sort of more trusted sites that would flow a high level of page rank in the first instance um into. So if the closest you were to a seed, close you were to a seed site on the link graph, then the more likely you were to, the less likely you were to be spam. And that's where we were.

Speaker 1: 4:02

Now the Google lead has got some interesting things in about links and one of them is that for every link in the link graph they mark it with a source type attribute and that attribute is linked as either type high quality, type medium quality or type low quality. And this gives them three kind of buckets of link graph. So they in the description and the the comments on the api documentation, they say authors. Anchors mark type high quality are from base documents. Anchors marks type medium quality are from documents of medium quality roughly but not exactly supplemental tier documents, and anchors marks type low quality are from documents of low quality, roughly, but not exactly, black hole documents. So here you can see that obviously you've got three core tiers of link quality, basically, or document quality that a link is coming from. So the high quality ones are from what they call base documents, which is probably their top tier link graph. Then they have the medium quality ones, which they call the supplemental tier, and anyone who's been seo a very, very long time remember back in the early I think previous to 2010 there was a thing known as a supplemental index and you used to see results on google.

Speaker 1: 5:27

If you put in a real long tail query you would get results and it will mark them as being in the supplemental index. And that was an index that google would go to if its main index didn't have any results and it would then search the supplemental index. So people used to spend a lot of time trying to get content out of the supplemental index into the main index because you're much more likely to be shown in the main index. So it's interesting to see that they have a supplemental level of um documents. So that sort of supplemental tier still mentioned in these um, in this api leak, and then the type low quality which they call essentially black hole documents. So that's really really low quality.

Speaker 1: 6:03

Now the interesting thing is how google decide whether a source document, the source web page, is in a type high quality, medium quality or low quality. In Rand Fishkin's original blog post on SparkToro, which broke the news of the Google API leak, he mentions in conversations that he had with a person that leaked the information was that they mentioned that Google has these three buckets for classifying the link indexes, so the low, medium and high quality that we've just covered. And they say that click data is used to determine which link graph index tier a document belongs to, and that's in a total clicks attribute that is attached to the document record. So what they're saying is that if a page gets no clicks, then it goes into the low quality index and the link's ignored, essentially for ranking purposes. But if a page has a high volume of clicks from verifiable devices now this is from chrome data then it goes into the high quality index and that link passes ranking signals. And then obviously there's the medium tier as well. So potentially that is one where Google isn't yet sure whether it's getting enough traffic to warrant passing link juice or it's not getting quite zero and it's maybe in a grey area there and we don't know exactly what happens in that medium tier.

Speaker 1: 7:38

Now, this is interesting because obviously Google's getting this click data from Chrome, from actually the billions of users worldwide of Chrome. Google is watching that click data we now know from these leaks and it's knowing what pages are being loaded and so it can therefore see, you know, how much traffic individual pages are getting, and it's not just from google, it's not just traffic that google is sending it, which obviously would know people clicking on its own search results to go to a page, but it also now knows how people navigate sites, independently of how they got there. So this means that Google can assign actual traffic to a page. So if people are creating web pages purely to put links on and no one's really visiting those pages, then Google can now easily determine actually this page is getting no traffic, so there's no actual surfer on that page, there's no one actually clicking that link, so therefore no actual surfers are clicking that link, so it can be excluded from the link graph. And then on high volume pages which are getting high volumes of traffic, google knows there's plenty of visitors on that page, so therefore actual surfers are likely to be clicking those links.

Speaker 1: 8:58

Now it doesn't mention anywhere that it records actual clicks on actual links. That would be a very interesting bit of information if they did do that, and I don't know whether they can't see technically why they couldn't do that with Chrome. But maybe it's just too much information for them and they're just going on the how many people on the page? And then they may actually use reasonable surfer model, looking at the page and where links are on the page and say, right, well, we're going to use a reasonable surfer model based to sort of to flow the page rank through different links, but we know that there's lots of actual surfers on this page. So this is you know I, if you just think about it, it is a really good way of being able to identify what is link spam and how to ignore it, because clearly there's no point passing PageRank through any link from a page which doesn't get any traffic.

Speaker 1: 9:55

Now could people sort of fake clicks? Yes, you could fake clicks. Yes, you could fake clicks.

Speaker 1: 10:02

But obviously I think Google are doing some very specific things around deciding whether an actual user of any individual Chrome is a real user. So obviously they have things like reCAPTCHA, which they know how people use web browsers, how the mouse moves around, what kind of actions people take. Do they go and watch YouTube videos at any point? Do they do all sorts of other things? So they're not just like this Chrome user just goes to a small set of sites and just does a certain set of actions. They could easily discount that. So I think, while not impossible, it was very hard to fake clicks, certainly at the scale that I think would move the needle on this. So yeah, so that's why I'm thinking this might be a kind of an actual surfer model.

Speaker 1: 10:54

So this is interesting to see how Google uses this to combat link spam, and I think it just shows that if you are buying very low quality links, then they probably are being completely devalued and and of no use to you whatsoever and probably a complete waste of time. On the flip side, it shows that those kind of digital pr links, from where people are getting links placed on high traffic sites, like high traffic news sites, could be more valuable than they first seemed, even if, for example, they are no followed links. Now, lots of newspaper websites automatically just no follow all their links, and a few years ago google changed how they described how they would treat nofollow links and they changed it from being that we will not pass page rank at any but under any circumstances, from a nofollow link to a, we will take the nofollow as a hint when it comes to how they flow page rank. Ie, they're saying we might pass page rank through a nofollow link and the question was always well, why, how would they decide whether to flow any page rank through a nofollow link? What, what circumstances might make them decide to to pass page ranks, so there's no follow links?

Speaker 1: 12:17

It could well be that if you that nofollow link is on a highly trafficked enough web page, google may look at it and say there's lots of people coming to this web, it's a really popular page and even though it's a no follow link, we are going to actually pass page rank through, or a certain amount of page rank through, and it may relate down to other things like it depending on where the page, where on the page that that link is placed. If it's in context and surrounded by lots of text that is contextually relevant to the page that is being linked to, it might decide. Actually, this is looks more editorial, even though it's been flagged as nofollow, but they might say well, actually this, this site, nofollows everything. We should therefore try and work out which ones. Rather than being a blanket nofollow, we might try and work out that some, that there is going to be some editorial links amongst these nofollow links, and we'll try and work it out. And this could be a way of doing it based on traffic and based on the context of the link where it is on the page. So, under the reasonable surfer model, I might think that this link is being clicked by reasonable surfers who are following contextual links to suitablyably contextual pages, landing pages. They may then decide to pass um page run through.

Speaker 1: 13:29

So I think, if anything, this, this um information, makes me think that, yeah, digital pr no follow links might have a lot more value than I originally thought and, yeah, I think this is a clear and obvious way where the real sort of spammy, low quality links probably being completely discounted. You know, pbn links, that kind of thing where there's, you know, these pbns get no traffic and people are creating sites just for putting links on and you're not going to get any traffic there. They're just completely worthless and not worth bothering with. So, anyway, just wanted to share this thought, these thoughts. It's only a theory, it's only a hypothesis, so you know there's more to maybe be learned here, but worth sharing. I thought, and I'd be interested if you know anyone's got any thoughts on this. Do get in touch. Until next time I'll see you later.

Speaker 1: 14:17

Before I go, I just wanted to let you know that if you'd like a personal demo of our tools that keywords people use, that you can book a free, no one-on-one video call with me where I show you how we can help you level up your content by finding and answering the questions your audience actually have. You can also ask me any SEO questions you have. You just need to go to keywordspeopleusecom slash demo where you can pick a time and date that suits you for us to catch up Once again. That's keywordspeopleusecom slash demo and you can also find that link in the show notes of today's episode. Hope to chat with you soon. Thanks for being a listener. I really appreciate it. Please subscribe and share. It really helps.

Speaker 1: 14:56

Seo is not that hard. It's brought to you by keywordspeopleusecom, the place to find and organize the questions people ask online. See why thousands of people use us every day day. Try it today for free at keywordspeopleusecom to get an instant hit of more seo tips. Then find the link to download a free copy of my 101 quick seo tips in the show notes of today's episode. If you want to get in touch, have any questions, I'd love to hear from you. I'm at channel 5 on twitter. You can email me at podcast at keywordspeopleusecom. Bye for now and see you in the next episode of SEO is not that hard.

People on this episode

Edd Dawson

Host