Google’s Site Quality Algorithm Patent

A Google patent describes a method of classifying sites as low quality by ranking the links. The algorithm patent is called, Classifying Sites as Low Quality Sites.  The patent names specific factors for identifying low quality sites.

It’s worthwhile to learn these factors and consider them. There’s no way to know if they are in use. But the factors themselves can help improve SEO practices, regardless if Google is using the algorithm or not.


An Obscure Link Algorithm

This patent dates from 2012 to 2015. It corresponds to the time that Penguin was first released.

There have only been a few discussions of this algorithm. It has, in my opinion, not been discussed in the detail offered below. As a consequence, it seems that many people may not be aware of it.

I believe this is an important algorithm to understand. If any parts of it are in use, then it could impact the SEO process.


Just Because it’s Patented…

What must be noted in any discussions of patents or research papers is that just because it’s patented does not mean it’s in use. I would also like to point out that this patent dates from 2012 to 2015. This corresponds to the time period of the Penguin Algorithm.

There is no evidence that this is a part of the Penguin Algorithm. But it is interesting because it is one of the few link ranking algorithms we know about from Google. Not a site ranking algorithm, a link ranking algorithm. That quality makes this particular algorithm especially interesting.

Although this algorithm may or may not be use, I believe that it is worthwhile to understand what is possible. Knowing what is possible can help you better understand what is not possible or likely. And once you know that you are better able to spot bad SEO information.


How the Algorithm Ranks Links

The algorithm is called Classifying Sites as Low Quality. It works by ranking links, not the content itself. The underlying principle can be said to be that if the links to a site are low quality then the site itself must be low quality.

This algorithm may be resistant to spammy scraper links because it only comes into play after the ranking algorithm has done it’s work. It’s the ranking algorithm that includes Penguin and other link related algorithms. So once the ranking engine has ranked sites, the link data that this algorithm uses will likely be filtered and represent a reduced link graph. A reduced link graph is a map of the links to and from sites that have had all the spam connections removed.

The algorithm ranks the links according to three ranking scores. The patent calls these scores, “quality groups.”

The scores are named Vital, Good, and Bad.

Obviously, the Vital score is the highest, Good is medium and Bad is not good (so to speak!).

The algorithm will then take all the scores and compute a total score. If this score falls below a certain threshold then the site or page itself is deemed low quality.

That’s my plain English translation of the patent.

Here is how the the patent itself describes itself:

“The system assigns the resources to resource quality groups (310). Each resource quality group is defined by a range of resource quality scores. The ranges can be non-overlapping. The system assigns each resource to the resource quality group defined by the range encompassing the resource quality score for the resource. In some implementations, the system assigns each resource to one of three groups, vital, good, and bad. Vital resources have the highest resource quality scores, good resource have medium resource quality scores, and bad resources have the lowest resource quality scores.”

Implied Links

The patent also describes something called an Implied Link. The concept of implied links must be explained before we proceed further.

There is an idea in the SEO community that Implied Links are unlinked citations. An unlinked citation is a URL that is not a link, a URL that cannot be clicked to visit the site. However, there are other definitions of an Implied Link.

A non-Google researcher named Ryan Rossi describes a Latent Link as a sort of virtual link. Latent means something that is hidden or can’t be readily seen. The paper is called, Discovering Latent Graphs with Positive and Negative Links to Eliminate Spam in Adversarial Information Retrieval

A latent link happens when site A links to Site B, and Site C links to Site A. So you have this: Site A > Site B > Site C. The implied link exists between Site A and Site C.

Leave a Reply