Google Search is often considered the gateway to the Internet: it is the first step in most people’s journey to online information. However, Google doesn’t say much about how it organizes the Internet, making search a giant black box that dictates what we know and what we don’t know. This week saw a 2,500-page leak, first reported by a search engine optimization (SEO) veteran. Rand Fishkinhas given the world a glimpse into the 26-year-old mystery of Google Search.
“I think the most important thing to remember is that what Google’s public representatives say and what the Google search engine does are two different things,” Fishkin said in an emailed statement to Gizmodo.
These documents provide a more detailed view of how Google Search controls the information we consume. Getting the right web page to your computer is not a passive task since thousands of editorial decisions are made on your behalf by a secret group of Googlers. For SEO, an industry that lives and dies by Google algorithms, the leaked documents are an earthquake. It’s like NFL referees rewrote the rules of football in the middle of the season, and you find out while playing in the Super Bowl.
Several SEO experts told Gizmodo that the leak lists 14,000 ranking features that, at the very least, define how Google organizes everything on the web. Some of these factors include Google’s determination of a website’s authority on a given topic, the size of the website, or the number of clicks a web page receives. Google has previously denied using some of these ranking features in search, but the company has confirmed that these documents are real, although imperfect in its eyes.
“We caution against making inaccurate assumptions about search based on out-of-context, outdated, or incomplete information,” a Google spokesperson said in an email to Gizmodo. “We have shared a lot of information about how research works and the types of factors our systems take into account, while striving to protect the integrity of our results from manipulation.”
As for “caution”, the company will not confirm what is correct or not in these documents. Google says it’s incorrect to assume this is complete search information and tells Gizmodo that disclosing too much information could empower bad actors. Ultimately, we don’t know what goes into determining these factors, or how much weight Google Search gives to each, if any.
“We’re just looking at different variables that they take into account,” Mike King, an SEO expert who was one of the first to analyze the leak, said in an interview with Gizmodo. “It’s the granularity with which (Google) examines websites.”
This leak was first noticed by Erfan Azimi, an SEO specialist who found the API documentation publicly on GitHub. It’s unclear whether these documents were actually “leaked” or somehow published by Google in a quiet corner of the web, perhaps by accident. Azimi aimed to make these documents public by bringing them to Fishkin last week, who asked King for help in understanding them.
King notes that a ranking feature “homepagePagerankNs” suggests that the notoriety of a website’s homepage could support everything he posts. Fishkin writes that the leak refers to a system called NavBoost – first mentioned by Google search vice president Pandu Nayak in his Justice Department testimony – which is supposedly measures clicks to improve Google search rankings. Many in the SEO industry view these documents as confirmation of what the industry has long suspected: A website deemed popular by Google may receive a higher search ranking for a query, even though a lesser-known site may contain better information.
In recent months, several small publishers have saw their Google search traffic disappear. When The Verge’s Nilay Patel asked Google CEO Sundar Pichai about it last week, Pichai said he was unclear “whether this is a uniform trend.”“A ranking feature mentioned by King appears to categorize these small sites uniformly.
“They have a feature called ‘smallPersonalSite,’ and we don’t know how it’s used of course, but it indicates that (Google) is looking to understand if these are smaller sites,” King said. “Since so many of these smaller sites are being crushed right now, it just shows that (Google) is not doing anything to offset the signals from these big brands.”
Notably, Pichai later mentioned in this interview with The Verge that at other times Google had driven more traffic to smaller sites. These ranking features could indicate what levers Google can pull. As more national media organizations allow their content to be displayed on ChatGPT, Google Search also appears to be shifting toward larger publishers. Generally speaking, this could have a crushing effect, compressing what most people hear in mainstream media alone.
The ripple effects of these Google document leaks have been widely felt. Kristen Ruby, CEO of Ruby Media Group who has worked in digital PR and SEO for over 15 years, tells Gizmodo that she received a disturbing text Monday evening: “Tomorrow is shit with Google. »
Ruby quickly found the leak and noted two ranking features that stood out to her: “isElectionAuthority” and “isCovidLocalAuthority.” These features appear to be Google’s way of ranking a web page’s credibility in terms of providing appropriate information about elections and COVID-19, respectively. In 2019, Ruby wrote extensively about how Google measures trustworthy web pages (which Google calls EAT, meaning Experience, Expertise, Authority and Trust) is inherently political. She notes that Google’s measurement of these factors tends to skew along political lines.
“I have a problem with Google not providing any context on critical pieces of data like “isElectionAuthority” or “isCovidLocalAuthority.” How does Google define authority in these critical areas? Ruby said in an emailed statement. “I shouldn’t have to guess what the answer is.” Google should be available and tell me what the answer is.
Even though Google is a company with a right to private information, Ruby argues that Google has an obligation to answer questions about these ranking features that shape the world around us. King and Fishkin also noted “isCovidLocalAuthority” and “isElectionAuthority” in their articles about the leak, both emphasizing the importance of search engines in improving the quality of information.
“I think it’s really important that they provide that kind of insight into information, because whether we like it or not, Google is indeed a public utility,” King said. “They’re probably pushing me back for saying that, but we consider it the leading news source on the web.”
The way Google categorizes information in these examples is a microcosm of the entire search ecosystem. Every day, millions of questions arise about what information to amplify and what to silence. While Google and several tech companies have long tried to portray themselves as opinionless algorithms, these ranking features show that’s not quite the case. There are many more examples of ranking features revealed in the 2,500-page leak.
Searching for answers using the Google algorithm
Since Google won’t expand on these documents, telling Gizmodo that disclosing too much information could empower bad actors, SEO experts need to make sense of this on behalf of everyone who uses Google Search. Several of those 14,000 ranking features identified last week are things that Google has explicitly said it hasn’t used over the years.
In a 2016 video, a Google Search representative said, “We I don’t have a website authority score.“In a 2015 interview, another Googler said: “Using clicks directly in ranking would be a mistake.» It’s difficult to make sense of these comments today in light of the leaked documents and Google’s response.
“This response is a perfect example of why people don’t like or trust Google,” Fishkin said. “This is a non-statement that fails to address the leak, adds no value, and could very well have been written by an AI trained on the most soulless corporate email of the last decade. “
In the age of AI responses, Ruby notes that how Google ranks web pages is more important than ever. Instead of a series of links to various perspectives, you might get a clear answer through Google’s new AI insights. However, we’ve seen 10-year-old Reddit posts gain strange levels of authority, asking some users to put stick in their pizza. How Google chooses authority is increasingly important, as the top result may be the only one that now has a voice.
“We’re shifting gears. We’re moving from one research system to another,” Ruby said. “AI has a profound impact on search results.”
Ultimately, it’s hard to say what Google is actually doing with these ranking features. What is clear is that Google created these classifiers, and potentially has even more, to rank websites on the Internet. These rankings clearly require judgments, adding more evidence that Google Search is not an objective experience, but rather a series of editorial choices made by people within Google.