Have you ever noticed that Google can handle almost any question you throw at it these days?
Just look at the result for this query:
Despite not mentioning Yoda by name, Google understood who we were talking about, and what we wanted to know about him.
This wouldn’t be possible without semantic search.
In this post, you’ll learn:
Semantic search is an information retrieval process used by modern search engines to return the most relevant search results. It focuses on the meaning behind search queries instead of the traditional keyword matching.
The terminology comes from a branch of linguistics called semantics, which is concerned with the study of meaning.
Although there are countless variables at play, the principles of semantic search, why it’s needed, and how it’s influenced are easy to understand.
Users often don’t use the same language as the desired content
Even worse, we sometimes don’t even know how to articulate a search query properly.
Let’s say that you heard an unfamiliar song on the radio. You liked it and started Googling random lyrics until you finally found it.
To add another layer of complexity, compare what you type into Google with what you say to Siri, Alexa, or the Google Assistant. Keywords now become conversations.
There are just so many ways to express the same idea, and search engines need to deal with all of them. They need to be able to match the content in their index with your search query based on the meaning of both.
However challenging this may sound already, it’s just the beginning.
Many searches are unintentionally ambiguous
Around 40% of English words are polysemous—they have two or more meanings. It’s arguably the most significant challenge that semantic search is trying to solve.
For example, the keyword “python” has 533,000 monthly searches in the US alone:
If I were to ever search for “python,” I’d most likely be referring to the programming language. But anyone outside of the tech industry would likely expect the actual snake or the legendary British comedy troupe.
The problem here is that words rarely have a definitive meaning without context. On top of the polysemous words, you have countless nouns that can also be adjectives, verbs, or both. And we’re still in the scope of literal meanings. It gets even more interesting if we delve into inferred meanings (think sarcasm).
Context is everything in semantics, and it brings us to the remaining two points.
The need to understand lexical hierarchy and entity relationships
Let’s take a look at the following search query and the top search result:
That’s truly impressive. Here’s what Google has to do to understand this query:
- Know that “partner” means wife/girlfriend/husband/boyfriend/spouse.
- Understand that Obi-Wan appeared in multiple movies and series played by different actors.
- Make the connections.
- Display search results in a way that reflects the ambiguity of “obi wan.”
I can’t even imagine what kind of search results I’d get if I did that search in 2010 or earlier.
Now, let’s take a step back to explain the concepts.
Lexical hierarchy illustrates the relationship between words. The word partner is superordinate (hypernym) to wife, boyfriend, spouse, and others.
As mentioned earlier, our queries often don’t match the exact wording of the desired content. Knowing that “affordable” is anything between cheap, mid-range, and reasonably priced is crucial.
Entities, in this example, are movie and series characters (Obi-Wan), people with a specific job (actor), and people who are associated with them (partners). In general, entities are objects or concepts that can be distinctly identified—often people, places, and things.
And as if all the language intricacies weren’t enough, we must go even beyond that.
The need to reflect personal interests and trends
Let’s go back to the “python” example. If I search for this, I do indeed get all results related to the programming language.
No matter how much we dislike all the ways our personal data is used, it’s at least useful for search engines. Google uses limited data together with your search history to deliver more accurate and personalized search results.
We’re all aware of this. Just type any type of service into your search bar and you’ll get localized results:
But what’s more fascinating is Google’s ability to temporarily adjust search results based on dynamically changing search intent.
For example, coronavirus is not a new term. It has always been the name of a group of viruses. But as we all know, the search intent changed rapidly at the beginning of 2020. People started looking for information about a particular strain of coronavirus (SARS-CoV‑2), and the SERP had to be adjusted accordingly.
As you can see in the SERP position history for “coronavirus” above, none of the current top five search results ranked before 2020.
You see the same thing in the ecommerce industry during big sales events like Christmas or Black Friday. The search intent during that time is highly transactional, whereas people might ordinarily prefer to see comparisons or reviews.
Google continuously pushes out algorithm updates and technologies that further improve its capabilities of understanding natural language and search intent.
There are four important milestones that make the semantic search what it is in 2020.
Knowledge Graph
Google’s Knowledge Graph, released in 2012, is a knowledgebase of entities and the relationships between them.
You can imagine it looking something like this—but with five billion entities instead:
In short, it’s a technology that kickstarted and enabled the shift from keyword matching to semantic matching.
There are two main methods of feeding the Knowledge Graph:
- Structured data (more on that later)
- Entity extraction from text
For the second point, the search engine needs to understand the natural language. That’s when the three algorithmic updates below come into play.
Hummingbird
Back in 2013, Google launched a search algorithm called Hummingbird to return better search results. It was especially helpful for complex search queries.
Hummingbird was the first colossal update that emphasized the meaning of search queries over individual keywords. It was the much-needed catalyst for writing about topics, not keywords.
RankBrain
If you’ve ever encountered the phrase Latent Semantic Indexing or LSI keywords, forget that. Google solves the problem that LSI was created to solve with an algorithm called RankBrain.
And we already discussed the problem earlier. It was about the mismatch between the language used in search queries and the desired content.
Google’s RankBrain is powered by technologies that are way superior to LSI. In layperson’s terms, RankBrain understands the meaning of even unfamiliar words and phrases by using sophisticated machine learning algorithms.
And that’s huge considering that 15% of all search queries are new.
We can consider RankBrain an upgrade to Hummingbird, not a standalone search algorithm. It’s one of the strongest ranking signals, but the only thing you can proactively do to optimize for it is to satisfy search intent.
BERT
Bidirectional Encoder Representations from Transformers (BERT) is the newest huge upgrade to how semantic search works. It affects approximately 10% of all queries since the end of 2019.
Don’t worry; it also took me quite some time to even remember what BERT stands for.
All you need to know is that BERT improves understanding of long and complex sentences and queries. It’s a solution for dealing with ambiguity and nuances because it strives to understand the context of words better.
And while you can’t do anything to optimize for BERT per se, it’s good to know what it means and what it does in a nutshell.
I’ve already sprinkled some hints and tips throughout the article. Now let’s get truly actionable.
- Target topics, not keywords
- Assess search intent
- Use semantic HTML
- Use schema markup
- Build your brand to become a Knowledge Graph entity
- Build relevancy through links
1. Target topics, not keywords
In the old days of SEO, you could have ranked high with separate pieces of content about the same topic, but targeting slightly different keywords like:
- open graph tags
- open graph meta tags
- og meta tags
- open graph tag
- what is open graph
- facebook open graph tags
That’s no longer the case. Google now understands that all these searches mean much the same thing, and ranks mostly the same pages for them all.
Keep this in mind when creating content. No longer is the aim to rank for just one keyword but to cover a topic in-depth so that Google ranks your page for lots of similar and long-tail keywords.
For example, our article about Open Graph meta tags ranks well for hundreds of keywords. Many of these are other ways of searching for the same thing, but some are subtopics like “og:title,” “og url,” and “og:image.”
We’re able to rank for all of these keywords because we wrote an in-depth article about the topic, not just about a single keyword.
Looking at this report for a top-ranking page about the topic is a good way to understand what subtopics to write about. For instance, say you wanted to write a post about growing asparagus. If you plug the top-ranking page for “growing asparagus” into Ahrefs’ Site Explorer and check the Organic Keywords report, you see that it’s ranking for these keywords amongst others:
- how deep to plant asparagus
- asparagus growing conditions
- when to plant asparagus
- best place to plant asparagus
- how to harvest asparagus
- how to care for asparagus plants
These are all things you’d want to mention to create an in-depth post that gets as much organic traffic as possible.
A word of caution, though. Targeting a particular topic doesn’t mean that you should cover absolutely everything related to that topic or go too in-depth.
Take this article as an example. I could have spent tens of hours researching natural language processing and going deep into the technicalities of semantic search. I didn’t do that because most people don’t care about it.
Which brings us to the next point.
2. Assess search intent
You can still publish content around a certain topic that doesn’t align with the search intent.
Let’s say that you’re a marketing data geek, and you see an opportunity to target the topic, “SEO report.” Naturally, you want to share everything that’s needed to create the best SEO report. So you come up with something like “Use the Power of QUERY to Create the Best SEO Report.”
It may indeed be the piece of content that ultimately leads to the best SEO report. But most people searching for this topic won’t be familiar with many Google sheets functions. They just want something that can do the job for them:
So, before you start outlining a new piece of content, look at the top-ranking pages to infer the search intent.
Recommended reading: Searcher Intent: The Overlooked ‘Ranking Factor’ You Should Be Optimizing For
3. Use semantic HTML
Before we were able to progress to semantic search, we had to start making the shift towards a semantic Web. The original concept of WWW could be interpreted as standardized interlinked documents with no explicit meaning. By now, it should be clear that we need meaning.
And it all starts with your basic HTML.
Compare the following HTML elements:
Semantic HTML adds meaning to the code so machines can recognize navigation blocks, headers, footers, tables, or videos.
HTML5 provides the most semantic elements, which most modern CMS themes already use. If yours doesn’t, there’s usually a plugin you can use to add them.
But semantic HTML is still quite limited. While it says, “this is a table, this is a footer,” it doesn’t convey the meaning of the actual content. That’s why we schema markup.
4. Use schema markup
Schema markup is an additional way of marking up your pages. It’s also referred to as structured data, which can be described as a common semantic framework for the Web.
Schema.org vocabulary contains hundreds of types that are associated with properties. You can use these to markup your content in a way that’s easy for Google to understand without complex algorithms.
For instance, it would be easier for Google to extract meaning from structured content like this:
cooking time: 20 minutes calories: 80
… than from natural language like this:
It will take 20 minutes to make the pancakes. Even better, these are low-calorie pancakes—around 80 per serving.
So when a user wants to know how long it takes to cook a pancake, or how many calories it has, Google can serve the information in the best way.
5. Build your brand to become a Knowledge Graph entity
The heading is pretty much self-explanatory because I already talked about entities, so I’ll just point you to our article about getting into the Knowledge Graph.
Among all of the tips on adjusting your SEO to semantic search, this one is the most difficult to turn into reality. It’s a long term consequence of brand building and applying the rest of these tips.
6. Build relevancy through links
Links were historically one of the first indicators of relevancy. If document A linked to document B, they could have been seen as related.
Both internal and external links from relevant pages using natural anchor text help Google figure out what your content might be about—even before processing it.
Final thoughts
Semantic search has changed the whole content ecosystem. Users get more relevant and valuable content, and that motivates publishers to produce such content.
While there are sophisticated technologies and algorithms involved, the principles of semantic search are easy to understand. You should now be ready to make any changes necessary and to future-proof your SEO.
Do you have any questions or comments regarding semantic search? Ping me on Twitter.
If you want to learn more about the technicalities around semantic search, follow Dawn Anderson and check out her presentations.