An Interview with Seznam's Search Division Director

As we all know, Google is not the only search engine being utilized by users around the world and faces competition in select countries from the likes of Yandex and Baidu.

There is however another search engine competing with Google – one that, for the most part, a lot of SEO professionals never encounter as it’s limited to the Czech market, Seznam.

Seznam was founded in 1996 by Ivo Lukačovič, and reached a peak around 2010 when it was estimated that the then population of the Czech Republic (10.5-million) were visiting the search engine monthly, if not more frequently. However, since 2010, Google has gained more dominance in the market and is now the dominant player.

Unlike in other markets, Seznam and its various products still hold a substantial market share (roughly 30% of the search market and processing 15-million queries a day).

Optimizing for the engine still requires consideration when entering Czechia.

For any search engine competing against Google, being able to understand how you approach the battle can be useful (as an SEO) in your own critical thinking and thought processes.

Seznam SEO: An Interview with Seznam’s Search Division Director

Tomáš Pergler, Seznam’s Search Division Director

To gain insight into Seznam’s approach, I’ve been fortunate to ask Tomáš Pergler, Seznam’s Search Division Director a few questions about Seznam’s approach to modern day search and how their search engine processes modern JavaScript powered websites.

A lot of external sources attribute Seznam an 11% market share of the Czech search engine market, down from an approximate 30% in January 2010. Where does Seznam place itself within the market in terms of share? Dusan Janovsky has previously quoted 25%, is this still accurate?

Tomáš Pergler (TP): It is hard to say, as Google does not publish its numbers in the Czech Republic.

We guess that we have something around 30% share on desktops but we lose on mobile devices. It is very hard for us to compete Google as its applications are preinstalled in all Android devices.

At the beginning of this year, advertising agency Evisions published its case study about how its clients use Google and Seznam. It is only in Czech but I am sure you will be able to understand the graphs.

When we (as SEO professionals) talk about optimizing website we tend to focus on three core areas, these being backlinks, technical, and content. Without giving away the secrets to Seznam’s algorithm, could you enlighten us on how Seznam attributes these factors and establishes its ranking system?

TP: At some level of abstraction, we break the problem of relevance and ranking down into three essential dimensions: Accuracy, Usefulness and Quality.

Accuracy deals with what the user query is about, what the document or website is about and how well they correspond with each other.
Usefulness means how useful the given result could be for most users. Some results can be accurate but useless – out of stock products, outdated news, missing body text etc.
Quality deals with usability, navigation, information structure, and credibility. Some results can be accurate and useful but low quality.

Basically, we classify each document or query-document in all three dimensions independently.

Accuracy and usefulness are query-document based, so they need to be evaluated in a query time.

Quality can be evaluated right after the document is downloaded as it’s a document-based classification only.

Each classifier utilizes specific features:

Link graph features are vital for usefulness and quality.
Body text size is one of the features important for usefulness. In most cases, empty body text is useless.
HTTPS has some importance for quality.
User feedback features are strong for predicting quality.
Dictionary-based features are strong for usefulness and quality.

Results visible in the SERP are supposed to be accurate and useful already, so we rank them mainly according to quality.

Backlinks have an overall importance for us. Each dimension utilizes them in a different way. When predicting accuracy, we need to find out what the page or site is about.

Backlinks hold the information about how the site is remembered by users. The information from anchor text on highly visited page is valuable, because a lot of people use it for navigation.

That’s why we need backlinks for navigational queries. However, more backlinks don’t mean more accuracy.

Usefulness is a different task entirely. The link graph features are strong here, which is why links work in SEO. But it should not be just about more backlinks = better rankings.

Historically, we have used some mechanisms to predict which links are good or natural and which are bad, but it’s a very complicated task and there might be better ways.

Direct and indirect links from highly credible websites can positively affect quality score of the target. Highly credible websites don’t often sell backlinks; hence they provide trust, which propagates through the link graph.

When dealing with accuracy, a content itself is the most easily available source of information. It basically tells us what the page is about.

A page title and an advanced body text extraction are essential. We use BM25-like scoring functions and we focus on Czech.

How capable is Seznam in terms of processing JavaScript websites?

TP: We have been experimenting with webpage rendering for some time now – several years ago every snippet in the SERP contained a page thumbnail (it was just a “design” feature).

Nowadays we use webpage rendering for better understanding of a content and layout of the webpage.

Currently it is still done just on low volume of crawls, mostly for news articles. From a rendered page we can extract its main text, main image or article release date very accurately.

In the future, we want to increase and balance the ratio of rendered crawling traffic to the consumption of the crawl budget of the crawler.

Now, an average rendered crawl of one webpage consumes tens of GET requests compared to one GET request for not rendered crawl.

We use the latest stable chrome version (currently 73.x) for page rendering which means that our crawler gets the same results as a real user does.

How big of a factor is mobile usability in Seznam’s ranking determinations?

TP: Our actual quality classifier currently doesn’t distinguish between desktop and mobile, but we’re preparing a new quality rating.

Our raters will rate the quality directly on their cell phones. Based on this kind of data, we’ll probably find some new features. We’re planning to incorporate them into our ranking model till the end of Q3/2019.

How often does Seznam update its ranking criteria/algorithms?

TP: Last year we rebuilt our main relevance model completely. Once. Then there were some minor tweaks through the year.

Our team has grown up and this year we want to go faster.

How does Seznam handle an international website (i.e., one with multiple language versions, Czech included)?

TP: The search engine crawler SeznamBot is focused on the pages that our users can possibly search for. That means it digs the web for pages in Czech language in the first place.

SeznamBot also crawls other pages to allow the search engine to answer “global” queries – for example: navigation to international sites, global companies, programming, videos, social, etc. so it also crawls international web.

If some website is serving content in multiple languages on the same URLs, the crawler gets only one language version – preferably Czech.

If the language versions are accessible through different urls, then SeznamBot may crawl several language versions of the pages – for example the same content at Wikipedia in Czech, Slovak, and English.

Does SeznamBot face any issues with crawling non-Czech websites?

TP: Unfortunately, we experience serious access problems when crawling international web. Increasing number of websites tends to block all traffic except for GoogleBot.

For example, we’ve communicated with ProjectHoneyPot.org recently to whitelist our crawler’s IPs, because SeznamBot is a standard search engine crawler and blocking it doesn’t bring any benefit.

It would be great if this article would help encourage webmasters to allow SeznamBot to their sites, so it may bring some visits from users in the Czech Republic.

Or at least to show them, that it is helpful to have a technical contact in the comments section of the robots.txt file on their web.

Moving away from search for a moment, how popular are Seznam’s other assets (such as Novinky, Sreality, Mapy)?

TP: Seznam.cz has a reach on 95% of the Czech Internet population. More than 3.5 million people visit our home page every day and within two years we managed to get almost a million visitors a day on SeznamZpravy.cz (our own news service).

Some highlights from our wider products include:

Email.cz – The email service facilitates 71.5 million emails daily.
Firmy.cz – Firmy is a structured catalog of businesses with contact information and reviews, listing more than 670,000 companies.
Kupi.cz – An online discount/coupon-code directory, with more than 10,000 discounts and 300 advertising flyers daily.
Mapy.cz – A maps provider, during the tourist season this can reach up to 1-million users daily.
Novinky.cz – The most popular online Czech news website, with 57% of the online news reading market share.
Sport.cz – The most visited sports website in the Czech Republic, with more than 1,600 articles and 160 online streams daily.
Stream.cz – An online TV streaming service, with more than 35-million views per month.

More Resources:

Image Credits

Tomas Pergler image from ihned.cz, April 2019

Also a thank you to Seznam, Tomáš Pergler, and Aneta Kapuciánová for facilitating the interview.

Source link