Website security is what we eat, sleep, and breathe. It’s what we do best because we deal with hacked websites every single day, thousands of them. Among the various types and evolution in attack scenarios, one has remained the same for all these years—spam infections.
A spam infection could be a serious problem for online businesses when it remains on the website long enough for Google, Bing, or other website blacklist authorities to spot it and block site access.
Having a blacklisted website could lead to revenue loss, brand damage, and it can take a long time to recover from. It doesn’t matter the size of your website, your website is a like a big, juicy steak for hackers. And in this article, we’ll explain why.
What Do Spam Infections Look Like?
Spam content comes in many forms, but in general they are random directories or files stored among legitimate files. From the visitor’s side, spam content appears on Google search results as your own website content.
Here is a sample of spam directories within a legit WordPress structure (blue lines):
And here is the content of the 8chk directory:
Here’s a different sample of spam displayed in Google Search results:
One symptom of spam infection a very large number of 404 error messages on Google Analytics, Google Search Console, or other site monitoring tools.
Google Search Console Malicious Ownership Requests
Once a website is up and running as it should, it is uncommon to browse files via SSH/FTP/cPanel. Fortunately, there are other symptoms we can check that don’t depend on technical skills.
These common symptoms can be indicators that your site has been infected by spam content. Using a tool called Google Search Console (also known as Google Webmaster Tools) can help.
Google Search Console is extremely useful for website owners because it allows them to check various aspects of the site, such as:
- indexing status,
- viewing how Google views a website,
- verifying whether there are any errors to correct,
- checking for Google improvement recommendations ,
- looking for security issues such as website blacklists,
- checking keyword performance,
- determining which searches bring traffic to the site,
- inspecting internal and external links,
- sending a sitemap.
Among the various features mentioned above, one in particular is useful for both the site owner and hackers—sending a sitemap to speed up the indexing process.
Malicious Sitemaps Indexing Spam Pages
In order to establish the site and Google Search Console relationship, a procedure needs to be done so Google can recognize its authority over the site in question. Hackers use the simplest of them, which is to send an html file, provided by Google, to the affected site and ask Google to check the file on the site.
If Google finds the html file there, it means that the hacker has control over the site in question and therefore assumes that the malicious user is the owner. From there on, the hacker can use any tool offered by Google Search Console to determine how the hacked site will relate to Google’s search results. One misstep and the invaded website’s SEO will suffer the consequences.
Once the hacker has already performed property verification and has completed access to Google Search Console, they send a malicious sitemap.xml file that references the thousands of spam files stored on the hacked site.
Now Google knows the spam pages and will index them. It will not take long for spam to appear in the site’s search results.
In case the website is already linked to Google Search Console, a message will be sent to the original owner reporting that a new owner has been added to the account. This is your big red flag alert that something is wrong. It is your call for action!
If you miss this warning, or your website is not linked to Google Search Console, then you will see spam content on the search results related with your website.
You can read this article to find a more complete coverage of malicious Google Search Console verifications.
Understanding PageRank
Google accounts for a large share of the Internet traffic via organic search. Because of this, a complex algorithm needed to be created to give its users the best results first.
From the point of view of a machine, how would it be possible to distinguish a large content portal from a small site? How can it distinguish relevant content from poor quality content? These are solved by the pagerank algorithm.
At a high level, PageRank looks at keywords on the page and any internal or external links pointing to that page. Google does not provide an official explanation for it; however, some specialized SEO portals are able to list variables that are taken into account when determining what search results will appear and in what position:
- Domain Authority (DA),
- Page Authority (PA),
- Spam Score.
There are also other metrics such as Trusted Flow (TF) and Citation Flow (CF) that qualify the links that point to the site and the traffic that passes through these links.
The higher Domain Authority (DA) value, best positioned the website is on pagerank. Sometimes the website is very good, but there is a specific article or articles that are so great that many sources on the internet make references to them. The Page Authority value measures this.
On the other hand, the higher the Spam Score is, the worse for the website. Spam score is a metric used to penalize a website on PageRank and consequently, on Google search results.
Another very important aspect of Google’s PageRank is: Google knows when the website has been recently created, thus, the website is not that relevant for pagerank purposes. The website starts to become relevant after a few months of its creation. This is one of the reasons why hackers tend to choose haviking websites and not just create malicious websites.
Why Do Hackers Hack Websites?
If you are wondering why hackers hack small websites, there are two primary reasons and any of them can be the real one:
- To promote some websites and raise their respective pagerank,
- To hurt a pagerank to indirectly promote a competitor.
Your website has a good enough DA value and low Spam Score thus, the goal is to take advantage of it to fool Google’s pagerank mechanism in order to promote the hacker’s website. The better the DA value is, the better the hacker website will appear to be in Google’s eyes.
This type of hack is never on its own as we explained in this article about Search Console SEO errors. They often make cross-references to other hacked websites among the hacker’s final target.
On the other hand, if Google realizes that something is spam content, it triggers a number of harmful consequences:
- The hacked site will be blacklisted by Google and an alert message will appear in search results for all visitors.
- The pagerank of the site will also be affected and if nothing is done for a considerable period of time, the spam score is increased and the site, as a whole, can be miseread as spam (yes, the whole site) by Google.
- Even if nothing is done after that, the site can be banned from Google and no longer appear in the search results for good. It’s in the realm of possibilities, but do not worry. This is an extreme case and it takes a long long time for it to happen.
How Can I Detect the Spam Infection on a Website?
We have explained common symptoms of website spam infections in this article, but let’s make it simple. Here is a checklist on how to detect if there is spam infection on a website:
- Suspicious Google Search Console request for ownership.
- Weird files and/or directories created on a website.
- A hack warning displayed on Google search results right below the website name.
- A really large number of 404 errors could be related with spam infection as well.
- Unauthorized/bogus content displayed as your website content on Google search results.
- Bogus content on the sitemap.xml (http://yoursite.com/sitemap.xml).
- Recent and suspicious modifications on the .htaccess file (sometimes related with legit plugins, but it is worth to check it).
How to Avoid Spam Infections
Now that you have a checklist on what to look for when searching for a spam infection, you might be wondering how to prevent it from happening in the first place.
Think of spam infections like the flu. You know what it is, you know what the consequences of it are, and now you need to know exactly what to do in order to avoid it.
Our recommendations are very simple:
1- Keep your CMS (Joomla, Drupal, WordPress, others) up to date.
Whenever you see an update available for your content management system (CMS), reach out to your developer, make a proper backup, and apply the update. According to our Website Hack Trend Report, outdated software is the leading cause of website malware infections.
2 – Change your passwords regularly and choose strong ones.
In order to have a strong password, think of the password’s entropy— a measure of how unpredictable the password is.
Having a password like g0*d,Pas_w8rd!, you*vvl11*n3v9rG|_|e55TH!s! will definitely increase its entropy. Problem is, it’s so hard to remember and start to miss that good old (and breakable) P@ssword123. XKCD has a great comic illustrating this pretty well:
We highly recommend a good password manager, such as LastPass.
If you believe your website may have been infected with spam infections we’re always happy to help!