Black Hat SEO

From Digital Marketing Wiki
Black Hat SEO is named after the black hats worn by typical western movie antagonists.

Black hat SEO, or spamdexing, encompasses an array of Search Engine Optimization (SEO) strategies that are discouraged by most search engines.

Although black hat SEO strategies can be more efficient in the short term at increasing organic rankings and traffic, practicing black hat SEO carries a higher risk of penalty. Penalties can be manual or automatic, the former often being more severe than the latter.

Often described as a "cat and mouse game"[1], SEO professionals constantly attempt to reverse-engineer search engine algorithms in order to game them. While this in itself is not black hat, some professionals attempt to use as much black hat SEO as they can without getting caught and penalized.

Webmasters with sites that need to maintain a trustworthy public image, such as those of big brands, should be discouraged from heavy engagement in black hat SEO activities because they stand to lose much more in terms of PR and domain authority than the average blog owner.

Small blog owners should only attempt black hat SEO strategies in moderation but should be warned that future search engine algorithm updates could result in future traffic losses.

No successful long-term SEO strategy can rely solely upon black hat SEO. If your site lacks content of any quality or originality, it will almost certainly rank poorly or be penalized.

History[edit]

Aliweb was the earliest search engine, launched in May 1994. It ranked pages based on signals like keyword density and metatags, which made it easy to manipulate with black hat SEO.

The first instances of black hat SEO can be traced to the infancy of the consumer internet and the first search engines. ALIWEB and other early search engines relied on rudimentary signals such as keyword density and meta tags for determining a page's relevance and ranking.

Simple search engine algorithms like this that are based almost entirely on trust are open to abuse. Webmasters would stuff entire dictionaries into their HTML and rank pages for all kinds of irrelevant keywords. This in turn would degrade the quality and relevance of the searcher's results and cause users to move on to better search engines.

Black hat SEO is one reason why Google became the dominant search engine; others lacked the algorithmic sophistication necessary to avoid manipulation and thus a decline in result quality and relevance. Google's initial algorithm, PageRank, made it superior to its competitors because it relied on factors that were more difficult to manipulate than just meta tags and keywords.[2]

Contemporary search engines have since evolved far more sophisticated algorithms that are capable of automatically detecting and punishing black hat SEO activity. However even Google lacks the resources to detect every single instance of black hat SEO and many professionals continue to practice black hat methods undetected.

Definitions[edit]

Google[edit]

Google's Webmaster Guidelines outline the following basic principles:[3]

  • Make pages primarily for users, not for search engines
  • Don't deceive your users
  • Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you'd feel comfortable explaining what you've done to a website that competes with you, or to a Google employee. Another useful test is to ask, "Does this help my users? Would I do this if search engines didn't exist?"
  • Think about what makes your website unique, valuable, or engaging. Make your website stand out from others in your field

Google then advises to avoid the following:

  • Automatically generated content
  • Participating in link schemes
  • Creating pages with little or no original content
  • Cloaking
  • Sneaky redirects
  • Hidden text or links
  • Doorway pages
  • Scraped content
  • Participating in affiliate programs without adding sufficient value
  • Loading pages with irrelevant keywords
  • Creating pages with malicious behavior, such as phishing or installing viruses, trojans, or other badware
  • Abusing structured data markup
  • Sending automated queries to Google

Google then advises to follow good practices like these:

  • Monitoring your site for hacking and removing hacked content as soon as it appears
  • Preventing and removing user-generated spam on your site
  • If your site violates one or more of these guidelines, then Google may take manual action against it. Once you have remedied the problem, you can submit your site for reconsideration.

Bing[edit]

Bing's Webmaster Guidelines are similar to Google's and can be read here[4]

Methods[edit]

Article Spinning[edit]

Article spinning is the practice of rewriting text using software that employs algorithms, often based on Markov chains[5], to mimic human writing.

While rewriting itself is not black hat, article spinning often produces poor imitations of the original text and that typically contains grammatical errors and nonsensical synonyms. Some article spinning services are free, others are paid and higher quality but still fall short of the work a professional human writer can produce.

Search engine algorithms are sophisticated enough to detect rewritten content of low quality or effort. Rewriting other people's work is still a viable SEO strategy that is not black hat but requires sufficient effort to ensure it's superior to the original version.

Article spinning violates Google's Automatically Generated Content policy[6] and is best to be avoided as it provides little value to human users and search engines.

Cloaking[edit]

Cloaking encompasses a number of 'bait-and-switch' methods that are some of the highest risk practices of black hat SEO. Source[24].

Cloaking encompasses a number of 'bait-and-switch' methods that involve showing different content to the user and to search engine crawlers on the same page.

A user may click on a search result that seems relevant because of its meta description only to be redirected to a page that has no relevance to the original query.

According to Google's Webmaster Guidelines, "Cloaking refers to the practice of presenting different content or URLs to human users and search engines. Cloaking is considered a violation of Google’s Webmaster Guidelines because it provides our users with different results than they expected.

Some examples of cloaking include:

  • Serving a page of HTML text to search engines, while showing a page of images or Flash to users
  • Inserting text or keywords into a page only when the User-agent requesting the page is a search engine, not a human visitor"
  • [7]

Cloaking is also one of the highest risk black hat SEO strategies because it can result on a manual penalty--a permanent ban of your website--from search engines.

A typical method of cloaking with Apache servers is done by editing the "mod_rewrite" module in a site's .htcaccess file. With a database of known search engine crawler IPs, the edited "mod_rewrite" module detects whether a user is a human or a search engine crawler and shows them the appropriate content. Typically the content shown to crawlers is keyword-dense but spun text. Human users will find themselves directed to a monetized page with little or no relevance to their original query.[8]

Other cloaking practices include keyword stuffing with invisible or hidden text, abusing Flash, and high HTML density.[8]

Practices that involve showing different content to users and search engines that are not considered cloaking are dynamic geographic content serving, and link cloaking (e,g. hiding affiliate tracking IDs on links).[8]

In 2006 Google banned BMW for abusing the hidden text method of cloaking.[9]

Doorway Pages[edit]

A doorway page is a web page designed to rank highly for organic searches but serve as an intermediary page, typically thin on content, and funnel users to other pages.

Google's definition of doorway pages is as follows:

"Doorways are sites or pages created to rank highly for specific search queries. They are bad for users because they can lead to multiple similar pages in user search results, where each result ends up taking the user to essentially the same destination. They can also lead users to intermediate pages that are not as useful as the final destination.

Here are some examples of doorways:

  • Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page
  • Pages generated to funnel visitors into the actual usable or relevant portion of your site(s)
  • Substantially similar pages that are closer to search results than a clearly defined, browseable hierarchy"[10]

Successfully implemented doorway pages will occupy multiple results for a search query yet lead users to the same destination.

When combined with cloaking, doorway pages are particularly difficult for search engines to detect.[11]

Duplicate Content and Plagiarism[edit]

Duplicate content is the practice of posting unoriginal content on a website. If the original source is not credited then it is plagiarism and could be subject to legal repercussions.

Google accepts that most duplicate content is not "deceptive in origin" or intended to be plagiarism[12], and as such, claims to be generally tolerant of duplicate content.[13]

Google claims that while publishing duplicate content carries no penalty unless it's "spammy", Google will always attempt to determine the original source of the content before deciding which page ranks higher[14].

If a site contains multiple similar pages it's recommended that the webmaster indicates which page should take priority in Google's search results. The page that takes priority over similar pages is known as the canonical URL, the method known as canonicalization[15].

Guest Posting Networks[edit]

Hidden Text and Links[edit]

Hidden text was a viable SEO strategy before search engine algorithms became more sophisticated. Source[25]

Hidden text on a web page is text that is not intended to be visible to the human user but only present to be read by search engine crawlers. It is one of the oldest examples of black hat SEO, practiced since the first search engines emerged in the mid 90 when inserting entire dictionaries of irrelevant keywords into pages was a common practice[16].

Example methods of creating hidden text include:

  • Inserting white text on a white background, or any colored text on the same color background.
  • Hiding text behind an image.
  • Positioning text off screen.
  • Using a font size of zero.
  • Hiding links by linking punctuation rather than words.

Due to the age and simplicity of these techniques, most search engines will detect and punish them automatically[16].

Google has patented a system of detecting all of the above methods of inserting hidden text[17].

Google has patented a system of detecting all of the above methods of hiding text and links[18].

Keyword Stuffing[edit]

Keyword stuffing was a common black hat SEO tactic in the 90s[16].

Keyword stuffing is the practice of inserting many redundant instances of keywords into a page's visible text, HTML or meta tags. Since meta tags no longer influence modern search engines, only text and HTML keyword stuffing can be considered positively or negatively impactful.

Keyword stuffing can occur in any of these page elements:

  • Meta descriptions
  • Meta keywords
  • HTML headers
  • HTML bodies
  • HTML comments
  • Image alt tags
  • Visible or hidden text

Studies have shown that keyword densities of ~2% are sufficient to rank in the top 10 Google search results. Source[26]

Studies have shown that keyword densities of ~2% are sufficient to rank in the top 10 Google search results[19].

Parasite Hosting[edit]

Parasite hosting is when a page is created by an external user on a high authority domain for the purpose of 'piggybacking' off of the domain's high rankings.[20]

Common sites leveraged for parasite hosting include Medium, Amazon s3, Github, LinkedIn, Quora, YouTube, Glassdoor and Facebook. It is not easy for search engines to detect whether a Medium article, for example, has been published with the intent of it being a parasite or not. However if irrelevant content exists on niche sites like Glassdoor then it should be clear to search engines that it is a parasite.

Parasite hosting can be used for negative SEO as well as positive SEO. Source[27]

Rich Snippet Markup Spam[edit]

Rich snippet markup spam is the misuse of rich snippets, a type of structured data that's used to instruct search engines on how to present the information on a site, in an attempt to manipulate search engines.

Google provides guidelines on how to avoid penalty for misusing rich snippets:

  • Don't mark up content that is not visible to readers of the page. For example, if the JSON-LD markup describes a performer, the HTML body should describe that same performer.
  • Don't mark up irrelevant or misleading content, such as fake reviews or content unrelated to the focus of a page.
  • Don't use structured data to deceive or mislead users. Don't impersonate any person or organization, or misrepresent your ownership, affiliation, or primary purpose.[21]

Google now penalizes websites suspected of rich snippet markup spam with manual penalties, its most severe punishment[22]

Link Schemes[edit]

A link scheme is any strategy that involves the webmaster's involvement in the deliberate placement of a link to their own website.

Google defines each of the following as being examples of link schemes:

  • Buying or selling links that pass PageRank. This includes exchanging money for links, or posts that contain links; exchanging goods or services for links; or sending someone a “free” product in exchange for them writing about it and including a link
  • Excessive link exchanges ("Link to me and I'll link to you") or partner pages exclusively for the sake of cross-linking
  • Large-scale article marketing or guest posting campaigns with keyword-rich anchor text links
  • Using automated programs or services to create links to your site
  • Requiring a link as part of a Terms of Service, contract, or similar arrangement without allowing a third-party content owner the choice of qualifying the outbound link, should they wish.[23]

Negative SEO[edit]

Phishing and Malware[edit]

Private Blog Networks (PBNs)[edit]

Query Automation[edit]

Sneaky Redirects[edit]

Notes[edit]

  1. What is Black Hat SEO and Why You Must Avoid It[1]
  2. Why Google Succeeded Where Other Search Engines Failed?[2]
  3. Google's Webmaster Guidelines[3]
  4. Bing Webmaster Guidelines[4]
  5. Markov Generator: Contextual 'Auto-Generated' Nested-Spintax[5]
  6. Google's Automatically Generated Content policy[6]
  7. Cloaking[7]
  8. 8.0 8.1 8.2 What is Cloaking in SEO & Should You Do Cloaking?[8]
  9. BMW given Google 'death penalty'[9]
  10. Doorway pages[10]
  11. What Are Doorway Pages?[11]
  12. Duplicate content[12]
  13. Matt Cutts Explains How Duplicate Content Affects Rankings[13]
  14. Google’s Matt Cutts: Duplicate Content Won’t Hurt You, Unless It Is Spammy[14]
  15. Consolidate duplicate URLs[15]
  16. 16.0 16.1 16.2 An Illustrated History of Blackhat SEO[16]
  17. [17]
  18. Google Patent on Hidden Text and Links[18]
  19. Keyword Density Tutorial[19]
  20. 4 Techniques to Identify if Your Site is Abused by Parasite Hosting SEO Doorways[20]
  21. Follow the structured data guidelines[21]
  22. Google’s New Manual Penalty Targets Rich Snippet Spam[22].
  23. Link Schemes[23]