Skip to content

SEO Content

Dennis Suitters edited this page Dec 25, 2020 · 1 revision

SEO - Content

What is on-site SEO?

On-site SEO (also known as on-page SEO) is the practice of optimizing elements on a website (as opposed to links elsewhere on the Internet and other external signals collectively known as "off-site SEO") in order to rank higher and earn more relevant traffic from search engines. On-site SEO refers to optimizing both the content and HTML source code of a page.

Beyond helping search engines interpret page content, proper on-site SEO also helps users quickly and clearly understand what a page is about and whether it addresses their search query. In essence, good on-site SEO helps search engines understand what a human would see (and what value they would get) if they visited a page, so that search engines can reliably serve up what human visitors would consider high-quality content about a particular search query (keyword).

The ultimate goal of on-site SEO can be thought of as attempting to make it as easy as possible for bothsearch engines and users to:

  • Understand what a webpage is about;
  • Identify that page as relevant to a search query or queries (i.e. a particular keyword or set of keywords);
  • Find that page useful and worthy of ranking well on a search engine results page (SERP).

Keywords, content, and on-site SEO

In the past, on-site SEO has been synonymous with keyword use — and specifically, including a high-value keyword in several key locations on a website.

To understand why keywords are no longer at the center of on-site SEO, it's important to remember what those terms actually are: content topics. Historically, whether or not a page ranked for a given term hinged on using the right keywords in certain, expected places on a website in order for search engines to find and understand what that webpage's content was about. User experience was secondary; simply making sure search engines found keywords and ranked a site as relevant for those terms was at the heart of on-site SEO practices.

Today, though, search engines have grown exponentially more sophisticated. They can extract a page's meaning from the use of synonyms, the context in which content appears, or even just by paying attention to the frequency with which specific word combinations are mentioned. While keyword use still matters, prescriptive methods like using an exact-match keyword in specific locations a requisite number of times is no longer a tenant of on-page SEO. What is important is relevance. For each of your pages, ask yourself how relevant the content is to the user intent behind search queries (based on your keyword usage both on the page and in its HTML).

In this way, on-site SEO is less about keyword repetition or placement and more about understanding who your users are, what they're looking for, and about what topics (keywords) can you create content that best fulfills that need. Pages that meet these criteria have content that is:

  • In-depth."Thin" content was one of Google Panda's specific targets; today it's more or less assumed that content must be sufficiently thorough in order to stand a good chance at ranking.
  • User-friendly. Is the content readable? Is it organized on your site in such a way that it's easily navigable? Is it generally clean, or littered with ads and affiliate links?
  • Unique. If not properly addressed, content duplicated from elsewhere on your site (or elsewhere on the Internet) may impact a site's ability to rank on SERPs.
  • Authoritative and trustworthy. Does your content stand on its own as a reliable resource for information on a particular topic?
  • Aligned with user search intent. Part of creating and optimizing for quality content is also delivering on searcher expectations. Content topics should align with the search queries for which they rank.

Non-keyword-related on-site SEO

Beyond the keywords (topics) used in content on a webpage and how they're discussed, there are several "keyword-agnostic" elements that can influence a page's on-site optimization.

Those include things like:

All of these elements tie back to the same basic idea: creating a good user experience. The more usable a page is (from both a technical and non-technical perspective), the better that page's on-site optimization.

How do you optimize a page?

Fully optimizing a page on your website requires both text- and HTML-based changes. Check out this article for more information on the on-site factors that contribute to ranking, and how you can improve your own website pages.

On-Page Ranking Factors

The way your page is optimized can have a huge impact on its ability to rank.

What are On-Page Ranking Factors for SEO?

On-page ranking factors can have a big impact on your page's ability to rank if optimized properly. The biggest on-page factors that affect search engine rankings are:

Content of Page

The content of a page is what makes it worthy of a search result position. It is what the user came to see and is thus extremely important to the search engines. As such, it is important to create good content. So what is good content? From an SEO perspective, all good content has two attributes. Good content must supply a demand and must be linkable.

Good content supplies a demand:

Just like the world’s markets, information is affected by supply and demand. The best content is that which does the best job of supplying the largest demand. It might take the form of an XKCD comic that is supplying nerd jokes to a large group of technologists or it might be a Wikipedia article that explains to the world the definition of Web 2.0. It can be a video, an image, a sound, or text, but it must supply a demand in order to be considered good content.

Good content is linkable:

From an SEO perspective, there is no difference between the best and worst content on the Internet if it is not linkable. If people can’t link to it, search engines will be very unlikely to rank it, and as a result the content won’t drive traffic to the given website. Unfortunately, this happens a lot more often than one might think. A few examples of this include: AJAX-powered image slide shows, content only accessible after logging in, and content that can't be reproduced or shared. Content that doesn't supply a demand or is not linkable is bad in the eyes of the search engines—and most likely some people, too.

From: A Visual Guide to Keyword Targeting & On-Page SEO

Title Tag

Title tags are the second most important on-page factor for SEO, after content. You can read more information about title tags here.

URL

Along with smart internal linking, SEOs should make sure that the category hierarchy of the given website is reflected in URLs.

The following is a good example of URL structure:

  • http://www.example.org/games/video-game-history

This URL clearly shows the hierarchy of the information on the page (history as it pertains to video games in the context of games in general). This information is used to determine the relevancy of a given web page by the search engines. Due to the hierarchy, the engines can deduce that the page likely doesn’t pertain to history in general but rather to that of the history of video games. This makes it an ideal candidate for search results related to video game history. All of this information can be speculated on without even needing to process the content on the page.

The following is a bad example of URL structure:

  • http://www.imdb.com/title/tt0468569

Unlike the first example, this URL does not reflect the information hierarchy of the website. Search engines can see that the given page relates to titles (/title/) and is on the IMDB domain but cannot determine what the page is about. The reference to “tt0468569” does not directly infer anything that a web surfer is likely to search for. This means that the information provided by the URL is of very little value to search engines.

URL structure is important because it helps the search engines to understand relative importance and adds a helpful relevancy metric to the given page. It is also helpful from an anchor text perspective because people are more likely to link with the relevant word or phrase if the keywords are included in the URL.

SEO Best Practice

Content pages are the meat of websites and are almost always the reason visitors come to a site. Ideal content pages should be very specific to a given topic—usually a product or an object—and be hyper-relevant.

The purpose of the given web page should be directly stated in all of the following areas:

  • Title tag
  • URL
  • Content of page
  • Image alt text

Here is an example of a well-laid-out and search engine–friendly web page. All of its on-page factors are optimized.

The content page in this figure is considered good for several reasons. First, the content itself is unique on the Internet (which makes it worthwhile for search engines to rank well) and covers a specific bit of information in a lot of depth. If a searcher had question about Super Mario World, there is a good chance, that this page would answer their query.

Aside from content, this page is laid out well. The topic of the page is stated in the title tag (Super Mario World – Wikipedia, the free encyclopedia), URL (http://en.wikipedia.org/wiki/Super_Mario_World), the page's content (the page heading, "Super Mario World"), and within the alt text of every image on the page.

The following example is of a poorly optimized web page. Notice how it differs from the first example.

This figure shows a less search engine–friendly example of a content page targeting the term "Super Mario World." While the subject of the page is present in some of the important elements of the web page (title tag and images), the content is less robust than the Wikipedia example, and the relevant copy on the page is less helpful to a reader.

Notice that the description of the game is suspiciously similar to copy written by a marketing department. “Mario’s off on his biggest adventure ever, and this time he has brought a friend.” That is not the language that searchers write queries in, and it is not the type of message that is likely to answer a searcher's query. Compare this to the first sentence of the Wikipedia example: “Super Mario World is a platform game developed and published by Nintendo as a pack–in launch title for the Super Nintendo Entertainment System.”. In the poorly optimized example, all that is established by the first sentence is that someone or something called Mario is on an adventure that is bigger than his or her previous adventure (how do you quantify that?) and he or she is accompanied by an unnamed friend.

The Wikipedia example tells the reader that Super Mario World is a game developed and published by Nintendo for the gaming system Super Nintendo Entertainment System–the other example does not. Search results in both Bing and Google show the better optimized page ranking higher.

An Ideally Optimized Web Page

An ideal web page should do all of the following:

  • Be hyper-relevant to a specific topic (usually a product or single object)
    • Include subject in title tag
    • Include subject in URL
    • Include subject in image alt text
    • Specify subject several times throughout text content
  • Provide unique content about a given subject
  • Link back to its category page
  • Link back to its subcategory page (If applicable)
  • Link back to its homepage (normally accomplished with an image link showing the website logo on the top left of a page)

Duplicate Content

What is duplicate content?

Duplicate content is content that appears on the Internet in more than one place. That “one place” is defined as a location with a unique website address (URL) - so, if the same content appears at more than one web address, you’ve got duplicate content.

While not technically a penalty, duplicate content can still sometimes impact search engine rankings. When there are multiple pieces of, as Google calls it, "appreciably similar" content in more than one location on the Internet, it can be difficult for search engines to decide which version is more relevant to a given search query.

Why does duplicate content matter?

For search engines

Duplicate content can present three main issues for search engines:

  1. They don't know which version(s) to include/exclude from their indices.
  2. They don't know whether to direct the link metrics (trust, authority, anchor text, WooRank link equity, etc.) to one page, or keep it separated between multiple versions.
  3. They don't know which version(s) to rank for query results.

For site owners

When duplicate content is present, site owners can suffer rankings and traffic losses. These losses often stem from two main problems:

  1. To provide the best search experience, search engines will rarely show multiple versions of the same content, and thus are forced to choose which version is most likely to be the best result. This dilutes the visibility of each of the duplicates.
  2. Link equity can be further diluted because other sites have to choose between the duplicates as well. instead of all inbound links pointing to one piece of content, they link to multiple pieces, spreading the link equity among the duplicates. Because inbound links are a ranking factor, this can then impact the search visibility of a piece of content.

The net result? A piece of content doesn't achieve the search visibility it otherwise would.

How do duplicate content issues happen?

In the vast majority of cases, website owners don't intentionally create duplicate content. But, that doesn't mean it's not out there. In fact by some estimates, up to 29% of the web is actually duplicate content!

Let's take a look at some of the most common ways duplicate content is unintentionally created:

1. URL variations

URL parameters, such as click tracking and some analytics code, can cause duplicate content issues. This can be a problem caused not only by the parameters themselves, but also the order in which those parameters appear in the URL itself.

For example:

  • http://www.widgets.com/blue-widgets?color=blue
  • is a duplicate of
  • http://www.widgets.com/blue-widgets%3C/li%3E%3Cli%3E%3Ca%20
  • is a duplicate of
  • http://www.widgets.com/blue-widgets
  • is a duplicate of
  • http://www.widgets.com/blue-widgets?cat=3&color=blue

http://www.widgets.com/blue-widgets?color=blue

Similarly, session IDs are a common duplicate content creator. This occurs when each user that visits a website is assigned a different session ID that is stored in the URL.

Printer-friendly versions of content can also cause duplicate content issues when multiple versions of the pages get indexed.

One lesson here is that when possible, it's often beneficial to avoid adding URL parameters or alternate versions of URLs (the information those contain can usually be passed through scripts).

2. HTTP vs. HTTPS or WWW vs. non-WWW pages

If your site has separate versions at "www.site.com" and "site.com" (with and without the "www" prefix), and the same content lives at both versions, you've effectively created duplicates of each of those pages. The same applies to sites that maintain versions at both http:// and https://. If both versions of a page are live and visible to search engines, you may run into a duplicate content issue.

3. Scraped or copied content

Content includes not only blog posts or editorial content, but also product information pages. Scrapers republishing your blog content on their own sites may be a more familiar source of duplicate content, but there's a common problem for e-commerce sites, as well: product information. If many different websites sell the same items, and they all use the manufacturer's descriptions of those items, identical content winds up in multiple locations across the web.

How to fix duplicate content issues

Fixing duplicate content issues all comes down to the same central idea: specifying which of the duplicates is the "correct" one.

Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines. Let's go over the three main ways to do this: Using a 301 redirect to the correct URL, the rel=canonical attribute, or using the parameter handling tool in Google Search Console.

301 redirect

In many cases, the best way to combat duplicate content is to set up a 301 redirect from the "duplicate" page to the original content page.

When multiple pages with the potential to rank well are combined into a single page, they not only stop competing with one another; they also create a stronger relevancy and popularity signal overall. This will positively impact the "correct" page's ability to rank well.

Rel="canonical"

Another option for dealing with duplicate content is to use the rel=canonical attribute. This tells search engines that a given page should be treated as though it were a copy of a specified URL, and all of the links, content metrics, and "ranking power" that search engines apply to this page should actually be credited to the specified URL.

The rel="canonical" attribute is part of the HTML head of a web page and looks like this:

General format:

...[other code that might be in your document's HTML head]...
  <link href="URL OF ORIGINAL PAGE" rel="canonical">
  ...[other code that might be in your document's HTML head]...
  

The rel=canonical attribute should be added to the HTML head of each duplicate version of a page, with the "URL OF ORIGINAL PAGE" portion above replaced by a link to the original (canonical) page. (Make sure you keep the quotation marks.) The attribute passes roughly the same amount of link equity (ranking power) as a 301 redirect, and, because it's implemented at the page (instead of server) level, often takes less development time to implement.

Below is an example of what a canonical attribute looks like in action:

Using MozBar to identify canonical attributes.

Here, we can see BuzzFeed is using the rel=canonical attributes to accommodate their use of URL parameters (in this case, click tracking). Although this page is accessible by two URLs, the rel=canonical attribute ensures that all link equity and content metrics are awarded to the original page (/no-one-does-this-anymore).

Meta Robots Noindex

One meta tag that can be particularly useful in dealing with duplicate content is meta robots, when used with the values "noindex, follow." Commonly called Meta Noindex,Follow and technically known as content=”noindex,follow” this meta robots tag can be added to the HTML head of each individual page that should be excluded from a search engine's index.

General format:

...[other code that might be in your document's HTML head]...
  <meta name="robots" content="noindex,follow">
  ...[other code that might be in your document's HTML head]...
  

The meta robots tag allows search engines to crawl the links on a page but keeps them from including those links in their indices. It's important that the duplicate page can still be crawled, even though you're telling Google not to index it, because Google explicitly cautions against restricting crawl access to duplicate content on your website. (Search engines like to be able to see everything in case you've made an error in your code. It allows them to make a [likely automated] "judgment call" in otherwise ambiguous situations.)

Using meta robots is a particularly good solution for duplicate content issues related to pagination.

Preferred domain and parameter handling in Google Search Console

Google Search Console allows you to set the preferred domain of your site (i.e. http://yoursite.com instead of http://www.yoursite.com) and specify whether Googlebot should crawl various URL parameters differently (parameter handling).

Depending on your URL structure and the cause of your duplicate content issues, setting up either your preferred domain or parameter handling (or both!) may provide a solution.

The main drawback to using parameter handling as your primary method for dealing with duplicate content is that the changes you make only work for Google. Any rules put in place using Google Search Console will not affect how Bing or any other search engine's crawlers interpret your site; you'll need to use the webmaster tools for other search engines in addition to adjusting the settings in Search Console.

Additional methods for dealing with duplicate content

  1. Maintain consistency when linking internally throughout a website. For example, if a webmaster determines that the canonical version of a domain is www.example.com/, then all internal links should go to http://www.example.com... rather than http://example.com/pa... (notice the absence of www).
  2. When syndicating content, make sure the syndicating website adds a link back to the original content and not a variation on the URL. (Check out our Whiteboard Friday episode on dealing with duplicate content for more information.)
  3. To add an extra safeguard against content scrapers stealing SEO credit for your content, it's wise to add a self-referential rel=canonical link to your existing pages. This is a canonical attribute that points to the URL it's already on, the point being to thwart the efforts of some scrapers.A self-referential rel=canonical link: The URL specified in the rel=canonical tag is the same as the current page URL.
    While not all scrapers will port over the full HTML code of their source material, some will. For those that do, the self-referential rel=canonical tag will ensure your site's version gets credit as the "original" piece of content.