Content
A guide to finding and fixing duplicate content on your website
Oct 22nd, 2024While some duplicate content on your website isn’t going to result in a Google penalty, it’s quite a common issue that could have an impact on your site, from wasting resources and crawl budget to delaying the pick-up of new content. Generally, we recommend finding it and fixing it rather than letting it be. That’s why in this guide, we explain how you can do exactly that.
We’ll start by discussing the various methods you can use to identify duplicate content, from manual inspections to automated tools. Then, we’ll explore how you can compare two different websites for duplicate content (this may be your website, plus a competitor’s).
Finally, and arguably most importantly, we’ll tell you how you can fix the duplicate content that exists on your site, from consolidating duplicate pages using 301 redirects to implementing canonical tags to indicate the preferred version of a page. Additionally, we’ll provide guidance on rewriting duplicate content to create unique value and improve your website’s search engine rankings.
How to check for duplicate content
Before you can go ahead and fix your duplicate content, you need to find all the instances where it occurs on your site first. There are a few ways you can do this, including using online tools and searching manually. Below, you can find some foolproof methods for both.
Online tools
One of the easiest ways to find duplicate content on your site is with tools that can do the job for you. The best part is some of these tools are completely free to use, so you don’t have to pay a fortune or sign up for an ongoing subscription.
- Google Search Console
Any website owner should sign up for a free Google Search Console account because it provides so much useful information about how your site is performing in the Google search results. As well as telling you how many people are clicking through to your site and what keywords they’re using to find you, it can also identify duplicate content issues and offer some suggestions to fix them.
If you navigate to the “Pages” section within the “Indexing” report in the sidebar on the right, it will tell you why some pages aren’t indexed, such as “Duplicate – Google chose different canonical than user”.
- Copyscape
Copyscape is a plagiarism checker that searches the web to find where content on your website has been copied elsewhere. While you do have to pay for aspects of the tool, you can add as many or few credits to your account as you’d like, so you aren’t tied into a monthly subscription. A batch search allows you to look up multiple (or all of the) URLs on your site to quickly find duplicate content that can be replaced.
- SEO tools
Several paid SEO tools, such as Ahrefs, SEMrush and Moz, have features specifically designed to detect duplicate content. They can scan your website, identify duplicates and provide detailed reports. These tools often offer advanced features like content similarity analysis and duplicate content detection at scale, which could save you time.
Manual techniques
It’s great to make use of automated tools that will find your site’s duplicate content, however doing some manual checks may be a good idea too. You might find specific instances that the tools haven’t picked up, or you can do away with the tools altogether and only use manual methods. This may take more time but will likely pick up much of the same issues.
- Site search
There are a number of ways you can search the content on your site yourself to either find duplicate content or pages that are on very similar topics.
First, you can use your website’s search function and input some keywords or phrases to find similar pages internally. This likely won’t identify exact duplicate content as such, as it’s unlikely you’ve copied text word for word from your own site, but it could help you find multiple pages on the same topic that could be combined. This can be particularly helpful for websites that have a lot of product pages with similar product descriptions.
You can also do a site search in Google. Before searching for any text, add the search operator “site:[your website URL]” to only search content within your website.
- URL structure
You can use a sitemap to take a look at your URL structure and find instances of multiple URLs that point to the same content. For example, a page accessible through both www.yourwebsite.com/blog/article and www.yourwebsite.com/blog/123456 might cause issues. This can occur because of dynamic URL generation, pagination or other technical factors. Once these issues have been identified, you can use redirects to fix them.
- Canonical tags
Canonical tags are an HTML element that specify the preferred version of a page. By adding a canonical tag, you can tell Google which page you’d prefer them to index, and can be useful when implemented correctly. If canonical tags are missing or incorrect, it can lead to duplicate content issues, so you should take the time to check that any page that needs one, has one.
Additional checks
There are a few other checks you can do to ensure you’ve caught as many instances of duplication as possible:
- Consider content syndication: If you syndicate your content on other websites, ensure that search engines can identify the original source by using canonical tags.
- Check for print versions: If you have print versions of your content that are also available online, make sure the online versions are unique enough to avoid duplication. This goes for downloadable PDFs, too.
- Pay attention to pagination: If your website uses pagination for long articles, ensure that each page has unique content. Consider adding a “next page” link instead of duplicating the entire article on each page.
- Review archived content: Regularly review your archived content to ensure it doesn’t conflict with current content. If outdated content is no longer relevant, consider removing or redirecting it.
Rather than just picking one or two of the above methods, the most successful strategy would be to combine all of them to ensure you’ve covered all your bases.
How to fix duplicate content
Once you’ve identified where there is duplicate content on your site, it’s time to fix it. However, we wouldn’t recommend jumping straight in and removing potentially good-quality or high-performing content, as you risk impacting your site’s performance.
Before making big changes, you should use tools like Google Search Console or Google Analytics to check the traffic that comes into a page with duplicate content. If the traffic is minimal, then you can absolutely consider a redirect or rewrite, but pages that are already performing well organically may not need any changes.
Below, we’ve covered some of the ways you can fix the duplicate content on your website to ensure it’s unique.
301 redirects
A 301 redirect is a permanent HTTP status code that tells search engines and browsers to permanently redirect a visitor from one URL to another. This is important because it helps prevent search engines from indexing both the original and redirected pages, which can lead to duplicate content issues.
To use 301 redirects, first identify the preferred page that you want to keep and the duplicate versions that should be redirected. Then, create a 301 redirect using your web server’s configuration files (e.g., .htaccess for Apache, web.config for IIS). The exact syntax may vary depending on your server, but it typically involves creating a redirect rule that specifies the old URL and the new URL. If you’re unsure, you can hire a web developer to implement this for you.
After creating the redirect, test it to ensure it’s working correctly. Visit the old URL in your browser and verify that it redirects to the new URL.
By using 301 redirects, you can consolidate duplicate pages, improve your website’s search engine rankings and provide a better user experience.
Canonical tags
Canonical tags are HTML elements that specify the preferred version of a page, helping search engines understand which one to index. By using canonical tags, you can avoid duplicate content issues and ensure that search engines index the correct version of your content.
To implement canonical tags, first identify the preferred version of the content you want to be considered the canonical version. This is often the most comprehensive or up-to-date version. Then, insert a canonical tag into thesection of the HTML code for each page that you want to consolidate. The canonical tag should point to the URL of the preferred version, for example:
After adding the canonical tags, it’s a good idea to test your implementation by using a tool like Google Search Console to check for any errors or warnings.
Content rewriting
If you have duplicate content on your website that cannot be consolidated or redirected, then you might want to consider rewriting it to create unique value. This can help improve your website’s search engine rankings and provide a better user experience.
One effective strategy is to add new information to the original content. This could involve expanding on existing topics, incorporating current events or including expert insights. By adding fresh content, you can make the page more informative and engaging for your audience.
Another approach is to change the format of the content. For example, if you have a blog post, you could convert it into an infographic, video or podcast. This can make the content more visually appealing and easier to consume.
Additionally, you can update the content to ensure it is accurate and up to date. You might want to remove any irrelevant information and add new perspectives or angles on the topic. By keeping your content fresh and relevant, you can improve its value to your audience.
Remember to optimise the rewritten content for search engines as well. Use relevant keywords, write compelling meta descriptions, and ensure the content is mobile-friendly. By following these strategies, you can effectively rewrite duplicate content and create unique value for your website.
recommended to guarantee the best results.