Michael Gray of Graywolf has some great WordPress SEO information, some of which may go against what you’ve been told. Watch this video to learn what I’m talking about.
The big take-aways from this are:
- Posts should really only be in one category
- archives should be blocked from getting crawled in the robots.txt file
Now, you would think that Google and other major search engines would know better than to count blog archives as duplicate content, but Michael Gray says we should make things as easy as possible for search indexing robots.
As I was watching this, a question occured to me: if a post is no longer on the home page, but it can’t be spidered, how, exactly, is that good for SEO? Well, I contacted Michael and asked him. He replied that your archived posts are accessible via your WordPress categories (thanks, Michael!). So, this actually ties in with why you want, if possible, only one category assigned to a post. I’ve taken this advice to heart and have been assigning my posts to only one category. I will implement the robots.txt suggestions as well and block spidering of the chronological archives.
UPDATE: You’ll find a clearly written basic article at Daily Blog Tips on what code to put in your robots.txt file, and here are some others:
- Create a robots.txt File (another one from Daily Blog Tips)
- Creating the Ultimate WordPress Robots.txt File
- Google explains Googlebot
- WordPress robots.txt File Optimized for SEO and Google
I can’t vouch for the accuracy of any of the information in the links above, so be careful, and for goodness’ sake do not just copy and paste code that you don’t understand. Take the time to examine the code. The robots.txt file is generally simple enough to understand, so make sure you take the time to do so.
Not all the examples in the links above are doing the things Michael Gray describes in his video. I’ve seen some that block categories and not the chronological archives. I think Michael’s onto something with keeping the categories over the chronological archives. People search using keywords that might also be your categories, but not chronological archives. Lastly, a big thank-you goes out to Greg Balanko-Dickson for initiating the addition of these resources to the original post with his questions in the comments!
Technorati Tags: WordPress SEO














24 Comments
Great post. Just discovered your blog. The video is pretty descriptive and gets the point across well.
Thanks very much, Leonid! I’m glad you appreciate it.
Michael, thanks for this and thought I would ask you what code you would need to add to block the indexing of the Archive?
Greg, try this post at Daily Blog Tips.
Hi Michael, thanks again.
I noticed a link at Daily Blog tips in the comments to another post Create a robots.txt file. I used those instructions.
Thanks for that suggestion, Greg! I’ve put it in the updated post.
He’s crazy. Removing your archives is a bad solution to a problem that comes from bad management. Why not just avoid creating content that can be considered duplicate in the first place?
@ SEO Ranter: Nobody said to remove archives, only to prevent them from being spidered–and not all archives, just the chronological archives. Category archives are encouraged to be crawled. Category archives are still archives, therefore archives are not being “removed”.
I’d argue that adding a URL for general exclusion in robots.txt is tantamount to removing it from search engine indices. That’s what I meant by “removing”.
@ SEO Ranter: Thank you for clarifying that. I think the point is that category archives and chronological archives are seen as duplicate content. So
archives/2006/03/15/blog-postis seen as separate URL fromcategory/blog-category-name/blog-post. Preventing one from being crawled shouldn’t really hinder the other one from being crawled. Is there any reason that you know why this is a bad idea?Michael nice one, why don’t you put your archive in pdf and make it as “Michael’s SEO knowledgebase”? It would be great.
Keep it great!
@ Adriatic: Thanks for the compliments! At some point in the future, I plan to collect some of my posts into a more linear form, like an ebook.
Well, it’s always good to maintain a thorough internal link strategy, and using only one category essentially removes the functionality of the category feature. An interim solution would be to 301 redirect post links from category summary pages to the /archive/ URL, thus retaining category functionality, and allowing an elegant and working link structure. This could be done via a line or two in .htaccess. A complete solution would be to adapt Wordpress’s category summary to link directly to your archive page. Alternatively, the archives could link to pages under /category/ the choice is yours.
Remedying issues with an application by updating robots.txt sounds like a really last-minute solution; great for stopping interactive parts of a site getting crawled, bad as a substitute for a linking strategy.
@ SEO Ranter: Thanks for that insightful information. I agree with you that modifying .htaccess would be a better longterm solution than modifying robots.txt.
I have read elsewhere that “siloing” your content is beneficial (but could never make up for weak content). WordPress categories can be used for this, but, as you say, it sort of takes away one of the main aspects of categories, which is the ability to apply more than one to a post. Good food for thought, SEO Ranter. Thanks for contributing to the conversation in a meaningful way!
Guys, I got to wonder if we are over thinking and over tweaking this entire issue. I have been writing daily online since 1998, and blogging since 2005 and have maintained first page position on Google for years doing nothing but writing.
Now I do not know if using the robots.text file was a mistake. This is what is frustrating about reading any blog relating to internet marketing etc.
You all talk over our heads and carrying on conversations about a technical topics that most of us know diddly squat about.
A little information about a topic that requires a deeper understanding of the issues is dangerous in the hands of a noob like me.
Damn, now I feel stupid.
Sorry Michael, I am feeling frustrated.
Why don’t you two (Michael & SEO Ranter) get together and write a manifesto on this topic?
Something anyone could read, be more well informed, and feel like they got some great value.
It would help you both and help the average blogger that wants to blog without becoming an technical whiz and SEO expert.
Help!?
@ Greg:
Don’t sweat it. Using the robots.txt isn’t a mistake. It also isn’t the only way to accomplish the goal of avoiding duplicate content. And, yes, none of this is a cure for bad writing. SEO Rant had an excellent additional suggestion which would also work. I think it might work better in the long run, but that’s my opinion. SEO Rant’s statements about the effectiveness of modifying .htaccess vs. robots.txt are also opinion. Michael Gray’s statements in the video are his opinions. In all these cases, the opinions are backed up by facts and/or experience.
Your situation, Greg, might be one of “if it ain’t broke, don’t fix it”. You could remove or revert your robots.txt file back to its original code without consequence.
Search engine bots don’t come around every single day, but they come around often enough so that if you change it back it would be like it never happened. Again, nothing to worry about, really.
The reason I posted the video is because its advice is something a beginner could do. There’s better, and then there’s better, and at some point you’re splitting hairs.
Generally speaking, you’re much better off spending your time on producing quality content. SEO efforts are meant to give that quality content an edge in search results. Sometimes, it’s not much of an edge, so people obsess over every possible detail.
Don’t feel bad about changing the robots.txt file. It will not hurt you. You will not know for sure if it helped until after some time has passed (probably a couple months). But if you were doing well to begin with, perhaps it wasn’t necessary.
@Michael, Greg - agreed, you needn’t agonize over this hack, and the search engines’ duplicate content filter will again help out a lot.
I would just add one caveat though: if you exclude content with robots.txt, and then use Google’s URL removal tool, you will have to wait at least six months before re-inclusion. So, don’t do that
I’d also say there is a concrete risk to changing the robots.txt file - any incoming links to excluded pages on your site will be effectively lost, as these excluded pages can’t pass the benefit on. You may want to examine your incoming links in any engine you can find, as well as landing pages and referral logs, to assess the risk here.
Greg - personally, I’d say the safest option is to carry on as you are, if things are working; as the saying goes, if it ain’t broke, don’t fix it! You could even try this one out for a while and see how well it works. SEO’s a continual cycle of evaluation and implementation.
SEO Rant said:
Right, but not immediately. There is a very high probability he could change it back with no penalty if he did it very soon.
@Michael - Yes, of course
Thanks guys, I guess I am a little sensitive when it comes to duplicate content, when I first switched from a website to a blog I hadn’t realized that I had mistakenly left a path open for the SE and effectively duplicated my content plus the Google 302 hack was stealing my content and making my site appear to be the duplicate content. I was dropped off the Google index for 9 months.
I think I will change things back. Thanks for the responses.
We just release a new kind of plugin .
who also kill duplicate content!
Still in french , V 0.1 , just release this day
Give your page the 301 +url , 302 + url , 404 error code from the “edit post page”
Traduction needed ! ( mail can be found on our site if you can help )
http://www.wordpress-seo.com/seo-http-error-manager.php
How it works :
Download, copy to plugin directory, Activate.
When editing you can choose Leave, 301, 302, 404 ( and url for 301, 302 )
Licence , “can be stolen”
and we still dont care of man who still want to give a licence for “nothing” like this !
Comment for make it better welcome
Sorry for my english
++
Personally, I agree with the point that you need to do the —Break— on your main page for the search engines, but having the full articles is more user friendly.
Also, robots.txt completely stops crawling whereas adding the proper meta robot will allow for crawling and link following and still prevent the duplicate entries.
I wrote a post on my blog at http://www.wmtoolbox.com/code-snippets/blogging-and-duplicate-content-rank-better/ on how I do my robots
I’m commenting to give you heck. Do you realize how long I’m going to spend at your blog now? I’ve only been here a few minutes and I’ve found several things I want to read. Sheesh. Shame on you.
@fracas - I’ll put some coffee on for you.
4 Trackbacks
[...] Michael Martini writes about how to avoid duplicate content while using Wordpress. He has good link resources there too. [...]
[...] 17: WordPress SEO Techniques to Avoid Duplicate Content, in which I posted a video originally recorded by Michael Gray of Graywolf (I found it on YouTube, [...]
[...] WordPress SEO Techniques to Avoid Duplicate Content [...]
[...] be missed. As usual Vista is the Clown Pants of Computing is worthy. I found the post wordpress seo techniques to avoid duplicate content exceptional. I found the piece another apology [...]