Useful snippets to protect your WordPress blog against scrapers

by Jean. 7 Comments -

If you run a blog, you probably already had to face the problem of content scraping: Some people steal your content to display it on their own blog, usually with lots of Adsense ads. Here are a few useful code snippets to help protect your blog against scrapers.

Force your WordPress blog to break out of frames

Some scrapers display your blog in a frame to keep advantage of your content, and show their ads in another frame in order to try to make a few bucks. This code will force your blog to break out of the frames, so the visitor will only see your blog, not the scraper site.

Just paste the code below into your functions.php file, save it, and you’re done.

// Break Out of Frames for WordPress
function break_out_of_frames() {
	if (!is_preview()) {
		echo "\n<script type=\"text/javascript\">";
		echo "\n<!--";
		echo "\nif (parent.frames.length > 0) { parent.location.href = location.href; }";
		echo "\n-->";
		echo "\n</script>\n\n";
	}
}
add_action('wp_head', 'break_out_of_frames');

Source: http://wp-mix.com/break-out-of-frames-wordpress/

Protect your blog against image hotlinking

Most scrapers simply use your RSS feed and display it on their site, which means that they also use your original images on their sites, and consume your server bandwidth for their own websites. So you can definitely use this to inform the reader that he’s reading an article stolen from another blog.

Let’s create a small image saying something like “This article has been stolen from www.yoursite.com”. and upload it on your blog server. Then, edit your .htaccess file, (located in your WordPress blog root directory) and append this code to it:

RewriteEngine On
#Replace ?mysite\.com/ with your blog url
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?mysite\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
#Replace /images/nohotlink.jpg with your "don't hotlink" image url
RewriteRule .*\.(jpe?g|gif|bmp|png)$ /images/nohotlink.jpg [L]

Here is a funny example of this technique in action:

Source: http://www.wprecipes.com/how-to-protect-your-wordpress-blog-from-hotlinking

Automatically add a link to your post title

As the majority of content thieves are using automatic scraping tools, they’ll scrap all of your content, including the post title. A good way to discourage scrapers is to automatically put a link on your post titles, so each stolen post will automatically link to your original post.

To do so in WordPress, simply open your single.php file and locate where the title is displayed. Then, replace the code by the following:

<h1>
  <a href="<?php the_permalink(); ?>"><?php the_title(); ?></a>
</h1>

Source: http://www.catswhoblog.com/how-to-protect-your-blog-from-content-thieves

Automatically add a link to your original posts using RSS feed

Another useful way to fight back against content theft is to automatically insert a copyright notice with a backlink to the original post on each RSS item. That way, scrapers who use your RSS feed to publish your content on their own sites automatically will also publish your copyright notice and backlink!

Simply add the code below to your functions.php file. Copyright notice can be customized on line 4.

// add custom feed content
function add_feed_content($content) {
	if(is_feed()) {
		$content .= '<p>This article is copyright &copy; '.date('Y').'&nbsp;'.bloginfo('name').'</p>';
	}
	return $content;
}
add_filter('the_excerpt_rss', 'add_feed_content');
add_filter('the_content', 'add_feed_content');

Source: http://digwp.com/2012/10/customizing-wordpress-feeds/

Create a custom RSS feed

While the technique above is good, it only display a small notice at the bottom of your posts. You might want a more in-depth solution, which allow you to limit the number of characters appearing in each RSS feed item.

Here is a ready to use WordPress page template that you can easily customize to fit your specific needs.

<?php
/*
Template Name: Custom Feed
*/

$numposts = 5;

function yoast_rss_date( $timestamp = null ) {
  $timestamp = ($timestamp==null) ? time() : $timestamp;
  echo date(DATE_RSS, $timestamp);
}

function yoast_rss_text_limit($string, $length, $replacer = '...') { 
  $string = strip_tags($string);
  if(strlen($string) > $length) 
    return (preg_match('/^(.*)\W.*$/', substr($string, 0, $length+1), $matches) ? $matches[1] : substr($string, 0, $length)) . $replacer;   
  return $string; 
}

$posts = query_posts('showposts='.$numposts);

$lastpost = $numposts - 1;

header("Content-Type: application/rss+xml; charset=UTF-8");
echo '<?xml version="1.0"?>';
?><rss version="2.0">
<channel>
  <title>Yoast E-mail Update</title>
  <link>http://yoast.com/</link>
  <description>The latest blog posts from Yoast.com.</description>
  <language>en-us</language>
  <pubDate><?php yoast_rss_date( strtotime($ps[$lastpost]->post_date_gmt) ); ?></pubDate>
  <lastBuildDate><?php yoast_rss_date( strtotime($ps[$lastpost]->post_date_gmt) ); ?></lastBuildDate>
  <managingEditor>joost@yoast.com</managingEditor>
<?php foreach ($posts as $post) { ?>
  <item>
    <title><?php echo get_the_title($post->ID); ?></title>
    <link><?php echo get_permalink($post->ID); ?></link>
    <description><?php echo '<![CDATA['.yoast_rss_text_limit($post->post_content, 500).'<br/><br/>Keep on reading: <a href="'.get_permalink($post->ID).'">'.get_the_title($post->ID).'</a>'.']]>';  ?></description>
    <pubDate><?php yoast_rss_date( strtotime($post->post_date_gmt) ); ?></pubDate>
    <guid><?php echo get_permalink($post->ID); ?></guid>
  </item>
<?php } ?>
</channel>
</rss>

Source: http://yoast.com/custom-rss-feeds-wordpress/

Comments (7) - Leave yours

  1. Seb F. said:

    Dear Jean-Baptiste,
    Many thanks for those snipets that will be useful for me.
    Just a silly question though : will the “custom rss” snipet work even with those “free” services that “force” display the entire blog entry instead of the excerpt ?
    If not, would you have any tip on dealing with those ?

  2. Adam said:

    For arguments sake, is it not a good thing to have your content scraped?

    If you’ve got an authoritative website and you’re posting nice big articles with internal links referencing more of your resources, scraper sites may copy the content, but they’ll copy the links as well, earning your website a load of low quality links. Meanwhile your site should still rank for the content, because it’s big/trusted/powerful/heavily linked to.

    Or am I missing something?

    • Jonas said:

      Having your content scraped is really not awesome. If there’s a picture in, uploaded on your server, the load is on your server every time that image is being shown. So basically you lose bandwidth on visitors on another dudes website. That’s bad. Also stolen content may give duplicate content on Google, which hurts both yours and the thiefs website. The only time I find scraping okay, is when you scrape for new prices on products, shown in a feed or something. Like various affiliate websites or Pricerunner.

  3. Christober Lee said:

    Really it helps me. Because, My blog contents were copied content on copyscape website. My website is not getting rank on Google because of copied content. I will use this coding to protect my website from scrapers. Thank you Jean-Baptiste.

  4. navin said:

    Thanks Jean-Baptiste, Thanks for the tips. but browse.feedreader.com has published many posts in iframe. due to this rank of my website is affected badly. Athough i have added some scripts and these scripts return the feedreader page to my my own post. but i am bearing punishment of other’s misdeed.

  5. Paul said:

    I always add a link back to my blog at the bottom of the RSS feed, and make sure I link to other posts in my article. If they are going to scrap my content at least I get a link back.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please respect the following rules: No advertising, no spam, no keyword in name field. Thank you!