November 21, 2007

Site Hijacking (part 2)

Just a couple posts back I discussed the advantages of using modular design when building your websites. Then yesterday I told you about how a hacker tried to use my site to boost his own page-rank — or else tried to sabotage my site by inviting a duplicate content penalty. I had no response to my complaint to his domain host after 24 hours, and being the impatient sort decided to take things into my own hands.

After looking up a couple PHP commands, I discovered that all I had to do was post two simple lines of code in the header file to effectively negate his attack. In fact, I turned his attempted theft into a benefit. Now if someone goes to his site, they will be automatically redirected to my site. Better yet, the redirect is a ‘301 - permanent’ type, so the search engines will see it as a correction, and not penalize my site for duplicate content. Until the hacker notices the change and stops stealing my code, I will get all his traffic for that site. Because the site is modularly designed, I had to add the code to just one file to have it effective on every page on the site.

Should you find yourself in a similar situation, here is the code:

<?php
$chk=$_SERVER[”HTTP_HOST”];
if(!stristr($chk,”mysite”)){header(”Location:http://www.mysite.com/”,TRUE,301);}
?>

This simply checks to see that the host is mysite, and if not, redirects to mysite. It is of course possible for the hacker to spoof the host before stealing the site, but then he will also have to serve different pages to the search engine than to regular browsers, if he expects to benefit from the theft. Doing that is not as effective as it used to be, because the search engines occasionally spoof the referrer string to look like regular browsers and compare the results to the regular search results.

Another solution available to me, should the hacker escalate his attack, is to have PHP write all the relative links out, so the browser receives absolute links. He could then still steal the home page, but all the links would go back to my site, which defeats his purpose.

While I’ve never heard of anyone else having this type of site theft, it is very common for hackers to copy your website and put it on their own server. Such theft is of very temporary benefit, but it can be automated, so the lazy thief can just replace it with another site when the search engines penalize him (and perhaps you as well).

To avoid such outright theft, you need to check your websites in a search engine occasionally. Select a long phrase of 50 or 60 characters from your site, put it in quotes in the search box. The search should return just your page. If there is a note that ‘very similar’ results have been left out, click on the ‘repeat search with omitted results’ link. If you have a blog, or the page is indexed both with and without the ‘www’ in the URL, you may get multiple results from your own site, but if you see someone else’s site in the results — your page has been copied.

If it is only one page, it may be a case of innocent infringement (i.e. stupidity) on the part of the other site. Write the webmaster and ask them to remove your material from their site. If they don’t respond, use whois to find the hosting site, and write them. In 99% of such cases you can get the material removed with little difficulty.

November 20, 2007

A Different Kind of Site Hijacking

I fell victim to an interesting blackhat SEO technique that I just discovered today. Someone has pointed another domain name at one of my sites. Go to their domain, you see my site, just the same as if you typed in the correct domain name. They have not cracked the site, they have no access to the files or databases. So why? What do they benefit?

Well, I assume it works like this. Suppose mysite.com is a long established, PR5 site with great content. Blackhat points hissite.com at the same DNS numbers, using a different domain name server. VoĆ­la, Blackhat now has great content on hissite.com, and soon is ranked pr5 too. Now, according to all the duplicate content information we read, this is not supposed to happen. His duplicate site should be penalized — but in the real case of my site being used, the home page has the same rank as the original site. Sub-pages are not ranked at all, and only about 25% of the pages are in the Google index, while all of the original site pages have been indexed and have rank. The rank is based on content alone, the site has only one incoming link shown in Yahoo, and that is disguised as an offer to sell the site — oops! ’sold’ already, surprise, surprise. Google shows two links to the site, neither with any page rank.

I’m assuming it takes a while before any penalty for pointing two domains at one site kicks-in, no doubt to allow for site transfers to be completed without penalties. Now, so long as Blackhat points the site to his own DNS before Google discovers the site is a duplicate, he will have (for a while) instant pr5 on a new site. I’m guessing that is the goal, since I don’t have any real competition for the subject covered by that site. If it were a competitive subject, another possibility might be that Blackhat just wants to get my site penalized by the search engines to lower its trust ranking.

Hopefully, we will never know for sure because the site will go away. I contacted the domain registrar and owner of the DNS server, and hope they will act on my complaint. If not, I will have to contact Google in writing to make a copyright complaint (they don’t accept those by email), which takes too long but is the only other solution that comes to mind.

This kind of minor hassle is part of the downside of having your own Web Empire. There are several others, like spam, hosting service problems, etc., but they are all part of doing business on the Internet.