November 21, 2007

Site Hijacking (part 2)

Just a couple posts back I discussed the advantages of using modular design when building your websites. Then yesterday I told you about how a hacker tried to use my site to boost his own page-rank — or else tried to sabotage my site by inviting a duplicate content penalty. I had no response to my complaint to his domain host after 24 hours, and being the impatient sort decided to take things into my own hands.

After looking up a couple PHP commands, I discovered that all I had to do was post two simple lines of code in the header file to effectively negate his attack. In fact, I turned his attempted theft into a benefit. Now if someone goes to his site, they will be automatically redirected to my site. Better yet, the redirect is a ‘301 - permanent’ type, so the search engines will see it as a correction, and not penalize my site for duplicate content. Until the hacker notices the change and stops stealing my code, I will get all his traffic for that site. Because the site is modularly designed, I had to add the code to just one file to have it effective on every page on the site.

Should you find yourself in a similar situation, here is the code:

<?php
$chk=$_SERVER[”HTTP_HOST”];
if(!stristr($chk,”mysite”)){header(”Location:http://www.mysite.com/”,TRUE,301);}
?>

This simply checks to see that the host is mysite, and if not, redirects to mysite. It is of course possible for the hacker to spoof the host before stealing the site, but then he will also have to serve different pages to the search engine than to regular browsers, if he expects to benefit from the theft. Doing that is not as effective as it used to be, because the search engines occasionally spoof the referrer string to look like regular browsers and compare the results to the regular search results.

Another solution available to me, should the hacker escalate his attack, is to have PHP write all the relative links out, so the browser receives absolute links. He could then still steal the home page, but all the links would go back to my site, which defeats his purpose.

While I’ve never heard of anyone else having this type of site theft, it is very common for hackers to copy your website and put it on their own server. Such theft is of very temporary benefit, but it can be automated, so the lazy thief can just replace it with another site when the search engines penalize him (and perhaps you as well).

To avoid such outright theft, you need to check your websites in a search engine occasionally. Select a long phrase of 50 or 60 characters from your site, put it in quotes in the search box. The search should return just your page. If there is a note that ‘very similar’ results have been left out, click on the ‘repeat search with omitted results’ link. If you have a blog, or the page is indexed both with and without the ‘www’ in the URL, you may get multiple results from your own site, but if you see someone else’s site in the results — your page has been copied.

If it is only one page, it may be a case of innocent infringement (i.e. stupidity) on the part of the other site. Write the webmaster and ask them to remove your material from their site. If they don’t respond, use whois to find the hosting site, and write them. In 99% of such cases you can get the material removed with little difficulty.

November 20, 2007

A Different Kind of Site Hijacking

I fell victim to an interesting blackhat SEO technique that I just discovered today. Someone has pointed another domain name at one of my sites. Go to their domain, you see my site, just the same as if you typed in the correct domain name. They have not cracked the site, they have no access to the files or databases. So why? What do they benefit?

Well, I assume it works like this. Suppose mysite.com is a long established, PR5 site with great content. Blackhat points hissite.com at the same DNS numbers, using a different domain name server. Voíla, Blackhat now has great content on hissite.com, and soon is ranked pr5 too. Now, according to all the duplicate content information we read, this is not supposed to happen. His duplicate site should be penalized — but in the real case of my site being used, the home page has the same rank as the original site. Sub-pages are not ranked at all, and only about 25% of the pages are in the Google index, while all of the original site pages have been indexed and have rank. The rank is based on content alone, the site has only one incoming link shown in Yahoo, and that is disguised as an offer to sell the site — oops! ’sold’ already, surprise, surprise. Google shows two links to the site, neither with any page rank.

I’m assuming it takes a while before any penalty for pointing two domains at one site kicks-in, no doubt to allow for site transfers to be completed without penalties. Now, so long as Blackhat points the site to his own DNS before Google discovers the site is a duplicate, he will have (for a while) instant pr5 on a new site. I’m guessing that is the goal, since I don’t have any real competition for the subject covered by that site. If it were a competitive subject, another possibility might be that Blackhat just wants to get my site penalized by the search engines to lower its trust ranking.

Hopefully, we will never know for sure because the site will go away. I contacted the domain registrar and owner of the DNS server, and hope they will act on my complaint. If not, I will have to contact Google in writing to make a copyright complaint (they don’t accept those by email), which takes too long but is the only other solution that comes to mind.

This kind of minor hassle is part of the downside of having your own Web Empire. There are several others, like spam, hosting service problems, etc., but they are all part of doing business on the Internet.

November 19, 2007

Duplicate Layout Penalty

There is a lot of talk on the forums dedicated to SEO and Internet business issues about something called a ‘duplicate content penalty’. Several people maintain that there is no such thing, or that it should be referred to as a duplicate content filter.

Well there is very definately a duplicate content penalty. And there is a duplicate content filter. I have discovered that there is even a duplicate layout penalty. I googled ‘duplicate layout penalty’ and got zero results — so nobody is talking about that, but I will.

First, there are several types of duplicate content, and various penalties and filters apply. And the criteria by which the penalties or filters are applied is neither clear nor consistent. The worst-case example of duplicate content is a mirror-site. Make a second copy of your site, and you will find one or both removed from the search engine listings. But sites with a valid need for mirror sites, like sourceforge.net or php.net do not get penalized. Different mirrors often have different page-rank, but that may be due to differences in incoming links, server response time, and other ranking factors that have nothing to do with content.

The next level of duplicate content is the reprint article site. If all of the pages on your site consist of free reprint articles, you will likely be penalized with low pagerank. But some sites with nothing but reprint articles fare well — articlecity.com has pr6 rank for their home page.

So clearly, duplicate content is a FACTOR in page rank, but not the only consideration. I looked at some one-year-old free reprint articles. In a typical case, one was listed in Google results, though if you clicked the link to see filtered-out results, there were three listings. Yahoo, on the other hand, showed 14 copies of the same article. Interestingly, the sites listed by Yahoo all were indexed by Google, usually with thousands of pages. So even though a site was indexed, the results might not appear in a Google search, even when duplicate listings were requested. One home page for that sample article had a pr3 rank, but most were pr0 or un-ranked.

Perhaps I’ll take the time to do a more detailed study of those factors in the future, but for now, lets just say the situation is complex enough that most people are just guessing when they talk about the duplicate content penalty. So when we move on to the topic of this post, duplicate layout penalty, we are on even shakier ground — but I’m personally convinced it exists.

Here is the scenario. I made a topic-specific website for a particular U.S. locality. It got a lot of traffic, and better yet, made a lot of money through Adsense. There is demand for that kind of information, and it pays off, so I create another website for the same topic, but a different locality — so all the data is different, but being lazy I used the exact same page layout, and just a different database. This continued through about five locations, each on different IPS servers. When they had aged a bit, so I could expect them to ‘catch-up’ with the first site in traffic and earnings, I noticed they were not doing so well. More specifically, the 2nd and 3rd site were rated one pagerank lower than the first, and the 4th and 5th were yet another pagerank lower.

Lower pagerank = lower traffic. Google has over half of the web-search market, so their pagerank is important. Here, I made sites, all equally useful, all with similar incoming links, and of six sites, one ranks pr4, two pr3, two pr2 and one pr0.

Well, maybe it was coincidence. But I remembered another disappointing set of sites that started out well, then didn’t live up to the first example — and I found the same pattern. It wasn’t just that the older sites ranked better, but at one year old, the first site had a big advantage, and the order in which sites were added predicted relative rank, with later sites ranking lower.

So why, I can hear skeptics ask, don’t newer blogs using Wordpress, rank poorly — there are tens of thousands (or 100’s of thousands?) of sites using the same basic layout. My only guess is that it is the same reason that 1,000s of news sites do not get hit with duplicate content penalties/filtering. Either the folks at google are so brilliant they developed a ranking program that can detect actual value, or (more likely) they programmed in certain exceptions to their ‘rules’.

November 16, 2007

Modular Design

An important factor to consider when designing a family of websites for your Internet business, is the ease of maintenance. You should always make your websites modular, so that you can make across-site changes by changing one file.

What is the best way to do that? Break the page down into sections, and put each in a separate file. You can use SSI includes, or PHP. If you don’t know PHP you really don’t need to learn much about it to use this technique. In fact, I’ll explain it all here.

This assumes you are using a hosting service that includes PHP, which they almost all do automatically. Now, if you want to use the .htm or .html suffix for your web pages, you need to have SSI turned on so you can use includes:
<!–#include virtual=”myfile.php” –>

Or you can add one line to your .htaccess file if you are using a Linux server:
AddType application/x-httpd-php .htm .html

Or you can just name your file with a .php extension, and then your includes will work without any further modification.

Now, suppose one section of your page that repeats is the footer. You put the footer HTML code in a file and save it as footer.php — then you can include it with:
<?php include(”footer.php”);?>

Or, if you have SSI turned on:
<!–#include virtual=”footer.php” –>

Either way, you will need to include the relative path if the file is not in the same directory as your HTML page.

Things get a little more complicated when you want to put the header section of your page in its own module. The header usually includes your keywords, page title, and other meta tags that change from page to page. There are two solutions to this problem.

The easiest approach is to put the variables in a database, then look up their values in the include file, and assign them to variables. Variable can be inserted in your HTML just like the include file itself was:
<title><?php echo($title);?></title>

A less satisfactory solution is to have a variable assignment file for each HTML file, that initializes those values that change from page to page. Changing these is just as much trouble as changing each HTML file on its own — but if you want to change other parts of your header, you still benefit from having separate modules. You can have the include file detect the correct variable file by giving it the same name as the HTML page it applies to, but with a different suffix. So you might use .inc files, for example. This bit of PHP code included in your header.php file will find the correct .inc file to match the .htm or .php file it is included in (assuming .htm as the suffix on the main HTML file):

<?php
$file=basename($_SERVER[’PHP_SELF’],’.htm’);
include($file.’.inc’);
?>

If you put the main content for your page in another include file, and use a consistent prefix and suffix, the same method will find the correct file, so long as it is in the same directory as the main HTML file:

<?php
$file=basename($_SERVER[’PHP_SELF’],’.htm’);
include(’x’.$file.’.php’);
?>

You can use anything you like in place of ‘x’.

To avoid getting annoying error messages that might reveal details of your page structure to hackers, your initial header file should include this code to turn off PHP errors:

<?php error_reporting(0);?>

That the number 0, not the letter O inside the parenthesis. When trouble-shooting your pages, looking for problems, you will want to replace the above line with:

<?php error_reporting(E_ALL);?>

Whatever scheme you use to modularize your websites, be consistent from one site to the next, and you will find it much easier to maintain your Web Empire. Being consistent in method does NOT mean having the same page design elements, links, file structure, etc.,  for your Internet business sites, as we will see in our next post on the hitherto unmentioned Duplicate Layout Penalty.

November 15, 2007

Multiple Websites

Why have more than one website for your Internet business? Well, there are several advantages, even if you have only a single product or service. The more different streams of income you have, the more advantageous the multiple-website strategy becomes.

Consider the simplest case first. Suppose I just sell widgets. Surely, my widget site has everything anyone needs to know about my widgets and how to buy them. Why would I need any other sites? Well first, if your single site doesn’t come up #1 in the search engines for the term ‘widget’ you are probably missing sales. Multiple sites help you achieve that #1 spot — and if you really work at, perhaps you can get #2 and #3 spots with your other widget sites.

Of course it does nobody any good if you just copy the information from your existing site to a new one. That is duplicate content, and it won’t show in the search results at all. What you need is another site that compares the different types of widgets. And another site on the history of widgets. And another site with pictures of the all the different types of widgets ever made. Etc., etc.

Some people try to put all that on one website, thinking they will then have the best widget website, so everyone will find it when looking for widgets. And to some extent, that works. But once on your site, what do people do? Oh look at the interesting widget history. Ah, I never thought there were so many types of widgets. And look at these pretty widget pictures … They get distracted by all the information and never get around to actually buying your widgets.

On the other hand, suppose you have multiple widget sites. Only people already interested in the history of widgets will go to your widget-history site, and there they find your sales site recommended for when they want to actually buy widgets. The same for each of your other sub-topic sites — they all funnel traffic (and web-link ‘credit’ for search engine ranking) into your sales site. Your sales site does not link back to your sub-topic sites. When someone finds your sales site, all they should see is information relevant to making the sale — no distractions!

Now suppose you have multiple streams of income — selling advertising space, affiliate links, your own products. A network of sites provides more ‘property’ for placing ads and affiliate links, while still funneling those persons interested in buying your product to your sales site or sites (one site per product usually works best, unless you have a product-line of related items).

In future posts we will look at how you produce, manage and optimize multiple web sites, so that your Internet business grows into your own Web Empire.