January 15, 2008

Building Your Web Empire: The Plan

There are two ways to grow your Web Empire — make a plan and follow it, or just build it up a site at a time, without careful planning. I built mine the second way, because I didn’t know better. With hindsight, I’m sure my business would be much more profitable had I created a plan and followed it. But in the beginning I didn’t have enough knowledge or experience to make a reasonable plan.

Here, I am going to give you the benefit of my accumulated experience, and together we will lay out a plan for building a successful Internet Empire. In the end, success is measured in dollars, though for specific sites and specific goals we may have other measures along the way.

As we discussed a bit in the post First Websites, your first step will be to choose the topic your sites will be built around. First, choose a general topic that interests you, sports, automobiles, health, technology, etc. Be sure it is something people will pay money for — search for sites using your main subject as the keyword, and see how many of the sites have advertising, and how wide-ranging that advertising is.

If you chose dinosaurs, for example, you might find that most of the sites about dinosaurs have little advertising. Those that have ads are mostly for fossils, or educational material about dinosaurs, such as books and videos. This suggests that the field would be very difficult to earn money from — how much do people spend on learning about dinosaurs? If it is a passion you have to follow, then go for it — you can become the  leading vendor of dino-related material. But if your main goal is to make money, choose something people spend more money on.

For our sample plan, however, let’s take the difficult example, and see how we can build a plan around dinosaur sites. We will be building up our empire one neighborhood at a time, so we need to consider a more specific topic, within the broad subject of dinosaurs, that will be our first neighborhood. Lets look at some of the possibilities:

  • dinosaur ecology — how they lived, what their environment was like
  • dinosaur fossils — what we learn from fossils, how they formed, what they look like
  • digging dinosaurs — paleontologists and how they work
  • dinosaur evolution — how they developed and changed over time
  • dinosaur species — characteristics of each species, when they lived, how they looked

That is enough to get us started, though a closer examination of the topic might yield more sub-topics. There is a point at which they begin to overlap too much, since they are all inter-related.

Let’s put those subjects in the order we want to cover them. Our plan will only cover the first topic, but once that neighborhood is built-up, we will have these other topics waiting in the wings. Thus, as we research our first subject, we can gather information for future reference on the other subjects.

  1. dinosaur species — characteristics of each species, when they lived, how they looked
  2. dinosaur fossils — what we learn from fossils, how they formed, what they look like
  3. digging dinosaurs — paleontologists and how they work
  4. dinosaur ecology — how they lived, what their environment was like
  5. dinosaur evolution — how they developed and changed over time

This is a fairly logical progression, though you might have chosen another, it makes no difference in the long run. Our first decision is made, we will build a plan for websites exploring the different species of dinosaurs.

In our next post, we will begin designing The Plan.

January 14, 2008

Best Source for Public Domain Material

Your Web Empire should include at least one ‘anchor’ site with really great content you create yourself. But you just will not have time to write content for all of your sites, especially when you have several dozen of them. Sometimes you really need content that will not require much work on your part.

Public domain material fits that bill. I don’t know how many websites consist of material from the Gutenberg collection of texts, but I am sure it is substantial, as that is one of the oldest and best-known collections of public domain texts. It is not a great source for that very reason — any content from there is certain to be categorized as duplicate content when you use it.

There have been a couple much-publicized efforts to scan books, most notably those by Microsoft and Google. These books are available as PDF image scans, but they are also run through OCR software, and the texts are available. Depending on the quality of image obtained, some of these OCR scans are surprisingly accurate. The best way to access these books, as well as some from other projects (including Gutenberg) is to use the Internet Archive text collection.

This collection is huge enough, and growing fast enough, that even Google has not fully indexed it. Use the search function to find texts in the subject area relevant to your website, then download a few in the text format, and see if the OCR scan is accurate. If it is, choose a line of 40 or 50 characters from the text, and search it (within quotes) on Google. Oftentimes, you will get ‘no results found’ — indicating the text is not in the Google database. Run it through a spell-check to catch any OCR errors. Add it to your site, and when Google indexes it, yours will be the authority site for that text.

PDPhotoblog.com Playing At Work Image

There are also images, sound recordings, and video clips in the Internet Archive database, just use the drop-down menu to select from other formats to search for sound or images to supplement your textual material. You can build an entire site around public domain material that has not been indexed in the search engines!

January 10, 2008

Interpreting SEO Data

Now that I have a minute to spare, I’d like to elaborate on my previous post, where I warned newbies not to accept all of Jonathan Leger’s conclusions. The particular post that made me question his ability to reason correctly was the one called Does PageRank Really Matter for Ranking in Google? (I have only read three or four posts on his blog so far, so one bad example out of four is a fairly poor average — maybe it is the only bad post though, I’ll have to read more to weigh in on that question.) His own data clearly shows PageRank does matter, yet he concludes it doesn’t. How can that be?

His data shows these results for the top ten links averaged for 500 keywords:

1. 6.722
2. 6.866
3. 6.292
4. 6.234
5. 5.968
6. 5.88
7. 5.73
8. 5.662
9. 5.656
10. 5.604

So how would that look if PageRank really didn’t matter? It would be a random distribution, so the overall averages would tend to be about the same for each position. I also suspect the numbers would be much lower, since the average PageRank for a random selection of pages would include far fewer high-ranked pages. I think an average around 2 or 3 would be much more probable, but regardless of what the actual number would be, it would be about the same for all positions, just varying slightly at random around the mean.

So how did Jonathan arrive at the brilliant conclusion that PageRank does not matter? He found that about one-third of the time a higher ranked page followed one of lower value. And 14% of the time it was a substantial difference (3 points of Pagerank or more).

So what does that tell us? It clearly demonstrates that PageRank is not the only factor that goes into the choice of what order result pages will be listed in — but that is a far cry from saying PageRank doesn’t matter at all! In fact it shows that most of the time (about 2/3 of the time) any two first-page results will be in PageRank order, with the higher ranked page on top. His own data refutes his conclusion.

Still, I like the fact that Jonathan actually tests things to see what effect they have. That is a step up from most Internet marketing advice, which is based on the consensus of guesses found on the forums.

January 9, 2008

Persuasive Writing

I don’t have time today to write a meaningful post, but I thought I’d point you at a blog post by someone who, to my way of thinking, makes sense. Too many fail that basic test. I suggest you read this — it begins with persuasive writing, which is one of my (admittedly) weak-spots, and goes on to talk about evaluating offers. Since I mostly agree with what Jonathan has to say, I’m recommending his post:

Cutting through the fluff.

A thoughtful post … check it out.

Oops … continued looking at that site. While the post I cited has good information, I can not wholeheartedly recommend the  author … some of the other material is crap. If you have the experience to know which is which, check out everything there … if you are a newbie to internet marketing, just read the post I recommended, and leave the rest for another time…

January 8, 2008

Another Look

Today I took another look at the results of the ‘quicky’ study I described yesterday, after reading an interesting article by Bart Kosko on the superiority of the median as a descriptive measure of data that varies from the typical bell curve distribution.

I had noticed yesterday when figuring out the statistics, of the 20 cases (ten keywords, page one and page ten results) that I measured mean and median for, 19 had a median lower than the mean (or average). That suggests to me that the distribution curve for site size does not follow the typical bell curve, so it may be a good situation in which to consider the medians, rather than averages.

One measure of how likely measured results are to be real, rather than random variation, is how many of the measured observations agree with the averaged-out conclusion. In this study we have ten keywords — for how many of those was it true that the average page one results were larger than those for page 10? I figured that out, and found 70% of the keywords had an average page size greater on page one results.

Doing the same comparison with median values, rather than means, I found 80% of those were larger for the page one results, further emphasizing the probably correctness of the results. When I looked at the average median value for page one results, the value is  29.1; while the average median for page 10 search results is 25.8. So, while the averages suggested page ten results were six percent smaller than page one results, the median results show page ten results were 12% smaller.

One biasing factor I forgot to mention originally, is that half of the keywords I used were from a single industry — finance — things like ‘home improvement loan’ and  ‘low interest credit cards’. As I said, the original list of keywords came from those increasing in price for adsense — so they reflect the current condition of the American economy, where monetary concerns dominate. I need to repeat this study with a more random selection of keywords…