Today I took another look at the results of the ‘quicky’ study I described yesterday, after reading an interesting article by Bart Kosko on the superiority of the median as a descriptive measure of data that varies from the typical bell curve distribution.

I had noticed yesterday when figuring out the statistics, of the 20 cases (ten keywords, page one and page ten results) that I measured mean and median for, 19 had a median lower than the mean (or average). That suggests to me that the distribution curve for site size does not follow the typical bell curve, so it may be a good situation in which to consider the medians, rather than averages.

One measure of how likely measured results are to be real, rather than random variation, is how many of the measured observations agree with the averaged-out conclusion. In this study we have ten keywords — for how many of those was it true that the average page one results were larger than those for page 10? I figured that out, and found 70% of the keywords had an average page size greater on page one results.

Doing the same comparison with median values, rather than means, I found 80% of those were larger for the page one results, further emphasizing the probably correctness of the results. When I looked at the average median value for page one results, the value is  29.1; while the average median for page 10 search results is 25.8. So, while the averages suggested page ten results were six percent smaller than page one results, the median results show page ten results were 12% smaller.

One biasing factor I forgot to mention originally, is that half of the keywords I used were from a single industry — finance — things like ‘home improvement loan’ and  ‘low interest credit cards’. As I said, the original list of keywords came from those increasing in price for adsense — so they reflect the current condition of the American economy, where monetary concerns dominate. I need to repeat this study with a more random selection of keywords…