Thursday, May 21, 2009

when numbers go bad

Old story, old post. But a good example of how numbers can lie.

First there were the numbers: A while ago, in a story about the growth of blogging, Mark Penn of The Wall Street Journal reported that "there are almost as many people making their living as bloggers as there are lawyers. Already more Americans are making their primary income from posting their opinions than Americans working as computer programmers or firefighters."

Then there were more numbers. From the story:

Demographically, bloggers are extremely well educated: three out of every four are college graduates. Most are white males reporting above-average incomes. One out of three young people reports blogging, but bloggers who do it for a living successfully are 2% of bloggers overall. It takes about 100,000 unique visitors a month to generate an income of $75,000 a year. Bloggers can get $75 to $200 for a good post, and some even serve as "spokesbloggers" -- paid by advertisers to blog about products. As a job with zero commuting, blogging could be one of the most environmentally friendly jobs around -- but it can also be quite profitable. For sites at the top, the returns can be substantial. At some point the value of the Huffington Post will no doubt pass the value of the Washington Post.

All of which should make a smart person scratch her head.

Then there was this post on Ecoconsultancy by Patricio Robles who not only called out the WSJ on faulty data, but provided a good lesson on how numbers go bad. He writes:
The first glaring problem: he uses a hodgepodge of sources to come up with his argument. He assumes there are 20m bloggers (based on data from eMarketer), assumes 1.7m of them profit from their blogging (based on information promoted by BlogWorldExpo) and assumes that 2% of the bloggers out there can earn a 'living' from their blogs (based on Technorati's State of the Blogosphere Report).

But the biggest problem here is not just the hodgepodge of data. It's that the basis for many of his claims is Technorati's State of the Blogosphere Report, which was sent to a random sample of Technorati users and which was based on less than 1,300 self-completed responses.

Assuming that 2% of the approximately 20m people who are estimated to have 'blogged' at some point in the US equates to 452,000 professional bloggers simply because 2% of the 1,300 bloggers who responded to Technorati's survey can reportedly earn a living blogging is the definition of fuzzy math.

He also points out how using means rather than medians can skew the facts as well:

The difference between mean (average) and median revenues is huge; the median figures are far more likely to be realistic. If the median revenue reported by the 550 US bloggers who were actually active enough to respond to Technorati's survey was $200, what does that tell us about the 20m Americans who have supposedly blogged at some point?
Finally, he writes that a small data box accompanying the WSJ story was misleading:
More troubling: that the 452,000 blogger figure is included in a table that cites the Bureau of Labor Statistics as its source, giving the impression that the Bureau of Labor Statistics confirms that there are more professional bloggers in the US than firefighters, CEOs, computer programmers or bartenders. It doesn't say any such thing; the figures for all the other professions were provided by the Bureau of Labor Statistics, the blogger figure was inserted by Penn.
Apparently, Robles wasn't the only one to complain. At about 4:30 that afternoon, Penn updated his story, suggesting that his critics should do the math. I did. His methods still don't pan out. bk

1 comment:

tk said...

You can mathematically prove that one equals two. If you haven't seen me do it, just ask :). The main problem with numbers in stories is that if the math is faulty, then the whole story will be disbelieved. Numbers matter. Figures lie, and liars figure.