22 agosto 2006

De novo sobre bloguismo

Mais um trabalho destinado a enriquecer o nosso conhecimento sobre o bloguismo.
Blogs Count
Jeffrey Henning

5/26/05 - 3:01 AM - The Wall Street Journal tackles the difficult issue of blog statistics: Measuring the Impact of Blogs Requires More Than Counting. Carl Bialik makes many good points, specifically as to why people are obsessed with counting blogs but not web pages:

First, let's step back and consider why we're counting blogs at all. You no longer see articles that attempt to demonstrate the legitimacy of the Web by stating how many Web pages there are. But blogs are still in the process of entering mainstream consciousness, so numerical credibility is important; bloggers themselves cite the statistics a lot.

It's a daily struggle for "numerical credibility", and Mr. Bialik has a great track record in his column, The Numbers Guy, yet in this case I'm not certain he's right:

No one has sole control of the definition of blog, but it seems to me that for the sake of counting, Technorati and BlogPulse are right to exclude the private blogs. That puts their estimates below those from some other analysts, but the companies are focusing on what they can directly count, and relying less on estimates.

I told Mr. Bialik on the phone that I didn't have a firm estimate of private blogs--which I defined as blogs that are visible only to the author's friends--but if I had to guess I told him it was less than 5% of the blog population we were reporting. Since I told him that on the record, I'm certain he defines "private blogs" differently than I do, since not only does no one have sole control of the definition of "private blog", it's not even a common distinction.

Perhaps Mr. Bialik is defining "private blog" as blogs with few readers. If that is the case, Technorati and BlogPulse don't exclude such blogs at all; they both index many LiveJournal accounts, even though LJ accounts have a median of five defined friends. Hardly "public" blogs. Since the indexing services include these, that's probably not his definition.

More likely, as I think about it, he is defining "private blogs" as "blogs that don't ping anybody". Since the blog-indexing services, and let's include IceRocket here, each report 10 to 11 million blogs, that appears to be a solid estimate of the number of "pinging blogs". As Mr. Bialik explains:

It turns out that counting blogs isn't as hard as counting Web pages. When writers who use common blogging software want their blogs to be publicized, they choose to automatically "ping" computer servers for companies like Technorati Inc. (www.technorati.com) and Intelliseek's BlogPulse (www.blogpulse.com), whose goal is to measure and index blogs.

Oh, wait. We've just defined blogs as blogs that tell index services they exist. Convenient for the indexing services! But web pages don't have to tell anyone they exist to be considered web pages.

OK, I honestly have no idea yet what Mr. Bialik means by "public blog" vs. "private blog". I respect Mr. Bialik, and anyone who tries to explain statistics, as he does in his regular column, and I look forward to his response to better understand how he defines blogs and private blogs.

Quick aside: I find that many disagreements come from failure to define terms, as when I attended BlogNashville and heard Dave Winer say the economy is bad and heard Stan of Two Minute Offense say the economy is good and neither could believe the other. I believed them both; over dinner that night, Dave defined the economy to us as foreign exchange rates, stock market performance and the rise of offshoring, while Stan later posted defining the economy as current unemployment, inflation, and interest rates. It's easy to instinctively dismiss comments you disagree with. As a survey researcher, I was trained to assume respondents are telling the truth and figure out how what they are saying is true from their perspective. I've found it a useful exercise outside surveys.

After first talking with Mr. Bialik, I was feeling rather blue about the direction our conversation went, and on a lark I grabbed sixty blogs we had found that had the word "blue" in them. Here's what I e-mailed him:

Out of 60 blogs in the subsample we checked, here are 47 that aren't in Technorati:


The way to check is to go to Technorati, copy and paste in the URL. If it displays the page title and the time last updated, it's in its database.

Now, presumably Technorati uses search submissions to expand its directory, so some of these may end up in the index in the coming week.

I think Technorati, BlogPulse and IceRocket have done a great job at indexing the blogs most frequently linked to, but they are missing the long tail.

I have a lot of admiration for all three indexing services and use them regularly. That said, I know they aren't counting all the blogs, especially old blogs and one-post wonders. Indexing firms are of course not trying to prepare an estimate of blog population: they are just trying to find, count and index as many recently updated blogs as they can, and do it faster than their competitors.

Google, for its part, tells you how many pages it's indexed: 8,058,044,651 web pages, at the moment. It doesn't tell you how many web pages exist, because it follows a whole protocol of excluding web pages you don't want indexed. If Technorati is to blogs as Google is to web pages, then Technorati shouldn't be used as an estimate of worldwide populations.

Nor should Perseus numbers be used to estimate the worldwide population of blogs. I originally did a random sample of hosted blogs because I noticed nobody else had, and I wanted to have something interesting to talk about at Dave Winer's first Bloggercon. Further, I've specifically stated again and again that we are not trying to count all the blogs in the world, only those blogs on hosting services.

Why limit ourselves? Because we can apply a form of random sampling to those blogs by randomly generating blog URLs. Random sampling is not foolproof--no research method is, and random sampling of blog account names is less reliable than random-digit dialing--but random sampling is a lot more reliable than choosing a self-selected audience, as George Gallup definitively demonstrated in 1936. (Blogs that ping are a self-selecting audience.)

What, gentle reader? You don't trust samples? OK, next time you're at the doctor's office, tell them you don't trust samples and that you'd like them to take all your blood instead!

How many blogs are there worldwide? Certainly more than the 10-11 million that have been indexed, and more than the 31.6 million hosted blogs we've estimated exist. Maybe as many as BlogHerald's estimate of 60 million. Whatever the number is today, the key point is that it is growing rapidly.

So, finally, let's step back and consider why we are counting blogs at all.

Because millions of people finding a new way to communicate and connect is exciting and empowering!

Sem comentários: