And it's not over yet.
I thought I'd just wrap up a few things this morning with the program and then spend the rest of the day writing. As it turns out, the steps to analyze the data are actually non-trivial and require some thought and more programming. Who'd have thought it?
The good news is that I'm done collecting all the results I want for this draft. I have a big spreadsheet made of sites going in one direction and topics going in the other, and each cell has a "weight" given to that topic by a site. That way I can compare the weights and see if there are any interesting patterns.
To be honest, nothing about the table is as interesting as the results I commented on from Digg. For example, USA Today, which has sort of a "Newspaper of the common idiot" vibe about it, does indeed have an excessively high amount of stories on Britney Spears and Paris Hilton. But the point I was making on Digg is that if Digg users accurately represent ordinary newspaper readers (which, you might reasonably argue, they don't) then taking "entertainment" and calling it "news" is not really an appropriate strategy.
A few other unusual results I found: New York Times gives a relatively high amount of attention to Paris Hilton also. Not in their top few stories, but distinctly in the top half.
Also, Fox News gives a surprising amount of coverage to serious news. When I included "Blackwater" in the results, I found that Fox News is the only paper that has recently given that topic higher priority than all the others. However, when I looked deeper into the individual stories they reported, I noticed that mostly they weren't by Fox reporters: they were stories that originated with the Associated Press, and then were just relabeled as "Fox News" and pasted on their site. I don't think anyone even filters it. In fact, Fox News has a much higher presence on Google News than more serious news organizations does, and it seems to be because they just automatically repost anything that comes their way.
I dunno, maybe I shouldn't stretch too hard to look for excuses to bash Fox. If I throw out theories like this then the paper won't seem very objective.
The paper today stands at 26 pages, of which 12 are actual substantive text. I have a long way to go - my eventual target is around 50 pages (including the padding), but I'll be happy with 30 or so pages for this draft. Luckily, I have a lot of tables and graphs to paste in; I have lots of material to steal from this blog; some philosophical discussion of user taxonomies that I can borrow from my term paper this summer; and I can always throw in code samples when I'm describing the program. I think I'll make it, but it's going to be a long night and another long day.
And after that, I get to start studying for my two midterms! Yay!