Sunday, September 30, 2007

Thesis saga continues: It's working

As per my last post, I'm now sitting here in Schlotzsky's Deli, which has free wi-fi, and my data mining program is just blazing along. I have my flag set to yell at me immediately if I get the "Are you human?" warning.

Maybe it's just because I've been listening to Harry Potter and the Deathly Hallows on tape, but my current frame of mind is that this is kind of exciting. Sort of like I'm sneaking around to my safe houses in order to avoid being apprehended by the authorities. It's the nerdiest cloak-and-dagger story you've ever heard, I bet. And by comical coincidence, I just checked my progress and it's looking at a story about Harry Potter from March '06 right now.

In a weird kind of way, this has actually helped me refocus my attention on how to attack the problem, a bit. Previously I was just indiscriminately grabbing all kinds of data, without regard to whether it was useful or not. Now that I know that my time is limited and I could be "captcha'd" at any moment, I've tightened my focus in a way that makes a lot of sense. I'm focusing on stories only within a specific time range, and only bothering to look at clusters of approximately average size. This way, I know that even if I'm interrupted in the middle and can't collect any more data at all, I'll still have plenty of information to work with.

This has also given me some new ideas on how to interpret the data, and I'm looking forward to analyzing it later. Eventually I won't need to worry about what Google thinks of me, because I can just read their stuff from my own private database.

No comments:

Post a Comment