Wednesday, March 21, 2007

Operation: Help Me With My Thesis - episode 1

Well, here we go. In about nine months, assuming everything goes well, I will be the proud bearer of one Master's Degree in software engineering. And it's time to start thinking about... (cue the sinister music) the Master's Thesis. It's not due until November, but I've seen hollow-eyed fellow students rushing to get it done in their last few months while simultaneously studying for finals and doing class projects. Based on the stress levels I've already experienced, this is not for me, so I need a topic and a start ASAP.

Here's my plan. I really liked my course in data mining, so much that I've been planning for a while to ask Dr. Ghosh to be my adviser. He says he's very busy through the summer, but we can meet in May and get me started. So basically, that's how long I have to really start fleshing out an idea for a project that involves data mining... something.

As I've mentioned before, I'm very interested in the whole Web 2.0 paradigm. People-powered encyclopedias. People-powered politics. People-powered news. People organizing the internet. And oh yeah, blogs. All those blogs.

All those people are generating literally tons of data, which I'm sure needs to be mined in some new way that hasn't been tried before, to figure out some new and surprising bit of internet psychology. I don't know what that is yet. My idea right goes something like this.

Step 1: Web 2.0
Step 2: Data mining
Step 4: A completed master's thesis

I think I may be missing a step, so help me out! What could be more fitting than to ask for a people-powered topic? Post a comment, leave a suggestion. If you know people who do work in web 2.0 or mining or are even interested in those topics, please mail them a link to this post. The future of the free world may depend on it!

Well, not really. But I'd sure like to graduate.


  1. Nephlm10:22 AM

    Never got passed by BS in CS so I'm not sure what constitutes a Masters Thesis, but I'll assume it is merely an interesting project.

    I would look into whether information can be gleaned across sites. Can you corelate (though certainly not prove) the Kazim at one place with a Kazim at another. If you can is it possible to draw a cluster map of what sites are grouped together based on shared membership?

    That's my pie in the sky, that would be neat project. You'd need to be able to vacum a number of sites and come up with some probabilities of entity equivalence.

    Another interesting question determining skew in product/ideolgies. People only produce content for web 2.0 sites if they emotionally involved in the matter in some way (possitive/negative). This would seem to produce skew in the produced content. Investigating that would seem interesting.

    Just some thoughts.

  2. Anonymous4:48 PM

    how about paradigm shifts in the technology, and/or RTS?
    The hardware technology is moving toward data storage on higher capacity media drives while getting physically smaller, and faster. Or mabe you could do something on IM, and ways to keep out the bad guys (for a while)
    Probably oversimple but it could be a minor step in the overall thesis.

  3. Anonymous3:51 PM

    This may be so far from what you're asking far that it makes a roadtrip to Lubbock seem quick, but here goes...

    The nonprophets show for today just ended, and I was struck by the idea of researching what people really want. I've had a frequent thought along those lines: the vast majority of people seem to hate the fluffy, flippant, superficial way news, national and local, is broadcast across the country, but every network copies all the others in a frantic attempt to gain market share.

    I would love to see a study done into what do people really want from their televised news broadcasts. And if such a broadcast existed, would enough people actually watch it to make it profitable and eventually improve the industry.

    If this is too far off from what you wanted, then I apologize for wasting your time.

    Keep doing great show.

    Mike Ellis
    Dallas - the city of large blonde hair

  4. I would love to see a study done into what do people really want from their televised news broadcasts. And if such a broadcast existed, would enough people actually watch it to make it profitable and eventually improve the industry.

    I really like that idea -- not just for televised broadcasts but for news in general.

    Here's an idea. and other sites give a sort of overall picture of what people like the most in their news stories, because they are able to post stories and rate them. Suppose I took the average rating of a news story on a given subject and compared it to the number of times that the story appeared in the news, across all news sites.

    The first would tell us what the audience believes is a worthwhile story, while the second would tell us what the major news outlets believe are worthwhile stories. Stories might be categorized as being about celebrities, law, politics, etc, and could also (potentially) be identified as of interest to liberals or conservatives. I wouldn't want to do all that work, but I could also browse user profiles and see how they identify themselves, and therefore identify what kind of demographic they're in.

    So say you have a story like the Scooter Libby trial. I would expect clustering to show that liberal users are more interested in the story than conservatives, so it might be identified as a liberal story about law and politics. But it would also have a certain level of general interest to all users, which could reveal whether that is considered a "good" story overall or not.

    This could be an interesting starting point. Anybody know what other sites besides Digg might yield a rating system for news?

  5. I just saw a video in my podcasts today before I got to the Non-prophets Radio podcast and it seemed very much on this topic. Hopefully something here will inspire you (despite being kind of arty).

    And the website of the person who is the subject of the video...

  6. (didn't look like my comment went through the first time)

    Just make sure you write your thesis before the Web 2.0 bubble bursts ( Regardless, here is an interesting topic I was thinking of working on as well, web2.0 and data mining the news and using diagrams al la Edward Tufte to represent the data:

    couple of resources:

    They visually represent disasters on a map (they supply feeds as well):

    Flight 404:
    He does some interesting data visualization with Processing

    Keep up the good fight on the Atheist front

  7. Mark Davis12:34 AM


    Have you considered how all this data generated by users can contribute to the emerging "long-tail" markets in internet commerce?

    In case you are not aware, the long tail phenomenon is a term applied to the growing ability of niche markets to be profitiably tapped thanks to the internet so drastically lowering the barriers to finding the specific goods and services you are interested in, not matter how far off the beaten path they are. and Netflix are 2 examples of businesses thriving on the long tail markets for obsucre books and DVDs that might not otherwise be available to consumers interested in them.

    I'm sure there has to be utilility in discovering better ways to use the data generated from web 2.0 sites to feed into the development and marketing of niche products and services. Just an off the top of my head idea that might (or might not) be applicable to what you are doing. Good luck to you on your thesis.