Showing posts with label grad school. Show all posts
Showing posts with label grad school. Show all posts

Thursday, December 06, 2007

Zero hour

Fall 2007 semester (Class weekend Schedule)
August 17-18, 2007
September 14-15, 2007
October 12-13, 2007
November 9-10, 2007
December 6-7, 2007
---> *** Graduation December 7 *** <---

Two final exams tomorrow.

(Cracks knuckles)

(Cracks neck)

Let's rock.

Tuesday, November 06, 2007

Two quick thesis updates

I finished the second draft of my thesis last Friday and submitted it to my supervisor. It's 64 pages, including 17 pages of "padding" in the form of indexes, glossaries, title pages, etc, which are required in the official template. I think it turned out pretty well, though I'm still waiting on feedback from Dr. Ghosh sometime this month.

If you want to see for yourself, you can take a look at the draft here. Also, you can play around with the data I collected (in a very limited way) by visiting the web interface here. The main point of interest is the graph on page 48 (though it's actually page 36 if you go by the numbers at the bottom of the page). This graph shows the emphasis given to celebrities by news sources, compared directly against the interest shown in the same topics by Digg readers. Not completely surprisingly, people are not as into sensational news as TV and print news seems to think they are, at least not according to the way I interpreted my data.

Yesterday I went to visit a journalism professor at UT, a guy named Maxwell McCombs, who invented the "agenda setting theory" of journalism that I referenced early in my paper. I explained the subject of the thesis and he seemed downright enthusiastic about it. He said "I certainly hope you're planning to publish this!" I said that I don't know how the publishing process works, not being particularly involved in academia. He gave me the names of some journals that might be interested, and then asked me to send the working draft and he would do some reading on the subject and get back to me. So, that's neat... nice to have your work validated. And if I actually get this published, maybe that will open some doors for me.

Saturday, October 13, 2007

Cresting that hill

Now I'm mostly ignoring a lecture in my Software Engineering class. This weekend is the midpoint of my final semester. I've done one midterm, and I have one more scheduled for the afternoon. My report draft was finished earlier this month. After tomorrow, all I'll have left is one or two homework, finalizing my thesis, and the finals. I feel like I'm getting over the top of a very long, slow rollercoaster, seeing the track ahead, and getting ready for the downward ride.

My graduation ceremony is December 8. I don't expect anyone to come except my family, but you can email me if you want to be there.

Monday, October 08, 2007

First draft completed

Yesterday I brought my thesis report up from 23 pages to 45 pages before calling it a night at midnight.

It's not the most spectacular writing I've ever done; it'll need lots of proofing and major details are still missing. But a friend of mine told me "It's better to have a thesis report that's DONE than one that's GOOD." So, I'll reread it a bit tonight and then send it to the appropriate people.

Saturday, October 06, 2007

Longest Saturday ever...

And it's not over yet.

I thought I'd just wrap up a few things this morning with the program and then spend the rest of the day writing. As it turns out, the steps to analyze the data are actually non-trivial and require some thought and more programming. Who'd have thought it?

The good news is that I'm done collecting all the results I want for this draft. I have a big spreadsheet made of sites going in one direction and topics going in the other, and each cell has a "weight" given to that topic by a site. That way I can compare the weights and see if there are any interesting patterns.

To be honest, nothing about the table is as interesting as the results I commented on from Digg. For example, USA Today, which has sort of a "Newspaper of the common idiot" vibe about it, does indeed have an excessively high amount of stories on Britney Spears and Paris Hilton. But the point I was making on Digg is that if Digg users accurately represent ordinary newspaper readers (which, you might reasonably argue, they don't) then taking "entertainment" and calling it "news" is not really an appropriate strategy.

A few other unusual results I found: New York Times gives a relatively high amount of attention to Paris Hilton also. Not in their top few stories, but distinctly in the top half.

Also, Fox News gives a surprising amount of coverage to serious news. When I included "Blackwater" in the results, I found that Fox News is the only paper that has recently given that topic higher priority than all the others. However, when I looked deeper into the individual stories they reported, I noticed that mostly they weren't by Fox reporters: they were stories that originated with the Associated Press, and then were just relabeled as "Fox News" and pasted on their site. I don't think anyone even filters it. In fact, Fox News has a much higher presence on Google News than more serious news organizations does, and it seems to be because they just automatically repost anything that comes their way.

I dunno, maybe I shouldn't stretch too hard to look for excuses to bash Fox. If I throw out theories like this then the paper won't seem very objective.

The paper today stands at 26 pages, of which 12 are actual substantive text. I have a long way to go - my eventual target is around 50 pages (including the padding), but I'll be happy with 30 or so pages for this draft. Luckily, I have a lot of tables and graphs to paste in; I have lots of material to steal from this blog; some philosophical discussion of user taxonomies that I can borrow from my term paper this summer; and I can always throw in code samples when I'm describing the program. I think I'll make it, but it's going to be a long night and another long day.

And after that, I get to start studying for my two midterms! Yay!

Sunday, September 30, 2007

Paper abstract

For anyone who's interested. I want to take this opportunity to repeat my thanks to those people who suggested directions to go in when I asked for help earlier this year.

In recent years, major news corporations seem to dedicate an increasing amount of time and space to "fluff," reporting on celebrities, entertainment and crime stories, rather than more essential national and international news. As such news content is increasingly gathered online, it has become feasible to aggregate large amounts of data from a wide range of sites. This report proposes a model for collecting information from news agencies, then applying the techniques of Data Mining to organize this reporting in a way that identifies the priorities of individual organizations.

In addition, the rise of user-based taxonomies has made it possible broadly to evaluate the interests of people who actively read and recommend news. In the final analysis, data collected from users of Digg.com are compared with data collected from media sites. This provides a benchmark for determining whether the delivery of "fluff" news is delivered is a fair response to popular demand, or whether typical news readers are dissatisfied with the level of serious event coverage found in the media.

Saturday, September 29, 2007

An unexpected hazard of mining other people's websites for information:

Sorry for the deluge of long computer sciency posts. The thing is, it's helping me to blog about my thesis. Earlier this week when I posted some comments about my research, I pasted the whole post into my paper and got another three pages out of it. Awesome! It needs some editing, but there's plenty of solid material in there. So, let's see if I can get away with writing the whole paper just by blogging. You, dear readers, will just have to decide whether to suffer through these posts or skip them. Unfortunately, tonight's commentary is about a big setback.

My web skimming program has been having a field day with the Google news archive. I'm currently pulling stories from back to a year and a half ago. Before dinner tonight, I picked up 2000 new Google clusters on "John Edwards." I was pretty cheered by this progress.

When I got home, I fired up the program again and started searching the year for clusters of "Anna Nicole Smith"... and got nothing. Not a single hit.

This was kind of bewildering to me. I tried a few more times, digging through it with the debugger. Nothing. So finally I pulled out the URL of the search page my program was looking at, and pasted it into my browser. I got this message:

403 forbidden

We're sorry...

... but your query looks similar to automated requests from a computer
virus or spyware application. To protect our users, we can't process
your request right now.

We'll restore your access as quickly as possible, so try again soon.
In the meantime, if you suspect that your computer or network has been
infected, you might want to run a virus checker or spyware remover to
make sure that your systems are free of viruses and other spurious
software.

We apologize for the inconvenience, and hope we'll see you again on
Google.
To continue searching, please type the characters you see below:
[Typical captcha text returned]

UH-oh. I experienced a bit of temporary jumpiness as I realized that Google noticed I've been hitting their server really hard and really fast. I typed in the confirmation text, of course, and it let me view the page. But I tried the program again, and it still didn't work.

I did some research, winding up at this post. I don't really get the details, but it sounds like Google has been targeted by malicious spyware programs in the past, which do tons of web searches that somehow uncover target servers that are vulnerable to attack. Then they install copies of themselves on those target servers, which in turn do more malicious searches on Google's site.

So, yeah, that's pretty neato that they catch bad guys. Unfortunately, they also catch me. That's bad. I have a thesis that needs finishing.

I decided to wait a few hours, and in the meantime I put in some code that makes it pause for five seconds before it gets a web page. I don't want to annoy them.

A few hours later, the spam catcher stopped harassing me. I let the program run for a while longer, and it managed to walk through a couple thousand more clusters, all from the month of March. But then it stopped again, with the same message. This time I had a break in there to kill the program before it started failing a bunch more challenges.

This is going to be a slow process. I want my data. Now. I might consider bumping the delay up to thirty seconds in the morning.

Also, I suspect that Google is making a note of my ISP to determine that I am an evildoer. If that's the case, then maybe I can get around it by wandering around town with my laptop. I'll go from one wireless hotspot to the next, grabbing a few thousand entries here and there, until I've got the whole year's worth of material.

Monday, September 24, 2007

Data mining the news (ongoing work)

My thesis is about using data mining to analyze the relative emphasis that traditional media outlets give to various types of stories. Then I'll be comparing this data to the emphasis that actual news consumers who inhabit Digg.com give to the same stories. My point is to discover which types of stories are overplayed or underplayed, and come to some sort of conclusions about which types of news sources best reflect the pubilc interest.

To that end, I've written a big Java program around an online MySQL database. In the last few days I've cataloged about 22,000 news pages, although only a small number of them will ultimately turn out to be important to the study. I've labeled roughly a dozen web sites and a dozen news topics as "interesting." The sites are:
  1. www.washingtonpost.com
  2. www.nytimes.com
  3. www.foxnews.com
  4. www.guardian.co.uk
  5. online.wsj.com
  6. www.usatoday.com
  7. www.cnn.com
  8. www.townhall.com
  9. www.washingtontimes.com
The stories are:
  1. Rudolph Giuliani
  2. Anna Nicole Smith
  3. Harry Potter
  4. Tiger Woods
  5. Rupert Murdoch
  6. Barack Obama
  7. Gulf Coast
  8. Mitt Romney
  9. New Orleans
  10. Hillary Clinton
  11. Britney Spears
  12. Blackwater
  13. Ron Paul
Crazy lists, aren't they? There is some method to this madness. With the stories, I tried to get a reasonable sample of popular topics, some of which are serious and some of which are decidedly unserious. I have a lot of presidential candidates in there since I'll be especially interested to compare who's being covered vs. who people WANT to be covered. For instance, my hunch is that expecting that Ron Paul is a topic of interest much more for Digg readers than for media outlets. Ron Paul seems to have some kind of word of mouth campaign going on where libertarian fans of his call shows like C-Span and post on blogs all over the place, whereas the news seems to be largely ignoring him. I'm not a Paul support, except to the extent that I think he's clearly the least evil Republican in the race.

With the web sites, the idea is to have a variety of media sources. Some are considered serious news sites; some are "fluff" news (I picked USA Today specifically for that reason, and it's possible that CNN will tend to fall in that category as well); and several are explicitly right wing rags. To be fair, I really would like to have included left wing rags, but the only ones I can identify are blogs, which are not treated much as news sources. The news is all pulled off of news.google.com. I search for the topics of interests, then read the resulting stories more or less indiscriminately and identify which site each one comes from.

Based on this, I have a total of nearly 2000 "news" sources, ordered by the number of stories found in searches since I started collecting data. In the stories I've pulled so far, after about three days of serious searches on the 13 topics, the New York Times and the Washington Post (my main "serious news" sites) each account for 104 stories. But dailykos.com has shown up zero times, so I guess there's a master list that they're clearly not on. TPM Muckraker and TPM Cafe both show up, and those are both explicitly liberal sites, but there are only 8 stories from them. "The Nation": 9 stories. So, liberal sites = small sample size. No use.

By contrast, townhall.com, whose "about" page proudly announces that they were founded as a "conservative web community," accounts for 123 stories. Yes, you read that right: for the topics I picked, townhall is treated as "news" more often than either the New York Times or the Washington Post. So, bottom line, I get to pick on right wing news sources more than left wing news sources, simply because left wing news isn't "news."

Almost time for the Daily Show now, so I've managed to procrastinate this long. Go me!

If anyone would like to make further contributions, feel free to suggest other story topics that are in the news. Anna Nicole Smith and Harry Potter aren't actually generating very many headlines these days, so I need more unserious topics that the media uses as padding these days. Suggestions? And if you have more right-wing, left-wing, or "mainstream" news sources that I should be looking at, make some suggestions. I'll check my database and see if there are enough stories represented to get something useful out of them.

This is it. I'm officially in grad school hell.

Bless me father, for I have sinned. It has been two weeks since my last blog post.

So -- ha ha -- did I think that semesters like this one and this one were tough? Bugger that, this one takes the cake. The first draft of my 50-ish page Master's Report is supposed to be done in early October, so I've been focused on that for the week since my last class. Meanwhile, in my next class weekend I have one homework and two midterm exams.

I spend an entire weekend working non-stop on my thesis, then I got to enjoy going back to work fresh on Monday. My boss gave me Friday afternoon off, which was a nice gesture, except of course for the fact that I used it to do schoolwork.

I spent most of Saturday at a coffee shop on campus. Actually driving to campus was a stupid plan, because apparently there was this little football game going on that I wasn't thinking about. I was originally planning to go to the library and renew my TexShare card, but parking turned out to be impossible. So, coffee shop. Nice thing about UT is that it's so wired you can actually get wireless internet from everywhere, included some parking lots.

My work's really taking shape now. I've filled out the 14-page template for my report, which feels like I've accomplished some real work even though only two pages of actual double spaced text are written.

I meant to start working on the homework tonight; however, I've been so brain-fried that I mostly just ran the data collection program, stared at the news for a while, and did a whole lot of nothin' else. Blogging is just another form of procrastination, which I think I will continue to do until the Daily Show starts, at which point I will concede defeat for the evening. There's always tomorrow.

I was going to write more about my thesis in this post, but I'd rather keep this one strictly a post wherein I bitch about the trials of being a grad student, and cleanly separate the stuff about what kind of work I'm doing into a separate post. I think blogging will help me overcome writer's block in adding more detail to the report, so humor me, dear readers. See you in the next post.

Thursday, August 30, 2007

Beautiful sentiments about programming

Wrapped up in grad school as I am, it's easy to lose sight of the big picture, and why I got involved in this career path in the first place.

For my classes in Software Engineering and Management, I have to read The Mythical Man-Month by Frederick Brooks. I know the book by reputation; as it was first published in the 70's, I presume that the material is very old news to many people who share my interest in programming. Even so, this is new to me, so I wanted to share a passage from the book that I personally found very inspiring.

"The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.

Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men."

Oh yeah.

Friday, August 17, 2007

Happy class day

It's here already: the first day of the last semester. I have roughly three months to finish my master's thesis -- about which I probably write more pretty soon.

I'm taking "Introduction to Software Engineering" and "System Engineering Program Management and Evaluation." This is the only semester when I've taken two classes that are both "concepty" rather than "mathy" or "programmy." However, since my thesis is both mathy and programmy, that more than fills this particular void in my life.

Anyway, here's my term paper from summer. The professor mailed me an evaluation, writing simply:

The paper describes issues in web tagging with several examples. The paper
is in the form that it can be submitted to a computer magazine with
little effort.

That sounds pretty complimentary. Anyone out there who works for a computer magazine? :)

Thursday, May 17, 2007

Sweet dreams are not made of these

This is another one of those school posts, so you can skip it if you don't read those...

I had another one of those dreams last night. It's the last week of school; I'm almost ready for finals. Then somebody asks me how I did on my homework in another class, and I realize this is
a class that I was originally taking at the beginning of the semester, but I have forgotten to attend for the last month or two. There are two such classes -- I thought I was taking just two classes for the semester, but I suddenly remember that it used to be four. There has already been a homework that I have missed in each class, and I'm woefully unprepared for both finals.

The really funny part is that in my dream, I'm thinking: "Oh no, this is just like one of those dreams I'm always having! Only this time it's real!" And then I woke up, and it still took me a few more minutes to realize it wasn't.

Also at another point in my dream, I was using my laptop on a stove because there were no other convenient surfaces to work on. I just had a shallow frying pan sitting on the stove, and the laptop was resting inside it, and I had a chair pulled up to the counter. So I'm working for a while when suddenly I realize that (of course!) the burner's been on. I think "Well, maybe I caught it in time." But when I turn the computer over and look, the bottom is all melted off and there's a big mess of singed wires and stuff underneath.

By the way, as for my ACTUAL finals, they went just fine. One of them was fairly easy and straightforward, and I feel pretty sure of an A in the class. The other one was hard, almost unfairly so. But the entire class, out in the hall afterwards, ALL looked miserable and we all had a good bitch session about how unfairly hard it was. That's good news for the curve, and this professor has been generous with some grading in the past, so I feel reasonably optimistic on the whole.

I mostly have these nightmares after school is over and I don't have as many real things to worry about. Although I did have another dream during finals week, where my high school teacher Mr. Laeser showed up and told me that I was going to have to work on another large project for HIM during the last six months while I try to get my thesis ready.

UPDATE:

This just in: I got an A in Real-Time Systems, the class with the brutal final. I got a 54 out of 70 on the final; the class average was 44. Yes, I AM that guy who ruins the curve for everyone. :)

Party time!

One thing I have to say about Dr. Mok, he gives really bad assignments and tests, but he makes up for it by being ridiculously generous with the grading. I had no clue what I was doing on half those questions, and there is no way I really deserved a 54. But hey, I'm not complaining. Seriously.


Thursday, May 10, 2007

Spring semester home stretch

I'm completely done with one homework, 90% done with the other! I'm also about 75% prepared for both of my tests this Saturday. It feels pretty good and reasonably non-panicky, compared with the ends of other semesters.

Then, this Sunday: Six Flags! (Ben is over 42" tall now, which means he gets to ride on the not-totally-sucky rides.)

Saturday after that: Performing Schubert and Bach! (I just hope I can get through the show without totally screwing up my fellow tenors. I had to skip two rehearsals this week on account of exams. I DID warn the director ahead of time, and asked him if he wanted me to sit out this performance as a result. He said "Stay in... you seem to be pretty solid on your part right now." Okay, who am I to argue?)

Coming next: Summer class on web servers! I get serious about my Master's Thesis!

Sunday, April 22, 2007

Operation: Help Me With My Thesis - episode 2

Thanks to everyone who responded to my request for Master's Thesis ideas. As I mentioned in the comments section, I'm planning to do some news analysis using sites like Digg.com, reddit.com, and perhaps del.icio.us.

I like to say that the this topic is partly inspired by Anna Nicole Smith, since around the time I thought of it, Smith died and for some reason completely monopolized cable news for several weeks. I kept wondering: Why in the world do they think people care about her? People die all the time. As celebrities go, she wasn't particularly interesting. Do people actually read this stuff?

Web 2.0 can give sort of a handle on answering this question. At Digg.com and similar sites, people actually rate the news by voting it up or down. A given news item will get an overall "score" for how many people voted for and against it.

Now suppose you take the average rating of a news story on a given subject -- let's stick with Anna Nicole Smith as the example -- and compare it to the number of times that that subject story appeared in the news, across all news sites. The first number would tell you what people want to read about. The second number would tell you what is being presented most often as news. We could probably normalize this by what section of the newspaper it appears in -- for example, a story that appears on the front page is considered more important than one that doesn't; a long story may be more important than a short one.

So the question at hand is: how successful are news sources at generating information that people want? Are readers really treating their news as entertainment, or do they recommend hard hitting investigative reporters much more heavily? And what about media bias, either liberal or conservative?

In theory, it may be possible to quickly identify stories as leaning towards a liberal or conservative position, perhaps by cross-referencing them with the people who recommend them. Then what? Well, suppose it turns out that there are more liberal stories than conservative ones in the media... but suppose also that the liberal stories tend to be rated higher and read by more people than the conservative ones. That might indicate that, for instance, the idea of what "liberal" means is out of sync with the political center. Of course, it could go either way, and I'll be interested to try to come up with a measurement that doesn't bias the results.

There are tons of flaws with this topic, and I'll acknowledge some of them up front. For starters, those who subscribe to Digg almost certainly do not constitute a representative sample of all people in the country who read the news. So there's no way I can think of to justify any claims about all people nationwide. However, just investigating this cross section of people, and seeing what they like, could be useful and interesting in various ways that I haven't thought of yet.

When I talked about this topic with Dr. Ghosh, who will be my adviser, he said I shouldn't get sidetracked by that kind of problem, because it's not unusual for a research paper to be limited in scope. In fact, he recommended that I deliberately limit the scope to around five news sources, so that I have interesting things to say about just articles from those sites. I was thinking of picking three somewhat "mainstream" media sites (for example, NY Times, Washington Post, and CNN); then pick a liberal feed (perhaps Daily Kos) and a conservative feed (Fox News? Washington Times? WorldNet Daily?) to compare against.

Saturday, April 14, 2007

Sleep GOOD! Coffee BAD!

What's on the agenda tonight:
  1. Go out to dinner with Ginny and Ben.
  2. Come home.
  3. Bed at around, oh, 7:30.
Yargh, Saturday afternoons in class are the worst. All homework is turned in, all the recent lack of sleep is catching up with me, creeping inexorably through the haze of Coke and coffee, and even information that may well prove critical on the final exam next month seems utterly useless and trivial at the moment.

Sandy, sitting in front of me, is browsing shoe sales online. So I'm not the only one who is using the internet to escape paying attention to detailed explanations of the syntax of RTCTL. I'll pick it up later in the lecture slides and future study groups, at least I hope I will.

Next month, worse than two months ago, I am responsible for TWO homeworks and TWO tests. The good news is that unlike two months ago, I have four weeks to prepare instead of three; the Requirements homework is meant to be easy, and the Real-Time Systems homework involves playing with a computer program, something I'm pretty good at.

I haven't been excited about my classes this semester, but I seem to be doing well in them based on a slew of returned assignments where I beat the class average. I may get some more A's under my belt. Next semester I'll be taking a summer topic on Web Server programming. There's something I should have learned a long time ago.

In other news, I spoke with Dr. Ghosh (my old data mining professor) today, and he likes my idea for a Master's Thesis. I will soon post an update to Operation: Help me with my thesis. I want to thank everybody who contributed ideas in the comments; your feedback was very valuable and helped me come up with the germ of a topic. It needs a lot of fleshing out still, but Ghosh is sufficiently interested to be my adviser, and he told me he'd put me in touch with some of his former students who work at Yahoo and know how to do the kind of text-spidering that I'm going to need to start doing in the coming few months. More details later. In any case, it can't hurt to have contacts at Yahoo, since this is a topic of interest to me.

Funny story about lunch today. We get an hour between classes for lunch time. A group of people decided to head for a new California Pizza Kitchen that had just opened. Well, that was a mistake. The place was packed and slow. We didn't manage to leave until ten minutes after class had started. We got our pizzas to go, but one person (not in my class) grabbed the bag and took all the pizzas. I met him during the first break, but it was after 2:00 before I got to enjoy my barbecue chicken pizza, at which point it was lukewarm. Still pretty good though.

Yawn. Still going to be a long two hours. Okay, Dr. Mok says that the rank of several nodes in this graph is infinity, because you rank it by the maximum path length and you have the option of going into an infinite loop. Yeah, whatever.

Thursday, April 12, 2007

Glug glug

I hate to say it, but I'm becoming quite addicted to bottled vanilla Frappuccino. During the weeks when I've been regularly staying up late on homework, such as this week, it has been my caffeinated beverage of choice. My dad has been brewing his own coffee every morning since I was a kid, so he's kind of a connoisseur, and I bet he'd be disappointed in me. I have simpler tastes, though: you buy the bottle and you drink it.

It's not a very frugal choice compared to, say, Mountain Dew. But it is both cheaper and easier than actually going to Starbucks or Seattle's Best down the street and buying something from them.

Of course, as everyone knows, Starbucks is evil. I guess I should start feeling guilty now.

Wednesday, March 21, 2007

Operation: Help Me With My Thesis - episode 1

Well, here we go. In about nine months, assuming everything goes well, I will be the proud bearer of one Master's Degree in software engineering. And it's time to start thinking about... (cue the sinister music) the Master's Thesis. It's not due until November, but I've seen hollow-eyed fellow students rushing to get it done in their last few months while simultaneously studying for finals and doing class projects. Based on the stress levels I've already experienced, this is not for me, so I need a topic and a start ASAP.

Here's my plan. I really liked my course in data mining, so much that I've been planning for a while to ask Dr. Ghosh to be my adviser. He says he's very busy through the summer, but we can meet in May and get me started. So basically, that's how long I have to really start fleshing out an idea for a project that involves data mining... something.

As I've mentioned before, I'm very interested in the whole Web 2.0 paradigm. People-powered encyclopedias. People-powered politics. People-powered news. People organizing the internet. And oh yeah, blogs. All those blogs.

All those people are generating literally tons of data, which I'm sure needs to be mined in some new way that hasn't been tried before, to figure out some new and surprising bit of internet psychology. I don't know what that is yet. My idea right goes something like this.

Step 1: Web 2.0
Step 2: Data mining
...
Step 4: A completed master's thesis

I think I may be missing a step, so help me out! What could be more fitting than to ask for a people-powered topic? Post a comment, leave a suggestion. If you know people who do work in web 2.0 or mining or are even interested in those topics, please mail them a link to this post. The future of the free world may depend on it!

Well, not really. But I'd sure like to graduate.

Monday, December 18, 2006

Worst part of being back in school? The nightmares

For many years after I got my Bachelor's degree from UCSD, I had nightmares about being back in school. But I haven't had them for a while... until this weekend. Now they seem to be back with a vengeance. Oh joy.

So I'm in class taking a final exam. The final exam has a very weird format: there are two questions, and you get ten minutes for each of them. Not twenty minutes for the test, but you actually are given one question, then you turn it in at the ten minute mark, then you are given the other question. Furthermore, the questions themselves are pretty ugly. You have to write code, on your paper, without a computer, and it has to compile and run correctly when the professor types it in later. For you non-coders, I should mention that writing code that runs perfectly with no testing is not a skill many normal people have, even very experienced programmers. It is often largely a matter of luck.

A few minutes into the test, I have written one line, and suddenly I lose a contact lens. I go to the bathroom, and for some reason I cannot get it back in for a long time. When I get back, the test is over.

I plead with the professor: come on! This was beyond my control! I need more time to finish! The professor finally says, "All right, you can have four more minutes to finish both questions."

One minute in, I wake up. I immediately panic: No! I can't leave the classroom! I have to go back to sleep and finish the test! It takes me several more minutes to calm myself down and convince myself that the test was not, in fact, real.

Friday, December 08, 2006

Pre-post-mortem on Fall 2006 semester

I'm sitting in Mobile Computing with one hour to go before my first year of grad school officially ends. I realize I am being unkind to my fellow students by blogging while they give their extremely important presentations that they worked on so hard. Well, sorry students. I want to go home, spend time with my family, watch some movies, and then maybe gird my loins for some early Christmas/Hannukah/Solstice/whatever shopping.

In my humble estimation, my class projects both turned out pretty well. As in Spring, I'll post the term papers on my web site in a few days or so. One of my projects was about writing a distributed program to calculate whether very large numbers are prime. (For more information, the basis of our project is www.seventeenorbust.com) I wrote an entire peer-to-peer application from the ground up in Java, which is a very cool thing to know how to do. My other project was a neat little graphics program -- I often miss writing graphics -- which simulated a network of sensors that can detect when a car drives past it. Since the sensor network is fun to play with, I would like to turn it into a Java applet and post it on my project page, but that will take a little work to convert.

I was quite proud of my 4.0 average through the summer, but I predicted that it wouldn't last and I think I'm ready for my prediction to come true now. I won't be completely shocked if I pull an A in either of these classes, but if I do then it will be by the skin of my teeth. Distributed systems was HARD, and while I studied for the final as much as I could, I know there was one question that I completely botched, and a few others that I struggled with. As for Mobile Computing, about 40% of my grade hangs on my performance in three quizzes. I screwed up the first one badly, did well on the second, and mediocre on the third. So I think my performance there is a bit below average. I'm going to guess that I'm getting both B's, and I'll be happy with it. I've honestly never been a straight A student, and I think I'm just satisfied with the fact that I got in here and am lasting.

Next year will be tougher, because I have to write a Master's Thesis while still taking the same full course load that I did this year. Fortunately, there are two classes which I've deliberately lined up, one per semester, which people tell me are easy.

The last day of class is always excruciating, because I've finished a grueling month of work and I frankly don't care that much about other people's projects. Unfair, maybe, but they probably don't care about mine. Some of them are somewhat interesting as explorations of side topics we covered in class, but the problem is that they're explained by computer science grad students who, as a whole, are not known for their public speaking abilities.

There are a few happy exceptions, and I like to believe that I am one. I try to begin or end on a good joke and scatter in some pop culture references, and I often throw in some wacky things in my slides just to keep people awake. I know they'd rather not be there, but I try to make it as painless as possible. Video game references are often a winner in this crowd.

Oh, while I'm on the subject of slides, let me say a few words about Powerpoint presentations. I'm pretty much a Powerpoint novice, but in the last year I've worked on four presentations and observed way too many presentations by others. Here are my words of wisdom, limited in experience as they are:
  1. Please oh please don't include large amounts of text on your slides. I don't want to hear you recite things straight off the slide. The bullet points in your slides need to be short, punchy, and highlight what you are saying, rather than repeat it. See, the thing is, I am not reading your slides. I am glancing at them to see if they say anything I need to know beyond what you are telling me.
  2. Give me pictures! We're writing computer programs; if you can't show me how your program , I want to see screenshots. Or diagrams. A picture is worth a thousand words, you know, and if I can visualize what you're talking about then I might be more eager to know how you did it.
  3. If your slides get your point across enough, you don't have to switch slides every 30 seconds. If your going to be talking about one major theme for three minutes, one slide that captures the central issue and hits the big topics can sit on screen for three minutes. Unless you want to break it up with a picture. Did I mention pictures are good?
Anyway, I have no plans tonight except to go home and relax. I have about six weeks till my next class starts. Yay! Six weeks of NOT thinking "Can't relax... must do homework..." Going to the office is going to be a piece of cake without school hanging over me.

By the way, this last guy who is talking is doing everything right. He has pictures, he's explaining what they're for, he includes minimal terminology on screen to identify the important development issues, and he even made a silly analogy to explain the issue he tackled.

Ten minutes now! Freedoooooooommmmmmm!