Thursday, July 07, 2005

I (heart) Statistics

I just love 'em. People have a basic distrust of "statistics" because they think the term means graphs and charts. The truth is that "real" statistics are matrices of data, and it's up to an enterprising individual to figure out their relationship.Graphs can easily be made deceptive, but raw data less so. And I love data.

Electoral-Vote was a godsend during the election, because there was so much data! So I crunched the numbers on my own. As early as August, it was clear that Bush was going to win the popular vote (though possibly not the electoral vote). Why? Well, further analysis revealed that red states gained population more than blue states, so there was a red-state advantage in raw voter population.

There was also my short-lived jaunt into selling for money on eBay. I sold cards from a new Collectible Card Game called "Call of Cthulhu,"because I was lucky enough to find a wholeseller willing to sell to individuals (most sell only to retail shops), allowing me low enough prices to make a profit. But I couldn't be sure until I ran the numbers on eBay. So I grabbed the data on what cards were selling for, and used a propperly weighted random number generator to determine how many cards of various types I would get buying X boxes, and put in my various expenses, and got a positive number! Plus, by keeping up this bookkeeping, I was able to anticipate when I wasn't going to be able to make a profit anymore (because statistical trends let me see the writing on the wall before I actually started to loose money), and got out while the getting was good. All thanks to math!

What I'd like to do is figure out a way to grab data from the Internet Movie Database, and use it to anticipate career trends. Who, for example, automatically guarantees a movie will suck? We all know certain major indicators of suckage (Rob Schneider, who is the epitome of tastelessness, and Charlie Sheen, who seems cursed to be in movies that suck despite having no glaringly negative qualities himself), but it's surely more interesting than that. So the idea is to grab the list of movies a person's been in, grab the IMDB user rating (accumulated from oodles of votes, giving a populist sense of its quality), and graph the results weighted to the person's casting prominence (i.e., if the movie sucked, but the actor had a cameo, it shouldn't count against them as much as if they had top billing). My guess is that, for example, Robert De Niro would show a marked dropoff in recent year, as compared to the glory days of Scorsese films. But I might be wrong. You could also, to a certain extent, rate the movies based on their score at Metacritic or Rotten Tomatoes, and get the "critic's choice" sense of it as well.

Why go to all this effort? Because I love statistics, and this is unanalyzed data! It's like a mountain waiting to be climbed! Which brings me to the annoying part about statistics: getting the data. I can't strees how important getting the data is, but it's usually boring gruntwork. I guess I'd say that collecting the data is like climbing the mountain, while analyzing it is like sking down it. Only on skis made of math!

So lacking a webcrawling dohickie capable of fetching data for me, I'm left analysing statistics at my job. For example, I once worked in a call center, and like most soulless environments, we were compelled to compete for no reason, and were given a list of everyone's performance on a variety of factors. Out of sheer boredom (did I mention the soullessness?), I figured out, with the rudimentary tools on hand (i.e. the Windows calculator), that there were certain subtle but significant correlations between a person's average call duration, overall break time taken, overtime, and the likelihood that that person would be evaluated by management. This seems like a trivial bunch of numbers, but people lived and died by whether they were evaluated. Some people wanted to be evaluated badly (so they could jockey for the tiny number of slots you could be promoted into), while others wanted to be evaluated as rarely as possible (because they had total disregard ofr management and wanted to slack off as much as possible). When I told my coworkers how to increase/decrease their chances of being evaluated, they looked at me like I was clairvoyant. The truth is that I knew (a) how to get an average from a list of numbers, (b) how to draw plot dots on piece of grid paper, and (c) how to connect dots. Not rocket science.

The moral of the story is: take a statistics class while you're in school. If your instructor isn't a moron, and you can learn to use the free, powerful, and somewhat intimidating stat app called R, you, too, can amaze your friends, outwit your enemies, and waste huge amounts of time learning the subtle relationships between things!

Not all math is fun. But statistics is, because it gives you knowledge, and knowledge is power.


Post a Comment

<< Home