So I haven’t died (surprise!), instead I seem to have enrolled on yet another online course. Which is already making me laugh my ass off. The course is on Data Science and the first assignment is to analyse the sentiment of Twitter messages. Or in plain English, have a look at a shitload of tweets and try and work out what kind of mood the person was in when they wrote it. Luckily the course gives you some nice things to use like dictionaries of emotive words and stuff, so a lot of it is just boiled down to programming.
Which is where my lols come in. So far I’ve managed to follow the assignment and get some sample data from Twitter (by leaving a pre made python script provided by the course running on a virtual machine), and then sort through the data. At the moment I’ve got as far as writing a script that looks at each line of the sample data file and decides whether or not the line is even a tweet (there’s a lot of garbage in the file that ISN’T tweets, so if I left it in then the rest of the program would error and go “nope, fuck that shit”) and whether or not the tweet is in English (because the assignment is going to be graded by the course admin running their own pre cutdown sample data with the garbage and any tweets that aren’t English removed, because non-English tweets and dictionaries of emotive English words do not play nice.). So when my program does find an English language tweet it prints out just the contents of the tweet and not a shit tonne of metadata like who posted it, when, what language, blah blah blah (yep, all of that metadata is ALSO in my sample data.)
So what set me off on a laughing fit? My sample data is an odd mix of stuff that is actually quite important:
"Thousands are without power in Washington County, but it’s mostly Beaverton area, no Tigard or Tualatin…
"Oriya Organics, LLC Voluntarily Recalls Oriya Organics Superfood Protein Medly Containing…
And stuff that isn’t even remotely important (in the grand scheme of things):
"im a fucking zombie"
"why did I wake up so fucking early"
And the eternal mystery of…
"Who put 20p in the grunion machine"
And before anyone asks, no this does not mean I work for the NSA, CIA or one of the other alphabet agencies, this shit was coming from Tumblr’s public stream and anyone could have seen these tweets by sitting around watching their monitor. Right, think I’ll leave the assignment for today. Can work on sentiment scores tomorrow.