Monday, December 30, 2013

Which radio station really has the best variety of songs?


Listening to the radio


Years ago, I listened to the good ol' FM radio, and I pretty much had something playing 24x7 (as long as nobody complained about the music). Although, for some reason my "preferred" stations would always end up becoming acquired by another entity/competitor/whatever - like WLOL and The Edge.


Eventually, I grew tired of the seemingly "limited music variety" that was available by the usual radio stations, so I usually prefer to stream something (still filled with commercials) from Spotify, Pandora, and other sources, where I have a little more control over the variety (although even these providers seem limited in their breadth of variety)

Radio Tower
Since it's completely unfair to blatantly claim that the local terrestrial FM radio stations have a limited/repetitive song catalog, I figured I'd "be scientific" and actually collect data on what the stations are playing, and then visualize/graph the information to see what would show up. 

There are probably better ways to spot trends within data, but I prefer to view (and explain) things using visual means, so my approach is to simply present things, and "just see what happens".




Accumulating the data


I figured it'd be easy to gather the data, since most of the local station's web sites provide the "currently playing" song data, and if sites like tunein.com and radiosearchengine.com can provide the information, I should be able to figure it out too.

I started with 93X (KXXR), which seems to be the only "current rock-music station" in the area (yeah, pretty sad). I went to their website, and then fired up the developer tools to identify the means used to retrieve the "currently playing" song information. At the time (Dec 2013), it uses a jQuery JSONP call to retrieve the information, which is a decent workaround if the data provider doesn't support CORS

All I had to do was visit the station's website, clear the "Network" tab after everything had finished loading, and then wait for the next song to start. Eventually, entries would appear as the web page would poll the currently playing song, and then update things when needed.

I coded up a simple bit of JavaScript code using Node's http get routine and configured it to poll the website every so often. After a little while, my data collector stopped working. I pointed a real web browser at the station's website to see what the problem was, and I was presented with a "You've been blacklisted" type of message from the station's hosting provider!  I thought it was a tad ironic that a station that presents itself with a "bad-assed barely legal" veneer image had declared that I was too deviant to be allowed access to their website's information. 

My first thought was that they're trying to pretend that they can restrict access to that data in the same way that Major League Baseball tried to "copyright" facts and stats in order to deter the Fantasy Baseball leagues. Eventually, I figured out that it was a simple DDOS type of filter, and that all I had to do was make my data requests look "more like a real web browser".

One of the (many) great features in Chrome's Dev Tools is the option "Copy as cURL". After a few days, the blacklisting automatically cleared (or maybe my outgoing IP address changed), and I was able to access the station's site again. I opened Chrome's Dev Tools once again, and this time I copied everything that the original request used.

As I expected, the request was full of header settings, referer (yeah, it's misspelled), and cookies. I slam-dunked most of the pieces within that request into my "defined stations" database/collection, then hooked up the wonderful node-curl npm module, and unleashed my data-collection code upon the information. After a few hours, the data collection seemed stable, and hadn't been blacklisted. Mission accomplished!


Deploying the app


As difficult as all of that coding seems, deploying an app is actually the most difficult part (I keep planning on writing up my thoughts, explanations, and experiences about that topic, but it hasn't happened yet). 

When designing/developing an app, you need to have the end-goal in mind from the beginning, or it's going to be a complete train wreck. As a result, when I started imagining up this experiment, I specifically had NodeJS and MongoDB in mind so that I could deploy/use OpenShift (RedHat) and MongoLab.

The team over at OpenShift have a guide to help get started with a new NodeJS app, although I chose not to use their MongoDB cartridge, and preferred to manage the database myself. I eventually discovered that there's a quickstart guide for using OpenShift and MongoLab together, but I had already had things working - figures.

I personally love the PaaS, which is basically a "more evolved" form of a web-app container (I've been doing Java Apps since 1997). When I started deploying Java apps "to the cloud" years ago, I was disappointed in the amount of "wakeup time" that was required after hitting an app that had "gone to sleep"  (If you're wondering, Google's App Engine seems to have the fastest/best "wakeup time" for Java-based apps, but maybe that will be another "research project" someday).  Anyway, from my experience, NodeJS apps seem to have an extremely quick wakeup/response time (because humans should never have to wait for a computer).

Getting the app deployed was a breeze (due to the straightforward git-based interface used by OpenShift). I eventually coded up a simple REST-based interface so that I could peek on the collected data without having to dive into the MongoDB shell/console.

After a few days, I discovered that there were times where the data stream would contain entries like "Song information not available", which I obviously didn't want to include within my collected data (along with other "glitches" in the data stream). So, it took a few iterations of refinement and data scrubbing/cleansing in order to get a "clean" set of song information. 


Looking at the data from a different view


After a few days of collecting data, and improving the process (which is still ongoing), I wanted to get a better look at what I had accumulated.  I've been wanting to do "more sophisticated" graphs with the D3 library, because everything I've done with D3 thus far was akin to printing simple drinking straws with a Stratasys printer

I had an idea of how I wanted to "see the data", and after lots of searching, I discovered a graph called a "Cluster Dendrogram" with an example, which appears to use a data-protocol similar to Flare.  I updated my NodeJS code to provide a data-feed similar to what the graph expected/used, and had things working pretty easily (which is freakin' amazing, because that almost never happens in real life). 


After a few more iterations (and deployments) to my OpenShift instance, I was able to generate a graph of the songs played in the last 48 hours by the radio station - and as I had guessed, there is a small number of artists/bands that compose a "significant portion" of their song catalog.


The graph seemed interesting enough, but data isn't very meaningful unless you have a baseline or some other means of comparing things. So I used my existing JSON/cURL based collection engine on the websites for KQRS (which is a "classic rock" sister station of KXXR, and thus was pretty easy to set up), and the station KDWB (yeah, I know), which was a bit more challenging due to the way their JSON data feed/response is structured.

After a few more late-evening sessions (after the kids were in bed) of coding, tweaking, and deploying, I was able to obtain more graphs of the song selections for these stations.  

The station KQRS shows a "wide variety" of songs, because each song in the graph is pretty much played just once. 



The station KDWB is on the opposite end of the spectrum (being a more Top 40 type of format), and thus shows a very small number of artists/songs each with a numerous amount of broadcasts within the time period.



If you want to see the entire graphs in real-time, you can find them here:
http://noderadio-panurgy.rhcloud.com/graph.html

March 4, 2014 - deployment update - I've noticed that the RedHat/OpenShift instance tends to go to "sleep" after an unspecified period, which kills the radio station polling. I've also had ongoing issues with the radio station's ISP blocking my app due to suspected "DDoS Activity". So, I tweaked the code and made it work on OpenShift and CloudFoundry, and then deployed the app to a few other PaaS providers:

AppFog (CloudFoundry): AppFog (using AWS): http://noderadio.aws.af.cm
Updated July 1, 2014 - AppFog no longer supports instances on the HP Cloud, moved to AppFog on AWS/East.

IBM BlueMix (CloudFoundry @ SoftLayer): http://noderadio.mybluemix.net/
Updated July 1, 2014 - IBM BlueMix domain names have changed, now that they've officially launched the service.

Yet another update - Nov 27, 2014: Tried out Heroku's integration with DropBox and deployed an instance to their service, which increases the resiliency of the application now that it's running on yet another (free) PaaS:  http://noderadio.herokuapp.com

A nice bonus about this is that it's super easy to "scale wide" and provide capacity (and redundancy) by deploying the same app/code to multiple services.

Conclusion or consensus ?


After watching the graphs evolve, it definitely appears that stations whose format is biased towards the newer music tend to repeat songs more frequently, and thus "have less variety". On the other end of that spectrum are the stations whose format is biased towards classic (or older) songs, and thus can draw upon a wider variety of songs (and hence, less repetition).

So yeah, I basically spent two weeks of evenings and weekends to "prove" something that was pretty much "commonly known". Fortunately, my actual goal was to become more familiar with NodeJS, OpenShift, MongoDB (and its "schemaless nature" - which does not mean that the data isn't  organized), MongoLab, and D3 - and in that regard, the experiment was a huge success.

I guess I'll stick with Spotify, and songs like BT - Skylarking, which is great background music for coding (and I love night-time long-exposure photography). 

Add a comment below if you happen to have a favorite song/artist/station for coding music!

P.S. I finally committed the code out in a GitHub repo: panurgy/noderadio. Check back for updates, or follow me on Twitter

Wednesday, July 10, 2013

Anticipating Adventurrito!

This is probably old news by now, but if you haven't already heard of the Chipolte Adventurrito contest, they're trying really hard to keep things "under wraps"  (groan).

When I visited their website, my unyeilding JavaScript geek had to know what kinds of frameworks and libraries they were using on the site (jQuery and SWF - yawn). As I was examining things using Chrome's Developer Tools, I noticed that there was a line of JavaScript that effectively gives away all of the names of the 20 daily challenges.



If you're interested, here's the list of puzzles:

    "id":1,"name":"Pig-spiration Point"
    "id":2,"name":"Ye Olde Mappe Shoppe"
    "id":3,"name":"The Dine-in Hall"
    "id":4,"name":"CMG-TV"
    "id":5,"name":"The F.Y.I. Freeway"
    "id":6,"name":"The Live-stock Exchange"
    "id":7,"name":"A Quiet Mountain Town"
    "id":8,"name":"Open Pasture"
    "id":9,"name":"CHIP Radio Tower"
    "id":10,"name":"The Shady Internet Café"
    "id":11,"name":"Cow Country"
    "id":12,"name":"Free Range Press"
    "id":13,"name":"Culinary Secret HQ"
    "id":14,"name":"Museum of Naturally Raised History"
    "id":15,"name":"Carnitas Call Center"
    "id":16,"name":"Farm and Charm School"
    "id":17,"name":"E-potle"
    "id":18,"name":"E-potle"
    "id":19,"name":"Chipotle Studios"
    "id":20,"name":"The Final Countdown"

The next question is whether they do a good job of securing the puzzles, or will their site will hand out unexpected information when the game begins?  Wouldn't be the first time I've seen that happen - ask me about the DQ "virtual scratch-off game" (years ago) that was written in SWF and sent "select * from ...." statements to the database via HTTP (not HTTPS). Just because you can't see it doesn't mean it's not there.

Stay tuned!

Thursday, April 25, 2013

Node.js on a Raspberry Pi and Tropo.com


This evening I tried out some of the Phone/Voice/SMS features available from the site Tropo.com, and I'm definitely interested in going "Back to the Future" and composing a "User Interface" based on an Interactive Voice Response and the Plain Old Telephone Service.  The APIs at Tropo.com give you two options of building your app:
  • You can compose a script or JSON file that they'll "host" on their server. This means that any/all incoming requests are processed using these rules, which means that your "app" can't do anything dynamic.
  • You provide the URL for your own web server that communicates via JSON (using your favorite server-side programming language) and it returns dynamic responses for the current conversation. This is my "preferred" route -  the only catch is that it requires a dedicated and "constantly connected" Internet web server that can be reached by the Tropo.com servers.

I had the opportunity to see this kind of technology in action earlier this month at a MinneBar session, presented by Kevin Whinnery at twilio.com, where he made it look super easy to hook up code with the phone system.  Since I finally had a few hours of "free time", I had to try it out for myself.


Lately, I've been converting my web-app development from Java/Spring/Tomcat over to Node.js/Express.  The folks at Tropo.com have provided a Node.js library, and a few samples, which makes it easy to get up and running in no time.  In a matter of minutes, I had a demo app running, and I was able to dial into my app and "talk" with it.

The most difficult part was finding a way to host it with an "always on" Internet connection. My initial thought was a Node.js PaaS provider (Joyent, Heroku, etc), but all of the companies that I could find only offered "limited duration" samples that would eventually expire after a few months, or the servers would go to sleep after a period of inactivity, which causes latency (I hate waiting for computers).



Since my Tropo/Node app needs a simple (low-bandwidth) text/JSON interface, I figured I could host it from my own home Internet connection. The obvious catch is that I would need a "server" within my house to always be on, and connected to the Internet.  I have plenty of Linux boxes that are "always on", the problem is that they're all safely hidden behind a firewall - for good reason.




The "safest" box I have to throw at the Internet is a Raspberry Pi. The next catch is that Node.js doesn't provide an ARM based binary - you have to manually compile it. This brought back memories of the Linux servers I used in the late 90's, which required me to custom compile the kernel to support the SCSI controllers that I used. Anyway, there's no shortage of helpful blog posts that provide the simple steps needed to get Node.js running on the RasPi. Helpful note: you have plenty of time to go watch a movie while the compile/build runs.

For now, I just have the app report the current time, but I have some thoughts on providing "useful information" similar to the old Moviefone service (yeah, before cell phones were ubiquitous). We'll see if I can string together enough "free nights" to make that happen!