tl;dr
This is the second generation of my (super minimalistic) graphing/visualization project, which visibly shows which FM radio stations have lots of variety (KQRS, KZJK, KQQL), and which don't (KDWB).
Check it out: https://codepen.io/panurgy/full/YzpGpap?s=kqrs
Introduction
About eight years ago, I built a data-collection system that accumulated and graphed information about the song variety (or lack thereof) across several FM radio stations around Minneapolis MN. A few years after setting that up, things changed at the various hosting providers, and the entire system fell apart (Heroku Cedar is deprecated, mLab was acquired and shutdown, etc).
A few months ago, my curiosity was rekindled about the variety of songs, so I revisited this project and updated it - using a newer generation of hosting solutions/options. Some of the code remains the same (and still mentions "use strict" within the functions!), but other parts were rebuilt from scratch.
Getting the data
The first step was figuring out the data acquisition. All of the radio stations have new websites, which changed (and broke) my song-info collection code (which essentially relied upon "screen scraping" the info from the station's website).
Rather than writing and deploying server-side code, I used RunKit to create an assortment of individual endpoints, which obtains and returns the "now playing" song information for a specific station. Every station provides their info a bit differently, and fall into three general categories:
- HTML data - the station's website sends a HTML string, which contains DOM elements, and the song info is buried within it. The npm package cheerio works great at parsing out the info.
- JSON data - the station's website returns a JSON string, which is super easy to parse/use. Most of the stations use this format.
- WebSocket - the station's site opens a WebSocket, and uses an "ask/reply" protocol that responds with a JSON string.
Here's the list of currently supported stations:
- KQRS - KQ92 (JSON): https://runkit.com/panurgy/kqrs-now-playing
- KTIS - 98.5 (WebSocket): https://runkit.com/panurgy/ktis-now-playing
- KDWB - 101.3 (JSON): https://runkit.com/panurgy/kdwb-now-playing
- KZJK - Jack 104.1 (HTML): https://runkit.com/panurgy/kzjk-now-playing
- KQQL - Kool 108 (JSON): https://runkit.com/panurgy/kqql-now-playing
Storing the data
Running the data collection
- The trigger uses Zapier's Webhook integration, which calls one of the Runkit endpoints that I created
- The filter step discards any data that's missing the song/artist info, which indicates the station is playing advertisements
- The final step saves the data into the correct collection
Rebuilding the front-end
- Converting the query - in MongoDB, querying is pretty easy, the database call passes over a "fairly simple" JSON object that contains the search's settings. In Cloud Firestore, the JSON object is a "bit more complex" and requires a Structured Query.
- Reading the data - in MongoDB, the query returns an array which contains the documents/objects from the database (thus the objects received match the objects in the database). In Cloud Firestore, the documents aren't "simple JSON", but rather a more detailed format which contains lots of meta-information about each of the document's fields/data-types. Fortunately, StackOverflow had the answer I needed, to convert those document objects into "plain objects".
Viewing the results
Next Steps
- Switching from Zaps to Clouflare, for faster sampling intervals
- Setup some metrics with Datadog to monitor the data-collection workers
- Possibly setup Sentry.io error logging when things break/fail
- Update and clean up the code, and rearrange it into something more polished
Conclusion
Check it out here: https://codepen.io/panurgy/full/YzpGpap?s=kqrs