Sunday, September 12, 2021

Music Listened To By Year Written

 I have been scrobbling my listening with Last.FM for the past decade. Now I want to do some analysis. I can see simple results such as most popular tracks, albums and artists over recent time frames. However, that is not enough.

I want to see what type of music I'm listening to.  I typically discover an artist I like, and then binge on their entire back catalog. I'd like to know what "year" I am most commonly listening to.

The first question I'd like to answer:

1. In what year was the music I'm listening to released?

The first approach would be to identify the year that each song was written, then produce a simple histogram to aggregate everything, showing each year and the songs by year. A pie chart showing the percent of scrobbles by year written.

The first challenge is getting the year written. Last.fm doesn't include it in the scrobble data. I saw a reference from Redit on one approach at https://zvum21s15hqhnszxvzy48w-on.drv.tw/Personal/last.fm/ This will take the top listened to albums and do a simple aggregation by the years specified.  It grabbed the most listened to albums, then looked up the musicbrainz id to find the year. It is somewhat limited by the rate limiting of the musicbrainz API. It also often displays the date of a remaster, rather than the original release. This is not quite what I wanted, but it did provide some inspiration.

I did look up listenbrainz. This has an alternative to Last.fm scrobbles. I've uploaded my data. However, it doesn't seem to merge.

I tried some manual MuiscBrainz look ups by name. The catch with that is that there can often be pages of results.

MusicBrainz does make all of the data public. There are full instructions on how to set up a local postgres database for the data. However, this looked like a fairly daunting task.

They had an alternative form of data as a JSON dump. (http://ftp.musicbrainz.org/pub/musicbrainz/data/json-dumps/) This seemed like an easier approach. However, the file was 10GB compressed. Uncompressed, it is one 177GB file. It is technically not a valid JSON file. Instead, each release is a single line of valid JSON. Grepping through it for information is possible, but painfully slow.

The files did have a lot of information that I did not need. I tired to do some shell scripting to reduce, but that ended up running into encoding errors. As an alternate approach, I wrote a simple JavaScript program to read each line and extract just what I wanted: album name, artist name, original release, track names and track release dates. This reduced the file to only 2.1GB of JSON.

This gives something that is greppable with:

cat smallerBrainz.json | grep "Unforgettable Fire" | grep U2

This sort of works. However, these one produced 323 results. It also doesn't distinguish between song and album titles.

Next step is to make an "album only" version.  

To be continued.

No comments:

Post a Comment