Make A Lifetime Spotify Wrapped With Python
Make A Lifetime Spotify Wrapped With Python
Published: December 17th, 2022 | Updated: March 4th, 2024
Published: December 17th, 2022
Updated: March 4th, 2024
Powered by AWS Polly | 0:00/0:00
Intro
As a data and music enthusiast, seeing my Spotify wrapped is one of my favorite times of the year. The major downside is that this tradition happens once a year and only tells the story of the past year. What if you could create your own Spotify wrapped for the lifetime of your listing history that is not just limited to five artists and tracks and see how many times you have streamed a specific album, artist, or track?
In this article, I will walk you through the steps to create your own Lifetime Spotify Wrapped! You will learn to:
1. Download your extended Spotify data
2. Transform your lifetime streaming history into a streaming log
3. Use the Spotify API to get album and artist data
4. Write an algorithm to count your top artists, albums, and tracks
Once you complete these steps you will be able to take a deep dive into your Lifetime Spotify Wrapped across several Excel files.
If you just want to see how I did it, the entire code can be found here.
Download Your Spotify Data
Downloading your Spotify data is easy but it takes time to receive it. Spotify says it takes up to 30 days, but in my case, I got my data in about two weeks.
First, go to your Spotify Account Overview. Next, click Privacy Settings on the left. Then, scroll down and you should see the following:
Select Extended streaming history and click Request data. Hopefully, within the next two weeks, you will receive an email like this:
Click download and the file will download as a zip file with the name my_spotify_data.zip. Once you unzip the file, you should see a folder, MyData, and in that folder, you should see many JSON files. I only used the JSON files that have Audio in the file name because they contain the streaming data I need. I moved the other files, one with Video in the file name and a PDF to another folder.
This wraps up retrieving your Spotify data.
Creating a Spotify App
I used the Spotify API to retrieve URIs for artists. URI stands for Uniform Resource Indicator. Every track, album, artist, playlist, etc. on Spotify has one. I used URIs to count everything in this project. I use the URI from a track to get the URI for an artist. In addition to this, I retrieved the URL for the image for every artist and album I have streamed. I used these URLs in my Spotify Grid project which you can read about here.
The first step is to create an app. To start, navigate to the Spotify for Developers dashboard. You can log in with your Spotify account. I have three apps in my account.
You can use what I have below or change it up. Just make sure to keep the redirect URL the same. Scroll down to accept the terms and save your app.
Next, you need to get your Client ID and Client secret. Click settings on the homepage for your app to navigate to where the keys are.
Click View Client Secret to view your secret key. Copy these values and store them for later. You can rotate your client secret if needed.
You might need to authenticate your app the first time you use it. This will open a new URL and all you have to do is click "Accept" and you are good to go!
Accessing The Spotify API
To access the Spotify API, you need to use your Client ID and Client Secret to request an access token. This token will enable you to make requests to various endpoints within the Spotify API.
ETL
To transform data in the JSON files into usable data, I will walk you through a simple ETL algorithm to copy the data needed into a DataFrame for later use.
Here I am concatenating all the JSON files from the folder I had you create earlier. Line 13 removes streams that are less than 30 seconds. I count a true stream as one that was for at least 30 seconds. You can modify this value to be whatever you like, it is in milliseconds (seconds · 1000). Line 15 removes any streams that are podcasts. Podcasts do not have a URI, so any stream without a URI can be dropped.
Here I changed the timestamps from UTC to EST. Change US/Eastern in line 1 to your timezone if you like. Line 10 will allow you to filter to a date while line 14 will allow you to filter from a date. Once the timestamps are changed, your Spotify Streaming Log will be exported as an Excel file. Your total minutes streamed, as a number, is also printed. This is done by summing your total milliseconds played.
Here I looped through all the URIs (tracks) I streamed and counted the URI with the most consecutive streams. This URI corresponds to the track that I streamed the most times in a row.
Here I created a new DataFrame that will hold the data for each stream's track name, album name, artist name, and track URI. The URI column in this DataFrame will be used to get album and artist URIs and then calculate top tracks, albums, and artists. I also checked each track, album, artist, and URI to see if it was None, if so, that stream was dropped. Finally, I replaced 'spotify:track:' in each Spotify URI.
The Spotify API
Now that you have your lifetime Spotify data, you can start to analyze it and get some insights. First, finding top artists. To do this, I counted the URI for each artist. I opted to do this because some artists might have the same name and I wanted to be as accurate as possible here. You might ask how could I have done this without the URIs for every artist. Well, I used the Spotify API to get the artist URI for each track using the track URI. I am going to walk through that process now.
I needed a unique list of the tracks I streamed. With this list, I made calls to the Spotify API for each track, to get its corresponding artist and album URI, as well as get the URL for the album's cover image. I added the track URI, artist URI, album URI, and album cover art URL to a new DataFrame. I pulled the track URI because that is the column I merged this DataFrame on later on to be able to calculate my top tracks, albums, and artists. I used the album cover art URLs in another project of mine, Spotify Grid, to make a collage of my top streamed albums and artists. I also added a try-except block in cases where tracks that are from albums with Various Artists do not have an artist URI and there is an index error due to a nonexistent indice in the list. This allows these tracks to be skipped and the code to continue to run without stopping due to the index error.
Now with a unique list of the tracks I streamed, I could make requests to the Spotify API with track URIs and retrieve the information I discussed. In my case, I had 6326 unique tack URIs yet the Spotify API will only take a max of 20 at a time using a bulk request. To handle this, I created sublists with 20 URIs each and make calls to the API for 20 track URIs at a time. Another constraint was rate limits. The API will only allow 30 seconds of requests then it times out. To handle this, I set a timer, and anytime the timer hit 27 seconds, I used a three-second buffer to be safe, I paused and waited 32 seconds, again using a two-second buffer to be safe. During testing, I did not do this and some of my API keys were locked for hours on end due to exceeding the rate limit, so I added the buffers to be safe. This process took about four minutes but it all depends on your internet speed and how many unique tack URIs you have.
I repeated this process over again but now I passed in lists of artists to get the URL for their Spotify profile image. I used these images in my Spotify Grid project. This time, however, my sublists had 50 unique artist URIs because this endpoint's max was 50 unique artist URIs per request. I also made a dictionary to store the artist URIs and artist image URLs to map back to the DataFrame.
Now that you have all the data you need, you can map the artist image URLs using the artist URI to the track_artist_album_df, the DataFrame you made in the first request to the Spotify API, and then merge the track_artist_album_df with the main DataFrame, cleaned_df to create merged_track_artist_album_url_df.
Analyze Your Data
The method I used to compute top tracks, artists, and albums was to count the URIs for all three entities. The function get_top_100() will count each entity and make a new DataFrame for each entity.
There is one caveat. The same track might have more than one URI. How is this possible? Let's say a track is released as a single and then included in an album sometime later. Even though the tracks are the same, each instance will have a different URI. This needs to be accounted for.
To account for this, I used a Pandas groupby, e.g., a Pivot table from Excel to make Dataframes for each unique track. Before this, I replaced "feat" with "Feat. I noticed that some track names were updated to have "Feat" when once having "feat" to denote a feature so this string replacement streamline track names with features. Once these Dataframes were created, I looped through them and found the URI that appeared most often, max_uri, and set all URIs to max_uri. Now, it is possible to count the number of times a URI appears and know that all unique tracks have the same URI.
Now you can call the function get_top_100() and export three Excel sheets with your top tracks, artists, and albums.
Final Thoughts
I had been looking for an online site that could provide me with my stream counts for specific tracks for a very long time. Once I accepted it didn't exist, I looked online for any custom methods. I found none. I took it upon myself to create this project for my love of music and programming.
I created another project that makes a grid of your top 49 most streamed album covers and artists. Both these projects use the DataFrames you made in this project to create a unique grid. Check it out here.