From the (Undergrad) Vault: How Spotify’s Use of Deep Learning Algorithms Allows for a Personalised User Experience

NB: This post is unedited from its initial submission aside from separating the paragraphs to be smaller. I did not voluntarily go into debt for this essay just be deleted from my laptop. I will at least have those sources that were hidden behind paywalls that the university paid for be put to good use! Also I would actually recommend looking up deep learning algorithms because it’s a very interesting topic and incredibly relevant to today’s media climate.

Spotify is an on-demand music streaming platform that boasts over 75 million active listeners and tens of millions of songs on its platform (Jacobson 373). To maintain its status as the largest music streaming platform, Spotify aims to personalise users’ experiences (Jacobson 373). Personalisation is an important aspect of digital platforms, as it has set a ‘standard’ of the way that users engage with music on platforms such as Spotify (Chen 1).

Therefore, Spotify puts large investments of its resources into deep learning to tune the platform’s algorithms to continually improve its personalisation for users (Chen 1). Therefore, it is important to understand how personalisation has come to exist on Spotify and its potential to influence user experience.

Gulmatico argues that music consumption patterns can alter because of digital platforms, as the platform provides new song recommendations and pushes favourite performers (1). This is done via a feature known as ‘automatic playlist recommendations’ (Gulmatico 1). Spotify makes use of deep learning, a machine learning that utilises algorithms to explain data, learn from collected data, and then learn how to make decisions/predictions to complete a task to personalise user experience (Gulmatico 2).

Usually, a machine learning task is called a ‘classification,’ which refers to sorting data and making distinctions (Carah 2022). A classification is made by algorithms making a judgement around a range of data that has been collected, which are often called features or variables (Carah 2022). Then, a machine will be ‘trained’ using a ‘training data set,’ where algorithms are ‘tested’ with previously unseen ‘test data’ (Briot 983).

One way in which AI systems are ‘trained’ involve what’s called ‘deep learning’ or ‘deep neural networks’ (Crawford & Paglen 22). Deep learning is dominant in training AIs such as Spotify because it is driven by increases in available data and computer processing power (Crawford & Paglen 22).

Deep learning approaches can be supervised or unsupervised (Carah). These two approaches are used for classification tasks where it is too hard to describe variables (Carah). By making use of these two approaches, digital platforms can create ‘layers’ that add more distinctions for an algorithm to learn (Crawford & Paglen 8).

Spotify makes use of the unsupervised approach, especially for its recommendations and curated playlists (Gulmatico 1-2). This means that not only is Spotify’s algorithm performing tasks such as classification, but it is also deep learning by trying to personalise user experience (Gulmatico 1-2). These processes are shown in Figures 1-3, showcasing how songs are pre-processed, processed, and recommended, how features are drawn out, and how features are viewed by the algorithms that are deep learning.

Figure 1: Jakheliya, Bhumil, et al. “System Architecture.” Using Deep Autoencoders to Improve the Accuracy of Automatic Playlist Generation. 2020. https://doi.org/10.1007/978-3-030-38040-3_71.

Figure 2: Mounika, K. S., et al. “Diversity of Filters Within Layers.” Music Genre Classification Using Deep Learning. 2021. https://doi.org/10.1109/ICAECA52838.2021.9675685.

Figure 3: Mounika, K. S., et al. “CRNNs for Music Classification.” Music Genre Classification Using Deep Learning. 2021. https://doi.org/10.1109/ICAECA52838.2021.9675685.

Briot argues that this process is successful if it contains three aspects: (1) technical progress in which inefficiency in ‘training data’ to optimise deep learning is improved upon continuously, (2) an AI is provided with multiple data sets to tune, and (3) improving efficient computing power (981). With these three aspects being used by a platform such as Spotify, the manual time-consuming creation of making playlists for users is no longer needed on Spotify – enhancing user experience (Jakheliya 626).

Therefore, Spotify offers its users the feature of ‘automatic playlist generation’ systems, which are generated by algorithms that consider users’ engagements with the platform (Jakheliya 626). Jakheliya explains that digital platforms have created this system for their platform using two main approaches: the collaborative approach and the content-based approach (626). S

ystems embedded in the collaborative approach require user data to generate accurate results, while the accuracy of playlists generated using the content-based approach systems instead relies upon features categorised from the platform’s dataset (Jakheliya 627). Spotify, therefore, prefers to utilise the content-based approach (Jakheliya 626).

The next question to ask then is: how does Spotify’s algorithm determine the features of a song in a content-based approach system so that user experience can be personalised (Matera 1)?

Jakheliya in Figure 4 showcases the deep learning model that is to be discussed in relation to Spotify in the following paragraphs (630).

Figure 4: Joshua Gulmatico et al. “System Architecture.” SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022. https://doi.org/10.1109/ICPC2T53885.2022.9776765.

Matera explains that Spotify collects the necessary data for training sets by using its API, which allows deep learning algorithms to access metadata and musical content that Spotify holds as shown in Figure 5 to extract features to turn into classifications (1).

Figure 5: Jakheliya, Bhumil, et al. “Histograms of Features Representing Distribution of Data.” Using Deep Autoencoders to Improve the Accuracy of Automatic Playlist Generation. 2020. https://doi.org/10.1007/978-3-030-38040-3_71.

Metadata is used to link together users’ likes/dislikes and media habits to recommend similar user profiles and similar songs that the algorithm learns to believe users may like, while musical content is classified by labelling genres, artists, and more (Anderson 561).

Spotify’s algorithm labels clusters of artists and determines the acoustic properties of songs (Anderson 561). This is seen in the figure below, which shows that the function of a deep learning model is to “perform content-based filtering on Spotify Song Dataset,” which Jakheliya explains is, implemented with the help of two sub-models: Autoencoder Model and Clustering Model as shown in Figures 6 and 7 (630).

Figure 6: Jakheliya, Bhumil, et al. “Autoencoder Neural Network.” Using Deep Autoencoders to Improve the Accuracy of Automatic Playlist Generation. 2020. https://doi.org/10.1007/978-3-030-38040-3_71.

Figure 7: Jakheliya, Bhumil, et al. “Deep Learning Model (Content-Based).” Using Deep Autoencoders to Improve the Accuracy of Automatic Playlist Generation. 2020. https://doi.org/10.1007/978-3-030-38040-3_71.

Spotify then uses Gracenote, a third-party music metadata service, which determines the ‘mood’ of a song by processing audio signals into features (such as harmony, rhythm, and more) – known as data extraction and pre-processing (Anderson 561; De Quirós 354). Then, the deep learning algorithm will use the features of a song to determine which genre, moods, and more the algorithm should recommend – known as feature extraction (Anderson 561; De Quirós 353-358).

Mounika explains that transferring learning is used to “insert features of specified audios and labels from various data into an allocated space with linear transformations” as seen in Figure 8 (5). This figure shows how transfer learning can classify features to then recommend songs to users in Spotify’s deep learning progress (Mounika 5).

Figure 8: Joshua Gulmatico et al. “Linear Correlated Features of Popularity.” SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022. https://doi.org/10.1109/ICPC2T53885.2022.9776765.

Gulmatico also shows varied categories that data can be sorted into, showing in the figure below how over 170,000 songs from Spotify were collected from the API to extract four classifications from them: primary, numerical, dummy, and categorical as shown in Figures 9-11 (3).

Figure 9: Joshua Gulmatico et al. “Features Under the Dummy Category.” SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022. https://doi.org/10.1109/ICPC2T53885.2022.9776765.

Figure 10: Joshua Gulmatico et al. “Features Under the Numerical Category.” SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022. https://doi.org/10.1109/ICPC2T53885.2022.9776765.

Figure 11: Joshua Gulmatico et al. “Features Under the Categorical Category.” SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022. https://doi.org/10.1109/ICPC2T53885.2022.9776765.

Now that the data is collected, the algorithm can then be trained using data. Figures 12 and 13 below showcase the results of the process Gulmatico explains:

“In this stage, the researcher conducts some pre-processing to get and normalize the data that is needed for K-means when clustering a certain dataset. The development utilizes the Standard Scaler, a module by SK-learn under pre-processing, to normalize certain values on columns.

The range of values for danceability and instrumentality is only roughly 0 to 1. On the other hand, the range for duration and popularity can be in the millions. The data was scaled so that it was homogenous and clusterable. To find the dataset’s real major components, we’ll need to utilize a decomposition method (3)”

Figure 12: Jakheliya, Bhumil, et al. “Box Plot of Tempo Feature.” Using Deep Autoencoders to Improve the Accuracy of Automatic Playlist Generation. 2020. https://doi.org/10.1007/978-3-030-38040-3_71.

Figure 13: Jakheliya, Bhumil, et al. “Box Plot of Loudness Feature.” Using Deep Autoencoders to Improve the Accuracy of Automatic Playlist Generation. 2020. https://doi.org/10.1007/978-3-030-38040-3_71.

A ‘Principal Component Analyser’ module from ‘SK-learn’ is then used to complete the algorithm’s system, allowing two variables to be determined as the ‘principals’ so that the metadata can be clustered again for tuning (Gulmatico). The scatter plot data in Figure 14 showcases how clusters led to their classifications into different features.

Following the pre- processing stage of Spotify’s deep learning, the classifications will be sorted into a hierarchy of ‘important information such as artist and genre which are more weighted than harmony/rhythm or models such as Linear Regression as shown in Figure 15 (Gulmatico 4). Linear Regression predicts if a song will be popular based upon audio features (by answering questions such as “is this song danceable?”) by calculating a total average per audio characteristic (Gulmatico 4).

Figure 14: Joshua Gulmatico et al. “Scatter Plot of 2 Components.” SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022. https://doi.org/10.1109/ICPC2T53885.2022.9776765.

Figure 15: Joshua Gulmatico et al. “Clustering of Data Using Hierarchy Method.” SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022. https://doi.org/10.1109/ICPC2T53885.2022.9776765.

Spotify recommendations, therefore, are based on users’ preferences of features such as genres and artists, and the features of songs that they listen to the most (De Quirós 353- 358). This process also goes through ‘fine-tuning,’ in which optimisers are used to continuously improve the algorithm’s ability to learn and recommend for better-personalised user experience (Mounika 3).

The five steps in improving ‘flow’ are: extracting data, pre- processing data, model development, evaluation, and prediction (Gulmatico 2). Using these five steps the table below shows the process that four distinct models are used in datasets to evaluate the efficiency of the Spotify algorithm’s process (Gulmatico 2).

Gulmatico explains that The System Architecture shown above is a process in which data extraction goes through an efficient application, where data is placed into deep learning training sets/procedures to train the algorithm until it is most optimal (2).

The Spotify algorithm once it has processed metadata turns features into song vectors (De Boom 385). These song vectors are used to predict users ‘tastes,’ and are known as ‘tase vectors (De Boom 385). Taste vectors are an output of a retrieval reference number that aggregates song vectors from users’ metadata to represent their musical habits. Therefore, the taste vectors are used to generate song recommendations by, “querying a tree data structure for the nearby song vectors” (De Boom 385).

Therefore, Briot explains that Spotify’s recommended automatic playlists feature is based upon a recurrent neural network’s deep learning that has processed combinations of “arbitrary embeddings and features” that have been supplied with song vectors to produce a “user taste vector” to personalise user experience (985).

To maintain a positive personalised user experience, this feature, therefore, is in a constant state of tuning, as with each song’s metadata and user metadata that is processed, the deep learning algorithms need to be updated often so that recommendations can immediately reflect users’ current habits – which the recurrent neural networks do to calculate new taste vectors to generate recommendations (Gulmatico 4).

Additionally, about the impact, Spotify’s deep learning algorithms have on users, despite many datasets existing for musical streaming platforms, Gulmatico argues that there is no accurate way currently to determine the impact of Spotify’s deep learning algorithm on users’ musical habits (4). Striphas does however argue that there are three key aspects to current music habits users consistently experience: information, crowd, and algorithm (395).

Striphas explains that culture and music are intertwined, such as how digital platforms are intertwined with our daily lives, and therefore argues that currently, an algorithmic culture is developing – where new songs and popularity of music emerge not from publicness but from exposure/recommendations on digital platforms that are based upon deep learning algorithms (395).

Therefore, users need to understand how algorithms function and influence their music streaming experience to see how algorithms ‘intervene’ with ‘connections’ (Van Der Nagel 83). Connections are made by an algorithm based upon artists, mood, features, and more and algorithms take this metadata and are trained to then recommend music based upon users’ engagement and habits rather than seeking out the music themselves (Van Der Nagel 83).

Van Der Nagel states that because platforms are political (as they mediate between the decisions of owners, researchers, designers, and users that allow for data to be collected and processed), people engaging with platforms such as Spotify should be aware of how their experience is being personalised by an algorithm that’s been trained and continuously is optimising its processes (83; 86). This is especially important because algorithms on digital platforms are a part of everyday life for many people nowadays, as they are suggested music by Spotify’s algorithm (Seaver 1).

Algorithms can ‘hook’ people by providing a personalised experience to encourage more engagement, which provides more metadata to the platforms (Seaver 1). Van Der Nagel and Seaver encourage users to think critically about how algorithms are providing an easy solution to searching for musical playlists/songs, and therefore consider how their data is being used to keep them on the platform via deep learning algorithms (86; 1).

Spotify aims to personalise its users’ experiences by utilising deep learning to tune the platform’s algorithms to continually improve its personalisation for users (Chen 1). Therefore, it is important to understand how personalisation has come to exist on Spotify and its potential to influence user experience (Seaver 1).

Spotify utilises deep learning, a machine learning that allows algorithms to explain data, learn, and recommend songs/playlists to personalise user experience (Gulmatico 2). Recommendations are given by classifying features of songs’ metadata to make a judgement of what songs to recommend to users based on their listening habits (Gulmatico 2). This is made possible because Spotify’s deep learning algorithms are ‘trained’ using a ‘training data set’ using an unsupervised approach to process data (Briot 983).

Works Cited
Anderson, Ian, et al. “Just the Way You Are”: Linking Music Listening on Spotify and

Personality.” Social Psychological & Personality Science, vol. 12, no. 4, 2021, pp.

561–572, https://doi.org/10.1177/1948550620923228.
Briot, JP., Pachet, F. Deep learning for music generation: challenges and directions. Neural

Computers & Applications 32, 981–993 (2020). https://doiorg.ezproxy.library.uq.

edu.au/10.1007/s00521-018-3813-6
Carah, Nicholas. COMU3110 Digital Platforms Seminars 1-6. 2022, University of

Queensland, Saint Lucia. Class lecture.
Chen, Yu-Chia, et al. “Music Mood Classification System for Streaming Platform Analysis

via Deep Learning Based Feature Extraction.” 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), IEEE, 2021, pp. 1–2, https://doi.org/ 10.1109/ICCE-TW52618.2021.9603205.

Crawford, Kate, & Paglen, Trevor. “Excavating AI: The Politics of Training Sets of Machine Learning.” Excavating AI, MLA, 19 Sep. 2019, https://excavating.ai/

De Boom, Cedric, et al. “Large-Scale User Modelling with Recurrent Neural Networks for Music Discovery on Multiple Time Scales.” Multimedia Tools and Applications, vol. 77, no. 12, 2017, pp. 385–407, https://doi.org/10.1007/s11042-017-5121-z.

De Quirós, J. García, et al. “An Automatic Emotion Recognition System for Annotating Spotify’s Songs.” Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11877, Springer International Publishing, 2019, pp. 345–362, https://doi.org/10.1007/978-3- 30-33246-4_23.

Gulmatico, Joshua S., et al. “SpotiPred: A Machine Learning Approach Prediction of Spotify Music Popularity by Audio Features.” 2022 Second International Conference on

Power, Control and Computing Technologies (ICPC2T), IEEE, 2022, pp. 1–5,

https://doi.org/10.1109/ICPC2T53885.2022.9776765.
Jacobson, Kurt, et al. “Music Personalization at Spotify.” PROCEEDINGS OF THE 10TH

ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS’16), ACM, 2016,

p. 373, https://doi.org/10.1145/2959100.2959120.
Jakheliya, Bhumil, et al. “Using Deep Autoencoders to Improve the Accuracy of Automatic

Playlist Generation.” INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, vol. 46, Springer International Publishing, 2020, pp. 626–636, https://doi.org/10.1007/978-3-030-38040-3_71.

Matera, Matteo. The Music Industry in the Streaming Age: Predicting the Success of a Song on Spotify. ProQuest Dissertations Publishing, 2021.

Mounika, K. S., et al. “Music Genre Classification Using Deep Learning.” 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), IEEE, 2021, pp. 1–7, https://doi.org/10.1109/ICAE CA52838.2021.9675685.

Seaver, N. (2018). Captivating algorithms: recommender systems as traps. Journal of Material Culture, 24(4), pp. 1-16. doi: 0.1177/1359183518820366.

Striphas, T. (2015). Algorithmic culture. European Journal of Cultural Studies, 18(45), 395- 412. https://doi.org/10.1177/1367549415577392

Van Der Nagel, E. (2018). Networks that work too well: intervening in algorithmic connections. Media International Australia, 168(1), pp. 81-92. doi:10.1177/1329878 X18783002.