Tuesday, January 2, 2018

Fun with lyrics

This post stems from a (very boring) casual thought I've had about a year ago: "Hmm... I wonder whether there is more rain in British songs?", which later generalized into "Is there any correlation between song lyrics and the weather in the country of origin of the artists?". I've spent an entire weekend writing code to scrap lyrics from the web, and then life got in the way and I've never finished this (uninteresting) project.

Since I already have a very large corpus of lyrics,1 I've figured why not combine two of my loves -- text analysis and music -- into one blog post? So in this post I will show you some fun analyses that people commonly do with lyrics.

Word Clouds
Word clouds provide a nice illustration to the frequency of word occurrences. Given a text, the word cloud contains the most common k words in the text, where more frequent words appear larger and in the center of the cloud. In this case, I chose an artist and created a word cloud from the lyrics of all the songs of that artist. I lowercased all the words, removed punctuation, stop words (very common function words like "and" and "the"), and the word "chorus". I used worditout to draw the word clouds. Here are a few examples (click on the links to enlarge):

Left to right: word clouds for the lyrics of Red Hot Chili PeppersMorrissey, and Eminem.
A few interesting, though expected, observations: Red Hot Chili Peppers often sing about love, Morrissey mostly moan. When he doesn't moan ("Oh"), he sings about serious topics such as war, the world and life. Eminem curses a lot. Funnily, since I kept the words in their inflected form, we get multiple variations of the F word in his word cloud. 

Now that we see which words are common in each artists' lyrics, we can take it a step forward and try to visualize the topics that they sing about. There are many ways to do that, and we'll do it simply by visualizing their word embeddings using t-SNE, a technique for projecting high-dimensional vectors to 2-dimensional space. The underlying assumption of word embeddings is that words with similar meanings or those that belong to the same topics would have similar vectors. This should also reflect in their 2-dimensional visualization.

To give the lyrics some context and demonstrate how they relate to all the possible topics in the world, I took the words from the lyrics and visualized their vectors along with the 2,500 most common words in English, highlighting words from the lyrics in red. Here is the result for Morrissey:

You'd have to scroll through the graph and look for clusters of red dots, then try to figure out what is their common theme. For example, I've found adjectives describing negative feelings (unhappy, sad, tired, weary, ...), words related to love (hearts, lonely, love, hug, kiss), body parts (body, arms, hands, head), and people (young, children, nephew, girl, boy, woman, ...).

And here is the result for Muse:

Here I see positive emotions (love, dream, fate), negative emotions (sorrow, shame, greed, apathy, bitterness), evil stuff (daemons, evil, exorcise, sins) and war-related words (war, struggle, fighting, revolt).

[Some technical details for my technical readers: I took the first 2500 words from this list of 10k most common words in English. For the lyrics, I considered the 500 most common words which are adjectives, nouns or verbs. I drew the t-SNE graph using this script, and used the pre-trained 50d GloVe word embeddings].

Generating New Songs
As the word clouds may suggest, each artist has a specific style which is reflected in the word choice and topic of their songs. We can train a model that captures this specific style, mimics this artist and generates new songs that would look like they've been written by this artist.

Unfortunately, for better-quality results, you need a large amount of training data, so forget about generating new songs of artists who tragically died after releasing only a few records (e.g. 123) or of your favorite indie bands that have relatively few songs (e.g. 123, 4, 5, 6, 7, 8). We'll stick with the more mainstream bands and try to generate new songs by Muse, Weezer, and Red Hot Chili Peppers.

For that purpose, we are going to learn an artist-specific language model. I've written an elaborate post about language models in the context of machine translation; in short, language models estimate the probability of a certain text in the language (e.g. English, or a more specific domain, like Twitter data or Muse lyrics). Each word in the text depends on the previous words, so in an English language model, for instance, the probability of "she doesn't" is larger than that of "she don't" (although, this may not be the case for English rap songs language models!). Language models can be used to compute the probability of an existing text, but they can also be used to generate new texts by sampling words from the distribution. We're going to use them for generation.

As opposed to the language models in my blog post, we will train a neural language model. These are explained very clearly in Andrej Karpathy's blog post "The Unreasonable Effectiveness of Recurrent Neural Networks". In short, a recurrent neural network (RNN) is a model that receives as input a sequence (e.g. of words / characters) and outputs vectors representing each subsequence (i.e. the first item, the first two items, ..., the entire sequence). These vectors can then be used by other machine learning models, e.g. for classification.

In the context of language models, the RNN learns to model the probability distribution of the next item in the sequence (e.g. the next word in the song). During training, the model goes over the entire text corpus (e.g. all the lyrics of a specific artist) and tries to predict the next item (word). If the predicted next item is incorrect, i.e. different from the actual next item, the model adjusts itself, until it is accurate enough. At test time, once the model parameters are settled, you can use it to generate new texts by sampling from the distribution of possible items (words) and constantly sampling new words conditioned on the already-sampled ones. The result should look similar to the original text corpus it was trained on. Very often, generated sequences will be actual texts from the corpus (and then you've just trained a parrot... Thanks Don Patrick for the great metaphor, I'm constantly quoting you on this!).

[Some technical details for my technical readers: I trained a word-level LSTM using DyNet, largely based on the char-level RNN example. My code is available here.]

So, let's take a look at the results! After training each model, I sampled a single song. I sampled each sentence separately, so subsequent sentences are not expected to be related to each other. I enforced the song structure by forcing a line break after every 5 lines. Here is the new "Weezer" song:

let me see the joy
holding on to what they give,
turn it, turn it,
i'd bury diamonds
you're just smile

excuse my manners if i make a scene
we're just visiting
i'm still afloat
and i'm lost without your love
why are all american girls so rough?

i'm a robot
and kick you out of sight
and if you're up all night
i cried for you, you were the blast

i'm just meant to be your latest tragedy
why are all american girls so rough?
you are.
how man is this in the world
i feel safe

oo-ee-oo i look just like buddy holly
wish that they would teach me when our critics
i don't want your love
chicks are trying to freak

Some sentences are clearly copied from existing songs ("oo-ee-oo I look just like Buddy Holly") but others are brand new. Overall it feels like a Weezer song to me!

Moving on to the new Muse song:

than you could ever give
and i want you now
i wish i could
and make a fuss
like the evil in your veins
you are

(your time is now)
our hopes and expectations
we don't belong here
i won't let you bury it
i wish i could

they will pull us down
in your world
now i just was to name
with who knows who
i'm growing tired of fighting

in my sleep
loneliness be over
vast human and material resources
you're unsustainable

is it enough
killed by drones
and our time is running out
you and i must fight to survive

This one is a bit disappointing, because the only reason it feels like a real Muse song is that it's a "summary of Muse songs" created by copying whole sentences from their songs. My intuition is that the amount of training data was too small, leading to "overfitting" (the training data is regenerated perfectly). This calls for an action by Muse to release more albums!

And the highlight is this new Red Hot Chili Peppers song:

when i find my peace of mind,
that i could find the fireflies
to close a right today
that i slept

you say the is least my love
start jumping and that sherri meet?
funky crime funky crime
just a mirror for the sun
i wrote a letter to you

i've been here before
stuck in the muck of the pond
to be afraid
play your hand and glory

well, i'm gonna ride a sabertooth horse
let's play
mother angel in your hand
take a star in a telegram
upon the places beyond

today loves smile for me
part of my scenery
i'll play all night
i am not wide

Wow... this looks nothing like every Red Hot Chili Peppers song ever. It doesn't even contain the word California! Maybe I should've trained the model for a few more iterations. It is pretty cool, though, that most sentences are new, and they make sense at least like the actual lyrics by RHCP make sense.

Now that we've got the data, we can finally answer the sleep-depriving question: "is there a correlation between the occurrence of rain-related words in lyrics and the country of origin of the artist?". For the lyrics that I scraped from the web I've also the kept the artists' countries. For these countries I've also looked up the annual precipitation statistics. I then looked for the occurrence of either of the following words in lyrics: rain, rainning, rained, rains, storm, stormy, cloud, cloudy, drizzle, flood. I computed the percentage of "rain" songs per country (out of all of the songs by artists in this country). The hypothesis was that artists from countries with a high average of annual precipitation are more likely to sing about it.

The percentage of songs mentioning rain in each country compared with average annual precipitation.
I was wrong. There was no correlation. It is also possible that this was a failed experiment because the number of songs for some countries was too small to draw any meaningful statistical conclusions.

Can we answer more interesting questions regarding lyrics? For example, this every Red Hot Chili Peppers song ever claims that all they ever sing about is California, but this wasn't reflected in the word cloud, nor in the generated song, meaning that this specific word was not very frequent in the corpus. However, if we only check which US states were mentioned in the songs, would California be more frequent?2  And if we make this question more general, do artists tend to sing more about the countries of origin, and do some places get more attention regardless of where the artists are originally from?

This time I focused on American artists, and took the lyrics of the first 200 artists from each state, checking for mentions of any states. I created a 51x51 table in which the columns represent the mentions and the rows represent the artists' state of origin. Rather than displaying this messy table, I plotted a heatmap where the lighter colors represent higher values (and 0 values are colored black).3
Mention of states in lyrics by artists' state of origin. Columns: states mentioned in lyrics. Rows: states of origin.
Here's how to interpret this heatmap: light values on the diagonal are pretty common, meaning that it's common for artists to sing about their states of origin. Two columns have light values across many rows: California and New York. Those are states which are common in lyrics, regardless of the artist origin.

Notice that the states are sorted alphabetically, so it's difficult to answer the question whether artists tend to sing about states in their proximity. A better visualization would be if we could place these statistics on a map. We can, and I used the Google Maps API to do so! Click on a state from the list and you'll see the states that sing about it visualized on a map.

I think I can see a pattern of states singing about their neighbors (this kind of visualization was helpful for someone like me who doesn't know much about US geography...).

Sentiment Analysis
Many words in Morrissey's word cloud are notably negative: killhate, die, leavegone, etc. This is of no surprise as anyone who's been listening to Morrissey or to the Smiths knows most of their songs are gloomy; according to this study, one of the gloomiest among UK artists.

This negativity can be "proved" computationally, using software for sentiment analysis. Sentiment analysis takes a text and determines its sentiment: either negative/positive, or a range of sentiments. Traditional models used to look at the words that appear in the text independently and score the sentence according to the individual words' sentiment, recognizing "good" and "bad" words. For example, "I am happy today" would be considered positive thanks to the positivity of the word happy (and the neutrality of the other words). Today's models are mostly based on neural networks, and sometimes they also take into account the structure of the sentence (which should be helpful in recognizing that "I am not happy today" is negative). The Stanford Sentiment Analysis system is an example for such a model.

I was planning to compute the sentiment of all the lyrics of Morrissey vs. all the lyrics of a presumably more cheerful artist (e.g. Queen, David Bowie), but I've found that most analyzers I've tried to use did pretty bad on recognizing the sentiment of lyrics. To be fair, they are usually trained on movie/restaurant reviews, and lyrics are often more sophisticated (As a proof: we've had a human disagreement on the sentiment of several Morrissey lines at home...). Here are some examples from the Stanford Sentiment Analysis demo:

A positive sentence from David Bowie. Sounds fun.

A negative sentence from Muse. A bit less fun.

Finally, this last example is a subtle insult (at least in my interpretation) from Morrissey: "you were good in your time", interpreted simply as a positive saying by the model. This was a difficult one!

1 In this post I use the lyrics I downloaded (315,357 songs) along with two lyrics corpora from Kaggle: from Sergey Kuznetsov (57,650 songs) and from Gyanendra Mishra (380,000 songs). I was planning to share the code for scraping the lyrics from the web, but when I finally started writing this post, I've found out that the website I've been using has changed and scraping lyrics with my code no longer works.  

2 It is very, very frequent in general, so the prior probability of the occurrence of California in songs is high, not just the conditional probability given that it's a RHCP song. I never realized how common it is until I came back from California last summer and tried to fill the void by creating and constantly listening to this America playlist (biased towards songs about California).  

3 One note about the statistics in this post: they are inaccurate. Some states have just a few artists, the number of mentions is counted equally if they are one from song or many songs, I didn't normalize the statistics by the size of each state, I didn't check for mentions of cities, etc.