Introduction to the chorrrds package

Music chords analysis!

Bruna Wundervald (Maynooth University) , Julio Trecenti (Curso-R)


chorrrds is a package to retrieve and analyse music data. It scrapes the Cifraclub website to download and organize music chords.

The main reason to create chorrrds was my undergrad thesis. In my work, I did an end-to-end analysis, exploring feature engineering techniques to describe and predict musical genres from music chord representation.

chorrrds can be considered a package for MIR (Music Information Retrieval). MIR is a broad area of computational music which extracts and processes music data, from the unstructured ones, as sound waves, to structured, as sheet music or chords.

In this post we’ll describe chorrrds functions and show some examples. Stay tuned!


You can install chorrrds from your favourite CRAN mirror, simply running:


You can also install the latest versios of chorrrds from the R-Music GitHub organization with:

# install.packages("devtools")


The main function of the package is called get_chords(). It extracts music chords from an specific artist. There is two steps to obtain the data:

  1. Extraction of song urls for each music of an artist with get_songs.
  2. Extraction of music chords using the urls with get_chords.


# Step 1: Getting the chords for some Janis Joplin songs
songs <- "janis-joplin" %>% 
  chorrrds::get_songs() %>% 
  dplyr::sample_n(5)        # Selecting a random sample of 5 pieces 

# Step 2: getting the chords for the selected songs
chords <- songs %>% 
  dplyr::pull(url) %>%                     
  purrr::map(chorrrds::get_chords) %>%     # Mapping the function over the 
                                           # selected urls
  purrr::map_dfr(dplyr::mutate_if, is.factor, as.character)   %>% 
  chorrrds::clean(message = FALSE)         # Cleans the dataset, in case
                                           # strange elements, as music lyrics, 
                                           # are present when they shouldn't

chords %>% slice(1:10) 
chord key music long_str
Em E janis joplin one good man 5
E E janis joplin one good man 5
A E janis joplin one good man 5
E E janis joplin one good man 5
B E janis joplin one good man 5
A E janis joplin one good man 5
E E janis joplin one good man 5
E E janis joplin one good man 5
A E janis joplin one good man 5
E E janis joplin one good man 5

The table above shows us how are the results of the get_chords function. As you can see, the data is in a long format: the chords appear in the sequence they are in each music, being repeated sometimes. The music column contains the name of the artist and the name of the song. This can be changed if preferred, with:

chords <- chords %>% 
  tidyr::separate(music, c("artist", "music"), 
                  sep = "(?<=joplin) ", extra = "merge")

chords %>% slice(1:10) 
chord key artist music long_str
Em E janis joplin one good man 5
E E janis joplin one good man 5
A E janis joplin one good man 5
E E janis joplin one good man 5
B E janis joplin one good man 5
A E janis joplin one good man 5
E E janis joplin one good man 5
E E janis joplin one good man 5
A E janis joplin one good man 5
E E janis joplin one good man 5


There are many datasets that come with the package. They were used in the undergrad thesis, so it was a choice just to keep it in the package. The data is composed of several Brazilian artists music chords. You can check the available datasets with the code above, which won’t be run here because the results are long:


Use case

Returning to the data we collected before, let’s explore it!

The first thing we can look at is the most common chords in each music. Which are the common chords in music made by Janis Joplin? Are the proportions of these chords similar between the songs?

chords %>% 
  dplyr::group_by(music) %>% 
  dplyr::count(chord) %>%
  dplyr::top_n(n, n = 3) %>%
  dplyr::mutate(prop = scales::percent(n/sum(n))) 
music chord n prop
combination of the two A 8 26.7%
combination of the two Abm 4 13.3%
combination of the two B 4 13.3%
combination of the two E 10 33.3%
combination of the two F#m 4 13.3%
move over A 2 33.3%
move over E 2 33.3%
move over G 2 33.3%
one good man A 8 33.3%
one good man B 4 16.7%
one good man E 12 50.0%
piece of my heart A 54 37.5%
piece of my heart B 67 46.5%
piece of my heart E 23 16.0%
st james infirmary Am 4 22.2%
st james infirmary B7 5 27.8%
st james infirmary Em 9 50.0%

With the dataset analyzed here, we can already obtain some interesting information. For some of it, as the first and second pieces, the 3 most common chords appeared in a close proportion. For the others, this happens in a different way. Both the proportions and the absolute quantities of the chords vary more. That shows us that the structure of her songs don’t follow a closed pattern, which can be a sign of how creative the artist was.

We can also look at something called “chord bigrams”. This is pretty much the task of creating pairs of chords that happened in sequence, by music, and analyze their frequencies.

chords %>%
  dplyr::group_by(music) %>% 
  tidytext::unnest_tokens(bigram, chord, to_lower = FALSE,
                          token = "ngrams", n = 2) %>% 
  dplyr::count(bigram) %>% 
  dplyr::top_n(n, n = 2) 
music bigram n
combination of the two B A 4
combination of the two F m 4
move over A A 1
move over A C 1
move over B A 1
move over C B 1
move over E G 2
move over G A 1
move over G E 1
one good man A E 8
one good man B A 4
one good man E A 4
one good man E B 4
piece of my heart A B 24
piece of my heart B B 35
st james infirmary B7 Em 5
st james infirmary Em Am 4
st james infirmary Em B7 4

There are some bigrams that happen a lot in a song, while others just a few times, but are still the most frequent ones. In the song called “Piece of my heart”, we have the repetition of the chord “B”, which is described by the appearance of the “B B” bigram.

Now, we have already explored the data a little bit. We can make it even more interesting, by building a chord diagram. The word “chord” here does not mean the musical one, but a graphic element that shows us the strength of a connection. The connections will be the observed chord transitions in our selected songs, and their strengths, how many times each transition happened. With this configuration, the chord diagram makes the relationship between each chord explicit.

# devtools::install_github("mattflor/chorddiag")

comp <- chords %>% 
  dplyr::mutate(seq = lead(chord)) %>% 
  dplyr::filter(chord != seq) %>% 
  dplyr::group_by(chord, seq) %>%  
  dplyr::summarise(n = n())

mat <- tidyr::spread(comp, key = chord, value = n, fill = 0)  
mm <- as.matrix(mat[-19, -1]) 

# Building the chords diagram
chorddiag::chorddiag(mm, showTicks = FALSE,
                     palette = "Reds")

Now we can clearly see how the transitions behave in the songs we’re using. There are strong connections between the chord A and E, A and B, and the others one are, in general, fragmented.
A cool thing to notice is that the diagram is interactive, then we can see the strength of each transition with the mouse cursor!

Wrap up

In this blog post, we:

chorrrds is a new package, with a lot of potential and many possible applications to be explored. We hope this was useful, and that now you’re starting to be as enchanted by music information retrieval as we are!


For attribution, please cite this work as

Wundervald & Trecenti (2018, Aug. 19). R-Music: Introduction to the chorrrds package. Retrieved from

BibTeX citation

  author = {Wundervald, Bruna and Trecenti, Julio},
  title = {R-Music: Introduction to the chorrrds package},
  url = {},
  year = {2018}