THE INQUISITIVISTS
  • Home
  • About
  • Blog
  • Whitepapers
  • Pii Software
  • Contact
  • Videos
  • Link Page

R & D

Using AutoHarp and a Character-Based RNN to Create MIDI Drum Loops

9/16/2015

0 Comments

 
This whitepaper details a method of generating new MIDI drum loops using a combination of AutoHarp — an algorithmic music playing software package created by the author — and Andrej Karpathy's character-based Recurrent Neural Network, char-rnn. It is a herculean task to track all of the inputs and prior art that led to this experiment and informed its construction (the history of algorithmic music is, at this point, vast), but a few specific experiments led me to begin investigating char-rnn as a fruitful source for progress in the arena:
  • The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej's own article on his software package.
  • Recurrent Neural Shady, Chris Johnson's rap song created by feeding Eminem lyrics to char-rnn.
  • Generating Magic cards using deep, recurrent neural networks, Reed Morgan Milewicz's PhD research.
  • Deep learning for assisting the process of music composition, Bob L. Sturm's investigation into using ABC musical notation to generate new musical melodies.
AutoHarp is an algorithmic song generator. It differs from most other applications in the space in that its output is fully rendered, multi-track MIDI files with recognizable instruments and repeating sections, constructed as a human popular song composer might construct his or her songs. The author uses these files as the basis for his music, as documented throughout this site. While the output of AutoHarp is impressively varied, and the construction of an individual song involves hundreds of thousand of discrete "decisions," it is also limited: the Markov Chains that generate melodies, song structures, chord progressions, and improvisations by the members of the "band" are literally hard-coded into the program itself. To transcend this, to allow the machine to become more literally creative, deep learning is required.

This experiment focused on AutoHarp's drummer. In the open-source version of AutoHarp, you "seed" it with existing MIDI drum loops (a plethora of free and royalty-free MIDI loops are available online for the Googling); thereafter it plays in particular musical genres by selecting a loop or loops you have tagged in that genre during the seeding process and modifying them slightly as the music requires (e.g. repeating loops or truncating them, adding simple fills at the end of phrases, switching one type of drum to another to add variance).

​The very first attempt was merely to print out the MIDI notes of all the drum loops in my AutoHarp library as text representations of their MIDI data. This was done without regard for differences in tempo, genre, or meter (a.k.a. time signature) among the loops themselves. As it is easier to manage and manipulate in code, I did switch MIDI note_on/note_off events, which use relative time, to using a single quasi-MIDI "note" event which uses absolute time and a note duration. I also used the MIDI utilities of AutoHarp to break each loop into one-bar sections and re-zeroed the absolute time of each bar. A section of the input file appears below:
START LOOP
TEMPO 112
note,0,31,9,36,98
note,0,35,9,51,80
note,105,37,9,51,69
note,167,37,9,51,56
note,231,36,9,53,119
note,349,37,9,51,70
...
note,1434,6,9,51,85
END LOOP
START LOOP
TEMPO 123
note,0,30,9,36,100
note,0,31,9,51,85*
...

*The "note" quasi-event is of the format: 'note',absolute time,pitch,channel,duration,velocity. In Standard MIDI percussion, pitches have set associated drums; in my library, e.g., 35 == kick, 38 == snare, 42 == closed hi-hat, etc.
​This file was 1.2 MB — according to the documentation, a small sample size. Nonetheless, the initial results — run with default settings for char-rnn — were somewhat promising. Example output from a mid-epoch checkpoint looked like:
START LOOP
TEMPO 93
note,0,43,9,43,111
note,143,30,9,42,84
note,718,24,9,36,94
note,455,30,9,48,115
note,138,30,9,36,116
note,248,30,9,36,92
...
note,9,36,9,38,114
note,156,30,9,42,117
note,482,41,9,42,73
...
note,486,30,9,38,115
END LOOP
START LOOP
TEMPO 150
note,0,30,9,49,111
...
note,395,42,9,51,95
END LOO
TART LOOP
TEMPO 80
note,0,30,9,38,112
note,148,30,9,36,68
note,727,30,9,36,101
...
END LOOP
START LOOP
TEMPO 100
note,0,34,9,36,101
note,0,30,9,9,38,104
note,607,30,9,42,108
note,48,30
note,844,30,9,38,128
...
note,948,41,9,44,85
END LLOPP
...

​I used AutoHarp's built-in text utilities to harvest loops, using all valid lines between Start and End markers. Converting those back to MIDI and looping the results over four bars produced output such as this:


Interesting — nothing to make Gene Krupa quake in his...um...grave — but interesting, and certainly indicative that the approach (of converting MIDI to text, building a text learning model, and then harvesting the resulting text and converting it back to MIDI) might be promising.
However, from there, it was initially difficult to make much forward progress. I did succeed, by using the same method, but limiting the input to only 4/4 loops, in getting examples of fairly rhythmic one-bar loops (here again looped to 4 bars):
Attempting to get the machine to string together its own, longer, drum lines using this method were, however, unsuccessful. I tried a variety of changes to the inputs — for instance, in one iteration I used only the Rock loops (i.e. straight-ahead 4/4 beats with kicks on 1 and 3, snares on 2 and 4); in several I used only the 4-measure phrases as input; I fed them through AutoHarp first to simplify the beats where possible. All outputs in this set of iterations were thematically similar to this:
Kind of like a kid who just got his drum set, and can keep things going for a couple of beats, falls apart, gets it back, and then falls apart again.
It was at this point that I had a kind of pedagogical revelation: the drum loops I was using as input were too advanced. They were recorded on MIDI drum kits by professional human players who had, presumably, years of practice to develop groove and feel. Here again is one bar of a sample input loop (from a later iteration where I had changed the format — the data is still the same):
[note      0   40  9          Bass Drum 1 127]
[note      0   42  9         Pedal Hi-Hat 127]
[note      0   41  9       Crash Cymbal 1 127]
[note    102   41  9         Pedal Hi-Hat 127]
[note    106   42  9            Ride Bell 127]
[note    181   43  9       Acoustic Snare  98]
[note    183   44  9          Bass Drum 1 116]
[note    222   41  9         Pedal Hi-Hat 127]
[note    224   42  9        Ride Cymbal 1  69]
[note    349   42  9         Pedal Hi-Hat 127]
[note    353   41  9          Bass Drum 1 125]
[note    356   42  9            Ride Bell 127]
[note    361   41  9       Acoustic Snare 117]
[note    444   42  9       Acoustic Snare  24]
[note    472   42  9         Pedal Hi-Hat 127]
[note    479   42  9          Bass Drum 1 127]
[note    481   41  9        Ride Cymbal 1  66]
[note    555   43  9       Acoustic Snare  31]
[note    595   42  9         Pedal Hi-Hat 127]
[note    595   42  9            Ride Bell 127]
[note    666   43  9          Bass Drum 1 122]
[note    672   42  9       Acoustic Snare 120]
[note    709   42  9         Pedal Hi-Hat 127]
[note    711   41  9        Ride Cymbal 1  67]
[note    833   42  9         Pedal Hi-Hat 127]
[note    835   42  9          Bass Drum 1 127]
[note    836   42  9            Ride Bell 127]
[note    837   42  9       Acoustic Snare 119]
[note    914   42  9       Acoustic Snare  24]
[note    954   42  9         Pedal Hi-Hat 127]
[note    954   42  9        Ride Cymbal 1  66]
[note    955   42  9          Bass Drum 1 126]

AutoHarp uses 240 MIDI ticks per beat, so a drum loop that was rigidly on the beat would have time values like 120, 240, 480, etc. This loop, and indeed all of the input loops, isn't robotically on the beat. Human drummers, especially good ones, play in a groove: they slightly anticipate or lag behind some or all of the beats in a measure. Here's what the above loop sounds like, fyi:

From Groove Monkee's free MIDI drum loops package
This is like trying to teach someone the drums by transcribing a Stewart Copeland line down to a resolution of 960th notes and expecting her to learn an expert drummer's rhythm, groove, and feel from it.
At this point, I switched file formats to a simple text drum notation that I'd created in AutoHarp to visualize drum loops. It looks like this:
36,|8.7.......8.....|8.8.......7.....|8.8.......78....|7.8.......7..7..|
38,|....9..3.4..9..3|....9..3.4..8..4|....9..3.4..9..3|....8.82........|
42,|....4.5.5.5.7.5.|6...7.5.6.6.6.5.|5...3.6.7.5.7.4.|7...7.6.5.......|
44,|....7...........|....6...........|....7...........|................|
46,|..7.............|..7.............|..7.............|..7.............|
49,|8...............|................|................|................|   

Each line represents a drum, and each character within the bars represents a 16th note. A dot is no drum hit, a number represents how hard the drum is hit with 0 being the lightest and 9 being the hardest. In addition to forcing the hits of the loop into a grid, I also bootstrapped the dataset by creating a series of virtual song sections of 4 bars apiece, and letting AutoHarp's drummer play them (via the method previously described using altered MIDI loops) and running it 5000 times. This created a 2 MB dataset, which I fed to the char-rnn train module. It took a few epochs before it got the hang of the format, but after that, well...here's a random sample of the results (now with a richer drum sound font):
The trailing off is one of the ways AutoHarp will end a phrase in order to bring the energy of the song down. It has learned well.

Straight ahead. Is that last little hi-hat flutter a screw up, or did you mean to do that?

It's learned to do little variances at the end of phrases — here, e.g., the open hi-hats.

If it weren't so perfectly on the beat, this would be indistinguishable from a human player

Yeah, this one's a little wonky; it's also undeniably rhythmically interesting.
I didn't cherry pick those. I have a script that runs through char-rnn's checkpoint files (on this run I was writing them every 25 iterations, so I have a lot of data for this single experiment), harvests the results, and writes them to MIDI files; I then sample them randomly, and the five above came out of that random sampling one right after another. There are certainly loops in the set that aren't quite passing as drum grooves (that fifth one is probably one of them), but there aren't very many of them. This just seems to me to be a stunning result; I tried a very difficult pedagogy and it more or less failed. I simplified the pedagogy, as you might when teaching a human to do something, and it succeeded.
I've since written code into the experimental branch of AutoHarp that can use existing, human played drum loops as a "groove template" to make the loops that come out of this model feel more natural — that is, it will move notes slightly off the beat and give them more nuanced note velocities so as to give the loops groove and feel. And I also hope that as I develop the model, this will prove unnecessary. As it is when a human learns music, it seems that the basic rhythms have to be taught first. Actually "feeling" the music — making it musical — is a study that lasts a lifetime. Or that's how it is for a human, anyway — in the case of char-rnn, it might take just a few more epochs.
0 Comments



Leave a Reply.

    Archives

    September 2015
    August 2015

    Categories

    All

    RSS Feed

© COPYRIGHT 2015. ALL RIGHTS RESERVED.
  • Home
  • About
  • Blog
  • Whitepapers
  • Pii Software
  • Contact
  • Videos
  • Link Page