Introduction to Using Neural Nets

Many of my recent posts have explored how machine learning techniques can be integrated into creative and analytical musical projects. Today, I continue to explore this area by introducing artificial neural networks as a tool for analyzing music. I should preface this by stating that I am by no means an expert in this area. However, I find the possibilities fun to work out and fascinating in their implications. Writing up my experimentations also helps me clarify my own thinking and suggests avenues for improvement or further discovery.

I started out by doing a lot of research on machine learning online. Some of the best resources I’ve come across include Andrew Ng’s video series on machine learning and 3Blue1Brown’s Youtube channel. I also looked into previous research specifically using machine learning to synthesize and analyze music. I found the work of David Cope and Roger Dannenberg (here and here) especially helpful in framing and evaluating research questions pertaining specifically to music. This paper surveying AI methods in algorithmic composition has also been extremely helpful.

The most popular tools for implementing machine learning techniques tend to advertise their ease of use. This is a good thing in general, but for my own purposes—namely, learning—it was not ideal, since much of the detail of the actual operation is hidden away from me as a user (see “blackboxing”) . Consequently, instead of starting with a well-known package like keras or scikit-learn, I found a terrific tutorial that walked through building a neural network from individual functions—in other words, almost scratch.

After going through the tutorial myself and making sure everything worked as expected, I turned my attention to finding an appropriate musical question that a neural net would be suited to solve. This turned out to be more challenging than I expected, as many of the questions that first occurred to me were actually better suited to other methodologies. My (admittedly incomplete) understanding of the “sweet spot” for the use of ANNs are questions that are difficult to define using concrete rules, but that ultimately rely on consistent patterns.

Thinking in this way, I found that many of the fundamental patterns of music (as understood through Western music theory) are too well-defined to be appropriate for machine learning methods. For example, specific chords and set classes can be identified through reasonably simple rules and algorithms. When machine learning methods (such as ANNs) are applied to questions like these, they tend to “overfit” the training data—as it is sometimes put, they “memorize” the training data, rather than “learning” from the underlying trends. Because of this, they tend to perform poorly on new data.

To illustrate what I mean, let’s imagine we want to use an artificial neural network to determine whether a given chord is a major chord. Our data set will comprise random chords in pitch space (I’ll use the MIDI note numbers 0-127 to refer to specific pitches). Major chords are extremely well-defined from a mathematical perspective: they comprise a set of three pitch classes separated by specific, unchanging intervals (major third, minor third, perfect fourth). If we perform a modulo 12 operation on all of the notes in each chord in our data set, we can measure the constituent intervals and determine the quality of the chord with certainty. If the chord follows the rules, it is a major chord; if it doesn’t, it’s not.

At first glance, we might imagine that an ANN would be able to infer these rules given enough training examples. However, ANNs work by assigning weights to the connections between nodes (i.e. data points), and in pitch space, there is no correlation between a particular MIDI note number and the quality of the chord. In other words, a C major chord might contain the note C4 (MIDI note number 60)—though not necessarily—and a D major chord will never contain C4. Therefore, it is unclear how a neural network would successfully weight this note (and by extension, every other note). Instead, it is likely to “memorize” the chords in the training set, rather than “learn” any consistent underlying principle, as is the objective in machine learning.

One possible solution is careful preprocessing of the data in order to ensure that individual data points would be meaningful to the neural network when used as input. Yet this ultimately reveals how well-defined major chords are and, consequently, how unnecessary machine learning would be in this scenario. For example, we might preprocess the data by applying a modulo 12 operation to each note. This would simplify the examples considerably: from 128 different possible data points (i.e. pitches) to 12 pitch classes. Yet we encounter the same problem as above: the note C (pitch class 0) will always be present in some major chords but not others. We might expand the preprocessing stage to transpose and/or invert all chords so they are based on C (this is also known as normalizing, scaling, and/or centering the training data). However, at that point a simple comparison between the example chord and a model set of (0,4,7) would always produce correct results, rendering a machine learning approach completely superfluous.

The research question I finally decided on was determining whether a given major triad in pitch space was a C major triad. Intuitively, this seemed like an appropriate problem for an ANN because there is consistency with respect to the individual data points (each data point is either potentially part of a C major triad or not), but examples will vary greatly in terms of the octave and number of component pitches, making it (somewhat) difficult to describe in a finite number of rules. In the next post, I’ll jump into the code and implementation.