Music with Markov Chains

This is a quick tutorial showing how to make music using Markov chains in Python. (If you’re not familiar with Markov chains, this page gives a great, interactive introduction.) The general idea is that we can input melodies of a certain style, and our model will output similar melodies that reflect that style. We’ll use a sequence of integers representing notes in a melody to train our model (i.e. build a transition table), and then compare the output.

First, start up Python and import the following libraries:

import numpy as np
import pandas as pd
import random

Next, we need melodies to use as input to train our model. You can compose them yourself, transcribe them, or use a dedicated toolkit like music21 to extract them from digital score files. Here I’ll just use the first few notes of a couple of familiar melodies. The integers are MIDI note numbers (transposed to C major):

brother_john = [60, 62, 64, 60, 60, 62, 64, 60]
little_lamb = [64, 62, 60, 62, 64, 64, 64]

We want to keep the melodies in separate lists to avoid introducing “false” patterns that never actually occur in the melody between the last note of one melody and the first note of the next. For example, the last note of the first melody is 60 and the first note of the second melody is 64. But the pattern 60 64 never occurs in either melody, so we keep them separate!

A Markov chain can be represented through what’s known as a transition table. The transition table specifies how likely we are to move from one state to another. This function will build our transition table:

def make_table(allSeq):
 n = max([ max(s) for s in allSeq ]) + 1
 arr = np.zeros((n,n), dtype=int)
 for s in allSeq:
  for i,j in zip(seq[1:],seq[:-1]):
   ind = (i,j)
   arr[ind] += 1
 return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')

Then we call the function with whatever melodies we’d like to include as items in a list (any number of melodies). This builds the transition table, essentially “training” the model:

transitions = make_table([brother_john, little_lamb])

The next step is to generate a new sequence based on the table. So we need a new function:

def make_chain(t_m, start_term, n): # trans_table, start_state, num_steps
 chain = [start_term]
 for i in range(n-1):
  chain.append(get_next_term(t_m[chain[-1]]))
 return chain

Inside of which we use the following nested function for each step:

def get_next_term(t_s):
 return random.choices(t_s.index, t_s)[0]

And finally, we’re ready to create our chain by calling the function using three arguments (transition table name, starting value, and length of sequence):

make_chain(transitions, 60, 10)

And we get something like this:

>>> [60, 60, 62, 64, 60, 62, 64, 60, 60, 60]

Try it a few times to see what kind of results you get. Switch up the starting value and train it on more melodies! Have fun!

Creative Gating

It’s a simple concept: set a gate at a certain threshold of volume, and only sounds above that volume will pass through. It’s a straightforward way to get rid of persistent, low-volume irritations like hums, ambient sound, and background noise. So straightforward, it seems, that many people seem to think of gates more as “utility” effects than as musical ones. But just as compression (as in dynamic range compression) is now viewed as at least as much an art as a science, there is much to recommend the creative possibilities of the humble gate.

(For a quick primer, check out this two-part series from Sound on Sound.)

Even though gates are generally used to reduce the volume of undesirable parts of an audio signal such as background noise (also known as “noise gates”), they are actually part of a class of effects known as dynamic range expanders. (The term “dynamic” is often used to refer to volume in music and audio.) How does expanding the dynamic range reduce volume? Well, by making quiet sounds even quieter, expanders actually increase the distance between the loudest sounds (i.e. those above the threshold and therefore unaffected by the gate) and the softest.

Imagine standing next to a drummer performing in a large, empty acoustic space. They play a drum beat that contains both loud sounds (for example, played on the kick and snare), and soft sounds (played on a closed hi-hat). The dynamic range is defined as the range or distance between the loudest sound (let’s say the thwack of the snare) and the softest sound (the tap of the hi-hat).

To expand the dynamic range, we physically move the hi-hat farther away from where we stand, so it sounds even quieter than before (let’s assume our drummer has extraordinarily long arms). If the softest sound before was the sound of a hi-hat a few feet away, now the softest sound is that of a hi-hat, say, fifty feet away. The loudest sound is unchanged (the snare is still right next to us), but there is more of a difference–a wider dynamic range–between the snare and the distant hi-hat, than between the snare and the close hi-hat.

To extend the analogy, compressors work in just the opposite fashion: we would take the loudest sound–the snare drum–and move it away from where we stand until it was closer in volume to the hi-hat right next to us. This rearrangement reduces or compresses the dynamic range, meaning that there is less of a difference between the volume of the snare and the hi-hat than before.

The most extreme version of the compressor, called a limiter, prevents any sounds above a given threshold from passing through (normal compressors decrease the volume of sounds above the threshold but do not impose a hard limit). Similarly, the most extreme version of the expander is the gate: instead of just making soft sounds softer, it makes them completely inaudible. In our analogy, it would be like taking the hi-hat out through the doors at the opposite end of the space and down the block.

That said, gates–and by extension, all expanders–are about much more than just changes in volume and dynamic range. One of the most famous creative gating effects is gated reverb, an important part of the huge sound of 1980s-era snare drums on many records. A conventional reverberation effect simulates the gradual decay in volume of a sound in a resonant space. Gated reverb involves aggressively cutting off the reverb before it has naturally decayed, resulting in a larger-than-life burst of energy that fades out quickly. Check out the snare on almost any record from the 1980s by Prince, Bruce Springsteen, or Phil Collins:

The “gated” in gated reverb refers to the use of a gate to dramatically cut off the reverb after it passes below a certain threshold (but well before it would naturally fade out). However, even though the gate is affecting the reverberated version of the audio, it’s actually responding to (or detecting) the volume of the drum before reverb is applied. The much shorter duration of the dry signal causes the gate to close on the reverberated signal before it has itself decayed, giving us the characteristically abrupt cutoff.

The principle of using one signal to control another is at the heart of many more familiar techniques, such as a modulation using an LFO or side-chaining. When gates are involved, the process is often described as “envelope following.” Envelope following means that changes in the volume of a signal (such as the attack and decay of individual notes or drum hits) are linked to the opening and closing of a gate. The gate is then applied to other tracks, essentially allowing one to apply the rhythmic pattern of one track to another.

For example, in “Upside Down” by Diana Ross, the rhythm of the strings is triggered (or “keyed”) by Nile Rodgers’s guitar:

Another great example is “Everybody Dance” by Chic. Listen to the solo section, beginning around 4:00:

In electronic music, this is often achieved by applying a gate to a sustained sound (such as a synth pad) and triggering it with a more rhythmic track, such as a drum track or step sequencer. This control layer may or may not be audible, just as kick sounds used for aggressive side-chain compression in a mix may be completely different from the actual kick track. Sometimes the gate is even inverted to produce complementary rhythms: whenever the first layer is playing, the second layer is not, and vice versa.

And what about using a reversed gate on its own: how might one use a gate to pass only sounds below a certain volume threshold? This is not an application that comes up often, but it’s an interesting question. Let’s say one wanted to create a sustained pad-like background layer from a dynamic source source (for instance, turning a drum beat into a wash of metallic cymbal sounds). One approach might be to use such a “reverse gate” to cut out the loudest attacks from the source, and then fill in the gaps with heavy reverb effects (and probably some compression as well). Although there exist specialized plug-ins that can do this, a bit of cleverness can easily flip the functionality of a standard gate.

Recall that combining two versions of the same signal with opposite phase cancel each other out. We can use this principle to cancel out sounds above the threshold of the gate, leaving only those below the threshold. Simply copy the same audio to two tracks and invert the phase of one version. Then apply a gate to the inverted version. When the sound is above the threshold the gate will open, and the two out-of-phase versions of the track will play simultaneously, resulting in silence. When the audio is below the threshold the gate will remain closed, meaning that the only audio we hear is the original, in-phase audio without the gate applied. Unlike compression, this technique will introduce silences where the loud sounds once were, but the attacks will be removed completely.

Gates can even be used to toggle between different inputs. The most famous example is the lead vocal on David Bowie’s “Heroes,” using a technique known as multi-latch gating devised by producer Tony Visconti. On this track, Bowie’s vocals are captured simultaneously by three microphones at varying distances, but the volume at which Bowie sings determines which microphone captures the sound. Each microphone input passes through a gate whose threshold is proportional to its distance from the singer. When Bowie sings quietly, the closest microphone captures his voice, but as his voice rises in intensity, the gate on a more distant microphone is opened, and the gate on the closer microphone is closed.

What ends up happening is that Bowie has to practically yell to be picked up by the farthest microphone, but the physical distance prevents the microphone from being overloaded while simultaneously adding more room ambience. In the context of the song, it allows for a powerful, emotional performance that is also somehow remote and alienated. It’s hard to describe, but it works–and at the center of it all is an elegant application of one of the simplest and perhaps most underrated tools.

Radiohead – Separator

There is certainly no shortage of deep dives into Radiohead tracks out circulating on the internet. That said, if I had to pick one tune that hasn’t quite gotten the love it deserves, it would be “Separator,” the closer from 2011’s The King of Limbs. You can listen to it here:

Spencer Kornhaber describes the song in an article as “the happiest Radiohead song ever.” There’s not a lot of competition, but I’m still not quite sure I hear it. What I do find revealing in the article is Kornhaber’s interpretation of the song as a moment of clarity:

“Things appear the same—’exactly as I remember, every word, every gesture’—but something’s changed: A weight has been lifted, and the ‘sweetest flowers and fruits’ surround him.”

Kornhaber sees “Separator” as relieving and transfiguring the gloomy weight of the preceding tracks on the album. While I don’t disagree (even if the epiphany is only bittersweet), I want to focus on how this process takes place within the song itself–without reference to the rest of the album. And there’s plenty to work with here: at five minutes and twenty seconds, “Separator” is the longest track on the record. Rather than focus on the lyrics, I’m especially interested in how Radiohead–along with producer and longtime collaborator Nigel Godrich–use mixing and production techniques to build a narrative and shape the form of a song.

For example, one way of hearing “Separator” is as a process that builds slowly, gradually sprawling into space. The opening is spare, yet by the end individual melodies are overwhelmed by cascading waves of sound. A song that is and is about a transformation. But not everything changes. The drummer’s crisp and contained breakbeat hardly varies at all over the course of the song. Does this mean that the transformation is less complete? Or maybe the drums are an implicit point of comparison, helping us to hear how much things really have changed?

We might listen to that process again. Starting from the beginning, the track unfolds slowly. When the vocals enter around 0:25, the melody seems to rise and fall over a narrow range, circling the same few notes. The bass guitar accompaniment is a syncopated two-bar riff hovering around a single harmony, rather than a chord progression. The drums are relentless. The whole thing starts to feel, well, claustrophobic.

Another factor is the distinctive sound of the vocals. Thom Yorke’s (characteristically mumbled) lead vocals are heavily processed with a delay effect. The delay is at the eighth note: too long to feel natural, but short enough that successive syllables overlap and accumulate. The vocals are also masked by the drums–their roles in the mix seemingly reversed. As a vocal performance, it’s more landscape than protagonist. Without a protagonist, it’s hard to feel like we’re moving forwards.

The first sign of something new on the horizon is the backing vocals in the right channel at 1:39 (and again at 2:15). They are louder than we might expect, competing with the lead vocals for the foreground of the mix. While they are also running through a delay, the delay time is much longer and, at precisely five eighth notes, causes the repetition of the backing vocals to overlap irregularly with the regular beats of the pulse. It is as though the musical texture is slowly being pulled apart.

Not everything that changes, changes gradually though. It is in the wake of this potential unravelling that the lead guitar enters (2:32), ushering in the second half of the song (narratively, if not quite mathematically). As it turns out, a lot seems to hinge on this moment. The sound immediately feels “bigger,” the result of a wider stereo image in the mix and the presence of a longer, more cavernous reverb tail.

As Yorke sings, “If you think this is over, you’re wrong”–a new line and a new melody–the backing vocals return (3:14), rhythmically intertwined. The delay effect has been removed and, even more jarringly, the backing vocals have been panned hard to the left, instead of the right as before. Even though we have heard this melody before, the production alerts us that something, narratively speaking, has changed.

Around 3:30, a spacey wash of guitar swells behind the vocals and begins to gently pulse every quarter note. It’s not actually a delay, but it’s the same gesture: a sound reasserted as it gradually gives way. To me, it sounds like the guitars have picked up, absorbed, and transformed the delay effect that was applied to the lead vocal throughout.

Sure enough, when Yorke returns to the phrase “Wake me up” at 4:13, the eighth-note delay is completely gone. It’s a chilling moment, and, for me, the most compelling transformation of the whole song. Here amidst the fullness of the pulsing guitars Yorke’s voice sounds small and exposed for the first time. The sobriety of hindsight? Resigned acceptance? Something really has changed.

Of course, that something is up for debate, and that ambiguity is what makes multiple listenings of a song like “Separator” so rewarding. Are the pulsing guitars “remembering” the sound of the voice from the opening? Could one hear the moment at 4:13 as the backing vocals standing alone–the echo split from the source? Or, more baroquely, were they really the lead vocals all along, shifting from right to left in the mix in search of the foreground? Play it again, and you never really have to decide.

Music Without Repetition

I recently came across a TEDTalk by a mathematician named Scott Rickard who set out to create the “ugliest” piece of music possible. It’s a fun clip, and can be viewed in its entirety here:

The basic premise is that “beauty” in music comes from patterns and repetition, and so our perception of “ugliness” results from the lack of these things. And this is basically true. As Rickard points out, the music we love is full of repeated patterns, and as listeners–consciously or unconsciously–we try to predict how these patterns will unfold. This is not to say that good music is endlessly repetitive (though some gets close)–just like a good book or film, there is joy both in the fulfillment of something we saw coming, and also in the surprise plot twist that subverts our expectations. Of course, we can only really have expectations when we are able to recognize a pattern in the first place.

At the end of the video, Rickard shares an original piece of music for piano where he has applied mathematical principles to ensure that no musical material is repeated. Not only is each note on the piano played only once, but the distance (the pitch interval) between each pair of notes is never repeated. Even the amount of time between two notes is unique to each pair.

You’d be hard-pressed to identify any patterns in this music, let alone predict which note was coming next (or when it would appear). It’s played for laughs in the moment, but it also raises an interesting question: would it be possible to use a lack of repetition to produce music that might be, in fact, beautiful?

Usually elements that are not pattern-based occur at an extremely local level in music, meaning that they are very brief in duration, or limited to certain instruments, but not others. For example, even though all popular music is built on top of a steady beat, there are (almost) always moments where the pattern is disrupted. Sometimes the same note of the beat is played on a different instrument for emphasis (for example, using the crash cymbal instead of the hi hat). Or the rhythm may change to make a passage more exciting, as in a fill at the end of a musical phrase. In both cases, it’s a momentary variation that the listener can compare directly with the pattern preceding and following it–it is meaningful to us to the extent that we can recognize its difference.

In jazz, this concept is taken a step (or two) further. In a typical jazz performance, players take turns playing solos over the same sequence of chords. This means that each section has different melodies and textures–often wildly different–but they are heard in relation to a repeated pattern of chord changes and a melody (the “head”) played at the beginning and end of the performance. It’s another way of finding a balance between the predictable and the unpredictable–what gets repeated, and what does not.

In the 1950s and 1960s, jazz musicians began to explore the limits of this balance. Free jazz, pioneered by Ornette Coleman, John Coltrane, and Cecil Taylor, exemplified one extreme through the use of less repetitive chord changes, more complex melodies, and greater rhythmic flexibility. But even before the emergence of free jazz, Miles Davis’s “Flamenco Sketches” (1959) presented an innovative take on the balance between repetition and non-repetition.

Apart from the opening piano chords, this composition is based entirely on improvised material. There is no identifiable melody that gets repeated–the structure of the piece comes from a sequence of five chords that are looped. However, unlike other jazz compositions in which the rhythm of the chord changes stays the same throughout the piece, in “Flamenco Sketches” the length of time spent on each chord varies according to the preferences of each soloist. It is an example of the principle of non-repetition expanded beyond the local level to shape the overall structure of the piece.

Of course, what is unique about Rickard’s composition is that it is arrived at mathematically, rather than intuitively. In other words, the raw materials of music–pitches, intervals, and rhythms–are organized according to a formula or algorithm whose results can’t be predicted until the formula is actually calculated. As it turns out, there are entire genres of music that use processes like these, often described (quite logically) as “algorithmic music” or “process music.”

One well-known example of process music is Steve Reich’s tape piece “Come Out” (1966). In this piece, multiple copies of the same short recording are played simultaneously on a loop at slightly different speeds. In the first few minutes of the piece, the difference is heard as a change in the quality or timbre of the sound, but a distinct echo quickly becomes audible. Later on, the echoes become more abstract, overlapping with one another in what is often described as a “phasing” effect.

What’s especially interesting about this example is that the basis of the piece is obviously the repetition of a short recording of someone speaking, but because of the process applied, no two repetitions sound exactly the same. Repetition itself, it seems, can be a form of variation under the right circumstances. And likewise, variation can sometimes sound a lot like repetition, based on how it is used. Rickard chose to apply a non-repetitive series to the notes of the piano, but what if the notes stayed the same and the it was the intensity of each attack that was varied eighty-eight times? Or the same note was played on eighty-eight different instruments? The perception of repetition or difference changes dramatically based on how it is applied to the music.

Brian Eno employs principles of non-repetition in his album Ambient 1: Music for Airports, often considered the progenitor of “ambient music.” Although the precise techniques vary from track to track, Eno makes extensive use of tape loops, in which a sound is recorded to a “loop” of magnetic tape that spools through the tape player over and over again. Instead of making the tape loop exactly as long as the sounds recorded, however, Eno adds a length of blank tape to each, creating repetitions that are interspersed with long silences.

A further layer of complexity emerges from the fact that Eno uses multiple tape loops at the same time and each tape loop has a different length. Additionally, the particular lengths that Eno chooses for the loops are not factors or multiples of one another. This means that the repetition of each loop is unsynchronized from the others–the beginning of each loop will always line up with a different part of every other loop. As Eno himself describes one track:

“There are sung notes, sung by three women and myself. One of the notes repeats every 23 1/2 seconds…The next lowest loop repeats every 25 7/8 seconds or something like that. The third one every 29 15/16 seconds or something. What I mean is they all repeat in cycles that are called incommensurable–they are not likely to come back into sync again.”

Just like in “Come Out,” even though the musical material is repetitive–recorded to tape and therefore fixed–the way in which the material repeats leads to a constantly changing musical texture. There is repetition of individual sounds, but their arrangement in relation to one another is always different. It’s certainly not “ugly” music, but in this case, it’s not necessarily repetition that makes it beautiful.