Genre Graph: Comparing Musical Qualities as a Class

This blog post introduces “Genre Graph,” an activity I frequently use in my classes to visualize and examine the relationship between musical qualities and stylistic groupings such as genre. It’s also useful for starting a discussion around how significantly individuals’ perceptions of musical qualities can vary. From a pedagogical perspective, it is highly interactive and promotes independent critical inquiry though the creative use of technology. This post is primarily a description of the activity, and concludes with a simple version of the Python script I use.

The activity has two phases that are normally conducted in two successive class sessions, but which could theoretically be integrated into a single session. In the first phase, students listen to a selection of short musical excerpts and rank them according to musical qualities. In the second phase, the rankings are collated and visualized electronically using a simple Python script, leading to a class discussion of the results. There are many possible variations, which I will describe below. I will conclude this post with an example and explanation of the code I use.

The first step for instructors interested in using Genre Graph in class is to clarify the learning objectives in order to choose appropriate examples. For instance, I frequently use this activity in my introductory electronic music course to survey the wide range of styles, genres, and subgenres that can be considered “electronic music.” Consequently, I might choose examples from genres as disparate as hip hop, glitch music, EDM, musique concrete, disco, and microsound. In other contexts, it might be more appropriate to choose a narrower range of examples, such as in a course focusing on a particular time period or composer. I normally use ten thirty-second excerpts, but this can be easily customized as well.

Next, the students rank the examples according to musical qualities. These can be predetermined, but I find it best to involve the students in choosing the specific qualities. Qualities must be measurable on a continuum—not binary categorizations. For example, the sense of whether an example has a beat or not, where 1 is no beat and 10 is a very clear beat. Example criteria that students have come up with in my classes include harmony, density, naturalness, and electronicness.

I like to have the students brainstorm many different possible qualities, then have a class discussion where we narrow it down to just a handful. I prefer three qualities since that can easily be graphed on three-dimensional axes. Usually I’ll have the students complete their rankings at home between class sessions, and in the next class, students compare their rankings with a partner—and revise them if desired. (I will often play a short excerpt of each example as a reminder to spur discussion.) Before jumping into the results, I also always like to see if there are major discrepancies in how different students interpreted different qualities, and discuss how they might have come about.

Finally, I’ll collect students’ rankings electronically, using Google Forms, Zoom, or some other tool. I input them into Excel to quickly get average rankings, and the average rankings are what I end up inputting into Python to plot.

The idea behind visualizing the results is to spur a discussion along two lines: (1) how certain musical qualities can be shared between stylistically different examples, and (2) how perception of the qualities of a given example can vary significantly between individuals. I often begin by focusing on examples that are located close to one another in the space, which I refer to as clusters. I follow up by asking what accounts for the similarities that can be seen.

Other questions include asking whether there are any examples that are close together but don’t seem to sound similar, or examples that do sound similar (at least to some), but are far away in the space. If we find instances such as this, I ask students what might account for these seeming disparities. I also ask if any of the clusters we identify might correspond with familiar or recognized genres. And finally, I ask students how hypothetical changes to our criteria—or different criteria altogether—might modify the space.

Here is a simple version of the Python code I use:

from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt

rhythm, harmony, density = [], [], []

for i in range(10):
 j = i + 1
 k = 'Rhythm Value for Excerpt #' + str(j) + ': '
 element = int(input(k))
 rhythm.append(element)

for i in range(10):
 j = i + 1
 k = 'Harmony Value for Excerpt #' + str(j) + ': '
 element = int(input(k))
 harmony.append(element)

for i in range(10):
 j = i + 1
 k = 'Density Value for Excerpt #' + str(j) + ': '
 element = int(input(k))
 density.append(element)

fig = plt.figure()
ax = plt.axes(projection ="3d")

ax.scatter3D(rhythm, harmony, density, color = "green")
plt.title("Genre Graph Activity")
ax.set_xlabel('Rhythm', fontweight ='bold')
ax.set_ylabel('Harmony', fontweight ='bold')
ax.set_zlabel('Density', fontweight ='bold')

point_labels = ['#1', '#2', '#3', '#4', '#5', '#6', '#7', '#8', '#9', '#10']

for n in range(10):
 ax.text(rhythm[n],harmony[n],density[n], point_labels[n], size=20, zorder=1)

plt.show()

You can run it in any IDE or through the Terminal on Mac and input the requested values, modifying the names of the parameters as you see fit. In the next post, I will go through the code in more detail, make it more robust and generalizable, and explore customizing the visualizations. I’ll also use a sample data set to illustrate the discussion.

Reinforcement Learning in Max, Part 3

In the last post, I shared my multi-armed bandit simulator and walked through the implementation of five different reward maximization algorithms: (1) uniform, (2) greedy, (3) epsilon-greedy, (4) epsilon-first-greedy, and (5) upper confidence bound. In this post, I will present the results I obtained from my simulations, and compare those results with the trials discussed in the Sutton/Barto.

I ran 1000 simulations of each algorithm, and each simulation comprised 1000 turns. Each of the ten bandits had a different, randomly-determined reward distribution, with the mean ranging from 0.01 to 4.78, and the variance ranging from 0.66 to 2.49. I employed a Gaussian distribution for all ten bandits, using the [randdist] object.

In addition to seeing which bandits were chosen in which order (and how many times)—and how their average estimates changed over time—I averaged several data points from across all of the simulations with each algorithm. The first data point was the average reward: on average, how much is gained per turn over the course of the simulation. A successful strategy would show an increase in the average reward over time through the simulation.

I also wanted to evaluate how frequently the “agent” engaged in optimal behavior. I devised two different ways to do measure that. First, I kept track of how often the agent chose the bandit with the highest average estimate. This was not necessarily the bandit with the highest average over the long term—instead, this was simply a measure of, given the available data, how frequently did the agent make the best (highest-paying) choice? The second way of evaluating this was to go by the mean of the distribution programmed into the bandit—rather than the estimate—and therefore comparing the agent’s behavior with what they might choose given perfect omniscient knowledge. I will refer to these two different measures as optimal (estimate) and optimal (actual).

My first simulation used the uniform (random) algorithm. This algorithm can be considered a kind of “control,” since it behaves exactly the same throughout the entire simulation. Since it doesn’t have a mechanism for “learning,” we don’t expect it to improve over time, as the graph of the optimal (estimate) under the uniform algorithm below shows.

Given that there were ten bandits, if chosen at random we would expect the choice to be optimal about 10% of the time (1/10), and that is precisely what the graph shows. (As you might expect, the other graphs for the uniform algorithm are similarly static.)

By contrast, the greedy algorithm shows a marked improvement early on, but then quickly levels off. Because the greedy algorithm chooses the bandit with the highest average by definition, the optimal (estimate) graph will show optimal behavior 100% of the time (except for the brief initial exploratory period described in the previous post). More revealing is the optimal (actual) graph, given below.

What’s interesting here is that even though the agent’s behavior is considered optimal according to the running estimate, compared with perfect knowledge of the bandits’ reward distributions, the greedy algorithm finds the highest-paying bandit only a little over 50% of the time. This result emphasizes the importance of exploration: early, randomly high-paying results can strongly influence algorithms that don’t perform adequate exploration, and yield less payoff in the long run. (Compare with the charts on page 29 of the Sutton/Barto.)

The next two algorithms are closely related in that they introduce mandatory exploration in a fixed proportion relative to the number of turns taken. This proportion, known as epsilon or e, is usually relatively small (10% or less); in my simulations, I used e = 0.01 (i.e. 1%). The e-greedy algorithm distributes the exploration uniformly across the run, while the e-first-greedy algorithm performs all of the exploration first. The graph below shows the average reward obtained for each.

As the graph illustrates, both algorithms show clear improvement over time, albeit at significantly different speeds. The e-first approach reaches a relatively high reward quickly and plateaus over the remainder of the run, while the e-greedy approach increases the reward more gradually, eventually converging with the e-first approach. Yet the cumulative reward for the e-first approach will be much higher because the agent has significantly more time to exploit the higher reward.

The final algorithm I tried was the upper confidence bound (UCB) algorithm, which balances exploration and exploitation by also taking into account the number of times a given bandit has been tested. The more a bandit has been tested, the more “confident” the algorithm becomes about its estimate, and the less valuable it is regarded for exploration. The algorithm can be customized by modifying the value placed upon exploration with the constant c. (I used c = 2 in my simulation.)

This graph shows the percentage of runs that exhibit optimal (actual) behavior with the UCB algorithm over time. The graph shows rapid improvement initially, then levels off similar to the logarithmic function. Yet this approach ultimately plateaus at a much higher level than any of the other algorithms, indicating that the UCB algorithm is effective at determining which bandit actually has the highest average payout. The graph below compares the prevalence of optimal (actual) behavior across all five algorithms.

Yet the true measure of success for the multi-armed bandit problem is the reward obtained. The following graph shows the average reward across all five algorithms.

The baseline average reward represented by the uniform approach is quickly overtaken by the greedy, e-first, and UCB approaches, and more slowly by the e-greedy approach. Over time, all four algorithms converge, though the UCB is most successful. However, as discussed above, this convergence is misleading. As the UCB approach reaches its plateau very quickly, an agent using this approach will accumulate much greater rewards early on, leading to an insurmountable lead in total reward gained, even if the rates at which the rewards subsequently increase are comparable across algorithms. The chart below demonstrates this fact by comparing the average cumulative reward between approaches.

Consequently, even though the different approaches appear to converge, following the UCB approach clearly yields the greatest reward over time. Yet as Sutton and Barto point out, this result holds only for a particular kind of problem, which is in reality a simplification of the more complex problems typically encountered in the real world. Indeed, the multi-armed bandit problem is only a starting point. In future installments of this series, I will cover more advanced reinforcement learning strategies and begin to apply these strategies to the processes of musical creation.

Reinforcement Learning in Max, Part 2

In the previous post, I described the first few steps of my journey exploring reinforcement learning. In this entry, I want to share my first patch: a multi-armed bandit problem simulator. As I mentioned before, this type of problem is akin to having the opportunity to choose between many different slot machines, each with different payout distributions, with the goal of maximizing your overall payout. If you try machines at random (explore), you risk wasting time on low-paying machines; if you stick with just one or two (exploit), you risk missing out on higher-paying machines. An optimal strategy finds a balance between these two extremes.

The second chapter of the Sutton/Barto describes a few different algorithms for balancing between exploration and exploitation. It’s useful to compare different algorithms to see how well they do under different conditions. Accordingly, I wanted to build a patch that would allow me to easily switch between different algorithms. I also wanted to be able to easily run algorithms multiple times to calculate, evaluate, and compare averages over many iterations, rather than just single runs.

I started by designing the bandit generator, using the [randdist] object from the CNMAT externals package as the core. The [randdist] object allows you to specify a distribution type and variance (a.k.a. sigma, or the width of the distribution); I add an offset so that the mean for each machine is slightly different as well. The screenshot below shows the subpatch, which takes the number of bandits to be generated as input and stores the bandits (as distinct distributions defined by type, variance, and mean) in a coll object (with indices counting from zero).

Next, I built the simulation portion of the patch, which determines how many turns will be taken per simulation—how many “pulls” of the machines in total—and how many simulations. Multiple simulations are important for obtaining averages characteristic to each algorithm (plus the examples in the Sutton/Barto use multiple runs and I wanted to be able to compare my simulations with theirs). This part is quite simple—essentially one [uzi] object nested in another to keep track of the number of turns in each simulation.

After that, I built the bandit-picking portion of the patch. This involves three sub-sections: (1) defining the algorithms (and choosing one); (2) choosing a bandit and determining the reward; and (3) revising the estimate for the chosen bandit by updating the average based on the latest reward.

For the first part, I ended up implementing five different algorithms: (1) uniform; (2) greedy; (3) epsilon-greedy; (4) epsilon-first-greedy; and (5) upper confidence bound. Uniform, or pure exploration, is random throughout and so was simple to implement with a single [random] object. The greedy method, in its “pure” form, requires that the user always choose the bandit with the highest average. However, depending on the starting conditions, this could result in a single bandit being chosen for the entire simulation (e.g. if the starting “estimate” for each is zero, the first bandit pulled will give a reward greater than zero, and from here it is likely that no other bandit will ever be tested). Therefore, I chose to pull each lever once before imposing the greedy rule.

For epsilon-greedy and epsilon-first-greedy, I allow the user to choose the value for epsilon. (In my trials, I used e = 0.01.) Epsilon (e) refers to the proportion of time spent exploring at random (the proportion 1-e he gives the proportion spent being “greedy”). The difference between these two approaches is that in typical e-greedy, the exploration is randomly interspersed throughout the simulation; in e-first-greedy, the exploration occurs only at the beginning. In other words, in the former case in a simulation of 10,000 pulls with e = 0.01, 100 at random will be exploration; in the latter case, the first 100 pulls will always be exploration. It’s the same proportion, but the latter should perform slightly better because you reach your best estimates sooner and therefore have more time to exploit the best-paying machine.

The most challenging algorithm to implement was the upper confidence bound (UCB) algorithm, described in section 2.7 of the Sutton/Barto. This algorithm takes into account the number of times a given machine’s lever has been pulled, and assigns greater “confidence” to estimates with more pulls. As Sutton and Barto write, it is “better to select among the non-greedy actions according to their potential for actually being optimal” (35). A lever with fewer pulls has greater potential for a change in the average, and therefore warrants more exploration than a lever that has already been pulled many times. Just as with the greedy algorithm, I started by pulling each lever once (this is especially important here to avoid a zero in the denominator of the formula). Then I calculated the UCB algorithm for each machine on each turn, and chose the machine that optimized the algorithm (i.e. yielded the highest value).

Once a bandit was chosen, the reward was determined by loading the bandit’s properties (distribution, mean, and sigma) from the appropriate [coll] into the [randdist] object. This reward, along with the bandit index number, was then passed into the averaging formula described in the previous post to update the average estimate for the last bandit based on the most recent reward.

From here, it is possible to collect many different kinds of information about the behavior of the system. In the next post, I will discuss the output section of the patch and compare the performance of the different algorithms according to a variety of measures.

Download patch

Reinforcement Learning in Max, Part 1

This is the first entry in a series of posts dedicated to implementing reinforcement leaning algorithms in Max. Reinforcement learning is an area of machine learning I have not previously explored, so these posts will serve as both tutorials and as a way of tracking my own progress.

The idea of using a reinforcement learning strategy for a music-related machine learning project was first suggested to me by a colleague, Daniel Fox. Since it was a new area for me, I started out by doing some research. I found this library for Max, but I ultimately decided to implement the algorithms directly myself. A few especially helpful resources when getting started included:

I started out by looking at the first couple of chapters of the Sutton/Barto. I decided to begin by building a patch that could simulate a multi-armed bandit problem, since that is the problem that begins the book. A multi-armed bandit problem is a category of problem in which one is presented with a number of different options, each of which is associated with some kind of reward. The user (“the agent”) attempts to maximize the rewards gained by employing various strategies.

At the core of these problems is a tradeoff between exploration (trying unknown options) and exploitation (repeatedly making the same choice when a high reward is received). The term “multi-armed bandit” refers to a hypothetical instance of the problem in which one chooses between many different slot machines, each with different payout (reward) distributions. In this metaphor, if the agent explores too little, they may miss out on the highest-paying machine; if they explore too much, they lower their overall reward by wasting time on the low-paying machines.

As I began to read up on this problem, it became clear that I would have to lay groundwork in three major areas: (1) conceptual framing and goals; (2) mathematics; and (3) technical implementation in Max. Of the three, working out the math (and brushing off my calculus) was definitely the most difficult.

At the core of the algorithms described in the second chapter of the Sutton/Barto was the need to keep a running tally of actions taken and rewards received. This is critical for maintaining a running average for each machine, which allows the agent to determine which machine has the best payouts at each moment.

Yet as the authors point out, over the long run this approach is not computationally efficient. Instead, they introduce a formula for incrementally calculating a running average (see section 2.4), where the average is continually adjusted with each new reward, and previous rewards do not need to be stored. I found their explanation a little confusing at first, but this blog post helped clarify the formula for me. My implementation is saved as inc_avg_demo.maxpat in the download bundle.

My patch allows you to compare the incremental calculation with the results from the [mean] object. While similar in function, the [mean] object is not optimal for this application because I have to calculate an average for each of an arbitrary number of bandits. This would require an arbitrary number of [mean] objects or perpetual storage of all previous values to swap in and out (a data set that increases without limit is precisely the kind of thing we want to avoid!). My implementation can be used for multiple bandits with finite data/computation by saving the running average for each, labeling the bandit number, and using [histo], [counter] or similar to keep track of how often a given bandit has been used.

A second issue that arose was the need for random number generation following a particular distribution. While not strictly necessary, using a Gaussian (rather than uniform) distribution allowed me to more closely compare my results with those in the book. I used the [randdist] object from the CNMAT externals package (download via the Package Manager in Max).

The third issue—and the last I’ll discuss in this post—to arise in these early stages was more of a technical concern than a mathematical one. I needed to be able to receive a stream of separated data (of arbitrary length) and output it as a single list. There’s not a Max object that precisely does this: zl.group requires a number to be specified in advance, and zl.reg needs everything at once. (Incidentally, the bach.collect object does this quite nicely, but I didn’t want to invoke bach for a single operation.) I ended up creating an abstraction I call da.can (like a can, you fill it up and then dump it out) that assembles anything it receives into a list, outputs with bang, and resets with a clear message. It’s part of the collection I store here.

In my next post, I’ll share my multi-armed bandit simulator patch, which allows you to test and compare different algorithms for maximizing rewards.

Download patches

Max Tutorial #12: Finally, Polyphony!

In this tutorial, we’ll finally build our first genuinely polyphonic synthesizer in Max! This tutorial will draw on techniques developed in several of the previous tutorials—particularly Tutorial #11. The first thing to mention is that in order to play a polyphonic synthesizer, you need to be able to press more than one key at once. Therefore, this tutorial really requires the use of a MIDI controller to fully appreciate what we’re doing, since we can only click on one key at a time with the cursor.

As before, we’ll begin with a [kslider] taking input via MIDI from a [notein] object. We’ll also use [attrui] to set the [kslider] object’s display mode to “polyphonic.” We send the output of the [kslider] to a new object called [poly], to which we pass the arguments “8” and “1.” The [poly] object is one of several ways of managing polyphony in Max. It takes MIDI note messages and assigns them to a voice by outputting a voice number from the left outlet. When creating a [poly] object, the first argument specifies the maximum number of voices, and the second (optional) argument indicates whether excess notes (beyond the number of voices specified) should be ignored (“0” or no argument) or if they should “steal” voices (“1”). (If you prefer, you can omit the [kslider] and [attrui] and connect the left and center outlets of [notein] to the left and right inlets of [poly] directly.)

Instead of passing pitch and velocity data separately as we have done previously, we will bundle these values together with the voice number to making polyphonic routing easier. To bundle data into a list format, we can use an object called [pack]. The arguments for [pack] indicate the number of values and data types expected. In this case, we use “0 0 0” as placeholders to indicate three integers: the voice number, the pitch, and the velocity. If we test the output using a message box, when we press a key we can see that the pitch (second number) and velocity (third number) are prefixed by the voice number, which advances each time.

We can use the voice number prefix to route the pitch and velocity information to the appropriate synthesizer voice. The routing object is called [route], with arguments for each expected prefix. With eight voices specified for [poly], we will expect eight different prefixes, corresponding to the numbers 1-8. Once the data has been routed to the appropriate voice (via the outlet corresponding to a prefix), we can unpack the pitch and velocity data using [unpack 0 0]. Once these values are unpacked, we can connect them to our usual synthesizer objects as before.

If we test the synth at this stage, we note that we only hear sound every few notes. The reason is that [poly] routes notes through all eight voices, but we only have an actual synth voice connected to the outlet for the first voice. Consequently, we have to connect synth voices to each outlet of [route]. This is done much more easily if we encapsulate the synth voice. We’ll name it [p synth_voice] and then copy, paste, and connect each to each the [route] object’s outlets. (Note that [route], like [sel] and other objects, has an extra outlet on the right for input values that don’t match any of the arguments. We can ignore this since our number of voices matches our number of arguments.)

Once everything is connected, if we play chords we can finally hear all of the notes at once! However, the sound still has a very rough shape: it starts and stops immediately, producing a choppy, unmusical sound. We’ll finish off this synth by adding a simple attack-release envelope. First, we have to modify the inside of each synth voice. We can do this by modifying one, and then copy and pasting. To open the first synth voice, lock the patch and double-click on the first [p one_synth] subpatch. We don’t have to make any changes to the pitch chain (on the left); we’re just modifying how loudness changes over time, so we’re concerned with the velocity chain (on the right).

First, we’ll add a [routepass] object with an argument of “0.” The [routepass] object is very similar to [route], except that [route] removes the prefix as it passes a message through, whereas [routepass] keeps the prefix as part of the message. This object allows us to distinguish between note off messages (0) and note on messages (any non-zero value). Note off messages will pass through the left inlet, and note on messages will pass through the right. In order to create smooth attack and release ramps, we’ll use the [line~] object as in previous tutorials. The [line~] object takes two arguments to form a ramp: a destination value, and an amount of time to reach that destination. Therefore, we’ll need to pack two values together using [pack 0. 0].

The first value in each [pack] object is the destination (a decimal between 0 and 1), and the second is the duration to reach that destination, corresponding to our attack or release time. We’ll use the [s] and [r] objects discussed in the previous tutorial to circulate attack and release times through all of the voices of the synthesizer. (We’ll call them “attack” and “release.”) Once we’ve made the necessary changes in one voice, we’ll select that subpatch, copy it, and then select all of the other subpatches and click Edit -> Paste Replace. This is a quick way to replace many subpatches from a single template.

The final step is to build an interface for setting the attack and release times. We’ll use integer boxes connected to [s] objects. Now if we change these values, we can hear that the changes are automatically and immediately applied to all voices in the synthesizer, giving us a uniform sound. And just like any other polyphonic synth, not only can we hear all of the notes of a given chord, but with a long release time we can hear an overlap between notes played in sequence.

Max Tutorial #11: Modular Synthesis

This tutorial builds on Tutorial #8, in which we first used the keyboard as a synthesizer interface. (If you haven’t watched it yet, I encourage you to check it out first.) We’ll continue to improve our synthesizer by dividing it up into its component parts, or modules. This is the first step towards what is more commonly known as “modular” synthesis, where, instead of thinking of a synthesizer as a single, one-way chain of audio from source to output, we think of the different parts of the synthesizer as building blocks that be arranged and connected in many different ways. We’ll also start to use MIDI velocity values to control the volume of our synth.

We’ll start off by using the [notein] and [kslider] objects as before. Plug in your MIDI controller if desired. This time, instead of only using the pitch output of [notein], we’ll also use the velocity output (middle outlet). As you’ll recall, MIDI velocity is often used to control volume. So we’ll have a modular system that is also velocity-sensitive: depending on how hard you press the keys, the volume will be louder or softer. (As before, if you do not have a MIDI keyboard, you can click directly on the [kslider] object. Clicking towards the top of each key results in a higher velocity value, and towards the bottom gives a lower velocity value.)

If you’re clicking on [kslider] with the cursor, you’ll notice that each click activates that key, but we don’t have a way of explicitly turning off notes (i.e. sending a note off message). We can enable this functionality by changing the [kslider] object’s display mode. In the past, we have changed attributes like this by using the Inspector. However, we can also use an object called [attrui] to change an object’s attributes. Simply connect [attrui] to [kslider], and the [attrui] menu will populate with the same attributes that appear in the Inspector. Lock the patch to browse the menu. We’ll switch the Display mode from “monophonic” to “touchscreen,” so that now when we unclick a key it send out a note off message (velocity = 0), which will stop the note.

Now we can proceed with our synthesizer, using the same objects we’ve used before. However, this time we’ll multiply the output of our oscillator [saw~] by the velocity value so as to control its volume. Just as pitch is assigned to a range of 0-127 in the world of MIDI, velocity also ranges from 0-127 (where note off = 0). Next, just as we have to convert MIDI pitch into frequency recognizable by digital audio objects (such as [saw~]), we have to convert MIDI velocity into an amplitude value in the range from 0 to 1. Therefore, we divide by 127: a value of 0/127 gives us 0 (no sound), and a value of 127/127 gives us 1 (maximum volume). We’ll use the [/] (division) object with an argument of “127.” (we need the “.” to ensure that the output is a decimal, since all values will be between 0 and 1).

Now that we have velocity integrated into our synthesizer, we can start to modularize it. One important set of tools for working in modular fashion in Max is the [send~] and [receive~] pair of objects. These allow you to take any audio signal (output of any object ending in “~”) and wirelessly route it to any other with the same argument. In this case, we create objects with the matching argument “synth_sound” and split our audio signal between the output of the synthesizer and the [gain~] object. Now we can move different parts of our synth around in the patch, but they remain connected. (It also helps us keep our patch tidy!)

We can also send control data around the patch wirelessly using the control-rate equivalents [send] and [receive]. We can use it to send our pitch values (we’ll use the argument “pitch”) as well as our velocity values (“velocity”). The argument names I’ve chosen are rather generic—you can name your [send] and [receive] pairs (almost) anything you like, as long as it does not contain spaces or any restricted characters. We can also use the shortcuts [s] and [r] instead of typing out the whole words “send” and “receive” (just like using [t] for the [trigger] object).

At this point, you’ll probably notice that we have three completely separate elements in our patch: the keyboard, the synthesizer, and the output. They are connected wirelessly, which makes our patch look cleaner, and also allows us to rearrange things as we choose. Now that we’ve divided our patch by function, we can start to think about it from a modular perspective, and encapsulate specific parts of the patch into subpatches. Usually this is useful for groups of objects that aren’t part of the user-facing interface.

Since we need to keyboard to enter notes, and we need the output module to control the volume, let’s encapsulate the internal synthesizer bits ([saw~], [*~], etc.). Simply select the objects you’d like to encapsulate and then click Edit -> Encapsulate (or command+shift+E). You’ll see a new object simply called [p]. It is customary to name subpatches to describe their contents. We’ll call ours [p synth_voice]. Make sure to follow the same naming conventions as for the [send]/[receive] objects (no spaces), and be sure not to get rid of the “p”! The [saw~], [*~], and [receive] objects are no longer visible, but if we test our synth, it still works! Lock the patch and double-click on [p synth_voice] to view its contents. In the next tutorial, we’ll use these techniques to build a polyphonic synth!

Music with Markov Chains

This is a quick tutorial showing how to make music using Markov chains in Python. (If you’re not familiar with Markov chains, this page gives a great, interactive introduction.) The general idea is that we can input melodies of a certain style, and our model will output similar melodies that reflect that style. We’ll use a sequence of integers representing notes in a melody to train our model (i.e. build a transition table), and then compare the output.

First, start up Python and import the following libraries:

import numpy as np
import pandas as pd
import random

Next, we need melodies to use as input to train our model. You can compose them yourself, transcribe them, or use a dedicated toolkit like music21 to extract them from digital score files. Here I’ll just use the first few notes of a couple of familiar melodies. The integers are MIDI note numbers (transposed to C major):

brother_john = [60, 62, 64, 60, 60, 62, 64, 60]
little_lamb = [64, 62, 60, 62, 64, 64, 64]

We want to keep the melodies in separate lists to avoid introducing “false” patterns that never actually occur in the melody between the last note of one melody and the first note of the next. For example, the last note of the first melody is 60 and the first note of the second melody is 64. But the pattern 60 64 never occurs in either melody, so we keep them separate!

A Markov chain can be represented through what’s known as a transition table. The transition table specifies how likely we are to move from one state to another. This function will build our transition table:

def make_table(allSeq):
 n = max([ max(s) for s in allSeq ]) + 1
 arr = np.zeros((n,n), dtype=int)
 for s in allSeq:
  for i,j in zip(seq[1:],seq[:-1]):
   ind = (i,j)
   arr[ind] += 1
 return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')

Then we call the function with whatever melodies we’d like to include as items in a list (any number of melodies). This builds the transition table, essentially “training” the model:

transitions = make_table([brother_john, little_lamb])

The next step is to generate a new sequence based on the table. So we need a new function:

def make_chain(t_m, start_term, n): # trans_table, start_state, num_steps
 chain = [start_term]
 for i in range(n-1):
  chain.append(get_next_term(t_m[chain[-1]]))
 return chain

Inside of which we use the following nested function for each step:

def get_next_term(t_s):
 return random.choices(t_s.index, t_s)[0]

And finally, we’re ready to create our chain by calling the function using three arguments (transition table name, starting value, and length of sequence):

make_chain(transitions, 60, 10)

And we get something like this:

>>> [60, 60, 62, 64, 60, 62, 64, 60, 60, 60]

Try it a few times to see what kind of results you get. Switch up the starting value and train it on more melodies! Have fun!

Experimental Genre Associations

This post summarizes one part of a larger digital humanities project on the use of the term "experimental" to describe music. For more on this project--including the data and code--see my Github.

The term “experimental” is used in discussions of musical genre in two contradictory ways: (1) to describe music that does not fit into any existing category, and (2) as a qualifier to describe music that occupies an aesthetically marginal position within a category (similar to the term “avant-garde”). To better understand the latter usage, I designed this quantitative project to identify the genre associations of musicians considered to be “experimental” using data from Wikipedia. I found that experimental musicians were most likely to also be categorized as rock music.

Because “experimental” is not consistently recognized as a genre in and of itself, instead of using genre labels I used a list of 184 experimental musicians from this page.

I used BeautifulSoup to parse the list and obtain the web addresses for each musician’s Wikipedia page. By scraping each musician’s Wikipedia page, I generated a list of 554 genre labels comprising 159 unique entries. I removed 93 results of “None,” in addition to one entry that was an editorial indication rather than a genre (“Edit section: Genres”), resulting in a dataset of 460 labels, of which 157 were unique.

As a final step, I consolidated the 157 unique labels into a handful of larger genre categories using substring matching. I began with well-established genre labels including hip hop/rap, pop, classical, rock, jazz, electronic, and dance. For hip hop/rap I combined the results of the substrings “hip hop” and “rap”; for electronic I used the substring “electro” (rather than “electronic”) so as to encompass words like “electroacoustic.” Next, I analyzed the list to determine if other terms were especially prevalent, and therefore warranted consolidation so as to be compared with the larger categories. I found that the terms metal, punk, and industrial were especially prevalent, and added these as well for a total of ten categories for comparison.

Musicians considered to be experimental were most likely to be associated with rock music subgenres by a wide margin, followed by electronic, pop, and metal. Remarkably, the number of rock-related labels (88) exceeded even the number of instances of the label “experimental” (70).

Prevalance of Genre Labels

Of course, it bears mentioning that the sample size for this project is extremely small, and was drawn from a list that was generated manually, rather than automatically. Consequently, the list may be especially vulnerable to systemic biases of Wikipedia editors. Nevertheless, this brief study is a starting point for better understanding the application of an especially ambiguous term.

Max Tutorial #10: Randomness as Control

In the previous tutorial, we used [random] to generate a random stream of notes. As you’ll recall, the possible outcomes were a function of two elements working together: a range determined by an integer into the right inlet of [random], and an offset determined by the [+] object. Because the random numbers represent the actual pitches (via MIDI note numbers) of the output, we can say that we are using randomness to generate musical material. Consequently, we would characterize this approach as “generative.”

Another way of using randomness is to “control” other parts of the patch. Instead of generating material, we use randomness to point to specific values that are already stored in other parts of the patch. One of the ways that we have previously stored values is in the sequencer patch, using the [multislider] object. We’ll start off the video by recreating the simple sequencer that we built in an earlier tutorial: using [counter] to control an eight-step sequencer where the pitches are determined by the values of [multislider]. Note that although this patch is technically simple and very similar to previous tutorials, the switch to thinking in terms of “control” functions involves some conceptual explanation (hence the length of this write-up).

We’ll recall that [counter 1 8] will always count up from 1 to 8 and then loop back to 1 again. This pattern ensures that we always move through the sequencer in exactly the same way, from step 1 to step 8. By using randomness, however, we can jump between steps unpredictably, meaning that we play the same notes as before, but in a constantly changing order. We’ll start off with a [random 8] object in place of the [counter]. However, since [random 8] gives us values between 0 and 7, we need to add a [+ 1] offset for the correct range of 1 to 8 (this is the range that [multislider] is expecting).

If we start up [metro] again, we hear a scrambled version of the original pattern that is constantly changing. Again, the notes are the same, but the order is different. We can take things a step further by customizing the “kind” of randomness we want. For example, when we think of something that sounds random, we often think of something that doesn’t repeat. The [random] object has no constraints on repetition: each random value is generated independently of the previous value, so it is perfectly possible to produce the same number twice (or, rarely, even more times) in a row. In musical terms, this means we hear the same note multiple times in a row—something we hear several times in this example.

If we want to specify random number generation without repetition, we can use a different object called [urn]. The [urn] object is similar to the [random] object, except that once it outputs a value it never outputs that value again. For [urn 8], that means that we’ll hear exactly 8 notes, and then silence. In order to continue to hear more notes, we must reset the object. Luckily, [urn] is built to make this easy.

When [urn] has gone through all of its notes, it sends a bang out its right outlet. This is a signal that [urn] needs to be reset or there will be no more output. To reset [urn], we have to send a message comprising the word “clear” to the left inlet. Therefore, to smoothly reset [urn] each time it runs through its entire range of numbers, we want to connect the right outlet with a clear message sent to the left inlet. This is one of the few exceptions to the rule that patch cables should never go “up” on screen. Here, the output of the [urn] object is actually feeding back into the input (albeit in a very specific way so as not to produce a feedback loop).

In making this feedback connection, we actually have to do two things: we have to reset the object with “clear,” but then we also have to send a bang if we want the rhythm to remain consistent. If we don’t send a bang here, we will have a silent beat each time we reset the object, since [urn] sends a bang out of the right outlet only after it has finished outputting all of its numbers from the left.

This is also a moment where the order of operations is very important. In the space of a single step of the sequencer, we must first clear the object, and then send the bang through. If the order were reversed, we would create a feedback loop in which the bang out the right outlet was continually passed into the left inlet to create a new bang out the right outlet, never actually resetting the object. For this reason, we use the trigger object [t] to force the order.

Recall that the [t] object executes from right to left: whatever we want to happen first must be to the right, and whatever last must be to the left. Therefore, we want our clear message on the right, and our bang on the left. One handy thing about the trigger object is that, in addition to passing data through (by using letters like “i” and “b”), the [t] object can also send one-word messages, like “clear.” Therefore, instead of connecting [t] to a separate message box, we can actually just type the word “clear” into [t] as shown.

The final object should be [t b clear], with both outlets connecting to the left inlet of [urn]. This ensures that each time [urn] runs through its range, it is automatically reset and a new sequence of eight random values begins again. Note that these values refer to the steps of the sequencer, meaning that they are “controlling” the sequencer, rather than generating musical material such as pitches. This will also prevent (almost) any possibility of repetition amongst the notes.

Max Tutorial #9: Making Music with Randomness

This tutorial is a brief introduction to making music using randomness. First of all, it cannot be overstated that there are many different ways to use randomness to make music, and each method can yield incredibly different results. Randomness can be used to create a sense of chaos and unpredictability when applied in a certain way, but it can also be used to recreate the feel of human presence when employed as a balance against rhythmic quantization.

We’ll start with the [random] object, which is the main random number generator in Max. It has two inlets: on the right, it takes an integer to set the range, and on the left, it takes a bang to output a random number within that range. The range set by the integer has one important caveat: [random] begins counting from 0, not 1! This means that if we set the range to 10, as in the video, the possible numbers are 0 through 9, not 1 through 10. Likewise, by setting the range to 12 (as in the 12 notes of the chromatic scale), we get the numbers 0 through 11, and never 12 (though as we are counting from 0, there are still 12 possible results.)

This is a little counter-intuitive at first, but it’s not actually a limitation. By using what’s called an “offset” we can generate random numbers in any range we like. We establish an offset by using the addition object, [+]. The [+] object adds two numbers together. The number that triggers the calculation goes in the left inlet, which in this case is our random number. It is added to another number that should already be loaded in, either by typing after the “+” (as I have done, preceded by a space), or by sending an integer into the right inlet. (Note that even if you have typed a number into the object itself (like “60”), if you later send a number into the right inlet, the new number will replace 60 in the calculation even though 60 doesn’t disappear.)

In the video, the range of [random] is set to 12 with an offset of [+ 60]. This means that the random numbers will be in the range of (0+60) to (11+60), or 60 to 71. In musical terms, this corresponds to the MIDI note numbers in the octave above middle C (MIDI note number 60). For all of the notes in the first two octaves above middle C, we can simply double the range of [random] to 24 without making any changes to the offset. By plugging this simple combination of objects into the synthesizer we have already built in previous tutorials, we can make a simple random pitch generator. Finally, we can add a [metro] to generate random numbers automatically instead of manually clicking on the button each time.

As an aside, you may have noticed that as the patches get more complicated, I have started paying more attention to the layout of the objects on screen. In general, it is good practice to design patches that are easy to read, and that depict the flow of information as clearly as possible. Take these three principles into consideration as you move forward:

1. Straighten patch cables by clicking on them and pressing command+Y. You can continue to drag cables around after straightening them, but straight lines are generally visually easier to follow than curved ones. In general, short cables are easier to follow than long cables (though sometimes long cables are unavoidable).

2. Connect objects in sequence from top to bottom. When patching, think about how information flows: it’s usually from the outlets (on the bottom) of one object to the inlets (on the top) of another object. This means that for the cleanest layout, objects that receive information should be placed below the objects sending information. If we reflect on the patch in the video, we can see a clear cause-and-effect relationship from top to bottom: (1) the toggle turns on the metronome, (2) the metronome outputs a bang, (3) the bang generates a random number, (4) the random number has an offset applied, (5) the random number triggers an envelope and changes the frequency of the oscillator, (6) the oscillator and envelope are multiplied together, and (7) the resulting sound is sent to the speakers.

3. Finally, also consider expanding objects horizontally to make your patch more legibile by shortening cable length. When the patch is unlocked, you can drag the corners of any object to make it wider or narrower. In the patch in the video, because the [function] object is large, I prefer it to be separated off to the right from other elements. By dragging the [t i b] object to be wider, I ensure that the patch cable going to the function object is straight and as short as possible. I expand the [*~] object below in complementary fashion below as well.