Building a Neural Net

In the previous post, I described the process by which I reached the question I’d like to use an artificial neural network to answer: whether a given major triad in pitch space (i.e. any voicing) is a C major triad. In this post, I’ll describe the implementation.

As I mentioned in the previous post, I used this tutorial as the basis for creating my own neural network. In this example, the input layer has three nodes. Since I will be working in the pitch space defined by the MIDI note numbers 0-127, I need 128 nodes in my input layer. Each of these nodes has a value of 1 if that note is contained within the chord, and 0 if not. This is an example of what’s known as one-hot encoding. My output consists of a single node, which will approach 1 if the input is recognized as a C major triad and 0 if the input is not a C major triad.

While it’s easy to generate examples when each sample has only three nodes, with 128 nodes I needed a way to automate example generation. I wrote the following function to generate a training set of arbitrary size that would contain only major triads, but with a variety of voicings, inversions, and pitch cardinalities (using the numpy library):

def make_training_set(set_size):
 arr = np.zeros((set_size,128))
 for row in arr:
  transpose = random.randint(1,12)
  for i in range(len(row)):
  if (i + transpose) % 12 == 0:
   if random.randint(1,4) == 1:
    row[i] = 1
  if (i + transpose) % 12 == 4:
   if random.randint(1,4) == 1:
    row[i] = 1
  if (i + transpose) % 12 == 7:
   if random.randint(1,4) == 1:
    row[i] = 1
 return arr

Each example in the training set must be accompanied by the correct answer so that the model can “learn” by shifting weights and minimizing the error. This is also easily automatable, along the same lines as described in the previous post. This function takes the training set as input and generates a list of outputs in the same format used in the tutorial (i.e. a numpy array comprising a single column of 0s and 1s):

def make_output_list(training_list):
 temp_list = []
 for each_row in training_list:
  possible_triad = []
  for note in range(len(each_row)):
   if each_row[note] == 1:
    possible_triad.append(note%12) # append pitch class
  possible_triad = list(set(possible_triad)) # remove duplicates
  possible_triad.sort()
  if possible_triad == [0,4,7]: # if c major...
   temp_list.append(1) # ...output 1
  else:
   temp_list.append(0) # if not, output 0
 temp_list = [temp_list[i:i+1] for i in range(len(temp_list))]
 final_output = np.empty((0,1), int)
 for item in temp_list:
  final_output = np.append(final_output,np.array([item]),axis=0)
 return final_output

The code in the tutorial for the neural network itself is highly generalizable, and only needed one tweak in the __init__ function to be adapted to the new input layer size. This line, which initializes the weights at 0.50 for each node:

self.weights = np.array([[.50], [.50], [.50]])

Becomes this:

self.weights = np.array([[i] for i in np.full((1,128), 0.50)[0]])

Again, because we are working with much larger samples (128 nodes vs. 3 nodes in the tutorial), it makes sense to automate the array of weights rather than specifying them manually. It’s worth pointing out that in many cases, weights are initialized randomly, rather than all at a particular value. I haven’t tested whether this makes a difference in the performance of the ANN, but it might be worth exploring in the future.

I decided to organize the code as a script that would ask for the training set size from the user and then automatically generate the training set and train the model. The basic structure is given as follows (import statements and definition of the NeuralNetwork class are omitted):

print('Enter size of desired training set: ')
training_set_size = input()

inputs = make_training_set(int(training_set_size))
outputs = make_output_list(inputs)

NN = NeuralNetwork(inputs, outputs)
NN.train()

The last thing I added to the script was a function that would make generating examples for the prediction (testing) phase easier. While one-hot encoded data is highly machine-readable, it is not very human-readable. Therefore, I created a function that allows a user to input the MIDI note numbers, and then automatically converts these into one-hot format when the example is passed to the neural network as input:

def create_example(note_list):
 ex_out = np.zeros((1,128))
 for midi_nn in note_list:
  ex_out[0][midi_nn] = 1
 return ex_out

When you run the script cmaj_or_not.py (in the enclosed ZIP file), you will first be prompted to set the size of the desired training set. I started out with a training set of 1000 examples—a relatively small set as far as machine learning goes. Depending on the size of the set and your computer setup, running the script may take a few seconds to a few minutes.

Once it’s complete, you can generate examples to test using the neural net. Some examples are provided in the accompanying example_usage.py document. For instance, let’s try a closely spaced C major chord with the root on middle C (don’t forget to call the “create_example” function to correctly encode your input):

example_1 = create_example([60, 64, 67])

Then use the predict() method to see whether the neural net recognizes this example as a C major chord or not (remember, values closer to 1 mean more likely C major; values closer to 0 mean more likely not):

NN.predict(example_1)

We get a prediction value of above 0.99—very close to 1, and therefore a very confident prediction that this chord is C major.

Now let’s try a chord that’s definitely not a triad of any kind—in fact, it’s a dissonant cluster:

example_2 = create_example([59, 36, 49])

NN.predict(example_2)

Our prediction value is 0.00008927—a value extremely close to zero—indicating confidence that this example is not a C major triad.

A third example, a D major triad, also produces a low prediction value of 0.000178:

example_3 = create_example([62, 66, 69])

NN.predict(example_3)

In other words, it appears that our neural net basically works. The next step will be optimizing various parameters, including the size of the training set and the number of epochs of training. We can also track the error over time to see how quickly the model learns. Until the next post, enjoy exploring the code below:

Download the code bundle here.