Early adventures in sonification: creating MIDI files from data

 

Turning data into sound

This is all Jamie Whyte’s fault. He published some R code which takes some input data and turns it into a sound file. He also demonstrated this at a webinar for Open Data Manchester which I attended. The session was recorded and is available to watch at your leisure.

Jamie’s script was based on a dataset of average global temperatures per year from 1880. As a plot it looks like this:

Average global temperatures per year

Average global temperatures per year

Jamie’s script takes each y value and then assigns it a frequency based on some simple arithmetic. He then creates an audio file of these frequencies which sounds like this:

That maps very clearly onto the chart. Though it is not exactly beautiful. And Jamie said he was interested in turning this into music. It’s not very musical.

Lost in music

Western music doesn’t use the full range of frequencies, instead we have a series of distinct frequencies with gaps between them (notes).

So the first thing to do with the data to make it more musical was to force the data into a set of notes. First of all I spread the data out along two octaves of the musical scale and assigned each value a note based on the nearest semi-tone (the smallest steps in western music). That sounds like this:

A little more musical but not THAT musical.

Because, actually in western music we don’t use all of the semitones at the same time. Instead we select a subset of semitones to form a musical scale.

Using basically the same script we can force the data to the nearest note in the C Major scale over 2 octaves.

That sounds like this:

Much more tuneful. Though not, exactly, musical.

Now for the chorus

Forcing the data onto a scale makes it more tuneful but the actual sound is a bit brutal. Jamie’s script used a fairly simple sine wave to generate the sound. We could play about with that code to add different complexities to the sound (like playing about with synthesisers in the good old days). That might be fun but there should be a better way.

And there is.

MIDI is a way of telling (electronic) musical instruments when to play and which notes to play without encoding the sound directly. This means that a computer (for example) can take a midi file and have any instrument play the notes.

MIDI files are a bit of a pig to work with though and far beyond my skills.

Luckily John Walker has been here way ahead of me. He has produced a couple of utilities that allow you to convert MIDI files to CSV and back again.

Working with CSV files is very much more my speed. So I took the version above and output it as a MIDICSV file. The nice utility converted it to a MIDI file and GarageBand on my MacBook made it sound like this:

The opportunities to play with this are quite large. Before I get on with this I wanted to write up where I’ve got to.

What’s next

One thing that we can do with this approach is to bring in multiple tracks. We could play a drum every ten years to give the listener an idea of time. The data being sonified is actually smoothed. We could add another track playing the raw data, maybe a piccolo could play that…

Obviously other datasets with several variables could be represented with one track per variable.

That might sound a bit like a cacophony and that brings me to the area I am least confident with. Forcing the data onto, for example, a C Major Scale makes it sound pleasant and is quite accessible. But scales are non-linear (some of the jumps in notes are 2 semitones, others are 1 semitone. So are we fairly representing the change in the Y-values this way?

I’m also interested in playing about with music in slightly different ways. Musical notes can be thought of as a set of ratios and using music to represent the rations between different y values might be another interesting area to explore.

Anyway for now Jamie’s script (which also contains some VERY clever use of text to speech which I have completely overlooked) is here github.com/northernjamie/climate-sonification

My fork of this and script to force the data onto musical scales and then output MIDICSV is here github.com/likeaword/climate-sonification

I imagine I will play about with this stuff some more. If this inspires you to join in please do let me (and Jamie) know what you get up to.