Untitled

If a layperson is asked what advanced sound analysis consists of, they would probably mention graphs that go up and down in a crazy manner.  This surprisingly insightful thought begs the question, why should we want to “see” a sound rather than “hear” a picture?  The answer probably lies in humans extraordinary abilities in vision – there are many more fibers in our optic nerves than, say, our auditory nerves, which transmit the sensory information to the brain.  One can take in, at just a glance, a visual representation of several seconds of complex sound, or even a full sheet of music.
	But, what does “seeing” a sound actually mean.  There are people who suffer from, or are blessed with, a condition called “synaesthesia”, in which they literally see sounds or hear colors or taste melodies.  These people’s sensory pathways seem to have been crossed, and they receive an auditory response to a particular shade of purple or a taste in their mouth from a trumpet solo.  Who knows if this is a blessing or a curse?
	The reason scientists represent sounds graphically, though, is because our vision is remarkably good at recognizing patterns and other facets of analysis, like identifying a vague outline of a lion within a jungle.  Sound is composed of pressure waves moving through some sort of medium.  Our ears, or a microphone, detect the rise and fall of the pressure as these waves pass by.  Sounds, then, are graphed using an oscilloscope, which represents louder sounds as waves with larger amplitudes, and sounds with higher pitch as waves closer together – waves with larger frequency and smaller wavelength.  A sample oscilloscope reading looks like this:

This sort of graph represents a “pure” tone, or a single pitch, and is called a sinusoid wave.  However, sounds such as voices, or musical instruments are made up of many pure tones and are much more complex, giving oscilloscope readings such as:

Fortunately, for most complex sounds, the waves are repetitive, or periodic.  For periodic waves, there is an excellent way of representing these sounds, compressing all of the information into one graph.  Invented by the French mathematician Joseph Fourier, the process takes a complex periodic sinusoid wave, and represents it by the frequency of each underlying pure tone.  For a complete representation of a sound, then, one only needs to graph how much of each frequency there is in a sound.  This Fourier synthesis of a sound looks like:

This complex sound is made up of a series of pure tones, which are all related to each other by some ratio.  For instance, if one tone is considered the base frequency, every other tone is considered a multiple of that base frequency.  In this sense, another tone which is twice the frequency has been labeled the 2nd harmonic.  Essentially, Fourier uses properties of music, which distinguish a musical sound from simple noise, and exploits the relationships between various elements in a musical sound to more directly represent what needs to be represented when analyzing the musical sound.
	The full description of a sound includes how it changes with respect to time.  To include this aspect, the Fourier synthesis of a sound is sampled at some interval of time, and the frequencies of the tones in the sound are made vertical, and put end to end.  Then, the strength of the sound at each frequency is represented by the darkening of the screen at the appropriate point on the frequency line, giving spectra such as:
     or
the first of which is a pure tone, and the second is a complex musical tone.  Both tones, however, have no variations in shading, meaning that the tones do not become louder or softer with time.
The Advantages of Fourier Synthesis of a Sound
Sound which is conveyed electronically using an oscilloscope comes as an incoming bit stream in the time domain, so that distinctions in the sound occur based on readings in time.  However, the French mathematician Joseph Fourier discovered a way to represent any signal in the time domain as a signal in the frequency domain, a very useful discovery for sound analysis since the pitch of a sound is determined by frequency.  This discovery is especially useful to music editors, because a sound is completely conveyed by its frequency spectrum.  The Fourier spectrum then includes the frequency element as well as the time element as the sound changes.  Representing a sound by its Fourier spectrum allows music editors to directly edit the frequencies present in the signal, making processes such as filtering much easier.  In a sense, a spectrogram of frequencies present in an audio signal is similar to a musical score in that every element which contributes to the overall sound is accounted for in the spectrogram.
A second advantage of representing sound files as Fourier spectra is that the frequency domain accounts for sounds that appear the same based on phase deafness.  The human auditory system can detect pressure waves in the form:
ω(t) =  A sin (2πft)
where the amplitude A > 0 and the frequency f > 0 are suitably chosen.  In reality, though, the sinusoidal tone should be represented as :
ω(t) =  A sin (2πft + φ)
where the phase parameter, φ, has no effect on what you hear.  The same is true when you listen to sums of such waves.  For example, the graphs of the following functions are obviously different:
ω1(t) =  sin (2π 400t) + sin (2π 500t) + sin (2π 600t)
and ω2(t) =  sin (2π 400t) + sin (2π 500t) – sin (2π 600t)
but your ears can not detect a difference in the sounds these functions produce, because your ears are phase deaf, a phenomenon explained by Ohm’s Law of Acoustics.  When a sound file is represented as a Fourier spectrum, however, these distinctions are not lost, and can thereby be preserved in the editing process.  Since a person can hear a tone if the frequency f is between 20 Hz and 20000 Hz, and the musical interval (the perception of the difference between two tones) between two tones with frequencies f1 and f2 is the ratio f2/ f1., we can create a sound file which represents a p-periodic audio signal ω by the Fourier series:
ω(t) = A0 + Σ Ak sin (2πkft + φk)
where f = 1/p, Ak > 0, and φk (the phase parameter) are real for k = 1, 2, 3, … in the natural numbers.  Since we only hear the sounds with 20 Hz ≤ f ≤ 20000 Hz, we can take 20/f ≤ k ≤ 20000/f.  This form of Fourier series can be used to represent a sound file with the entire frequency spectrum, allowing sounds which are the same up to a phase difference to be treated differently.  These different frequencies which can be interchanged harmoniously, allowing for easy manipulation, using appropriate functions, of a person’s auditory perception of the changes in the frequency of a sound over time.
Fourier Spectra and Risset’s Endless Glissando
	Musician and composer Jean Claude Risset created a very interesting auditory anomaly for his composition Mutations I.  Analagous to the visual phenomenon of a spinning barber shop pole, the sound is strictly increasing, similar to the red spiral of the barber shop pole which seems to be constantly moving upward, but also periodic, similar to the periodic rotation of the pole itself.  He accomplished this by first defining a function:
y(t) = sin (2πf0 eαt/ α)
where f0 = 784 Hz and α = log (1.5)/(8 sec).  This definition gives the frequency, which increases by a factor of 1.5 (a musical fifth) during each eight second interval, as:
f(t) = f0 eαt,       (f > 0 by definition)
at time t, which is strictly increasing.  In music, a glissando is a continuous slide up the musical scale and can be represented by a function which strictly increases frequency as time passes.  However, since there is a finite range of frequencies that humans can perceive, a glissando should be forced to end eventually.
	After applying a gaussian amplitude envelope to the function y, we obtain:
g(t) = e-βt^2 sin (2πf0 eαt/ α)
with β = 4/(322), which localizes the tone to the interval -32 sec ≤ t ≤ 32 sec.  Now, the function is ready to be periodized, which, if done with the appropriate periodization, will preserve the increasing nature of the local frequency of each successive note, using Ohm’s Law to create an illusion of strictly increasing frequencies.  The periodized function is given by:
w(t) = Σ g(t+8m)   (m is summed from negative to positive infinity)
This function includes chords formed by notes given by the function g.  Because of the gaussian amplitude envelope, each translate of g occurs for 64 seconds, and because of the periodizing function, a translate starts every 8 seconds.  Therefore, there can be only 8 notes at any time forming the chord.  The function is designed so that the notes are separated by musical fifths, and the local frequency of the note is always an increase from the local frequency of the last note, creating a musically harmonious glissando.
Analysis and Conclusion
	We have seen that using Fourier’s methods on an appropriately increasing function on time that outputs frequency, we can periodize the final frequency output, making the increasing sound endless.  The reason Fourier’s methods are able to accomplish this lies in the advantages listed earlier, that a Fourier spectrum of a sound includes the frequency element as well as the time element in the sound function as the sound changes, rather than simply a frequency and amplitude oscilloscope reading over time.  This makes the representation of the sound more accessible for manipulation, allowing Risset to create an auditory illusion by playing with frequency and periodization in relation to the way sound is perceived by a person’s ear according to Ohm’s Law of acoustics.