The Science of Cacophony and the Tonic

The modern Western Chromatic scale consisting of 12 independent notes began its development during the time of Pythagoras, and has evolved from simple harmonic frequencies which were resolved by experiment with simple machines and listening into an exact science of pure frequencies with the advent of mechanical timers capable of counting frequencies in the sonic (or audible) range.

This scale is fundamental to all Western music and is the building block to its written notation--however it is fascinating to consider that the addition of simple sinusoidal frequencies together are also the building blocks of the perception of hearing itself.

The Composition of Sound

There are physical principles that mathematically govern the composition of sound on a whole: from its creation, propagation and perception. However, these principles could neither have been quantitatively nor qualitatively defined by man when he first began experimenting with it to create pleasant sounds with machine or voice. Nevertheless, I must assume that the perception of listening alone governed what was pleasing and what was contrary, and man’s quest to master the techniques and tame the principles behind it have been driven primarily by the art and science of music itself.

Now, sound itself is a longitudinal pressure wave propagated in any dense medium which is the stimulus of hearing. Specifically, it is the propagation of energy away from a point of perturbance in said medium by the compression and rarefaction of the medium itself which is conveyed to a sensory organ capable of distinguishing the analogous wave energy from some basal. Due to the physical laws of nature, the propagation exists as wave energy and can be resolved into purely sinusoidal components due to the visco-elastic properties of any dense medium, such as air, water or even a taught steel string. The illustration to the right shows one oscillation of a purely sinusoidal waveform.

Sinusoidal Wave Form

The Creation of Sound

Vibrating Stick

The perturbance itself is typically initiated by the undulation of a dissimilarly dense medium suspended within or adjacent to the propagation medium. An example of this might be a wooden stick, perhaps. If the stick is struck so that it vibrates, its back in forth motion is analogously transferred to the surrounding medium--air, for example--which immediately begins to vibrate. It is analogous, but not perfectly since the air must fill the vacuum created when the stick moves away (rarefaction) and conversely must be thrust outward when the stick moves toward (compression) the molecules in the medium. Since there isn’t (and can never be) perfect elasticity, some of the energy is absorbed (usually as heat, but sometimes (though rarely) as light in the unusual example of sonoluminescence) while the rest is propagated away at a specific speed (the speed of sound in Earth’s atmospherically average, arid air at 60°F is about 1118.4 feet per second.)

The predominant frequency at which this process occurs is said to be the fundamental which musically is referred to as the tonic. Essentially, frequency itself is the number of discrete occurrences over a specified period of time. The discrete occurrences are each individual undulation, and are currently measured in cycles per second: commonly referred to as Hertz-(abbr. Hz.)

The Propagation of Sound

As was discussed earlier, the disturbance is carried away from the source by the elastic properties of the propagation medium. It’s viscous properties are what serve to dampen or attenuate propagation; and since all dense mediums or physical materials exhibit some degree of both viscous and elastic properties, the interaction of these properties serve to distribute (propagate) and absorb the energy of the longitudinal pressure wave as well as determine the velocity and directionality of the wave. The velocity is purely independent of the frequency, however its directionality is very dependant upon the frequency, i.e. the higher it’s frequency, the more directional it behaves. These particular behaviors shall neither be proven nor further discussed-except to be referred to, as it is not the intent of this paper to go into such physics which is far beyond the realm of this discussion. However, it is important to recognize these behaviors and accept them as facts in order to justify points that shall be taken further.

Though the propagation of sound is purely sinusoidal, the individual components of the propagation medium don’t actually move as a sine wave as shown two illustrations above. They actually compress toward each other (or increase in medium density) and then to counteract they move away from each other (rarefy or decrease in medium density,) or vice versa depending on which way the undulating source is headed: toward or away respectively. This creates the longitudinal pressure wave which propagates away from the source and in reality looks more like bands that fade from dark to light and back again as in this illustration. The wave continues pushing on, compressing the adjacent medium in a chain reaction that continues on. This phenomenon is called propagation.

Propagation: Compression/Rarefaction

When sound energy is finally come to terminus, not by its utter exhaustion via the viscous properties of the propagation medium, but when it arrives at the threshold of a new medium, the energy is similarly transferred to this new medium, with some nearly immeasurable loss of fidelity. The process continues until the energy is completely absorbed or morphed into a different type of energy; typically heat or physical motion.

The Perception of Sound

A particularly notable terminus to sound energy is when it is captured (via motion of a membrane also referred to as a diaphragm) by the auditory sensory organ of a living being; particularly the human ear-which through a complex and still fully misunderstood process serves to transform the sound energy into the sensation of sound. Sound energy, in and of itself is entirely meaningless without some sensory device to pickup and interpret it, lending credence to the old adage that "if a tree falls in a forest and nobody is there to hear it, does it make a sound?"

The definitive answer is of course that it most likely does create the potential to initiate a longitudinal pressure wave, and as long as there is a suitable propagation medium that wave shall obey the laws of physics and nature, and begin rarefaction and compression away from the source. However, without the sensory organ to perceive the wave (or device capable of capturing and measuring the sound energy,) it could not by definition considered sound since sound is finalized with perception. It is relegated to merely sound energy that was fully attenuated before it could be birthed into perceived sound. Nevertheless, it is certainly the intent of this discussion to neither prove nor disprove old adages.

It is also not the intent of this discussion to go into the nature of hearing, to which sound is the stimulus, save to suggest that hearing is the biological part of sound that allows human beings to determine certain qualities of sound. The two major qualities of discussion here are cacophony and the tonic, which seemed to be of particular importance to musicians and those who enjoy their craft as discussed heavily throughout the Enjoyment of Music course. And those qualities are only discernible due to the magnificent sense of hearing, thus it was brought up.

Nevertheless, before going any further, it’s important to consider the following facts: The normal, healthy human ear can perceive approximately 10 octaves of the sound spectrum. The thresholds are from about 20 to 20,000 Hertz.

The Spectrum and Strength of Sound: Logarithms

The spectrum of audible sound does not follow a linear progression, rather a logarithmic progression. This indicates that every perceptive increase is actually a physical doubling. Therefore to increase the frequency by one octave, the base must be doubled. Thus from 20Hz to 40Hz is one octave, but to get to the next octave one must go to 80Hz, then to 160Hz, 320Hz and 640Hz. But this represents merely 5 octaves. There are 5 more! One should easily see how it could get to 20,000 Hz so quickly.

Similarly, the strength of perception is also logarithmic. Thus sound strength--or loudness seems to increase with each doubling of energy, and the energy is referred to as sound pressure level. The unit to which sound pressure level is described is called the decibel, and with each doubling of sound pressure level, there is a net increase of 3 decibels. Thus, if one horn player was producing a note at 85 decibels (a little louder than normal conversation level,) two horn players together would bring it to 88 decibels. It would then require an additional two horn players to bring it to 91 decibels, and another four to bring it to 94 decibels, and another 8 to get it to a whopping 97 decibels. This is a total of 16 horn players. Thus where 1 horn player could produce 85 decibels, to increase it to 100 decibels would require the addition of 31 more horn players.

But at 110 decibels, it would begin to be rather painful; and in fact, at 120 decibels it would only take a moment for your sense of hearing to be destroyed. Fortunately, to get to 110 you’d need about 250 more horn players. And, all the bells of each horn would need to be in complete proximity to one another for this to occur, which also is fortunately a physical impossibility—especially given the egotistical nature of most horn players, or any musician for that matter!

I only bring these topics up in this discussion to make a very crucial point about music: it is based upon physical and mathematical principles, and these principles are what govern what we perceive in sound as cacophonous or pleasant.

The Tonic

The tonic is going to be discussed first since cacophony involves many tonics, and it would be simpler to describe the tonic first. Essentially, as described several paragraphs above, the tonic is the fundamental frequency at which a tone resonates. Because of the visco-elastic nature of components which initiate sound, they exhibit their own particular resonance. This resonance is further morphed by several factors including the ambient acoustical environment as well as the physical nature of the environment (which impacts the sound with its own visco-elastic properties.) Thus, a tonic is almost never a simple sinusoidal waveform, and consists of many harmonics. These harmonics are mathematical overtones, usually occurring at whole or ‘whole fractional’ intervals.

A whole interval is an integer spacing, such as twice the fundamental or three times the interval. A whole fractional interval is that of one and a half times or one and a third times, that is, and ordinal fraction. One and a half creates a harmonic fifth, one and a third creates a harmonic fourth. These are graphically represented in the two illustrations below.

Fundemental with Harmonic Fifth Fundemental with Harmonic Fourth

One can see how vastly different they look, and this difference in appearance is exactly why they sound different. It actually takes more time for a fourth to resolve, or repeat than the fifth does, but this will be covered in greater detail further on, and is the prevalent feature of cacophony or dissonance: the amount of time it takes for the waveform to resolve and thus be interpreted by the human brain through the sense of hearing.

The Chromatic Scale

The chromatic scale conveniently consists of 12 notes due to the very nature of the fifth. Since a fifth is easy created by increasing the frequency exactly one and a half times (this can be mechanically duplicated by having a plucked or hammered string of particular length under a certain tension shortened to two thirds with the same tension.) By following this progression upward, there are a total of twelve discrete frequencies that will evolve into the thirteenth which ends up being the same as the original. In reality, it isn’t exactly the same as the original, rather slightly sharper than the original, but close enough so as to be such. The opposite can be similarly done by cutting the frequency by the octave below the fifth upward (also known as a fourth) resulting in twelve discrete downward steps, the thirteenth being again the original, except this time somewhat flatter than the original.

Thus we can create the following increasing pattern (of fifths) by raising a fundamental frequency by 1.5 (and then cutting by one half to keep the frequencies within the same octave: denoted with a downward arrow ‘↓’--this is the bottom note and frequency in each cell. Notes with dual nature (sharp/flat) are duly noted.) Assume that a C-note is 1000Hz:

C-1000Hz G-1500Hz D-2250Hz ↓D-1125Hz A-1687.5Hz
E-2531.25Hz ↓Fb-1265.63Hz B-1898.44Hz ↓Cb-1898.44Hz F#-2847.66Hz ↓Gb-1423.83Hz C#-2135.74Hz ↓Db-1067.87Hz
G#-1601.31Hz Ab-1601.31Hz D#-2402.71Hz Eb-1201.36Hz A#-1802.03Hz Bb-1802.03Hz E#-2703.05Hz ↓F-1351.5Hz
C-2027.29Hz ↓C-1013.64Hz      

Therefore, the table to the right (using only sharps since it is increasing) can be derived which clearly shows the discrepancy, a difference of 1.3% However, by using a formula of frequency times 21/12 (two to the power of one twelfth, approximately 1.0595) times each successive frequency, the pure mathematical overtones can be assessed accurately. Note that these are now mathematical overtones, and not mathematically harmonic overtones. This is why pianos are tuned to what is called a ‘stretch tuning’, because the frequencies are stretched mathematically to keep them in pure tuning. Thus, by starting with C at 1000Hz times circa 1.0595, a C#’s pure tuning results in 1059.46Hz, a D’s pure tuning results in 1122.46Hz and so on until the C an octave above turns out to be precisely 2000Hz. In reality, the accepted tuning of C is actually 1046.5023Hz, where an A is 440Hz; what’s considered concert tuning.


Cacophony (which is) Dissonance

Now, pleasant sounds are sounds which resolve in the ear quickly, where displeasing sounds take much longer to resolve. The proof of this is quite elementary, since a diminished fifth played near the middle of a piano sounds far more dissonant than the same chord played several octaves up; while a simple fifth played near the middle of the piano sounds considerably less dissonant than the two same notes played at the bottom of the piano. This is entirely due to the time required to resolve. This is perceived as a warbling to the ear.

Warbling: Dissonance to Resolution

The illustration above represents the warbling that occurs of an A-440Hz and a B-493.88Hz resolving into an A-440Hz linearly over 3 seconds of time. The warbling begins very quickly, and as the frequency of the second note (the ‘B’) approaches that of the first note (the ‘A’.) The following illustration is the exact same linear progression except the duration has been reduced to 1/10th of a second:

Magnification: 0.1 sec

What is occurring at each ‘node peak’ and ‘node valley’ is a phenomenon known as phase cancellation. The tones begin in phase (the peak), but become out of phase (the valley) as time progresses. They then become back in phase and out of phase: the pattern repeats, each time taking longer and longer durations to undulate as they grow closer in phase to one another. Actually, these longer periods are actually the pleasing part of the sound, where it is initially entirely dissonant--it becomes entirely un-dissonant at the end.

This visible representation of ‘warbling’ is the exact analogy of what is heard when, for instance, a guitar is tuned. The further out of tune it is, the faster the warbling (and less pleasing the sound) and as the tuning machines are turned to bring the instrument into tune, the length of time between warbles decreases until finally the two strings are in tune with one another. This results in a pleasant, warble-less tone. This visible representation is also exactly what separates a pleasant tone or chord from a dissonant one. The following last two illustrations are the 1/10th of a second sinusoidal representation of a major triad and then a diminished triad (where the 3rd is a minor and the fifth diminished; the root of both chords is at 1000Hz for this representation.)

Major 3rd & Dominant 5th: Root = 1000Hz, time = 0.10s
Major 3rd & Dominant 5th: Root = 1000Hz, time = 0.10s
Minor 3rd & Diminished 5th: Root = 1000Hz, time = 0.10s
Minor 3rd & Diminished 5th: Root = 1000Hz, time = 0.10s

As one can clearly see, the top chord goes through just over two complete iterations in 1/10th of a second, whereas the bottom isn’t even nearly resolved in the allotted duration. Incidentally, the average period of synapse in the human physiology is 1/100th of a second, thus about 5 synaptic occurrences are required to render the top chord, whereas several scores are required to render the second. This requires much more work for the human auditory sensation during the perception of sound, and likewise, is less pleasant!

Thus in conclusion, there is a purely scientific rationale behind cacophony which relies upon the amount of work the human brain must do to resolve a sound. This may very well be the source of emotional resolution when a chord is likewise resolved from dissonance: the brain goes from an excited state to one of less excitement ergo less work. This may also explain the choice of simplicity in the musical preference of society over time. The simpler the times--the simpler the harmonic content. As society has sped up, and emotions have flared; so too what has been considered acceptable musically. Is it any wonder that what is visually dissonant seems to have evolved with what is aurally dissonant?

The information that has gone into this research paper is entirely first hand knowledge from what I have gleaned over the past two decades. I have taught guitar and its theory as well as sound reinforcement and recording professionally, and I have likewise guest lectured in applications for a Calculus III course at SUNY Cobleskill under direction and supervision of Associate Professor of Mathematics; Kurt E. Verderber concerning the relationship between sonic waveforms and series functions. It is with this knowledge and suppositions, as well as what I have been exposed to in this class that has brought this paper to fruition. All of the illustrations included herein are completely of my own creation utilizing windows Paintbrush, PSP 7.0, and the Cool 150 waveform editor.

Valid XHTML 1.0 Strict Web Design ©2006 by ΔΠΣ for B2bB
Content ©2005 by ΔΠΣ
Valid CSS!