|
June 2026
It's About Time: What Makes Music
Music
For a normal, healthy young person, the generally accepted frequency range of human hearing is said to be from 20 Hz to 20 kHz. Before the term "Hertz" (Hz) came into use to honor Heinrich Hertz ( the man who proved Maxwell's wave theory correct), sound frequencies were expressed in "cycles" per second, with one cycle being the complete cycle of the air pressures making up a sound wave, from zero to positive peak, to zero, to negative peak, and back to zero again. What that means is that a person blessed with good hearing is capable, because one cycle includes two pressure peaks, of detecting 40,000 pressure changes per second or, to put it differently, of hearing pressure changes that take just ONE FORTY THOUSANDTH OF A SECOND to happen. Even old people like me, who may only be able to hear frequencies an octave or more below the stated human norm—can still clearly pick up sonic differences taking place in times in the range of less than one ten thousandth of a second. That's why the time aspects of music, both live and recorded, are not just important, but crucial. It's also why we can hear in stereo. The generally accepted average distance between human ears is about five inches. When that is calculated against the standard speed of sound in air at sea level (about 1120 feet per second at 60 degrees, Fahrenheit), we find that a sound coming from a listener's hard left will reach his left ear 0.000372 seconds before it's heard by his right one. And, obviously, the same applies to sound coming from the right to his left ear, and sound coming from a source closer to the center will take even less time to reach both ears. That time delay (plus phase differences) makes up almost 100% of what allows us to tell where the sounds we hear are coming from.
In hi-fi, as in high-fidelity home audio, that's what allows for imaging and soundstaging, and is a major part of how "real" our system can sound. It's not just in recorded music, though, that time makes a difference. Every piece of music is written with a "time signature" (3/4, 4/4, 13/8, etc.) that tells the basic rhythmic structure of the music. To keep it, a metronome (a device that marks time with a steady, regular beat) can be used, and the music can be played exactly as written. Unfortunately, though, although the basic time signature can describe the "bones" of the music, it's tiny (or even not-so-tiny) variations from it (rubato) that give music its soul. The slightest extra pause between notes or extra hold on a given musical note or phrase can make the difference between a technically perfect but boring performance and an exciting performance of exactly the same notes. One perfect example of this is the piano transcription of Rimsky-Korsakov's Flight of the Bumblebee, which one modern pianist was recorded to play in its entirety in just one minute and thirty seconds – absolutely perfectly, with never a missed note (coming at a rate of more than fifteen notes per second), but with absolutely no music at all. The lady performing it seems never to have heard the word "phrasing", and although (with no exaggeration at all) she may very well be the greatest high-speed-finger-twiddler or race-piano driver of all time, everything I've ever heard her play might very well have been played by an over-wound metronome.
The opposite phenomenon can be heard in practically any performance of Beethoven's Fifth Symphony. The first few notes (da dadadum) are so simple as to be mindless, but, by the way they're played and how their multiple repetitions are spaced in time and varied in intensity, they've become the most recognizable four notes in Western music. It's not just the notes, but the way they're placed in time that gives music its meaning. You can hear it everywhere, not just in classical works. Listen to Billie Holiday sing God Bless the Child and you'll hear her come in just a breath behind the beat, stretching phrases until they feel like a live experience instead of lyrics. Or take Bill Evans at the Village Vanguard: the opening of My Foolish Heart floats so freely that the bar lines seem more like suggestions than rules, and the whole thing takes on a hushed, intimate glow that no metronome could ever produce. Even in pop and rock, time is the secret ingredient. David Gilmour's acoustic intro to Wish You Were Here (Pink Floyd) wanders just enough to sound human and searching; play it strictly in time and it turns into a campfire strum. Frank Sinatra built an entire career on bending time to his will—leaning back, pushing forward, shaping phrases the way a sculptor shapes clay. And in the blues, B.B. King could make a single note ache simply by delaying its landing by the tiniest fraction of a second.
All of these are rubato—tiny, intentional distortions of written time that turn sound into expression. Without them, music is just a machine. With them, it becomes human. It's easy to understand the importance of time to music, but how about to the reproduction of music? In high-fidelity audio's earliest days, all hi-fi recordings were monophonic (single channel, non-stereo), and loudspeakers were built with all of their drivers (woofer, mid-range, and tweeter) stuck on the speaker's front panel in whatever arrangement the designer thought looked good or went together most easily. With mono sound, driver placement simply didn't matter. With stereo, though, it becomes crucial. We've already seen that our ears can pick up teeny-tiny little differences (1/40,000th of a second) in timing, and that the relative arrival times of a sound to our two ears are one of the most important factors in allowing us to determine where sound comes from or, consequently, the size and shape of the space it was made in. Natural sounds come from a single source, or as with an orchestra, from multiple single sources. Most speakers, though, split the sound into two or more frequency bands, to be reproduced by different drivers, in different locations on their enclosure. The result of this is that, instead of all of the sound—whether of a single instrument, a rock band, or the entire 1910 cast of Mahler's Symphony of a Thousand (Symphony#8 in E-Flat Major)—getting to our ears at the same time, it's going to have at least two different arrival times to each ear, and be spatially "smeared" accordingly.
All speakers, other than those with single "full-range" drivers (which usually aren't really all that full-range) do it to at least some degree—even planars, which, because of their sheer size, can have sound arriving at your ears from different parts of their driver surface at different times. To fix that, most speaker manufacturers, ever since stereo replaced mono as normal, have placed their speakers' drivers in what is called a "time-aligned" array. The best way to do this is to use tall, narrow enclosures with a "stepped" front that places the woofer closest to the listener, the mid-range, slightly farther away, and the (conventional cone or dome) tweeter farthest away, such that the voice coils of all of the drivers line up on a single vertical plane. Ribbon or planar tweeters must be placed so that it's the vertical center of their diaphragm is in line. This works, but because the different drivers (woofer, mid-range, and tweeter) will always be of different diameters, it can only be fully accurate for a listening position directly in front of the speakers. The farther you get off to either side of center, the less accurate the alignment may be.
When it works, though, the effect can be spectacular, and with properly placed speakers (a whole different thing we can talk about later, if you'd like), the sonic "image" can snap into perfect focus and, even more than just "flat" frequency response, give you the best possible musical experience. Getting the timing right is the key to both the music and its successful reproduction. It gives the music its soul and, for your system, makes sure you can hear it. Now, go to your system, put on some tunes, sit back, close your eyes and...
|
| ||||||||||