Time Alignment & Phase

A loudspeaker system that is perfectly laid out geometrically can still sound smeared, hollow, or spatially vague if the sound from each driver and each enclosure does not arrive at the listener as a coherent wavefront. Time alignment and phase management are the disciplines that turn a collection of correctly placed boxes into a single, believable spatial image. This chapter develops the topic from the physics of sound propagation up through the practical procedures used to align subwoofers to mains, drivers across crossovers, and surround speakers to the front stage.

The recurring theme of Part V applies here with full force: calibration is about making the physical system deliver the cues that the earlier parts of the guide assume. The precedence effect, summing localization, and the direct-to-reverberant balance that we treated as given in the psychoacoustics chapter and in the reverberation chapter only behave as predicted when arrivals are coherent in time and consistent in polarity. Time alignment is where the abstract becomes physical.

Why Alignment Matters Perceptually

Coherent arrival and the precedence effect

When two or more sources radiate the same signal, the auditory system fuses them into a single perceived event if their arrivals fall within a short integration window. The precedence effect (also called the law of the first wavefront, or the Haas effect for the level-trading regime) means that for delays roughly between 1 ms and 30 ms the perceived direction is dominated by the first arrival, while later arrivals add loudness and spaciousness without forming a separate image. This is the mechanism reviewed in detail in the psychoacoustics chapter.

Time alignment exploits and respects this mechanism. If we want two loudspeakers to form a stable phantom image between them — the basis of stereo and amplitude panning — their signals must arrive within a fraction of a millisecond of each other at the listening position. Summing localization, the process by which a centred phantom appears when left and right arrive simultaneously and at equal level, is destroyed by even modest interchannel delay: a delay of about 0.5 ms to 1 ms is enough to pull the image fully to the earlier side, overriding any level-based panning.

Misalignment smears imaging and timbre

Two failures follow from poor alignment. The first is spatial: a phantom centre collapses toward whichever speaker arrives first, panning laws no longer produce the intended angles, and the soundstage tilts toward the near side of the room. The second is timbral: when two arrivals of the same broadband signal are separated by a few milliseconds, they sum constructively at some frequencies and destructively at others, producing the comb filter we analyse later in this chapter. The ear hears this as a hollow, phasey, or metallic coloration rather than as an echo, because the delays are too short to be resolved as discrete events.

Key takeaway

Alignment is not cosmetic. The phantom imaging, panning accuracy, and timbral neutrality that every spatial technique in Techniques relies on are conditional on arrivals being coherent in time and consistent in polarity at the listening position.

Propagation Delay Basics

The fundamental relationship

Sound travels through air at a finite speed. The time taken to travel a distance $d$ is

t = \frac{d}{c},

where $c$ is the speed of sound. At a typical indoor temperature of $20\,^\circ\text{C}$ , $c \approx 343$ m/s. A convenient working figure is that sound travels roughly 1 metre in 2.9 ms, or equivalently about 0.34 m per millisecond. The inverse, 2.92 ms per metre, is the number to keep in your head on site.

The speed of sound depends on temperature. A useful linear approximation near room temperature is

c(\theta) \approx 331.3 + 0.606\,\theta \quad \text{(m/s, } \theta \text{ in } ^\circ\text{C)}.

So a hot venue at $30\,^\circ\text{C}$ gives $c \approx 349$ m/s, about 1.8% faster than at $20\,^\circ\text{C}$ . Over a 10 m throw that shifts the arrival time by roughly $0.5$ ms — usually negligible for a single throw, but worth remembering for large outdoor systems whose air temperature changes between soundcheck and showtime.

Converting distance to delay

The conversion is linear, so a small table covers most practical cases:

Distance difference $\Delta d$	Delay at $343$ m/s	Notes
$0.10$ m	$0.29$ ms	within a single enclosure (driver offset)
$0.34$ m	$1.0$ ms	onset of precedence-effect image pull
$1.0$ m	$2.92$ ms	typical front-row vs reference offset
$2.1$ m	$6.1$ ms	surround behind a nearer front speaker
$3.4$ m	$9.9$ ms	small-room sub-to-main path difference
$10$ m	$29.2$ ms	distributed delay-fill in a large venue

Worked example: a surround speaker 2.1 m closer

Consider a 5.1 layout per ITU-R BS.775. Suppose the left-front loudspeaker is $3.6$ m from the reference seat and the left-surround is $1.5$ m from it. The path difference is $\Delta d = 3.6 - 1.5 = 2.1$ m, so the surround's sound arrives earlier by

\Delta t = \frac{2.1}{343} \approx 6.1\ \text{ms}.

To make the front and surround wavefronts arrive together at the reference seat, we must delay the nearer speaker (the surround) by about 6.1 ms. Without this correction, a sound panned to arrive equally from front and surround would localize toward the surround (precedence) and would comb-filter in the overlap region. This single number — delay the near one to match the far one — is the heart of system time alignment.

Delaying the Near Speakers to the Far Ones

The alignment principle

The goal at the reference position is that all loudspeakers carrying a shared or correlated signal present their direct sound within the precedence window, ideally within a few tenths of a millisecond. Because we can only add delay electronically (we cannot make sound arrive earlier than physics allows), the rule is:

Find the loudspeaker with the longest acoustic path to the reference, treat its arrival as the time reference, and add delay to every other loudspeaker so that its arrival matches.

The required delay for speaker $i$ is

\delta_i = \frac{d_{\max} - d_i}{c},

where $d_{\max}$ is the longest path and $d_i$ is the path from speaker $i$ . The speaker at $d_{\max}$ gets $\delta = 0$ ; nearer speakers get progressively more delay.

Worked example: aligning a 5.0 array to one seat

Take these measured distances from the reference seat: left-front $3.6$ m, right-front $3.6$ m, centre $3.1$ m, left-surround $1.5$ m, right-surround $1.6$ m. The longest path is $3.6$ m (the fronts), so $d_{\max} = 3.6$ m.

Speaker	$d_i$ (m)	$d_{\max} - d_i$ (m)	Added delay $\delta_i$ (ms)
Left-front	$3.6$	$0.0$	$0.0$
Right-front	$3.6$	$0.0$	$0.0$
Centre	$3.1$	$0.5$	$1.5$
Left-surround	$1.5$	$2.1$	$6.1$
Right-surround	$1.6$	$2.0$	$5.8$

After applying these delays, a transient fed to all channels arrives simultaneously at the seat. The centre image stops being pulled forward by its shorter throw, and front-to-surround pans behave symmetrically. This is the electronic equivalent of moving every speaker onto a single arc centred on the listener — see speaker layouts and topologies for the geometric version.

The trade for off-centre listeners

Delay alignment is exact only at the reference point. Move the listener and the geometry changes: a delay that equalised arrivals at the centre seat now over- or under-compensates. This is unavoidable — there is one delay value per speaker but a continuum of seats.

In practice you choose the reference to serve the most important position (the mix seat in a studio, the centre of the audience in a venue) and accept graceful degradation elsewhere. A useful observation: the spatial extent of acceptable alignment is larger than people fear, because the precedence window is tens of milliseconds wide. A 0.5 ms misalignment at an off-centre seat shifts an image but rarely splits it. Cinema and immersive calibration (Dolby, SMPTE) deliberately align to a reference and verify that the coverage zone stays within tolerance rather than chasing per-seat perfection.

Reference-point discipline

Pick and physically mark one reference position before you touch a single delay. Every distance, delay, level, and EQ decision in this chapter and in equalization and room correction is relative to that point. Changing the reference midway through calibration is the fastest way to chase your own tail.

Polarity vs Phase vs Delay

These three terms are routinely conflated and they are not the same thing. Getting them straight prevents a large class of calibration errors.

Polarity

Polarity is the sign of the signal: a positive-going pressure becomes negative-going when polarity is inverted. It is frequency-independent — inverting polarity multiplies the entire signal by $-1$ , equivalent to a $180^\circ$ phase shift at every frequency simultaneously. In the time domain it flips the waveform about the zero axis. Polarity is binary: normal or inverted. It is caused by swapped speaker terminals, an inverting gain stage, or a "phase" switch (which almost always means polarity).

Phase

Phase is the position within a cycle, and it is frequency-dependent. A network or transducer can advance or retard different frequencies by different amounts. Phase is what a transfer-function measurement displays as a function of frequency. A pure delay produces a phase that grows linearly with frequency (more on this below); a crossover filter produces a frequency-dependent phase that is not simply a fixed time offset.

Delay

Delay (latency) is a pure time shift. A delay of $\tau$ seconds corresponds to a phase shift

\phi(f) = -2\pi f \tau,

linear in frequency. This is the crucial distinction from polarity: a $180^\circ$ phase shift from a delay occurs at only one frequency (and odd multiples of it), whereas a polarity inversion is $180^\circ$ at all frequencies. The frequency at which a delay $\tau$ produces a half-cycle is $f = 1/(2\tau)$ .

Property	Polarity	Phase	Delay
Frequency dependence	none (all frequencies)	varies with frequency	linear with frequency
Time-domain effect	waveform flips sign	waveshape changes	waveform shifts in time
Caused by	wiring, invert switch	filters, transducers, rooms	path length, DSP delay
Fixed by	polarity/invert switch	all-pass / FIR phase EQ	delay line
$180^\circ$ at...	every frequency	one or more specific freqs	$f = 1/(2\tau)$ and odd harmonics

Absolute polarity and its audibility

Absolute polarity is whether a positive electrical input ultimately produces an initial compression (outward cone motion, positive pressure) at the listener or a rarefaction. For most steady, symmetric musical signals it is inaudible because the ear is largely insensitive to the starting sign of a periodic waveform. It becomes audible for asymmetric waveforms — brass, certain plucked and struck transients, solo voice — where the peak excursion is markedly larger in one direction. The robust rule is consistency: every loudspeaker in the system must share the same absolute polarity, because relative polarity errors between channels are devastating (a phantom centre with one channel inverted has a deep midrange null and a vague, "outside-the-head" image), even when absolute polarity is a subtlety.

One inverted channel wrecks the centre

If left and right carry the same signal but one is polarity-inverted, the on-axis sum is a difference. Low and mid frequencies cancel, the phantom centre becomes diffuse and unlocalizable, and bass nearly disappears at the symmetry plane. Always verify relative polarity across all channels before trusting any imaging judgement. This is the single most common gross error in a freshly wired system.

Phase Coherence at Crossovers and Between Drivers

Why crossovers create phase problems

A multiway loudspeaker splits the band among drivers using crossover filters. Each filter imposes its own frequency-dependent phase. In the crossover overlap region both drivers contribute, and the summed response depends on their relative phase there. Even with perfectly flat individual magnitude responses, the acoustic sum can have a peak or a deep null at the crossover frequency depending on how the two driver outputs phase up.

A classic result: a textbook Linkwitz-Riley 4th-order crossover sums to flat magnitude when both drivers are wired in the same polarity, because each branch is $360^\circ$ shifted at the crossover and they recombine in phase. A Butterworth 2nd-order crossover, by contrast, has the two branches $180^\circ$ apart at the crossover and sums to a deep null unless one driver is polarity-inverted — after inversion it sums to a $+3$ dB bump. The point is that polarity and filter order interact, and "correct" wiring depends on the filter topology.

Minimum-phase vs non-minimum-phase behaviour

A system is minimum-phase when its phase response is uniquely determined by its magnitude response — the two are linked by the Hilbert transform. Most individual drivers and many electronic filters are minimum-phase. The practical consequence is enormous: for a minimum-phase response, correcting the magnitude with EQ also corrects the phase. A minimum-phase peak or dip fixed by a matching parametric filter restores both amplitude and time behaviour.

A response is non-minimum-phase when magnitude and phase are independent — typically because of a pure delay path, or because the response is the sum of multiple arrivals (driver plus reflection, or two speakers, or speaker plus room mode). Comb filtering from a reflection is the archetype: its dips cannot be removed by minimum-phase EQ, because boosting at a cancellation notch raises the level of both the direct and reflected paths and the cancellation persists. The proper tools for non-minimum-phase problems are physical (move the speaker, treat the reflection, retime the sources) or, for known fixed offsets, FIR / linear-phase correction. This distinction governs everything in equalization and room correction.

All-pass behaviour

An all-pass network passes all frequencies at equal magnitude but shifts phase progressively with frequency — it changes timing without changing the spectrum. Crossovers, ported-box high-pass roll-offs, and many room interactions contain all-pass-like sections. The audible effect of pure all-pass phase shift on broadband music is generally subtle and often inaudible for moderate amounts, which is why magnitude alignment usually takes priority over absolute phase flatness. But all-pass phase becomes very audible when it causes two summing sources to cancel: at a crossover, or between a sub and a main, the phase relationship determines whether you get reinforcement or a null, even though each source alone measures flat. That is the bridge to the next section.

Minimum-phase test

A quick field heuristic: if a measured dip moves in frequency as you move the microphone, it is an acoustic interference (non-minimum-phase) effect — do not try to EQ it. If a peak or dip stays put across nearby mic positions, it is more likely a minimum-phase property of the loudspeaker and is a legitimate EQ target. See measurement and calibration.

Sub-to-Main Alignment

Integrating subwoofers with the main loudspeakers is the most common and most consequential time-alignment task, because the crossover between them falls in a region (typically $60$ – $120$ Hz) where the ear is sensitive to level and where a misalignment produces a broad, obvious suckout or a boomy hump rather than a subtle coloration. Subwoofer integration has its own chapter — subwoofers and bass management — but the timing of that integration belongs here.

The summation problem at the crossover

Near the crossover frequency $f_x$ , both the sub and the main radiate significant energy. Their acoustic outputs add as vectors (phasors). If the two arrive in phase at the crossover frequency, they sum to as much as $+6$ dB (for equal levels); if they arrive $180^\circ$ out of phase, they cancel toward a null. Our goal is to choose the sub's delay and polarity so that at $f_x$ the two are in phase and summation is maximal.

The relative phase at the crossover has three contributors:

the acoustic path difference between sub and main to the reference (a delay),
the filter phase of the high-pass on the main and the low-pass on the sub,
any polarity difference and intrinsic transducer phase.

Worked example: aligning a sub at an 80 Hz crossover

Suppose at the reference the main loudspeaker's acoustic path is $4.0$ m and the subwoofer, placed against the front wall, has a path of $5.2$ m. The sub is farther, so its sound arrives later by

\Delta t = \frac{5.2 - 4.0}{343} \approx 3.5\ \text{ms}.

At the $f_x = 80$ Hz crossover, the period is $T = 1/80 = 12.5$ ms, so a half cycle ( $180^\circ$ ) is $6.25$ ms. A $3.5$ ms lateness corresponds to a phase shift of

\phi = 360^\circ \times \frac{\Delta t}{T} = 360^\circ \times \frac{3.5}{12.5} \approx 101^\circ.

That alone is more than enough to spoil summation (it is past the $90^\circ$ where vector sum starts dropping below the in-phase value). On top of this, the fourth-order low-pass and high-pass filters each add their own phase, often another $180^\circ$ to $360^\circ$ of combined relative shift at $f_x$ .

Because the sub arrives late, we cannot fix it by delaying the sub further — we must delay the main to bring the main's arrival back to match the sub, then trim with polarity. In this geometry, delaying the main by about $3.5$ ms removes the path-difference component; the remaining filter-induced phase is then resolved either by the polarity switch (if it lands near $180^\circ$ ) or by a small additional delay. In practice you do not compute the filter phase by hand — you measure the combined system and adjust delay (and try both polarities) to maximise the summed level through the crossover region.

The measured procedure

The reliable method, used in every modern workflow, is:

Measure the main alone through the crossover region (magnitude and phase), with its high-pass engaged.
Measure the sub alone through the same region, with its low-pass engaged.
Overlay the two phase traces. In the overlap around $f_x$ , adjust the sub's (or main's) delay until the two phase curves lie on top of each other — parallel and coincident at $f_x$ .
Try both polarities of the sub; keep whichever makes the phase traces overlap (rather than sit $180^\circ$ apart) at $f_x$ .
Measure the combined response and confirm a smooth, flat sum with no notch at $f_x$ .

When the two phase traces coincide at the crossover, the magnitudes add to the maximum the level setting allows. A residual null at $f_x$ after best alignment almost always means a polarity that needs flipping or an additional half-period of delay.

Worked example: the cost of getting polarity wrong

Suppose after delay alignment the sub and main are within $20^\circ$ of phase agreement at $80$ Hz with correct polarity, summing to roughly $+5.5$ dB over a single source. Flip the sub polarity and they are now about $160^\circ$ apart. The summed level relative to a single source is

20\log_{10}\!\left(2\left|\cos\tfrac{\Delta\phi}{2}\right|\right),

so at $\Delta\phi = 160^\circ$ the sum is $20\log_{10}(2\cos 80^\circ) \approx 20\log_{10}(0.347) \approx -9.2$ dB relative to a single source — a swing of nearly 15 dB at the crossover from a single switch. This is why polarity is never an afterthought in bass management.

Do not align subs by ear with pink noise alone

Sweeping a "phase" or delay control while listening to pink noise feels productive but is unreliable: the broadband level barely changes even as the narrow crossover region swings by 15 dB, and room modes mask the effect. Align with a measured phase overlay at the crossover, then verify the summed magnitude. Confirming by ear is fine; finding the setting by ear is not.

Comb Filtering from Misaligned or Duplicated Sources

The mechanism

When the same broadband signal reaches a point by two paths differing in length by $\Delta d$ (hence in time by $\Delta t = \Delta d / c$ ), the two copies reinforce at frequencies where the delay equals a whole number of periods and cancel where it equals an odd number of half-periods. The result is a series of regularly spaced peaks and nulls across frequency — a comb filter, named for the comb-like look of its magnitude response.

The cancellation (notch) frequencies are

f_n = \frac{(2n-1)\,c}{2\,\Delta d}, \qquad n = 1, 2, 3, \dots

and the reinforcement frequencies sit halfway between them, at $f_m = m\,c / \Delta d$ . The first null is at $f_1 = c/(2\,\Delta d)$ , and notches recur every $c/\Delta d$ Hz thereafter. The depth of the nulls depends on the relative level of the two copies: equal-level copies cancel completely (infinite-depth nulls), while a quieter second copy produces shallower ripple. The relative level of two paths differing by $\Delta L$ dB sets a ripple of roughly $\pm 20\log_{10}(1 \pm 10^{-\Delta L/20})$ around the mean.

Worked example: a 1 ms duplicate

Two arrivals separated by $\Delta t = 1$ ms (a path difference of $\Delta d \approx 0.34$ m) place the first null at

f_1 = \frac{1}{2\,\Delta t} = \frac{1}{2 \times 10^{-3}} = 500\ \text{Hz},

with further nulls at $1500$ Hz, $2500$ Hz, and so on, spaced $1/\Delta t = 1000$ Hz apart, and reinforcements at $1000$ Hz, $2000$ Hz, etc. A $1$ ms misalignment thus carves audible holes right through the vocal presence range — the hollow, "phasey" coloration we warned about at the start of the chapter. Halve the offset to $0.5$ ms and the comb stretches out: first null at $1000$ Hz, spacing $2000$ Hz. Increase it to $5$ ms and the comb gets dense — first null at $100$ Hz, spacing $200$ Hz — so closely packed below a few kHz that the ear integrates it into a diffuse coloration rather than distinct nulls.

$\Delta t$	$\Delta d$	First null $f_1$	Notch spacing
$0.1$ ms	$0.034$ m	$5000$ Hz	$10000$ Hz
$0.5$ ms	$0.17$ m	$1000$ Hz	$2000$ Hz
$1.0$ ms	$0.34$ m	$500$ Hz	$1000$ Hz
$5.0$ ms	$1.7$ m	$100$ Hz	$200$ Hz

Where comb filtering comes from in practice

The classic sources are: two loudspeakers reproducing correlated (mono) content reaching one seat by different paths; a direct sound plus a strong early reflection from a console, wall, or floor; and duplicated/overlapping coverage where two boxes cover the same area. The first of these is precisely why a hard-panned-centre mono signal sounds different from a true centre channel, and why correlated content behaves differently from decorrelated content — a theme developed in stereo is already spatial and in direct, diffuse and envelopment. Decorrelated signals (different content in each speaker) do not comb in the same way, because the cancellation requires the two arrivals to be coherent copies of one another.

Managing comb filtering

You cannot EQ a comb filter away (it is non-minimum-phase, and it changes with position), so you manage it physically:

Time-align the two intended sources so the comb moves below the audible region or its first null moves above the band of interest.
Reduce overlap: aim, splay, or level-offset speakers so any given seat is dominated by one source, pushing the second copy down in level and shallowing the ripple.
Treat reflections: absorb or diffuse the surfaces producing the strong second path (console bounce, side walls).
Accept decorrelation: where two speakers must cover the same area, decorrelated content combs far less than correlated content.

Mono check

Sum a stereo program to mono and listen at the reference seat. If the centre image hollows out or the level drops noticeably, you have comb filtering between your two front speakers — usually a sign that they are not equidistant from the seat or that one is polarity-inverted. A clean mono sum is a strong indicator of good front-stage alignment.

Measuring Delay

Tape measures get you close; they do not account for the electroacoustic delay inside DSP, converters, amplifiers, and the driver's own acoustic offset, nor for the filter-induced group delay around crossovers. For real alignment you measure the time of flight acoustically. This section previews the methods; the full treatment is in measurement and calibration.

Impulse response and time-of-flight

An impulse response (IR), obtained from a swept-sine or MLS measurement, shows sound pressure versus time at the microphone. The direct sound appears as the first large peak; its position on the time axis is the total propagation delay from the analyzer's reference (electrical zero) to the microphone. Measuring two speakers' IRs from the same reference and reading off the difference between their direct-sound peaks gives the relative delay directly, including all electronic and acoustic contributions. If speaker A peaks at $11.7$ ms and speaker B at $14.6$ ms, B arrives $2.9$ ms later; delay A by $2.9$ ms to align them. The IR also reveals reflections (later peaks), letting you see the second paths that cause combing.

Cross-correlation

Cross-correlation between the reference signal and the captured signal estimates the delay as the lag that maximises their similarity. The peak of the cross-correlation function marks the dominant time of arrival. Many analyzers use this internally to auto-set the measurement delay so that the impulse sits at time zero. It is robust against noise and is the engine behind the "delay finder" or "delay locator" found in measurement software.

Transfer-function phase

A dual-channel transfer function (FFT-based, as in Smaart or the equivalent in REW) compares the system output to its input and reports magnitude and phase versus frequency. A pure delay shows up as phase that decreases linearly with frequency at a slope of $-360^\circ$ per $1/\Delta t$ Hz. Software computes the group delay, $\tau_g(f) = -\frac{1}{360^\circ}\frac{d\phi}{df}$ , from this slope. Removing the bulk delay (so the trace is roughly flat) lets you see the residual frequency-dependent phase — exactly what you overlay between sub and main to align them at the crossover. Phase overlap at the crossover, not a tape measure, is the criterion for sub-to-main timing.

Three views of one phenomenon

The impulse response shows delay in the time domain (where is the peak), cross-correlation shows it as a lag (what shift maximises similarity), and the transfer-function phase shows it in the frequency domain (what is the slope). They are complementary views of the same arrival time; use the IR for gross time-of-flight and reflections, and phase for fine crossover alignment.

Step-by-Step Alignment Procedure

The following sequence assumes a measurement microphone at a marked reference and a dual-channel analyzer. It produces a coherently aligned multiway, multi-enclosure system.

Choose and mark the reference position. Set the microphone at ear height at the primary seat. Record the room temperature and compute $c$ .
Verify wiring and polarity first. Before any timing, confirm every loudspeaker has correct and consistent absolute polarity (positive input gives initial compression). Use a polarity test signal or read the IR's initial peak direction. Fix any inverted channels now — alignment built on a polarity error is worthless.
Measure each loudspeaker's impulse response from the same electrical reference. Read off the direct-sound arrival time for each. Note the longest path; this is your timing anchor ( $\delta = 0$ ).
Compute and apply per-speaker delay to bring every nearer speaker into agreement with the anchor, $\delta_i = (t_{\max} - t_i)$ . Re-measure to confirm the IR peaks now line up within a few tenths of a millisecond.
Align drivers within each enclosure (if you have access to the internal processing): overlay the woofer and tweeter phase across the crossover, adjust the inter-driver delay and verify the crossover sums flat with the correct polarity for the filter topology in use.
Integrate the subwoofers. Measure main-alone and sub-alone phase through the crossover, adjust delay (usually delaying the mains if the sub is farther, or delaying the sub if it is nearer and the filter phase calls for it), try both polarities, and choose the combination giving coincident phase and maximum summed magnitude at $f_x$ . Verify the combined trace is smooth with no notch.
Set levels so each loudspeaker contributes its intended SPL at the reference (this interacts with alignment because level offsets shallow comb ripple; see equalization and room correction).
Verify imaging by ear. Check the phantom centre, front-to-surround pans, and a mono sum. A locked, narrow centre and a clean mono sum confirm the front stage is coherent.
Check coverage off-axis. Move the mic to a few representative seats and confirm alignment degrades gracefully rather than collapsing. Re-choose the reference if the important zone is poorly served.
Document everything — distances, delays, polarities, temperature, and reference position — so the calibration is reproducible.

Common Mistakes and Pitfalls

Aligning by tape measure only. The acoustic origin of a loudspeaker is not its front grille, and DSP, crossover group delay, and converter latency add electronic delay a ruler cannot see. Tape measures get you to the right ballpark for setting up delay lines but must be confirmed by measurement. A multiway box can have a hundred microseconds to a couple of milliseconds of internal acoustic offset between drivers that no external measurement of cabinet distance reveals.

Ignoring polarity. As shown, a single inverted channel can swing the crossover sum by 15 dB and destroy the phantom centre. Always verify relative polarity before judging timing or imaging.

Over-delaying. Adding more delay than the path difference requires (for instance, "to be safe") throws the near speaker later than the far one, recreating the very misalignment you set out to fix and pulling images toward the previously far speaker. Delay is a precise correction, not a margin to pad.

Aligning at the wrong reference, or moving it. Delay alignment is exact at one point. Aligning to an empty mix chair when the mix engineer sits 0.4 m back, or changing the reference partway through, invalidates the work.

EQ-ing a comb filter. Boosting a position-dependent interference null wastes amplifier headroom and fixes nothing, because the underlying cancellation is non-minimum-phase. Treat the cause (timing, geometry, reflection), not the symptom.

Trusting a single mic position. A dip that is really an acoustic interference looks like an EQ target at one spot and vanishes at the next. Spatially average or compare positions before concluding a feature is a property of the loudspeaker rather than of the room.

Confusing polarity flip with phase alignment. The polarity switch is a $180^\circ$ -at-all-frequencies tool. It can rescue a crossover whose filter phase happens to land near $180^\circ$ , but it cannot fix an arbitrary delay offset. Use delay for time-of-flight and polarity for the coarse half-cycle, in that order.

Limits

Time alignment cannot do everything, and pretending otherwise leads to overconfidence in a calibration.

One reference, many seats. Delay alignment is geometrically exact only at the reference point. A large audience will always have positions where arrivals are imperfectly aligned. The mitigation is good loudspeaker aiming and the broad precedence window, not finer delay numbers.

Non-minimum-phase room behaviour cannot be delayed away. Room reflections and modal behaviour create position-dependent comb filtering and decay that no global delay corrects. These belong to acoustic treatment and bass management, not to delay alignment. Delay aligns direct arrivals; the reverberant field — central to reverberation and distance and air — is governed by the room.

Filter-induced group delay is finite. Steep crossovers improve band separation but add group delay around the crossover; you trade phase complexity for magnitude control. Linear-phase FIR filters flatten phase but cost latency and can introduce pre-ringing. There is no free lunch; choose the trade deliberately.

Air and temperature drift. The speed of sound changes with temperature and humidity, so an outdoor system aligned in a cool morning will be slightly mistimed in afternoon heat. Over short throws this is negligible; over long delay-fill distances it is real and may warrant re-checking before showtime.

Alignment serves coherence, not loudness. Perfect timing maximises constructive summation, which raises level at the crossover — but the purpose is coherent imaging and neutral timbre, the cues the rest of the guide depends on, not chasing the last decibel of output. Keep the goal in view: a system that delivers the first wavefront cleanly, sums smoothly through its crossovers, and presents an even, believable spatial image across the audience.

Where this fits in Part V

Time alignment is the timing half of system calibration. Its partner is the spectral half — see equalization and room correction — and both rest on the diagnostics in measurement and calibration. The geometry you are aligning comes from speaker layouts and topologies, and the bass-region timing connects directly to subwoofers and bass management.

References

Toole, F. E. Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, 3rd ed. Routledge, 2017. (Precedence effect, summing localization, minimum-phase behaviour, and why magnitude usually dominates phase audibility.)
Davis, D., Patronis, E., and Brown, P. Sound System Engineering, 4th ed. Focal Press, 2013. (Time-of-flight, delay setting, comb filtering, and electroacoustic measurement.)
McCarthy, B. Sound Systems: Design and Optimization, 3rd ed. Focal Press, 2016. (Summation, phase overlap at crossovers, sub-to-main alignment by measured phase, and the management of overlap-induced combing.)
Ahnert, W., and Steffen, F. Sound Reinforcement Engineering: Fundamentals and Practice. E and FN Spon, 1999. (Propagation, delay-line design, and array timing.)
ITU-R Recommendation BS.775, Multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union. (Reference layouts and per-channel alignment context.)
AES, Journal of the Audio Engineering Society — papers on loudspeaker phase, crossover summation, and minimum-phase EQ (e.g., Lipshitz, Pocock, and Vanderkooy on audibility of phase distortion). Audio Engineering Society.
Rational Acoustics, Smaart v9 User Guide, and ARTA / REW documentation. (Transfer-function phase, impulse response, group delay, and cross-correlation delay finding.)

← Back to Systems & Calibration

Why Alignment Matters Perceptually​

Coherent arrival and the precedence effect​

Misalignment smears imaging and timbre​

Propagation Delay Basics​

The fundamental relationship​

Converting distance to delay​

Worked example: a surround speaker 2.1 m closer​

Delaying the Near Speakers to the Far Ones​

The alignment principle​

Worked example: aligning a 5.0 array to one seat​

The trade for off-centre listeners​

Polarity vs Phase vs Delay​

Polarity​

Phase​

Delay​

Absolute polarity and its audibility​

Phase Coherence at Crossovers and Between Drivers​

Why crossovers create phase problems​

Minimum-phase vs non-minimum-phase behaviour​

All-pass behaviour​

Sub-to-Main Alignment​

The summation problem at the crossover​

Worked example: aligning a sub at an 80 Hz crossover​

The measured procedure​

Worked example: the cost of getting polarity wrong​

Comb Filtering from Misaligned or Duplicated Sources​

The mechanism​

Worked example: a 1 ms duplicate​

Where comb filtering comes from in practice​

Managing comb filtering​

Measuring Delay​

Impulse response and time-of-flight​

Cross-correlation​

Transfer-function phase​

Step-by-Step Alignment Procedure​

Common Mistakes and Pitfalls​

Limits​

References​