Speaker Layouts & Diffusion Topologies

A spatial-audio format is an abstraction. A 5.1 mix, a third-order Ambisonic scene, a Dolby Atmos object bed — none of them know anything about your room, your loudspeakers, or where your audience sits. The moment you commit to hardware, you are translating that abstraction into geometry: a set of points in three-dimensional space, each radiating into a particular volume of air, each at a particular distance from each listener. This chapter is about that translation. It is the first step of the whole calibration chain, and every later step — time alignment, equalisation, bass management, measurement — inherits the geometry you choose here. Get the layout wrong and no amount of DSP will rescue it.

The recurring theme of Part V is that calibration exists to make the physical system deliver the cues the earlier parts assumed. The precedence effect and summing localization only behave if arrivals are coherent in time; a controlled direct-to-reverberant balance only holds if speakers are at sane distances and aimed sensibly; an even response across the audience only happens if coverage is designed, not assumed. Layout is where all three of those start.

From a Format to a Physical Layout

Every spatial format falls into one of three paradigms, and each implies a different relationship between the deliverable and the loudspeakers. Understanding which paradigm you are in tells you how much freedom you have when you place speakers. The three are reviewed in detail in Formats; here we look at them strictly through the lens of physical deployment.

Channel-based: speakers are named destinations

In a channel-based format the signal is already baked for specific positions. "Left surround" is a track, and it expects a loudspeaker at a defined angle. The layout is therefore prescriptive: the format standard (ITU-R BS.775 for 5.1, BS.2051 for immersive beds) tells you the nominal azimuth and elevation of each channel, and your job is to put a speaker there, or as close as the room allows. Deviation degrades the spatial image directly, because there is no renderer between the track and the driver to compensate. If you place the left surround 20 degrees off, the phantom images that mix relied on land in the wrong place.

Object-based: a renderer mediates

In an object-based format (Dolby Atmos, MPEG-H, DTS:X) the deliverable is metadata — position coordinates and gains over time — and a renderer computes loudspeaker feeds at playback for your specific layout. The renderer is told where your speakers actually are (a configuration file or measured setup), and it pans objects accordingly, usually with some flavour of amplitude panning. This buys tolerance: an irregular room is acceptable because the renderer adapts. But it does not buy infinite tolerance. The renderer can only pan between speakers it has; large gaps become holes, and a coordinate that falls in a sparse region of the array localizes poorly. The format still publishes a recommended layout because the renderer was validated against it.

Scene-based: the array reconstructs a field

In a scene-based format (Ambisonics, and by extension WFS) the signal encodes the sound field — spherical-harmonic coefficients, or a recipe for a wavefront — independent of any speaker. A decoder then synthesises that field over whatever array you have. Here the layout requirement is the most physical of all: the array must sample space densely and regularly enough to reconstruct the field up to some spatial order or frequency. Too few speakers, or an uneven distribution, and the decode produces direction errors and timbral colouration. The format does not name speakers at all; it constrains their number and arrangement.

Key takeaway

The paradigm sets your degrees of freedom. Channel-based: place speakers where the standard says. Object-based: place them sensibly and tell the renderer the truth. Scene-based: satisfy a density and regularity constraint. Most real installs are hybrids — an Atmos room plays channel beds, objects, and can host an Ambisonic decode — so you design the layout to satisfy the most demanding paradigm you intend to use.

The invariants you always owe the listener

Whatever the paradigm, three physical invariants must hold for the cues to work. First, coherent arrival: signals meant to fuse into one image must arrive within the precedence window (roughly the first 1–5 ms for most sources) so the auditory system fuses them rather than hearing an echo. Second, matched level at the listener, not at the speaker — a distant speaker must be turned up to compensate for inverse-distance loss. Third, even coverage so that listeners off the sweet spot still hear a plausible balance. Layout is where you make these achievable; alignment and EQ are where you finish the job.

The Diffusion Topologies

Before standard layouts, it helps to classify families of layout by the spatial region they cover and the rendering they support. I call these diffusion topologies. They range from a single frontal cluster to a full enclosing sphere, and the choice is driven by content, audience size, and budget.

Frontal (PA and line-array)

All sound comes from in front of the audience. This is the topology of concert sound reinforcement: a left/right pair of line arrays, possibly with centre and outfills, throwing into a deep audience. Spatial intent is limited — stereo at best, often effectively mono for much of the audience — but coverage and SPL over distance are paramount. A line array exists precisely to fight the inverse-distance law over a long throw, which we return to below.

LCR

Left, Centre, Right across the front. The centre anchors dialogue and hard-panned-centre content so it does not collapse for off-axis listeners — the classic cinema problem that a phantom centre cannot solve for a wide audience. LCR is the backbone of every film format and the front of every immersive layout.

Extended frontal

LCR plus wide front speakers (the "front wide" or "Lw/Rw" positions at around ±60 degrees). Used in large cinemas and some immersive music rooms to keep panning smooth across a very wide screen, where the gap from centre to a ±30-degree front would otherwise be audible as a panning jump.

Horizontal surround

Front speakers plus surrounds and/or rear speakers in the horizontal plane: 5.1, 7.1. The listener is enveloped in the plane of their ears, which is where lateral envelopment cues live. No height.

Full 360 horizontal

A dense ring of equally spaced speakers — eight, sixteen, more — for horizontal-only VBAP or 2D Ambisonics. Common in planetariums, ring-based installations, and electronic-music venues. Regular spacing is the whole point: it makes panning uniform in all directions.

Dome / hemisphere

A ring (or several rings at rising elevations) plus an apex, covering the upper hemisphere. This is the topology for 3D Ambisonic decodes and immersive object rendering — Atmos 7.1.4 and 9.1.6 are sparse hemispheres; research domes (e.g. the ICST or IRCAM rigs) are dense ones with 20–64 speakers. Height channels add the elevation cues that horizontal-only systems cannot.

Full sphere

A dome plus speakers below the listener — only practical with a transparent floor or a raised mesh platform. Used for full periphonic Ambisonics where sources must appear below as well as above. Rare outside research and high-end planetaria.

Dense WFS lines

A continuous-seeming line (or frame) of closely spaced drivers — often 10–25 cm apart — that synthesises wavefronts by Huygens' principle. Not a "surround" in the panning sense; it reconstructs a field with true depth and a listening area rather than a sweet spot. The defining constraint is driver spacing versus aliasing frequency, covered below.

Comparison of topologies

Topology	Region covered	Typical speaker count	Primary rendering	Sweet spot	Typical use
Frontal / line array	Front only	2–24 (per array)	Stereo / mono	Wide but frontal	Live PA, concerts
LCR	Front only	3	Discrete + phantom	Wide horizontally	Cinema front, broadcast
Extended frontal	Wide front	5	VBAP front	Wide	Large cinema, wide screens
Horizontal surround	Ear-plane ring	6–8	Channel + VBAP 2D	Moderate	5.1 / 7.1 home, film
Full 360 horizontal	Horizontal ring	8–24	VBAP 2D / 2D Ambisonics	Centre-biased	Planetarium, club, install
Dome / hemisphere	Upper hemisphere	12–64	3D Ambisonics / object	Centre-biased	Immersive cinema, research
Full sphere	Whole sphere	24–128	Periphonic Ambisonics	Small, central	Research, high-end dome
Dense WFS line	Plane/frame	32–256+	Wave field synthesis	Whole area	Theatre, museum, lab

The trend down the table is from coverage-driven topologies (where the question is "does everyone hear it loud and clear?") to resolution-driven topologies (where the question is "is the field accurately reconstructed?"). Most commercial work lives in the middle rows; the bottom rows are where spatial fidelity is pushed hardest at the cost of speaker count.

Standard Reference Layouts and Their Geometry

Standards exist so that a mix made in one room translates to another. The numbers below are the canonical nominal angles; real rooms approximate them within tolerances the standards also specify.

Stereo

Two speakers and the listener form an equilateral triangle: each speaker at ±30 degrees azimuth from forward, both at ear height (roughly 1.2 m seated), at equal distance. The 60-degree included angle is the foundation of every stereo phantom image; widen it and the centre image weakens and may split, narrow it and the soundstage collapses. Equal radius is what makes a centre-panned signal arrive coherently at the listener — the precondition for summing localization.

5.1 per ITU-R BS.775

Five full-range speakers plus a low-frequency-effects channel:

Channel	Azimuth (deg)	Elevation (deg)
Centre (C)	0	0
Left / Right (L/R)	±30	0
Left/Right Surround (Ls/Rs)	±100 to ±120	0
LFE	— (non-directional)	0

The front three keep the stereo geometry and add a hard centre. The surrounds sit behind the lateral axis, between 100 and 120 degrees, a deliberately wide range because their job is envelopment, not precise imaging — the lateral reflections and decorrelated content they carry do not demand pinpoint placement. All five sit in the horizontal plane. The LFE is band-limited (typically below 120 Hz) and, being non-directional in that band, can go almost anywhere.

7.1

7.1 splits the surround field into side and rear pairs: L/R at ±30, C at 0, side surrounds (Lss/Rss) at about ±90 degrees, and rear surrounds (Lrs/Rrs) at about ±135 to ±150 degrees. The extra pair tightens rear imaging and smooths front-to-back pans. Note that the cinema "7.1" (Dolby Surround 7.1) and the home BS.2051 System C differ slightly in nominal angles; always design to the standard your content targets.

Immersive: 7.1.4 and 9.1.6

The ".4" and ".6" suffixes count height speakers. A 7.1.4 layout is a 7.1 horizontal bed plus four overhead/height speakers; 9.1.6 adds front wides and two more height speakers. Drawing on BS.2051 and the Dolby Atmos home and install recommendations:

Channel group	Azimuth (deg)	Elevation (deg)
Front L/C/R	±30 / 0 / ±30	0
Front wide (9.x only)	±60	0
Side surround	±90 to ±110	0
Rear surround	±135 to ±150	0
Front height (Ltf/Rtf)	±30 to ±45	+30 to +45
Rear height (Ltb/Rtb)	±110 to ±135	+30 to +45
Top / voice-of-god (.6)	±90 (or 0)	+45 to +90

The two crucial things about height speakers: they should sit at a real elevation (commonly +45 degrees, or "top front / top rear" overhead), and their azimuth should roughly align over the corresponding floor speakers so vertical pans read as vertical and not diagonal. We design a full 7.1.4 to these numbers in the worked example below.

note

These angles are nominal. BS.2051 and the Atmos guides publish tolerance windows (often ±10 degrees on surrounds, tighter on the front) because no real room is a perfect circle. The renderer is told the true positions; the tolerances tell you how far you can stray before the design intent breaks. Stay inside them and document the as-built geometry — alignment and EQ depend on knowing it.

Coverage and Loudspeaker Directivity

A loudspeaker is not a point that radiates equally everywhere; it has a directivity pattern that concentrates output into a coverage angle and rolls off outside it. Coverage design is the art of arranging these patterns so every seat is inside one, with even level and consistent tone — and the physics is governed by two relationships: angular coverage and inverse-distance level loss.

Coverage angle and the −6 dB convention

A speaker's nominal coverage angle is, by convention, the included angle between the two off-axis directions where the level has dropped 6 dB relative to on-axis. A horn rated "90° × 50°" is 90 degrees wide and 50 degrees tall at the −6 dB points. To cover an audience you tile these cones: adjacent speakers are aimed so their −6 dB edges overlap, which (because each is −6 dB down at the seam, and two correlated −6 dB signals sum to roughly the on-axis level) yields approximately even direct-field coverage along the seam.

The directivity also concentrates energy: the directivity factor $Q$ is the ratio of on-axis intensity to the intensity of an omnidirectional source of the same total power. A rough geometric estimate from the coverage angles (in degrees) is

Q \approx \frac{180^\circ \times 180^\circ}{\theta_h \, \theta_v \cdot k}

where $\theta_h$ and $\theta_v$ are the horizontal and vertical coverage angles and $k$ is a fudge near unity; the directivity index is $DI = 10\log_{10} Q$ in dB. A higher $Q$ means more direct sound for the same power — which is exactly how you control the direct-to-reverberant ratio in a live room: a tighter pattern throws more direct energy to the back rows before the room's reverberant field takes over.

Inverse-distance level variation across an audience

A single point source obeys the inverse-square law: sound intensity falls as $1/r^2$ , so the sound pressure level falls 6 dB per doubling of distance:

\Delta L = 20 \log_{10}\!\left(\frac{r_2}{r_1}\right) \ \text{dB}

Worked example 1 — front-to-back loss from one speaker. A single full-range speaker covers an audience whose front row is at $r_1 = 3$ m and back row at $r_2 = 18$ m. The level difference is

\Delta L = 20\log_{10}\!\left(\frac{18}{3}\right) = 20\log_{10}(6) \approx 15.6 \ \text{dB}.

A 15.6 dB swing front-to-back is unacceptable for music — the front rows are blasted while the back strains. This is the problem every coverage design must beat.

How arrays fight the inverse law

Two strategies reduce that swing. Aiming/distance staggering: aim the speaker so its on-axis points at the far seats and the near seats sit progressively off-axis, where the pattern is down several dB. If the near row is 8 dB off-axis and the far row is on-axis, the net front-to-back variation drops from 15.6 dB toward 7–8 dB. Line arrays: a tall column of sources behaves, over a useful frequency and distance range, more like a cylindrical (line) source than a point source. A cylindrical wave spreads in only two dimensions, so its level falls only

\Delta L = 10\log_{10}\!\left(\frac{r_2}{r_1}\right) \ \text{dB}

— 3 dB per doubling, half the point-source loss. Over the same 3-to-18 m throw a perfect line source loses only about 7.8 dB. Real arrays transition from cylindrical to spherical behaviour beyond a critical distance and with frequency, which is why long-throw arrays are tall and carefully splayed (curved more at the bottom to reach near seats, straighter at the top for distant seats). The takeaway for spatial work: coverage uniformity is a geometry problem first; EQ cannot fix a level gradient across the audience that the layout created.

Coverage rule of thumb

Aim each speaker's on-axis at the farthest seats it must serve, and let nearer seats fall off-axis where the pattern naturally attenuates. This trades a few dB of high-frequency dullness up close (off-axis rolloff) for a much flatter level across the audience — a trade almost always worth making in a deep room.

Speaker Count vs Spatial Resolution

For coverage you need enough speakers to fill the angles; for spatial resolution — accurate localization and field reconstruction — each renderer imposes its own minimum count and spacing. This is where scene- and object-based systems become demanding.

VBAP: pairs and triplets

Vector Base Amplitude Panning pans a source by distributing gain across the two (2D) or three (3D) nearest speakers surrounding its direction. The angular gap between adjacent speakers sets the worst-case localization blur: a source midway between two speakers 60 degrees apart is a phantom whose stability and width depend on that gap. As a practical guide, keep adjacent-speaker spacing at or below about 30–45 degrees in directions that matter for imaging. Halving the gap roughly halves the phantom's wander as a listener moves. There is no aliasing limit as in WFS — VBAP degrades gracefully — but holes appear wherever the gap is large.

Ambisonics: order sets the count

An Ambisonic decode of order $N$ needs at least

(N+1)^2

loudspeakers for full 3D (periphonic) reconstruction, and $2N+1$ for 2D (horizontal-only). First order ( $N=1$ ) needs at least 4 (3D) — the classic tetrahedral minimum; third order needs at least 16; for a robust, even decode you want comfortably more than the minimum and, crucially, a near-uniform angular distribution. The spatial resolution — the sharpness of a rendered direction — improves with order, but only if the array can support it. A 16-speaker dome can decode third order; feed third-order material to a sparse, irregular array and the higher harmonics fold into direction errors, exactly as undersampling folds high frequencies in time-domain DSP.

The reconstruction is also frequency-limited: an order- $N$ decode is accurate over a central region of radius

r \approx \frac{N \, c}{2\pi f}

— the sweet area shrinks as frequency rises. At first order and 1 kHz the region of faithful reconstruction is only a few centimetres across, which is why low-order Ambisonics is best-effort for a single head and why higher orders (and more speakers) are needed to enlarge the listening area.

WFS: the spatial-aliasing spacing constraint

Wave field synthesis reconstructs a true wavefront from a line of secondary sources (the loudspeakers) spaced $\Delta x$ apart. Discretising a continuous wavefront into point sources works only up to a frequency where the wavelength is comparable to the spacing. Above the spatial-aliasing frequency

f_{\text{alias}} \approx \frac{c}{2\,\Delta x}

the synthesis breaks down: the array radiates spurious directions and the field is no longer faithfully reconstructed (it still sounds like something, but imaging and timbre degrade).

Worked example 2 — WFS driver spacing. With $c = 343$ m/s and drivers spaced $\Delta x = 0.17$ m (17 cm, a common WFS module pitch):

f_{\text{alias}} \approx \frac{343}{2 \times 0.17} \approx 1009 \ \text{Hz}.

Faithful WFS only up to about 1 kHz. To push aliasing to 2 kHz you must halve the spacing to about 8.5 cm — doubling the driver count per metre. This is why true WFS installations have hundreds of drivers and why most "WFS-like" commercial systems accept aliasing above 1–1.5 kHz, relying on the ear's reduced reliance on interaural cues there. The spacing constraint is the single hardest economic fact of WFS.

Renderer-vs-array mismatch

The most common high-end failure is feeding a renderer more spatial detail than the array can reproduce: third-order Ambisonics into 8 speakers, or 2 kHz content into a 30 cm WFS line. The result is not graceful — it is direction errors, comb filtering, and timbral colour. Always size the array to the highest order or frequency you intend to reproduce, or downgrade the decode to match the array. See Ambisonics and WFS.

Summary of count requirements

Renderer	Minimum speakers	For good results	Governing constraint
Stereo / phantom	2	2	±30° geometry
VBAP 2D	4–6	gap ≤ 30–45°	adjacent angular gap
VBAP 3D	6–8	dome of 12–24	triangulation gaps
Ambisonics order $N$ (2D)	$2N+1$	$\geq 1.5\times$ min, uniform	order, uniformity
Ambisonics order $N$ (3D)	$(N+1)^2$	$\geq 1.5\times$ min, uniform	order, uniformity, $r\approx Nc/2\pi f$
WFS	line of many	$\Delta x \leq c/2f_{\max}$	spatial-aliasing frequency

Distance, Level, and Delay Implications of an Irregular Layout

Standards draw perfect circles; rooms force compromise. A surround speaker may sit closer than the fronts because the rear wall is near; a height speaker may be 1.5 m further away than its floor partner. Every such radius mismatch creates a level and timing error that the layout bequeaths to the alignment stage. Knowing this at layout time lets you minimise it.

Level: equal SPL at the listener, not at the speaker

Because of inverse-distance loss, two identical speakers at different radii produce different levels at the listener. A speaker at 4 m versus one at 2 m is

20\log_{10}\!\left(\frac{4}{2}\right) = 6 \ \text{dB}

quieter at the listening position even with identical drive. Calibration fixes this with per-channel gain trims (the standard procedure plays band-limited pink noise per channel and trims each to the same SPL at the reference seat — typically 79 or 85 dB SPL depending on the standard). But trimming a distant speaker up costs headroom and raises its contribution to the reverberant field; so at layout time, keep radii as equal as you can.

Delay: equal arrival time at the listener

Sound travels at roughly $c \approx 343$ m/s, so it covers 1 m in about 2.9 ms. A speaker 1 m closer than the others arrives 2.9 ms early. If that early arrival belongs to content meant to fuse with the others, the precedence effect pulls the image toward the early speaker — the soundstage skews. The fix is to delay the nearer speakers so all arrivals coincide at the reference seat:

\Delta t = \frac{d_{\max} - d_i}{c}

where $d_{\max}$ is the farthest speaker's distance and $d_i$ each speaker's distance. We design this fully in Time Alignment & Phase; here the point is preventive. The bigger the radius spread, the more delay you must add, and delay you add to near speakers is headroom and latency you spend.

Worked example 3 — irregular surround. Fronts at 3.0 m, side surrounds at 2.1 m, heights at 3.4 m. The farthest is 3.4 m. Delays to equalise arrival:

Heights (3.4 m): $\Delta t = 0$ ms (reference).
Fronts (3.0 m): $\Delta t = (3.4 - 3.0)/343 = 0.4/343 \approx 1.17$ ms.
Surrounds (2.1 m): $\Delta t = (3.4 - 2.1)/343 = 1.3/343 \approx 3.79$ ms.

And the level trims, relative to the 3.4 m heights:

Fronts: $20\log_{10}(3.4/3.0) \approx +1.1$ dB.
Surrounds: $20\log_{10}(3.4/2.1) \approx +4.2$ dB.

So this "irregular but typical" room needs up to 3.8 ms of delay and 4 dB of trim just to undo the geometry. None of it would be necessary on a perfect equal-radius circle — which is the whole argument for designing the layout on a circle whenever the room permits.

Do not "fix" geometry with EQ

A level gradient or a timing skew created by unequal speaker distances is a geometry defect. Reaching for equalisation to mask it is the wrong tool: EQ changes spectrum, not arrival time or the spatial level map across seats. Fix distances and aiming first; trim level and delay second; EQ last, and only for spectral balance. Inverting the order is the most common way calibrations go wrong.

Listening Area vs Whole-Audience Design

A central design decision precedes all the geometry: are you optimising for one seat or for everyone? The two goals pull in opposite directions and most failures come from confusing them.

The sweet spot and why it is small

Phantom imaging depends on relative level and time from multiple speakers arriving coherently at the ears. Move off the centre line and two things happen: the nearer speaker arrives earlier (precedence skews the image toward it) and louder (level skews it too). For a stereo pair, moving sideways by even 0.5 m can collapse a centre image to the near speaker. The sweet spot is the locus where these errors are tolerable, and for amplitude-panned content it is genuinely small — roughly a head-width region at the design centre for critical imaging, widening for less critical content.

This is the regime of the mastering room, the cinema reference seat, the BS.1116 critical-listening position. You design every radius, angle, and trim around one point and accept that other seats are approximations.

Whole-audience strategies

For a paying audience you cannot give everyone the sweet spot, so you change strategy. Options, roughly in increasing order of cost:

Hard centre channels so dialogue and lead vocals do not rely on a phantom that only the centre seat hears correctly — the entire reason cinema is LCR not LR.
Wider speaker counts (extended frontal, dense rings) so the angular gap between adjacent speakers is small; a smaller gap means a panned image wanders less as the listener moves, because the precedence/level errors are bounded by the gap.
Distance-based topologies — WFS in particular synthesises a wavefront that is correct over a whole area, not a point, which is its headline advantage over panning for large audiences.
Decorrelation for envelopment: where precise imaging is hopeless (deep surround fields), feed decorrelated content so the goal becomes envelopment rather than localization — envelopment is far more robust to listener position than a phantom image.

The honest framing: amplitude-panning topologies optimise a point and tolerate the rest; scene-based and dense topologies trade speaker count for a genuine listening area. Choose before you place a single speaker, because the count, spacing, and budget all follow from it.

Worked Example: Laying Out a 7.1.4 Room

Let us design a complete 7.1.4 immersive room to BS.2051/Atmos-style angles, on an equal-radius circle, and check coverage. The room is 6.0 m wide, 7.0 m deep, 3.0 m high. The reference listening position is on the centre line, 3.5 m from the front wall, ears at 1.2 m.

Step 1 — choose the design radius

We want all bed speakers equidistant from the reference seat. The binding constraint is the front L/R at ±30 degrees: they must fit on the front wall (6 m wide, so ±3 m from centre) without crowding the corners. A speaker at ±30 degrees and radius $R$ sits at lateral offset $R\sin 30^\circ = 0.5R$ . To keep it within 2.6 m of centre (leaving margin), $0.5R \leq 2.6$ , so $R \leq 5.2$ m. The room depth from the front wall to the seat is 3.5 m, so a front speaker at ±30 degrees and radius 2.4 m sits $2.4\cos 30^\circ \approx 2.08$ m forward of the seat — comfortably on the front wall. Choose $R = 2.4$ m as the bed radius.

Step 2 — place the horizontal bed (7.1)

At $R = 2.4$ m, ear height 1.2 m, relative to the seat (forward = +y, right = +x):

Channel	Azimuth	$x = R\sin\theta$ (m)	$y = R\cos\theta$ (m)
Centre	0°	0.00	+2.40
Left / Right	∓30° / ±30°	∓1.20 / ±1.20	+2.08
Side surround L/R	±90°	±2.40	0.00
Rear surround L/R	±135°	±1.70	−1.70

Check the room: side surrounds at ±2.4 m need a 4.8 m span; the room is 6.0 m wide, so they sit 0.6 m off each side wall — fine. Rear surrounds at $y = -1.7$ m relative to the seat are 1.7 m behind it, i.e. 3.5 + 1.7 = 5.2 m from the front wall, leaving 1.8 m to the rear wall — fine.

Step 3 — place the four height speakers

For 7.1.4, two front-height (Ltf/Rtf) and two rear-height (Ltb/Rtb), at elevation +45 degrees and azimuth roughly over the front and rear pairs. We keep the same slant radius $R = 2.4$ m so arrival times match the bed (this is the trick that minimises the delay corrections from the previous section). At elevation $\phi = 45^\circ$ :

Height above ear level: $R\sin 45^\circ = 2.4 \times 0.707 \approx 1.70$ m, so the speaker is at $1.2 + 1.70 = 2.90$ m above the floor — and the ceiling is 3.0 m. That fits with 0.10 m to spare.
Horizontal radius: $R\cos 45^\circ \approx 1.70$ m.

Front heights at ±45 degrees azimuth, rear heights at ±135 degrees azimuth, each at horizontal radius 1.70 m:

Channel	Az / El	$x$ (m)	$y$ (m)	$z$ above floor (m)
Top front L/R	±45° / +45°	±1.20	+1.20	2.90
Top back L/R	±135° / +45°	±1.20	−1.20	2.90

The four height speakers form a 2.4 m × 2.4 m square on the ceiling centred over the seat — a clean, symmetric overhead array, each one 2.4 m from the reference seat just like the bed.

Step 4 — coverage check

Each speaker must cover the listening zone — say a 2.0 m × 2.0 m area around the reference seat (a small audience or a producer plus clients). Take the centre speaker at 2.4 m: the listening zone subtends, at worst, a lateral half-extent of 1.0 m at 2.4 m, i.e.

\arctan\!\left(\frac{1.0}{2.4}\right) \approx 22.6^\circ

to each side — a 45-degree included angle. A typical studio monitor with a 90° × 70° pattern covers that with margin; even at the zone edge the listener is well inside the −6 dB cone. For the height speakers, the same 45-degree zone is covered by their (usually wider) dispersion. Coverage passes.

Step 5 — residual trims and delays

Because we held every speaker at $R = 2.4$ m, the ideal delays and trims are all zero at the reference seat — the payoff of equal-radius design. In practice you still measure: real driver-to-acoustic-centre offsets, baffle positions, and a few centimetres of mounting reality mean you will measure and apply small corrections (a fraction of a millisecond, under a dB) at the alignment stage. But the layout itself has contributed essentially no geometric error — contrast this with Worked Example 3's irregular room needing 3.8 ms and 4 dB of correction. That difference is the entire value of designing on a circle (or sphere).

Design on a sphere, deviate only as forced

Start from a perfect equal-radius sphere (or circle) at the standard angles. Then move each speaker the minimum the room forces, and record the as-built coordinates for the renderer and the alignment DSP. Every metre of radius equalisation you preserve is delay and level trim you do not have to spend — and headroom and image stability you keep.

Common Mistakes and Pitfalls

Uneven radii. The single most common layout error: surrounds bolted to whatever wall was handy, heights mounted wherever a joist allowed, with radii ranging from 1.8 to 3.6 m. The result is large geometric delay and level errors (Worked Example 3) that eat headroom and, if uncorrected, skew every pan. Equalise radii first.

Wrong angles. Placing surrounds at ±150 degrees when the standard calls for ±100–120 (or vice versa), or putting "height" speakers at +20 degrees because the ceiling was low, breaks the format's spatial map. Channel-based content suffers immediately; even a renderer cannot put a sound where there is no speaker near the target direction.

Too few speakers for the renderer. Decoding third-order Ambisonics on eight speakers, or running WFS at 30 cm spacing and expecting fidelity to 4 kHz. The math (count $(N+1)^2$ , aliasing $f \approx c/2\Delta x$ ) is unforgiving. Match the array to the most demanding rendering you will use, or downgrade the rendering.

Height speakers not over their floor partners. If the top-front speakers sit at a different azimuth than the front L/R, vertical pans read as diagonal smears. Keep height azimuths aligned with the corresponding floor speakers.

Mixing coverage and resolution goals. Designing a "whole-audience" PA layout and then expecting it to image like a near-field studio, or vice versa. Decide point-optimised vs area-optimised first; the count and spacing follow.

Aiming everything at the centre seat in a deep room. This maximises the front-to-back level gradient. In a deep audience, aim for the far seats and let near seats fall off-axis.

Ignoring the floor and ceiling reflections in height placement. A height speaker firing across a hard ceiling adds a strong early reflection that competes with the direct sound and muddies the direct-to-reverberant balance. Aim height speakers down at the audience, not along the ceiling.

Limits

Layout design has hard ceilings that no cleverness removes. Room geometry is the first: a long, narrow, low room cannot host wide front speakers at ±60 degrees or height speakers at +45 degrees without violating one constraint or another, and you must then pick which compromise least harms the content. Speaker count economics is the second: true WFS and high-order periphonic Ambisonics demand dozens to hundreds of channels of amplification and DSP, which puts them outside most budgets — the table of count requirements is also a table of cost. The sweet-spot/area tradeoff is fundamental, not a engineering shortfall: amplitude panning cannot give a large audience a correct image, and only field-synthesis topologies (at their cost) escape it. Frequency-dependent reconstruction: every array has a frequency above which its spatial fidelity degrades — WFS at $f_{\text{alias}}$ , Ambisonics as the faithful radius $r \approx Nc/2\pi f$ shrinks. And finally, the room itself imposes a limit the layout cannot beat: above the reverberation-dominated transition, the diffuse field blurs whatever the direct field tried to image. Layout maximises the direct-field cues; it cannot abolish the room. Those cues are then preserved — or squandered — by the alignment, EQ, and bass-management stages that follow, and verified by measurement. Layout is necessary but never sufficient; it is the foundation the rest of Part V builds on.

Where this chapter hands off

You now have an as-built geometry: every speaker's coordinates, angles, and distance to the reference seat. The next chapter, Time Alignment & Phase, turns those distances into delays so arrivals coincide; then Equalisation & Room Correction flattens spectrum, Subwoofers & Bass Management integrates the low end, and Measurement & Calibration verifies the whole. Carry the coordinate table forward — every later stage needs it.

References

ITU-R BS.775-3, Multichannel stereophonic sound system with and without accompanying picture, International Telecommunication Union, 2012. (Canonical 5.1 angles and tolerances.)
ITU-R BS.2051-3, Advanced sound system for programme production, International Telecommunication Union, 2022. (Immersive bed and height-speaker geometries, Systems A–J.)
F. E. Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, 3rd ed., Routledge/Focal Press, 2017. (Directivity, coverage, sweet-spot and whole-audience behaviour.)
D. Davis, E. Patronis, and P. Brown, Sound System Engineering, 4th ed., Focal Press, 2013. (Inverse-distance laws, directivity factor $Q$ , coverage and array design.)
W. Ahnert and F. Steffen, Sound Reinforcement Engineering: Fundamentals and Practice, E & FN Spon, 1999. (Line-array behaviour, coverage tiling, $Q$ and $DI$ .)
Dolby Laboratories, Dolby Atmos Home Theater Installation Guidelines and Dolby Atmos Cinema Installation technical documents, Dolby, current editions. (7.1.4 / 9.1.6 angles, height placement, renderer configuration.)
M. A. Gerzon, "Periphony: With-Height Sound Reproduction," Journal of the Audio Engineering Society, vol. 21, no. 1, pp. 2–10, 1973. (Ambisonic order and loudspeaker-count relationships.)
A. J. Berkhout, D. de Vries, and P. Vogel, "Acoustic control by wave field synthesis," Journal of the Acoustical Society of America, vol. 93, no. 5, pp. 2764–2778, 1993. (WFS principle and the spatial-aliasing spacing constraint.)

← Back to Systems & Calibration

From a Format to a Physical Layout​

Channel-based: speakers are named destinations​

Object-based: a renderer mediates​

Scene-based: the array reconstructs a field​

The invariants you always owe the listener​

The Diffusion Topologies​

Frontal (PA and line-array)​

LCR​

Extended frontal​

Horizontal surround​

Full 360 horizontal​

Dome / hemisphere​

Full sphere​

Dense WFS lines​

Comparison of topologies​

Standard Reference Layouts and Their Geometry​

Stereo​

5.1 per ITU-R BS.775​

7.1​

Immersive: 7.1.4 and 9.1.6​

Coverage and Loudspeaker Directivity​

Coverage angle and the −6 dB convention​

Inverse-distance level variation across an audience​

How arrays fight the inverse law​

Speaker Count vs Spatial Resolution​

VBAP: pairs and triplets​

Ambisonics: order sets the count​

WFS: the spatial-aliasing spacing constraint​

Summary of count requirements​

Distance, Level, and Delay Implications of an Irregular Layout​

Level: equal SPL at the listener, not at the speaker​

Delay: equal arrival time at the listener​

Listening Area vs Whole-Audience Design​

The sweet spot and why it is small​

Whole-audience strategies​

Worked Example: Laying Out a 7.1.4 Room​

Step 1 — choose the design radius​

Step 2 — place the horizontal bed (7.1)​

Step 3 — place the four height speakers​

Step 4 — coverage check​

Step 5 — residual trims and delays​

Common Mistakes and Pitfalls​

Limits​

References​