Surround Recording

Stereo recording, covered in Stereo Microphone Techniques, asks a microphone array to map a roughly $60^\circ$ frontal stage onto two loudspeakers spanning $\pm30^\circ$ . Surround recording asks something far harder: it must populate a full $360^\circ$ horizontal field with three front channels, two (or four) surround channels, and an optional low-frequency channel, while keeping the front stage as sharp as a good stereo pair and making the rear feel like an enveloping acoustic space rather than a second band playing behind you. This chapter develops the main microphone arrays that have become standards for 5.1 and 7.1 capture — the Decca tree, OCT, INA-3/5, the Fukada tree, and the Hamasaki square — entirely from the geometry and psychoacoustics that justify them.

The recurring theme of Part IV holds throughout: capture and reproduction are duals. Coincident arrays encode direction as inter-channel level differences; spaced arrays encode it as inter-channel time differences. These are exactly the two binaural cues the ear decodes — interaural level differences (ILD) and interaural time differences (ITD) — described in Psychoacoustics, and exactly the two cues amplitude panning and time panning exploit on the reproduction side in Techniques. A surround microphone array is a spatial encoder; the loudspeaker layout is its matched decoder. Keep that duality in mind and every array below stops being a recipe and becomes a derivation.

The Goal: A 5.1/7.1 Sound Field

The target format is defined precisely in Formats, but the loudspeaker geometry is worth restating because every array is engineered against it. The ITU-R BS.775 layout places the Centre at $0^\circ$ , Left and Right at $\pm30^\circ$ (the legacy stereo positions), and the two surrounds, Left Surround and Right Surround, at $\pm110^\circ$ — well behind the listener's ears. The asymmetry is deliberate and consequential: the front three speakers cover a $60^\circ$ arc densely, while a vast $140^\circ$ gap yawns between Right at $+30^\circ$ and Right Surround at $+110^\circ$ , and an even wider $140^\circ$ gap separates the two surrounds across the back. A 7.1 layout adds rear-back or side channels to subdivide that gap, but the fundamental problem remains.

This geometry dictates a split philosophy. The front must deliver localisation: a violinist should sit at a definite azimuth and stay there. The sides and rear must deliver envelopment — the sense, analysed in Direct, Diffuse and Envelopment, of being surrounded by a reverberant field arriving from all directions. Crucially, envelopment is not localised sound from behind. A correlated reverb tail panned hard into both surrounds collapses to a phantom centre-rear and sounds like a defect. Genuine envelopment requires decorrelated signals — different waveforms, not merely different levels of the same waveform — at the two surround channels. This single fact, that the front wants correlation-derived imaging and the rear wants decorrelation-derived envelopment, explains why almost every successful surround array uses different microphone strategies front and back: a coherent main array forward, a spaced or figure-of-eight ambience array aft.

The surround formats and their reproduction logic are detailed in Surround and Matrix; this chapter is the input side of that same coin.

Key takeaway

A surround array is two arrays bolted together: a front imaging array optimised for stable phantom sources between L, C and R, and a rear ambience array optimised for low inter-channel coherence. Designing one with the goals of the other is the most common conceptual error in the discipline.

Main Microphone vs Spot Microphone Philosophy

Two recording philosophies coexist and, in practice, combine.

The main-microphone (Hauptmikrofon) approach uses a single coherent array near the conductor's position to capture the whole ensemble and its acoustic in one balanced perspective. Spatial relationships — who sits left, who sits right, how reverberant the hall is — are encoded by the array geometry itself. This is the European classical tradition (Decca, the German schools) and it produces a natural, internally consistent image because all instruments share one set of inter-channel time and level relationships and one reverberant perspective.

The spot-microphone (multi-mic) approach, dominant in larger productions and most non-classical work, places close microphones on individual sections or instruments and pans them into the surround field at mixdown using amplitude panning (see Amplitude Panning). It offers control — balance, presence, the ability to lift a weak soloist — at the cost of a synthetic spatial image that must be reassembled by the engineer and that lacks a unified acoustic.

Real sessions blend the two: a main array establishes the spatial frame and the overall sound, and a handful of spots add definition where the main array is weak (a quiet harpsichord, a distant timpani). Section 7 covers how to reconcile them without phase damage.

The front triplet and the centre problem

The defining headache of surround capture over stereo is the centre channel. In two-channel stereo, the centre of the stage is a phantom image formed by equal signals in L and R; the ear places it at $0^\circ$ by summing the two loudspeaker contributions. Surround provides a real loudspeaker at $0^\circ$ , which should anchor dialogue and central soloists rock-solidly — but only if it is fed sensibly. There are three options:

Phantom-only (C silent or low): keep the classic stereo phantom centre and use C lightly or not at all. Imaging is smooth and stereo-compatible, but the hard centre is less stable off-axis and the real C speaker is wasted.
Discrete C from a dedicated centre mic: a third front microphone feeds C directly. This gives a stable, rock-anchored centre but risks the centre build-up problem: energy that also appears in L and R, partially correlated, producing comb filtering and a narrowed, mono-ish front.
Derived/divergence centre: C is matrixed or panned from L/R with controlled divergence.

The art of every front array below is how it splits the L/C/R region so that the three channels overlap just enough for smooth panning between them, but not so much that correlated overlap combs and narrows the image. The governing quantity is the array's recording angle — the range of source azimuths mapped across the full width of the reproduced front — and the crossover between adjacent capsules.

The Decca Tree

The Decca tree is the oldest and still one of the most beloved surround-capable main arrays, predating surround itself by decades. It originated at Decca Records in the early 1950s, credited to engineers Roy Wallace and Arthur Haddy, who needed a stereo image with both the localisation of a coincident pair and the spaciousness of spaced omnis. Their solution: three spaced omnidirectional microphones on a T-shaped bar.

Geometry

The classic Decca tree mounts two microphones as a left/right pair separated by roughly $2\,\text{m}$ , with a third microphone pushed forward about $1.5\,\text{m}$ ahead of and centred between them, forming a T (or an inverted Y). The whole assembly is flown high above and slightly behind the conductor. Traditionally the capsules are large-diaphragm omnis (the Neumann M50, with its pressure-built presence rise, is iconic), though wide cardioids are sometimes substituted.

Why three spaced omnis work where two would fail comes down to time-of-arrival differences. For a spaced pair, a source at azimuth $\theta$ off the array axis reaches the two capsules with a path difference

$\Delta d = d\,\sin\theta,$

where $d$ is the microphone spacing, producing an inter-channel time difference

$\Delta t = \frac{d\,\sin\theta}{c},$

with $c \approx 343\,\text{m/s}$ . For $d = 2\,\text{m}$ and a source $30^\circ$ off-axis, $\Delta t = (2\times\sin30^\circ)/343 \approx 2.9\,\text{ms}$ — a large time cue that the ear reads as strong lateralisation and, because the two omni signals are largely decorrelated at high frequencies, as spaciousness. The omnis also capture deep, extended low frequency that directional microphones lose to their pressure-gradient roll-off.

The centre microphone is the Decca tree's masterstroke for surround. It directly feeds the C channel and fills the central hole that two widely-spaced omnis would otherwise leave — with a $2\,\text{m}$ gap, sources near $0^\circ$ produce a vague, pulled-apart phantom because the time differences become ambiguous near the centre. The forward centre capsule arrives earliest for central sources (it is physically closest to the stage), so by the precedence effect it dominates localisation of the centre and pins it firmly, while the L/R omnis supply width and depth. This is precisely the discrete-centre solution to the triplet problem, achieved acoustically rather than by matrixing.

From stereo to surround

For surround, the three tree capsules feed L, C and R. Envelopment and the rear channels come from a separate outrigger pair — two further omnis placed wide, several metres out to the left and right of the tree (sometimes $3$ – $6\,\text{m}$ apart) — and/or a dedicated rear array (Section 5). The outriggers widen the front, feed the surrounds with decorrelated hall sound, or both, depending on the production. Decca's own surround practice typically combined the tree with such ambience microphones distributed in the hall.

Rule of thumb

A Decca tree's "sound" is large, warm and enveloping because it is built from omnis — full low end and, above a few hundred hertz, strong inter-channel decorrelation. Its weakness is precision: with no level differences and only time cues, phantom images between the capsules are diffuse. Choose it when you want the hall and a big, generous image; choose a directional array (below) when you want pinpoint section localisation.

Front Arrays: OCT, INA-3/5 and the Fukada Tree

Where the Decca tree relies on time differences from spaced omnis, the modern German and Japanese front arrays engineer the L/C/R region using directional microphones so that level differences (from the polar patterns) and time differences (from the spacing) combine to give controllable recording angles and clean channel crossovers. These are near-coincident to spaced hybrids, and each makes a different trade.

OCT (Optimised Cardioid Triangle)

Developed by Günther Theile at the IRT, OCT is engineered specifically to minimise crosstalk between front channels. Its centre microphone is a cardioid facing the stage. The Left and Right microphones are supercardioids (or hypercardioids) facing outward, to the sides, not the stage. Pointing L and R sideways is the key trick: their nulls face the centre of the stage, so the L and R capsules pick up little of the central source. That source is captured almost exclusively by the centre cardioid, which dramatically reduces the centre build-up and comb filtering that plague arrays where all three capsules see the centre strongly. The L–R spacing is typically around $40$ – $90\,\text{cm}$ (adjustable to set the recording angle), with the centre microphone advanced about $8\,\text{cm}$ forward. OCT often adds two omni "OCT-surround" or low-frequency support capsules at the sides to restore the bass that the directional capsules lose. The result is a clean, well-separated front triplet with reliable localisation — at the cost of the warmth and effortless space of an all-omni tree.

INA-3 and INA-5 (Williams)

Michael Williams formalised the design of multichannel arrays through a rigorous geometric framework based on critical linking of recording-angle segments. His INA (Ideal Cardioid Array) family treats the front as a chain of stereo pairs — L–C and C–R — each covering a contiguous segment of the stage, and computes the spacings and angles so that the segments join seamlessly into one continuous recording angle with no gap and no overlap at the crossover.

The governing idea is that any stereo pair has a recording angle determined jointly by its microphones' inter-axis angle and their spacing, via the combination of level differences (from the angle between directional capsules) and time differences (from the spacing). Williams published extensive "critical linking" curves giving the exact spacing/angle pairs. INA-3 uses three cardioids: a centre facing forward and L/R splayed outward and spaced (typically the L–R base around $0.5$ – $0.7\,\text{m}$ with inter-axis angles near $90^\circ$ ), yielding a total front recording angle that can be tuned from roughly $90^\circ$ to $180^\circ$ . INA-5 extends the same logic to all five channels, adding rear capsules so that the entire $360^\circ$ field is partitioned into linked segments — front L–C, C–R, plus side and rear segments — a fully integrated surround main array rather than a front array with bolt-on ambience.

The Fukada tree

Akira Fukada (NHK) sought to keep the Decca tree's beloved spaciousness while adding the channel separation that omnis lack. The Fukada tree therefore uses three cardioids in the Decca T-configuration (L, C, R) for directional front imaging, but flanks them with two omnis placed wide to the sides for low-frequency extension and spaciousness, and adds a rear pair (cardioids facing backward) for the surrounds. It is, in effect, a Decca tree's geometry rebuilt with directional front capsules plus omni "wings" — a deliberate compromise that captures the whole surround field in one coherent array with both good front localisation and a natural, enveloping bloom. Its larger overall footprint (the omni wings can sit a couple of metres out) demands more space and careful rigging.

ORTF-Surround

The ORTF-Surround (a Schoeps configuration) generalizes the classic ORTF pair into a four-capsule horizontal surround array. Four supercardioids sit on the corners of a small rectangle (roughly $10\,\text{cm}$ wide by $20\,\text{cm}$ deep), the front pair splayed outward by about $\pm 55^\circ$ and the rear pair by about $\pm 125^\circ$ , so the four look directions partition the horizontal circle. Like the stereo ORTF it imitates, each adjacent pair is near-coincident — small spacing, moderate angle — giving sharp, well-focused imaging and good mono behaviour from a remarkably compact, gig-friendly body. Why it exists: to bring ORTF's loved blend of focus and portability to full $360^\circ$ surround capture, and — crucially — to serve as the horizontal base beneath an ORTF-3D height layer. Strengths: compact, sharp imaging, travels well, integrates directly with a height layer. Weaknesses: the small footprint yields less envelopment than a widely spaced array, and supercardioid off-axis colour must be watched in the diffuse field.

Comparison

The table contrasts the principal front/main arrays. Spacings are typical starting points, not absolutes; most arrays are tunable.

Array	Capsule types (L/C/R)	Primary cue	Approx. L–R base	Centre handling	Strength	Weakness
Decca tree	3 × omni	Time (Δt)	~2.0 m	Forward omni, precedence-pinned	Warmth, deep bass, big space	Diffuse imaging, large footprint
OCT	super/hyper (L,R out) + cardioid (C)	Level + time	~0.4–0.9 m	Cardioid C, L/R nulls at centre	Very low centre crosstalk, clean front	Less natural space, needs LF support
INA-3	3 × cardioid	Level + time	~0.5–0.7 m	Cardioid C, critically linked	Continuous, gap-free recording angle	Requires precise geometry
INA-5	5 × cardioid	Level + time	~0.5–0.7 m	As INA-3, full 360° linking	Integrated whole-field array	Complex setup, less LF
Fukada tree	3 × cardioid + 2 omni wings + rear pair	Time + level	~1–1.8 m	Cardioid C in T-config	Decca-like space + separation	Big, many channels to manage
ORTF-Surround	4 × supercardioid (no centre)	Level + small time	~0.1 m (10×20 cm)	No discrete centre (phantom)	Compact, sharp, height-ready	Less envelopment; off-axis colour

note

Notice the duality made concrete in the "Primary cue" column. Omni arrays (Decca) encode direction almost purely as time differences — the same cue that ITD provides binaurally and that the spaced playback techniques exploit. Directional arrays (OCT, INA) add level differences from their polar patterns — the ILD/intensity cue. Most modern arrays use both, exactly as the ear does and exactly as the panning laws in Amplitude Panning blend the two.

The Rear and Ambience: Surround Pairs and the Hamasaki Square

The surround channels exist to reproduce the diffuse reverberant field of the recording space — the late, direction-rich energy responsible for listener envelopment, as developed in Reverberation and Direct, Diffuse and Envelopment. The single most important design constraint is decorrelation: the LS and RS signals must be as mutually independent as possible, because envelopment correlates with low inter-channel coherence of the lateral sound field. Feeding the surrounds correlated signals produces phantom imaging behind the listener and destroys envelopment.

Dedicated surround pairs

The simplest approach is a widely spaced pair of omnis or cardioids placed deep in the hall, well back from the main array, feeding LS and RS. The large spacing (several metres) guarantees that the two captured reverberant signals differ in waveform — different room modes, different reflection sequences — so their coherence at the surround speakers is low and the rear field feels diffuse. Distance from the stage also ensures the surrounds carry mostly reverberant energy (past the critical distance discussed in Distance and Air) rather than direct sound, which would pull the orchestra into the back of the room.

The Hamasaki square

Kimio Hamasaki (NHK) devised the most influential dedicated ambience array, the Hamasaki square. It uses four figure-of-eight (bidirectional) microphones arranged in a horizontal square, typically $2$ – $3\,\text{m}$ on a side (sometimes larger in big halls), with every capsule oriented so that its null points at the stage — that is, the figure-of-eights are sideways-facing, their maximum sensitivity aimed laterally (left and right), their dead axis aimed front-and-back toward the source.

The genius is in the polar pattern's orientation. A figure-of-eight has the equation $g(\theta) = \cos\theta$ relative to its main axis; orienting that axis laterally means

$g_{\text{lateral}}(\phi) = \sin\phi,$

with $\phi$ measured from the front. The response is therefore zero for sound arriving from directly ahead (the direct sound of the orchestra) and maximum for sound arriving from the sides — which is overwhelmingly lateral reflected and reverberant energy. Lateral reflections are precisely the components that psychoacoustics identifies as the dominant contributor to apparent source width and listener envelopment (the inter-aural cross-correlation, IACC, is governed by lateral energy). The square thus captures the most envelopment-relevant part of the sound field while rejecting the direct sound that would muddy the rear image. The four capsules' wide spacing keeps their signals decorrelated. The front pair of the square can also feed a small amount into L/R (or into 7.1 side channels) to broaden the front spaciousness; the rear pair feeds LS/RS.

tip

Aim the Hamasaki square's figure-of-eights so their nulls face the stage. If you accidentally point a lobe at the orchestra, you capture direct sound in the surrounds and the ensemble smears into the back of the room — one of the fastest ways to ruin an otherwise good surround recording.

Spacing, Decorrelation and Inter-Channel Coherence

Envelopment without comb filtering is a balancing act governed by inter-channel coherence, the normalised cross-correlation between two channels:

$\gamma_{12} = \frac{\langle s_1(t)\,s_2(t)\rangle}{\sqrt{\langle s_1^2\rangle\,\langle s_2^2\rangle}},$

ranging from $+1$ (identical) through $0$ (independent) to $-1$ (inverted). For the front channels we want moderate coherence — enough shared signal that phantom images form smoothly between L, C and R. For the surround channels we want coherence near zero so that the rear field is diffuse and enveloping. The target is therefore frequency- and channel-dependent, not a single number.

Why spacing decorrelates

Two omnis separated by $d$ receiving a diffuse (isotropic) field have a coherence that follows the well-known sinc function:

$\gamma(d, f) = \frac{\sin(2\pi f d / c)}{2\pi f d / c} = \operatorname{sinc}\!\left(\frac{2\pi f d}{c}\right).$

Coherence first falls to zero when $2\pi f d / c = \pi$ , i.e. when the spacing equals half a wavelength:

$d = \frac{c}{2f}.$

For full decorrelation down to, say, $f = 300\,\text{Hz}$ we need $d = 343/(2\times300) \approx 0.57\,\text{m}$ ; to decorrelate down to $100\,\text{Hz}$ requires $d \approx 1.7\,\text{m}$ . This is the quantitative reason ambience microphones and surround pairs are spaced metres apart: only wide spacing decorrelates the low-mid reverberant energy that carries the bulk of envelopment. At high frequencies even modest spacings decorrelate easily; the challenge is always the low end.

Worked example: coherence vs spacing

Take a surround pair of omnis spaced $d = 1.5\,\text{m}$ in a diffuse reverberant field. The first coherence null is at

$f_0 = \frac{c}{2d} = \frac{343}{2\times1.5} \approx 114\,\text{Hz}.$

Above this frequency the coherence oscillates within a decaying sinc envelope and stays low, so essentially the entire spectrum from $\sim120\,\text{Hz}$ upward is well decorrelated — excellent for envelopment. If instead we used $d = 0.3\,\text{m}$ , the first null jumps to $f_0 = 343/0.6 \approx 572\,\text{Hz}$ , leaving everything below $\sim500\,\text{Hz}$ substantially correlated: the low reverberation would image as a phantom behind the listener instead of enveloping them. This single calculation explains why "just spread the surround mics wide" is sound advice.

The comb-filtering hazard

Decorrelation by spacing has a dark twin. If two correlated signals — the same direct source reaching two spaced capsules — are summed (as happens when they are panned to overlapping channels, or when the listener sits where both speakers combine), the delay $\Delta t = d\sin\theta/c$ produces comb filtering: cancellation notches at frequencies

$f_n = \frac{(2n-1)}{2\,\Delta t}, \qquad n = 1, 2, 3, \dots$

and reinforcement at $f_n = n/\Delta t$ . For $\Delta t = 1\,\text{ms}$ the first notch sits at $500\,\text{Hz}$ with further notches every $1\,\text{kHz}$ — audible, hollow colouration. The design rule is therefore: spacing should be large enough to decorrelate the diffuse field (good) but the capsules feeding adjacent, overlapping channels must not both carry strong correlated direct sound (bad). This is why ambience arrays are decorrelated and pointed away from the source, while front arrays use directionality (OCT, INA) to limit which capsules see the direct sound.

warning

Spacing is a double-edged sword. Wide spacing of ambience microphones is exactly what you want — it decorrelates reverberation for envelopment. Wide spacing of microphones that both carry direct sound and end up summed at the listener produces audible comb filtering. The fix is to keep direct sound in non-overlapping channels (directional capsules, the 3-to-1 rule below) and reserve wide spacing for diffuse ambience.

Combining Main and Spot Microphones

Spot microphones add detail to a main array, but only if their timing is reconciled with the main array, otherwise they comb against it.

Time-aligning the spots

A spot microphone sits close to its instrument; the main array sits metres away. The same instrument therefore arrives at the main array later than at the spot by the propagation delay

$\Delta t = \frac{r_{\text{main}} - r_{\text{spot}}}{c}.$

For a spot $0.5\,\text{m}$ from a violin section and a main array $7\,\text{m}$ away, $\Delta t = (7 - 0.5)/343 \approx 19\,\text{ms}$ . Mixed untreated, the spot leads the main array by $19\,\text{ms}$ — beyond the Haas fusion limit for some material and certainly enough to comb the section's direct sound. The remedy is to delay each spot by its propagation time so the section's direct sound arrives coincident (or slightly later — exploiting precedence so the main array still defines localisation) with the main array's rendering. A common practice is to delay the spot by the geometric distance plus a few milliseconds so the spot supports presence without stealing the image from the main array.

The 3-to-1 rule

The classic guideline for limiting comb filtering between any two microphones picking up the same source is the 3-to-1 rule: the distance between two microphones should be at least three times the distance from either microphone to its intended source. If a spot is $0.3\,\text{m}$ from its instrument, the next microphone (or the main array) should be at least $0.9\,\text{m}$ away from that same instrument. The logic is level-based: at three times the distance the $1/r$ inverse-distance law

$\frac{p_2}{p_1} = \frac{r_1}{r_2} = \frac{1}{3}$

gives the second microphone roughly $-9.5\,\text{dB}$ of the source relative to the first, low enough that the comb-filter ripple on summation stays under about $\pm1\,\text{dB}$ — generally inaudible. The rule is a guideline, not physics; with delay compensation and good panning it can be relaxed, but it is a reliable sanity check.

Panning spots into the surround image

A spot is mono and must be placed in the $360^\circ$ field by amplitude panning across the relevant speaker pair, using the constant-power laws of Amplitude Panning. A first-desk cello spot panned to match its physical seat (say $20^\circ$ right) is distributed between R and C; a percussion spot intended to feel spacious might be spread across L and LS with a little decorrelation. The cardinal rule is consistency: pan each spot to the azimuth where the main array already places that instrument, so the spot reinforces rather than fights the main image. Mismatched panning produces a "double image" — the instrument heard at two azimuths at once.

Bass Management at Capture

The LFE channel and bass management belong properly to Formats and to the calibration stage, but two capture-time considerations matter.

First, in a correctly bass-managed system the subwoofer reproduces the summed low frequency of all channels below the crossover (commonly $80\,\text{Hz}$ ), regardless of whether an LFE channel is recorded. The LFE (.1) channel is an additional effects channel (with up to $+10\,\text{dB}$ in-band headroom), not the place where the orchestra's natural bass lives. For acoustic music the LFE channel is frequently left empty; the low end comes from the main channels and is redirected to the sub by bass management at playback. So at capture you generally do not record a dedicated LFE — you make sure the main array captures full-range bass.

Second, this is a strong argument for omnidirectional main capsules (Decca, Fukada wings). Pressure-gradient directional microphones roll off in the bass and exhibit proximity effects; pure-pressure omnis extend flat to very low frequencies, so a Decca tree delivers the deep, well-defined low end that survives bass management intact. Where directional front arrays (OCT, INA) are used, omni LF-support capsules are often added precisely to restore the bottom octaves.

A summation hazard: because all channels' low frequencies are combined in the subwoofer, any low-frequency phase differences between channels — inevitable with spaced arrays — can cause partial cancellation or reinforcement in the summed bass. Widely spaced omnis reproducing a $40\,\text{Hz}$ note arrive at different phases; summed to mono in the sub they may add to anything from $+6\,\text{dB}$ to near silence depending on the phase relationship. In practice this is rarely catastrophic because reverberant low end is fairly diffuse, but it is worth checking the mono-summed bass of a spaced array.

info

At capture, treat the LFE as an optional effects channel, not as your music's low end. Capture full-range bass in the main channels — favouring omni capsules — and let bass management route it to the subwoofer at reproduction. See Formats for the channel definitions.

Worked Example: A Decca Tree Plus Hamasaki Square in a Concert Hall

Consider recording a symphony orchestra in a medium-large concert hall (reverberation time $\approx 2.0\,\text{s}$ , critical distance from the orchestra perhaps $4$ – $5\,\text{m}$ ) for a 5.1 release. We combine a Decca tree for the front and a Hamasaki square for the surrounds.

Step 1 — Front (Decca tree). Fly the tree about $3\,\text{m}$ high and roughly $3$ – $3.5\,\text{m}$ behind the conductor, over the first rows of players. Set the L–R omni base at $2.0\,\text{m}$ and advance the centre omni $1.5\,\text{m}$ toward the stage. These three M50-style omnis feed L, C and R. Check the front recording angle: with a $2\,\text{m}$ base, a player at the orchestra's left edge (say $40^\circ$ off the array axis, at $r \approx 6\,\text{m}$ ) produces an inter-channel time difference

$\Delta t = \frac{d\sin\theta}{c} = \frac{2.0\times\sin40^\circ}{343} \approx \frac{1.29}{343} \approx 3.7\,\text{ms},$

a strong-but-not-extreme time cue that the precedence-pinned centre omni keeps from collapsing the middle of the stage. If the front feels too wide or too narrow, adjust the L–R base (wider base $\Rightarrow$ larger $\Delta t \Rightarrow$ narrower apparent recording angle for a given stage; counter-intuitive but follows directly from the $\sin\theta$ relation).

Step 2 — Front width support (optional outriggers). If extra width and hall is wanted, add two outrigger omnis about $1.5\,\text{m}$ outside each tree arm (so $\sim5\,\text{m}$ outrigger base) and blend them lightly into L/R. Their wide spacing decorrelates the captured ambience, broadening the front without combing the direct sound (the direct sound at $5\,\text{m}$ base is already past the 3-to-1 comfort zone relative to the tree).

Step 3 — Surrounds (Hamasaki square). Position the square in the hall about $8$ – $10\,\text{m}$ behind the tree — comfortably past the critical distance, so it captures predominantly reverberant energy — at roughly head height of the rear seating, with sides of $2.5\,\text{m}$ . Orient all four figure-of-eights with their nulls toward the stage, lobes facing the side walls to harvest lateral reflections. The rear pair feeds LS and RS. Verify decorrelation: with $d = 2.5\,\text{m}$ between the relevant capsules, the first coherence null sits at

$f_0 = \frac{c}{2d} = \frac{343}{2\times2.5} \approx 69\,\text{Hz},$

so the surrounds are decorrelated essentially across the whole audible spectrum — exactly the diffuse, enveloping rear field we want.

Step 4 — Spots. Add a few discreet spots — a woodwind pair, a soloist, perhaps timpani. For a soloist spot $1\,\text{m}$ from the instrument with the tree $6\,\text{m}$ from it, delay the spot by

$\Delta t = \frac{6 - 1}{343} \approx 15\,\text{ms}$

(or $15\,\text{ms}$ plus a couple of milliseconds) so the tree continues to define localisation while the spot adds presence. Pan the spot to the soloist's true azimuth, around $0^\circ$ – $10^\circ$ , distributed between C and the nearer of L/R.

Step 5 — Balance and bass. Set the front-to-surround ratio by ear; a common starting point puts the surrounds $4$ – $8\,\text{dB}$ below the front so the hall envelops without pulling the orchestra backward. Leave the LFE channel empty and rely on bass management; confirm the mono-summed low end of the tree is well behaved.

The whole array — three front omnis, optional outriggers, four bidirectional ambience capsules, a handful of delayed spots — captures a coherent, enveloping 5.1 image in which the orchestra sits stable and forward and the hall wraps around the listener from genuinely decorrelated surrounds.

Common Mistakes and Pitfalls

Centre build-up. Feeding C from a microphone that also bleeds strongly into L and R places correlated energy in three channels; on summation it combs and the whole front narrows toward mono. Cure: use an array that isolates the centre (OCT's outward-facing supercardioids, a forward Decca omni pinned by precedence) or reduce the discrete C contribution.

Correlated surrounds. Deriving LS and RS from the same reverb or the same closely-spaced pair gives high coherence; the rear collapses to a phantom behind the head and envelopment evaporates. Cure: space ambience microphones metres apart and/or use the null-forward figure-of-eight orientation of the Hamasaki square to keep the lateral field decorrelated. Recall the sinc-coherence calculation: small spacings only decorrelate the highs.

Phase and comb filtering between main and spots. Undelayed spots lead the main array by tens of milliseconds and comb its direct sound. Cure: delay every spot by its propagation distance (plus a precedence margin) and respect the 3-to-1 rule.

Pointing ambience mics at the stage. A figure-of-eight or cardioid surround capsule with a lobe (not a null) toward the orchestra drags direct sound into the rear and smears the ensemble around the room. Cure: nulls to the stage, lobes to the side walls.

Inconsistent spot panning. Panning a spot to an azimuth different from where the main array images that instrument creates a split double image. Cure: pan to match the main array.

Ignoring bass summation. Treating the LFE as the music's bass, or ignoring the phase of spaced omnis in the summed sub, can hollow or boom the low end. Cure: capture full-range bass in the main channels, check mono-summed lows, leave LFE for effects.

danger

The two failure modes most often heard in amateur surround recordings are a front image that collapses toward mono (centre build-up plus correlated front capsules) and rears that image as a phantom instead of enveloping (correlated, too-closely-spaced surround microphones). Both come from violating the central principle: correlate the front just enough for imaging, decorrelate the rear thoroughly for envelopment. They are opposite goals and demand opposite microphone strategies.

Limits

Surround capture inherits and amplifies the limits of stereo capture.

The sweet spot. Like all loudspeaker techniques, the captured image is correct only near the geometric centre of the ITU layout. Off-centre, the nearest speaker dominates by precedence and the front image shifts; the wide L-to-LS gap means lateral phantom images (a source at $70^\circ$ ) are inherently unstable on any horizontal 5.1 array, because there is simply no loudspeaker there to anchor them. This is a limitation of the format, not the array, and is one motivation for the height and density of Ambisonic Recording and the systems in WFS.

No height. Every array in this chapter is horizontal. True three-dimensional capture — with elevated and overhead information — requires the additional layers covered in Immersive 3D Recording. A 5.1/7.1 array cannot encode elevation; height is the subject of the next chapters.

Fixed perspective. A main array bakes in one listening position and one front-to-back balance. Unlike object-based production (Object-Based Audio) or scene-based Ambisonics, a channel-based surround capture cannot be rotated, re-rendered to a different layout, or re-balanced spatially after the fact without artefacts. What the array captured is what you have.

Room dependence. These techniques assume a good room. The Hamasaki square captures the hall's lateral reflections — wonderful in a fine concert hall, ruinous in a dead or boxy room. Surround recording is, even more than stereo, a collaboration with the acoustic; in a poor space the surrounds may be better synthesised from decorrelated reverberation than captured.

Format lock-in. A microphone array optimised for the $\pm30^\circ$ / $\pm110^\circ$ ITU layout is not optimal for 7.1, for $7.1.4$ , or for binaural rendering. Each target layout ideally wants its own array geometry — another reason the field increasingly turns to the layout-agnostic encodings of Ambisonics and object-based workflows. Channel-based surround capture remains the gold standard for its format, with the natural sound of real microphones in real spaces, but it does not transcend that format.

Within those limits, a well-built main-plus-ambience array — a Decca tree or OCT forward, a Hamasaki square aft, judicious delayed spots — remains one of the most musical and natural ways to put a listener inside a real performance in a real hall.

References

Rumsey, F. Spatial Audio. Focal Press, 2001. Chapters on multichannel microphone technique, inter-channel coherence and envelopment.
Williams, M. and Le Dû, G. "Microphone Array Analysis for Multichannel Sound Recording." AES 107th Convention, 1999. (INA arrays and critical linking of recording-angle segments.)
Theile, G. "Multichannel Natural Music Recording Based on Psychoacoustic Principles." AES 19th International Conference on Surround Sound, 2001. (OCT and the front-array rationale.)
Hamasaki, K., Hiyama, K. and Okumura, R. "The 22.2 Multichannel Sound System and Its Application." AES 118th Convention, 2005; and Hamasaki, K. "Reproducing Spatial Impression with Multichannel Audio." AES 24th International Conference, 2003. (The Hamasaki square and lateral ambience capture.)
Fukada, A. "A Challenge in Multichannel Music Recording." AES 19th International Conference on Surround Sound, 2001. (The Fukada tree.)
Eargle, J. The Microphone Book, 2nd ed. Focal Press, 2004. (Polar patterns, spaced and coincident principles, the Decca tree.)
Howie, W., King, R., Martin, D. and Grond, F. "Subjective Evaluation of Orchestral Music Recording Techniques for Three-Dimensional Audio." AES Convention papers, 2016–2017. (Comparative listening across main-array techniques.)
ITU-R Recommendation BS.775: "Multichannel stereophonic sound system with and without accompanying picture." International Telecommunication Union. (The 5.1 loudspeaker layout.)

← Back to Recording

The Goal: A 5.1/7.1 Sound Field​

Main Microphone vs Spot Microphone Philosophy​

The front triplet and the centre problem​

The Decca Tree​

Geometry​

From stereo to surround​

Front Arrays: OCT, INA-3/5 and the Fukada Tree​

OCT (Optimised Cardioid Triangle)​

INA-3 and INA-5 (Williams)​

The Fukada tree​

ORTF-Surround​

Comparison​

The Rear and Ambience: Surround Pairs and the Hamasaki Square​

Dedicated surround pairs​

The Hamasaki square​

Spacing, Decorrelation and Inter-Channel Coherence​

Why spacing decorrelates​

Worked example: coherence vs spacing​

The comb-filtering hazard​

Combining Main and Spot Microphones​

Time-aligning the spots​

The 3-to-1 rule​

Panning spots into the surround image​

Bass Management at Capture​

Worked Example: A Decca Tree Plus Hamasaki Square in a Concert Hall​

Common Mistakes and Pitfalls​

Limits​

References​