Aller au contenu principal

Amplitude Panning — VBAP & DBAP

Amplitude panning is the oldest and still the most widely deployed family of spatialization techniques. Its premise is disarmingly simple: feed the same signal to two or more loudspeakers, vary only the relative levels, and the listener hears a single virtual source — a phantom image — somewhere between the speakers. Everything else in this chapter is an elaboration of that one idea, generalized from the classic stereo pair to rings, domes, and arbitrary three-dimensional rigs.

This chapter assumes you have read stereo (where the phantom image and the panning laws are introduced) and the psychoacoustics chapter (where summing localization, the precedence effect, and the interaural cues are explained). Here we take those foundations and build the two dominant modern formulations: Vector Base Amplitude Panning (VBAP) by Ville Pulkki (1997) and Distance-Based Amplitude Panning (DBAP) by Trond Lossius and colleagues (2009). Both are pure gain methods — no delays, no filtering — yet they rest on opposite assumptions about what the loudspeaker array is. VBAP treats speakers as directions seen from a sweet spot; DBAP treats them as positions in a room with no privileged listener. Understanding that distinction is the key to choosing between them.

As with every method in this Part, the organizing idea is encode then decode. The encode stage is the authoring of an intended direction or position for the source. The decode stage is the computation of a gain for each physical loudspeaker present, derived on the fly from the geometry of the array. Amplitude panning is unusual in that encode and decode are fused into a single, lightweight, per-source matrix operation — which is exactly why it scales to hundreds of moving objects in real time.

The Principle: Borrowing the Phantom Image

Summing localization in one pair

Two loudspeakers radiating a coherent signal create, at the listener's ears, a superposition of two wavefronts. Below roughly 1 kHz the dominant cue is the interaural time difference (ITD): the two coherent arrivals sum into a single low-frequency waveform whose effective phase delay between the ears mimics that of a real source located somewhere between the speakers. Above ~1.5 kHz the head shadow makes the interaural level difference (ILD) dominate, and the level imbalance between the speakers again maps onto a believable direction. This dual mechanism — described in detail in psychoacoustics — is why a single amplitude ratio can steer one fused image across the gap. The phantom is not "really there"; the auditory system is fooled into integrating two sources into one because they are coherent and arrive within the temporal fusion window.

The sweet-spot caveat

The phantom image is fragile. It is sharpest for a listener equidistant from both speakers, on the axis of symmetry, and it degrades as the listener moves off-centre because the precedence effect then snaps the image toward the nearer speaker. This fragility is the original sin that propagates through every amplitude-panning system: the image lives at a sweet spot, and its quality is a function of listener position. Keep this in mind — it is the recurring theme of the Limits section.

Any pair, then a ring as a chain of pairs

The crucial generalization is that nothing about summing localization requires the two speakers to be the canonical ±30° stereo pair. Any two loudspeakers spanning a modest arc (up to roughly 60°) can host a phantom image between them. Once you accept that, a horizontal ring of NN speakers becomes simply a chain of adjacent pairs: to place a source at azimuth θ\theta, find the two speakers that bracket θ\theta and pan between them, leaving every other speaker silent. As the source moves around the ring, the active pair changes — a hand-off from one pair to the next at each speaker. This is the conceptual heart of VBAP in two dimensions.

A dome as a mesh of triangles

In three dimensions the same logic lifts one rank. Two speakers define a line of possible phantom positions; three non-collinear speakers define a triangular patch of the sphere over which a phantom can roam. A full dome or sphere of loudspeakers is therefore tiled into a mesh of triangles, and any target direction falls inside exactly one triangle. Panning uses the three vertices of that triangle and silences everyone else. The 2D pair is just the degenerate, flat case of the 3D triplet. Pulkki's contribution was to make this "which pair or triplet, and with what gains" computation exact, fast, and layout-agnostic using vector algebra — which we develop in full below.

The Panning Law and Constant-Power Normalization

Why constant power, not constant amplitude

Before VBAP, recall the panning-law question from stereo: given two speakers and a desired image position, how should the two gains g1g_1 and g2g_2 relate? Two normalizations compete:

  • Constant amplitude (linear): g1+g2=constg_1 + g_2 = \text{const}. Correct if the two signals sum coherently, i.e. as pressures in phase. This holds for a listener exactly on the centre line where path lengths are equal.
  • Constant power: g12+g22=constg_1^2 + g_2^2 = \text{const}. Correct if the two signals sum incoherently in energy, which is the better model once the listener is even slightly off-centre, or across frequency where phase relationships scramble.
Common mistake

Real rooms and real (moving) listeners sit between these, but constant power is the robust default for multi-speaker panning because it keeps perceived loudness stable as an image crosses the array. If you used constant amplitude and the room summed the speakers incoherently, the image would dip by 3 dB in level at the midpoint between speakers, producing an audible "loudness scallop" as sources move.

The normalization rule is therefore

g12+g22++gM2=1,g_1^2 + g_2^2 + \cdots + g_M^2 = 1,

applied to whatever raw gains the panner produces, where MM is the number of active speakers. After computing un-normalized gains g~i\tilde{g}_i, you scale them all by a single factor:

gi=g~ikg~k2.g_i = \frac{\tilde{g}_i}{\sqrt{\sum_{k} \tilde{g}_k^{2}}}.

A small numeric check

Suppose a pairwise panner yields raw gains g~1=0.80\tilde{g}_1 = 0.80 and g~2=0.30\tilde{g}_2 = 0.30. The normalizing denominator is 0.802+0.302=0.64+0.09=0.73=0.8544\sqrt{0.80^2 + 0.30^2} = \sqrt{0.64 + 0.09} = \sqrt{0.73} = 0.8544. The normalized gains are g1=0.80/0.8544=0.9363g_1 = 0.80/0.8544 = 0.9363 and g2=0.30/0.8544=0.3511g_2 = 0.30/0.8544 = 0.3511. Check: 0.93632+0.35112=0.8767+0.1233=1.0000.9363^2 + 0.3511^2 = 0.8767 + 0.1233 = 1.000. The summed power is unity regardless of where in its travel the image sits, so a listener perceives constant loudness as the source pans. The level ratio between the speakers — here 20log10(0.9363/0.3511)=8.520\log_{10}(0.9363/0.3511) = 8.5 dB — is what encodes the direction; the normalization only fixes the overall loudness.

A useful intermediate is the tangent law (Bennett, refined from Blumlein's stereo geometry), which for a symmetric pair at half-angle θ0\theta_0 relates the desired image angle θ\theta to the gain ratio:

tanθtanθ0=g1g2g1+g2.\frac{\tan\theta}{\tan\theta_0} = \frac{g_1 - g_2}{g_1 + g_2}.

We will see in the next section that VBAP reduces exactly to this tangent law for the two-speaker case — which is the proof that VBAP is not a new psychoacoustic claim, but a clean vector restatement of the classic law generalized to any geometry.

VBAP in Full

Loudspeakers as unit vectors

Pulkki's insight (1997) is to drop angles and work with vectors. Place the listener at the origin. Each loudspeaker ii is represented by a unit vector li\mathbf{l}_i pointing from the listener toward that speaker:

li=pipi,\mathbf{l}_i = \frac{\mathbf{p}_i}{\lVert \mathbf{p}_i \rVert},

where pi\mathbf{p}_i is the speaker's position. Only the direction matters; distance is discarded (this is the first VBAP assumption — all speakers are treated as equidistant, i.e. on a sphere around the sweet spot). The desired source direction is likewise a unit vector p\mathbf{p}.

The panning hypothesis is that the vector sum of the active loudspeaker vectors, weighted by their gains, points in the perceived direction. This is the velocity-vector model of summing localization (closely related to Gerzon's rV\mathbf{r}_V used in ambisonics). We therefore require

p=iactivegili.\mathbf{p} = \sum_{i \in \text{active}} g_i\, \mathbf{l}_i .

The 2D case: choosing and solving a pair

In two dimensions, pick the active pair {i,j}\{i, j\} that encloses the target — the two adjacent speakers whose arc contains p\mathbf{p}. Stack their vectors as columns of a 2×22\times 2 matrix Lij=[li    lj]\mathbf{L}_{ij} = [\,\mathbf{l}_i \;\; \mathbf{l}_j\,] and write the gain vector g=[gi,gj]\mathbf{g} = [g_i, g_j]^\top. Then

p=Lijgg=Lij1p.\mathbf{p} = \mathbf{L}_{ij}\,\mathbf{g} \quad\Longrightarrow\quad \mathbf{g} = \mathbf{L}_{ij}^{-1}\,\mathbf{p}.

This is a 2×22\times 2 inverse — trivial and fast. The enclosing pair is the correct one precisely because it is the unique pair for which the solution g\mathbf{g} has both components non-negative. Negative gains would mean pulling a speaker in anti-phase, which does not create a phantom between the speakers but smears it — so VBAP tests candidate pairs and selects the one whose solved gains are all 0\ge 0. Finally, normalize to constant power as in the previous section.

The 3D case: triplets

In three dimensions the matrix becomes 3×33\times 3. For the enclosing triplet {i,j,k}\{i,j,k\},

Lijk=[li    lj    lk],g=Lijk1p,gi,gj,gk0.\mathbf{L}_{ijk} = \big[\,\mathbf{l}_i \;\; \mathbf{l}_j \;\; \mathbf{l}_k\,\big], \qquad \mathbf{g} = \mathbf{L}_{ijk}^{-1}\,\mathbf{p}, \qquad g_i,\,g_j,\,g_k \ge 0 .

Again the correct triplet is the one whose three solved gains are all non-negative — geometrically, the triangle on the sphere that contains the target direction. Then normalize: gg/gg \leftarrow g / \lVert g \rVert. The inverses Lijk1\mathbf{L}_{ijk}^{-1} for every triangle are precomputed once at setup, so per-sample (or per-block) panning is just a matrix–vector product followed by a sign test and a normalization — cheap enough for hundreds of objects.

Reduction to the tangent law

To see that VBAP is the tangent law in disguise, take a symmetric stereo pair at ±θ0\pm\theta_0, so l1=(sinθ0,cosθ0)\mathbf{l}_1 = (\sin\theta_0, \cos\theta_0) and l2=(sinθ0,cosθ0)\mathbf{l}_2 = (-\sin\theta_0, \cos\theta_0) (using xx = right, yy = front). A target at azimuth θ\theta is p=(sinθ,cosθ)\mathbf{p} = (\sin\theta, \cos\theta). Solving p=g1l1+g2l2\mathbf{p} = g_1\mathbf{l}_1 + g_2\mathbf{l}_2:

sinθ=(g1g2)sinθ0,cosθ=(g1+g2)cosθ0.\sin\theta = (g_1 - g_2)\sin\theta_0, \qquad \cos\theta = (g_1 + g_2)\cos\theta_0 .

Dividing the first by the second:

tanθtanθ0=g1g2g1+g2,\frac{\tan\theta}{\tan\theta_0} = \frac{g_1 - g_2}{g_1 + g_2},

which is exactly the tangent panning law. VBAP thus inherits the perceptual validity of the tangent law (which holds well for low-frequency ITD-based localization) and extends it to arbitrary speaker positions with no new assumptions.

Worked example: a 3-speaker triplet solve

Consider a small dome triplet. Three loudspeakers at directions:

  • AA: azimuth 30°-30°, elevation 0°
  • BB: azimuth +30°+30°, elevation 0°
  • CC: azimuth 0°, elevation 45°45° (an overhead front speaker)

Using the convention l=(cos ⁣εsinα,  cos ⁣εcosα,  sin ⁣ε)\mathbf{l} = (\cos\!\varepsilon\,\sin\alpha,\;\cos\!\varepsilon\,\cos\alpha,\;\sin\!\varepsilon) with azimuth α\alpha and elevation ε\varepsilon:

lA=(0.5,  0.8660,  0),lB=(0.5,  0.8660,  0),lC=(0,  0.7071,  0.7071).\mathbf{l}_A = (-0.5,\;0.8660,\;0),\quad \mathbf{l}_B = (0.5,\;0.8660,\;0),\quad \mathbf{l}_C = (0,\;0.7071,\;0.7071).

Target: azimuth 0°, elevation 20°20°, i.e. straight ahead but lifted:

p=(0,  cos20°,  sin20°)=(0,  0.9397,  0.3420).\mathbf{p} = (0,\;\cos20°,\;\sin20°) = (0,\;0.9397,\;0.3420).

We solve Lg=p\mathbf{L}\,\mathbf{g} = \mathbf{p} with L=[lA  lB  lC]\mathbf{L} = [\mathbf{l}_A\;\mathbf{l}_B\;\mathbf{l}_C]. Writing the three scalar equations:

0.5gA+0.5gB+0gC=0(x)0.8660gA+0.8660gB+0.7071gC=0.9397(y)0gA+0gB+0.7071gC=0.3420(z)\begin{aligned} -0.5\,g_A + 0.5\,g_B + 0\,g_C &= 0 &&\text{(x)}\\ 0.8660\,g_A + 0.8660\,g_B + 0.7071\,g_C &= 0.9397 &&\text{(y)}\\ 0\,g_A + 0\,g_B + 0.7071\,g_C &= 0.3420 &&\text{(z)} \end{aligned}

From (x): gA=gBg_A = g_B. From (z): gC=0.3420/0.7071=0.4837g_C = 0.3420/0.7071 = 0.4837. Substitute into (y): 0.8660(2gA)+0.7071(0.4837)=0.93970.8660(2 g_A) + 0.7071(0.4837) = 0.9397, i.e. 1.7320gA+0.3420=0.93971.7320\,g_A + 0.3420 = 0.9397, so 1.7320gA=0.59771.7320\,g_A = 0.5977 and gA=gB=0.3451g_A = g_B = 0.3451.

Raw gains: (gA,gB,gC)=(0.3451,  0.3451,  0.4837)(g_A, g_B, g_C) = (0.3451,\;0.3451,\;0.4837), all non-negative — confirming this is the enclosing triplet. Normalize:

g=0.34512+0.34512+0.48372=0.1191+0.1191+0.2340=0.4722=0.6872.\lVert g \rVert = \sqrt{0.3451^2 + 0.3451^2 + 0.4837^2} = \sqrt{0.1191 + 0.1191 + 0.2340} = \sqrt{0.4722} = 0.6872. (gA,gB,gC)=(0.5022,  0.5022,  0.7039).(g_A, g_B, g_C) = (0.5022,\;0.5022,\;0.7039).

Check power: 0.50222+0.50222+0.70392=0.2522+0.2522+0.4955=1.0000.5022^2 + 0.5022^2 + 0.7039^2 = 0.2522 + 0.2522 + 0.4955 = 1.000. The symmetry gA=gBg_A = g_B correctly reflects the target's zero azimuth, while the strong gCg_C lifts the image to 20°20° elevation. Note that the overhead speaker carries the largest single gain even though the target is only 20°20° up — a consequence of CC being the only vertex with a vertical component, foreshadowing the width variation issues of the next sections.

Triangulating a 3D Layout

The convex hull and the triangle mesh

To pan over a dome you must first decide which triangles tile it. The standard procedure computes the convex hull of the set of loudspeaker direction-points on the unit sphere. The hull's faces are triangles whose vertices are loudspeakers; together they form a closed mesh covering every direction. For each triangle, you precompute and store the inverse matrix Lijk1\mathbf{L}_{ijk}^{-1}. At run time, for a target p\mathbf{p}, you test triangles until one yields all-non-negative gains — that is the triangle containing p\mathbf{p}.

Some constraints follow. The hull must enclose the origin (the listener), or there are directions with no enclosing triplet — VBAP cannot place a source outside the arc/cap spanned by the speakers, only on the surface they define. If a layout has a large hole (e.g. no speakers below the horizon, a common case for a half-dome), sources steered into the hole are clamped to the nearest hull edge, and a virtual "imaginary" speaker is sometimes inserted to regularize the triangulation.

Why triangle shape affects stability

Not every valid triangulation is equally good. The numerical and perceptual stability of a triplet depends on its shape. A near-equilateral triangle of moderate size gives well-conditioned matrices and smooth gain variation; a long, thin "sliver" triangle does not.

Consider the conditioning. The gain solution g=L1p\mathbf{g} = \mathbf{L}^{-1}\mathbf{p} amplifies any imprecision in p\mathbf{p} (or in the assumed speaker positions) by the condition number of L\mathbf{L}. For a sliver triangle the three direction vectors are nearly coplanar with a small spanning angle, L\mathbf{L} is near-singular, det(L)0\det(\mathbf{L}) \to 0, and the inverse has large entries. Small changes in target direction then cause large, jumpy changes in the gains — audible as image instability and level fluctuation.

Layout regularity is load-bearing

Practically, triangulation algorithms therefore maximize the minimum triangle angle (a Delaunay-like criterion on the sphere) and cap the maximum edge length, splitting large gaps with extra (real or virtual) speakers. The lesson for system designers: regular layouts triangulate into fat triangles and pan smoothly; irregular layouts with bunched and sparse regions produce slivers and unstable images. Layout regularity, discussed again under Practical Deployment, is not cosmetic — it is a precondition for clean VBAP.

VBAP Artefacts: Source-Width Variation and MDAP

The width problem

VBAP's defining virtue — it activates the fewest speakers (one, two, or three) — is also its defining artefact. When a source sits exactly at a loudspeaker, only that one speaker plays: the image is as tight and dry as the speaker itself. When the source sits between speakers, two or three play, and the phantom image is intrinsically wider and softer because it is synthesized from spatially separated sources. As a source moves across the array, its apparent width and timbre pulse — narrow at each speaker, broad in the gaps. This source-width variation is the most criticized property of plain VBAP, especially on sparse layouts where the gaps are large.

There is also a timbral component. Two coherent speakers at the ears produce comb filtering whose notches depend on the path-length difference; at the midpoint of a wide pair the colouration differs from the on-speaker case, so timbre as well as width is modulated by position.

MDAP: Multiple-Direction Amplitude Panning

Pulkki's 1999 remedy is Multiple-Direction Amplitude Panning (MDAP), sometimes called spreading. Instead of panning the source to a single point direction, you pan it to several nearby virtual directions distributed around the target, then sum the resulting gain vectors. This deliberately recruits more speakers even when the target lands on or near a speaker, so the number of active speakers — and hence the image width — stays roughly constant across the array. The width artefact is traded for a controlled, uniform spread.

Formally, replace the single target p\mathbf{p} with a set of KK spread directions {p(1),,p(K)}\{\mathbf{p}^{(1)}, \dots, \mathbf{p}^{(K)}\} arranged on a small cone of half-angle σ\sigma (the spread parameter) about p\mathbf{p}. Compute VBAP gains for each, then combine:

g~i=m=1K(gi(m))2,gi=g~ikg~k2.\tilde{g}_i = \sqrt{\sum_{m=1}^{K} \big(g_i^{(m)}\big)^2 }, \qquad g_i = \frac{\tilde{g}_i}{\sqrt{\sum_k \tilde{g}_k^2}} .

The power-sum combination keeps energy additive (incoherent spreading), and the final normalization restores constant power. With σ=0\sigma = 0 you recover ordinary point-source VBAP; increasing σ\sigma broadens the image continuously, up to σ\sigma large enough to wrap the whole array (a diffuse, enveloping source). The spread is thus a deliberate width control, turning VBAP's worst artefact into a usable parameter — see how tools expose this in Practical Deployment.

Worked example: spreading on a ring

Take a target between two ring speakers where plain VBAP gives (g1,g2)=(0.71,0.71)(g_1, g_2) = (0.71, 0.71) (a centred phantom, two speakers active). Add a single spread pair: one virtual direction toward speaker 1's side giving partial recruitment of speaker 0, one toward speaker 2's side recruiting speaker 3. Suppose the three sub-pans yield, before combination, contributions to four speakers:

Speakersub-pan centersub-pan leftsub-pan right
00.000.450.00
10.710.890.30
20.710.300.89
30.000.000.45

Power-sum per speaker: g~0=0.452=0.45\tilde g_0 = \sqrt{0.45^2} = 0.45; g~1=0.712+0.892+0.302=0.504+0.792+0.09=1.386=1.177\tilde g_1 = \sqrt{0.71^2+0.89^2+0.30^2} = \sqrt{0.504+0.792+0.09}=\sqrt{1.386}=1.177; g~2=1.177\tilde g_2 = 1.177 by symmetry; g~3=0.45\tilde g_3 = 0.45. Normalizing by 1.17722+0.4522=2.770+0.405=3.175=1.782\sqrt{1.177^2\cdot2 + 0.45^2\cdot2} = \sqrt{2.770 + 0.405} = \sqrt{3.175} = 1.782 gives (0.253,0.661,0.661,0.253)(0.253, 0.661, 0.661, 0.253). Now four speakers are active rather than two: the image is broader and, critically, will remain broad even when the target slides onto speaker 1, because the spread keeps neighbours engaged. That is exactly the width-stabilization MDAP was designed to provide.

DBAP in Full

A different premise: positions, not directions

VBAP assumes a listener at a known sweet spot and treats speakers as equidistant directions. Distance-Based Amplitude Panning (DBAP), introduced by Lossius, Baltazar and de la Hogue (2009), throws that assumption away. It was born from the needs of installation and theatre sound, where there is no single listener, speakers are scattered at irregular positions and distances (not on a sphere), and a "virtual source position" in the room is the natural authoring control. DBAP therefore computes gains from the physical distances between the source position and each loudspeaker position, with no assumed listener and no front/sweet spot. Every speaker can participate; the array is treated as a field of emitters in a plan view.

The distance-based gain law

Let the virtual source be at position xs\mathbf{x}_s and loudspeaker ii at xi\mathbf{x}_i, with Euclidean distance

di=xsxi.d_i = \lVert \mathbf{x}_s - \mathbf{x}_i \rVert .

The base idea is that a speaker's gain should fall off with its distance from the source, mimicking how a real source is loudest at the nearest speaker. DBAP uses a power-law roll-off controlled by a parameter aa (related to a roll-off in dB per doubling of distance, RR, via a=R/(20log102)=R/6.02a = R/(20\log_{10}2) = R/6.02). The un-normalized gain is

g~i=1dia=dia.\tilde{g}_i = \frac{1}{d_i^{\,a}} = d_i^{-a}.
Mind the on-speaker singularity

A spatial blur rsr_s regularizes the singularity when the source sits exactly on a speaker (which would otherwise demand infinite gain). Distances are softened:

di    di2+rs2,d_i \;\rightarrow\; \sqrt{\,d_i^2 + r_s^2\,},

so that even on top of a speaker the gain is finite and neighbours still contribute. Larger rsr_s spreads energy over more speakers (a wider, blurrier image); smaller rsr_s tightens it.

Constant-power normalization, no listener

DBAP enforces the same constant-power law as everything else in this chapter, but now over all speakers (none are silenced a priori):

gi=g~ik=1Ng~k2,i=1Ngi2=1.g_i = \frac{\tilde{g}_i}{\sqrt{\sum_{k=1}^{N} \tilde{g}_k^{2}}}, \qquad \sum_{i=1}^{N} g_i^2 = 1 .

Crucially this normalization is what makes DBAP listener-independent in a useful sense: the total radiated power is held constant as the source moves, so wherever a listener stands in the installation, the overall loudness of the source is stable even though its apparent position (governed by the nearest loud speakers, via the precedence effect) tracks the authored coordinate. There is no inverse matrix, no triangulation, no convex hull — just distances and a normalization. This makes DBAP trivial to deploy on any geometry, including non-convex rooms, ceiling grids, and lines of speakers.

Worked example: DBAP across four speakers in a room

Place four speakers at room corners (metres): S1=(0,0)S_1=(0,0), S2=(6,0)S_2=(6,0), S3=(6,4)S_3=(6,4), S4=(0,4)S_4=(0,4). Author a source at xs=(2,1)\mathbf{x}_s=(2,1). Use roll-off a=1a = 1 (i.e. 6\approx 6 dB per distance doubling) and blur rs=0.5r_s = 0.5 m.

Distances (with blur):

  • d1=22+12+0.52=4+1+0.25=5.25=2.291d_1 = \sqrt{2^2+1^2+0.5^2} = \sqrt{4+1+0.25}=\sqrt{5.25}=2.291
  • d2=42+12+0.52=16+1+0.25=17.25=4.153d_2 = \sqrt{4^2+1^2+0.5^2} = \sqrt{16+1+0.25}=\sqrt{17.25}=4.153
  • d3=42+32+0.52=16+9+0.25=25.25=5.025d_3 = \sqrt{4^2+3^2+0.5^2} = \sqrt{16+9+0.25}=\sqrt{25.25}=5.025
  • d4=22+32+0.52=4+9+0.25=13.25=3.640d_4 = \sqrt{2^2+3^2+0.5^2} = \sqrt{4+9+0.25}=\sqrt{13.25}=3.640

Raw gains g~i=1/di\tilde g_i = 1/d_i: g~1=0.4365\tilde g_1 = 0.4365, g~2=0.2408\tilde g_2 = 0.2408, g~3=0.1990\tilde g_3 = 0.1990, g~4=0.2747\tilde g_4 = 0.2747.

Power sum: 0.43652+0.24082+0.19902+0.27472=0.1905+0.0580+0.0396+0.0755=0.36360.4365^2 + 0.2408^2 + 0.1990^2 + 0.2747^2 = 0.1905 + 0.0580 + 0.0396 + 0.0755 = 0.3636; 0.3636=0.6030\sqrt{0.3636} = 0.6030.

Normalized: g1=0.724g_1 = 0.724, g2=0.399g_2 = 0.399, g3=0.330g_3 = 0.330, g4=0.456g_4 = 0.456. Power check: 0.7242+0.3992+0.3302+0.4562=0.524+0.159+0.109+0.208=1.0000.724^2+0.399^2+0.330^2+0.456^2 = 0.524+0.159+0.109+0.208 = 1.000. The nearest speaker S1S_1 dominates as expected, but all four speakers radiate — there is no hard pair selection, which is exactly what gives DBAP its smooth, listener-agnostic behaviour and its characteristic "everything is a little bit on" diffuseness compared with VBAP's sparse activation.

VBAP vs DBAP: A Comparison

The two methods share the gain-only mechanism and the constant-power law but differ in almost every assumption. The following table summarizes the trade-offs.

PropertyVBAP (Pulkki 1997)DBAP (Lossius 2009)
Speaker modelUnit directions from a sweet spot (equidistant)Physical positions in a room (any distance)
Assumed listenerYes — single sweet spotNo — listener-independent
Geometry neededConvex hull / triangulation of the sphereNone; raw coordinates only
Active speakersSparse: 1–2 (2D) or 1–3 (3D)All speakers, weighted by distance
Image typeDirectional phantom on the speaker spherePosition-cued, diffuser, precedence-driven
Width controlSpread / MDAP parameter σ\sigmaSpatial blur rsr_s
Width artefactYes — narrows at each speaker (cured by MDAP)Mild — inherently broader, smoother
Audience sizeBest for one centred listenerSuits dispersed audiences, installations
Compute per sourceMatrix inverse (precomputed) + sign testDistances + normalization (no matrix)
Out-of-convex sourcesClamped to hullNaturally handled (just farther)
Typical useCinema/dome/studio with a sweet spotGalleries, theatre, multi-zone, irregular rigs
Rule of thumb

A concise way to remember it: VBAP is for a listener; DBAP is for a room. VBAP gives the crispest localization when there is a sweet spot and the layout is regular; DBAP gives the most graceful behaviour when there is no sweet spot and the layout is whatever the architecture allowed. Many production tools offer both and let the operator choose per object.

Other Amplitude Methods, Briefly

KNN / nearest-neighbour panning

A family of pragmatic panners select the k nearest loudspeakers to a target position and distribute gains among them by an inverse-distance or interpolation weight, then constant-power normalize. This is essentially DBAP with a hard neighbour cap (k=3k=3 or 44), trading DBAP's all-speakers smoothness for VBAP-like sparsity without requiring a convex hull. KNN panning is common in game audio engines and ad-hoc multi-speaker grids where triangulation is inconvenient but full DBAP diffuseness is unwanted.

ViMiC — Virtual Microphone Control

ViMiC (Braasch and colleagues) models a set of virtual microphones with chosen directivities and positions corresponding to the real loudspeakers, and computes each loudspeaker gain as the response of its virtual mic to a virtual source — optionally adding the small delays that the virtual mic spacing implies. By incorporating directivity patterns and inter-channel time differences, ViMiC blends amplitude panning with a coincident/spaced-microphone aesthetic (see the recording techniques discussion, covered in another Part), and can render distance and room cues more naturally than pure gain panning. It sits between this chapter and microphone-array methods, and it points toward the delay-plus-gain unification we return to with RIPL.

Where they fit

These methods occupy the same niche as DBAP — robust, listener-agnostic panning on irregular arrays — but each makes a different trade: KNN limits speaker count for tighter images; ViMiC adds directivity and micro-delays for naturalness at the cost of simplicity. None changes the fundamental amplitude-panning premise; they are engineering refinements of which speakers to use and how to weight them.

Practical Deployment

Layout regularity and calibration as prerequisites

Calibrate first, or the math is moot

Amplitude panning assumes that equal gain produces equal perceived level from each speaker. That is only true if the rig is calibrated: matched levels, matched delays (so all speakers are time-aligned to the reference position), and reasonably matched timbre. An uncalibrated array breaks panning at the root — a speaker that is 3 dB hot will pull every phantom toward itself regardless of the math. Calibration (gain, delay, and EQ alignment, the subject of a separate Part) is therefore a precondition, not an optional polish, for both VBAP and DBAP.

Layout regularity matters most for VBAP, as the triangulation section showed: even spacing yields fat, well-conditioned triangles and uniform width; clustered-then-sparse layouts yield slivers and width pulsing. DBAP tolerates irregularity better (it has no triangles to degenerate) but still benefits from sensible coverage so that no large region of the room is far from every speaker. Distance and air absorption between source and listener — discussed in distance and air — further modulate perceived level and should be accounted for in level calibration.

Fast movement and the active-set hand-off

Interpolate the hand-off or it clicks

A subtle real-time issue: in VBAP the active set changes discontinuously as a source crosses a speaker (the enclosing pair/triplet swaps). If gains are not interpolated, this hand-off clicks or zippers. Practical implementations interpolate gains over a short ramp (a few milliseconds) and ensure that at the hand-off boundary the outgoing and incoming triplets agree (they share the crossed speaker, which holds gain 1\approx 1 there, so the transition is continuous in principle but must be smoothed in floating-point practice). For very fast orbits, MDAP/spread also helps by keeping more speakers continuously active, eliminating the abrupt one-speaker pinch points. DBAP, having no active-set switching, is inherently click-free under motion — another reason it is favoured for kinetic installation pieces.

Listener-position sensitivity

Because VBAP encodes direction from a sweet spot, a listener off the sweet spot hears images collapse toward the nearer speakers via the precedence effect; the spatial scene compresses. DBAP, encoding position, degrades differently — a listener near one speaker simply hears that speaker as loudest, which is often correct for a position-based scene. Neither method reconstructs a physical wavefront, so neither is robust over a large area in the way WFS aims to be. Quantifying and accepting this listener dependence is part of choosing the method, and connects to envelopment and the direct/diffuse balance discussed in direct, diffuse and envelopment.

Worked example: a 7-speaker ring orbit

Consider a regular horizontal ring of 7 speakers at azimuths 0°,51.4°,102.9°,154.3°,205.7°,257.1°,308.6°0°, 51.4°, 102.9°, 154.3°, 205.7°, 257.1°, 308.6° (spacing 360°/7=51.43°360°/7 = 51.43°). We orbit a source full circle using 2D VBAP and track behaviour.

At azimuth θ=25.7°\theta = 25.7° (exactly between speakers 1 and 2, the worst case for VBAP), the enclosing pair is {1,2}\{1,2\} at 0° and 51.4°51.4°. By symmetry the centred phantom gives equal normalized gains g1=g2=1/2=0.707g_1 = g_2 = 1/\sqrt2 = 0.707; all other speakers are silent. The image is at its widest here because the half-gap of 25.7°25.7° is the maximum separation between active speakers.

At θ=51.4°\theta = 51.4° (exactly on speaker 2), the solve gives g2=1g_2 = 1, all others 00; the image is at its narrowest — a single dry speaker.

So as the source sweeps from 0° to 51.4°51.4°, the active pair is always {1,2}\{1,2\}, gains move smoothly from (1,0)(1,0) through (0.707,0.707)(0.707,0.707) to (0,1)(0,1), and width pulses from narrow to wide and back. Crossing 51.4°51.4°, the active pair hands off to {2,3}\{2,3\} — speaker 2 holds gain 11 at the boundary so the transition is continuous. Over a full orbit there are 7 such width pulses, one per gap.

Now turn on MDAP with spread σ=25°\sigma = 25°. Even at θ=51.4°\theta = 51.4° (on speaker 2), the spread directions at roughly 26°26° and 77°77° recruit speakers 1 and 3, so three speakers play and the on-speaker image is no longer pinched. The width pulsing flattens markedly. A practical orbit automation would therefore set a modest constant spread to keep the source's apparent size stable as it circles — the standard fix for the "lumpy ring" symptom. Switching the same orbit to DBAP (treating the ring as positions on a circle of radius rr around the room centre, listener unspecified) would similarly avoid pulsing because all nearby speakers stay partly active throughout, at the cost of a softer, less pinpoint image.

Limits: What Amplitude Panning Cannot Do

Amplitude panning is cheap, robust, and perceptually effective, but it has hard ceilings that motivate the rest of this Part.

Sources are bound to the speaker surface

A phantom can only live between the loudspeakers, on the arc (2D) or sphere/hull (3D) they define. You cannot pan a source to appear closer than the speakers, behind the array, or floating in empty space inside the room with any reliability — there is no mechanism to synthesize the curvature of a wavefront converging on, or diverging from, an interior point. Distance cues must be faked by level, reverberation, and air-absorption filtering (distance and air), not by genuine wavefront geometry. DBAP's "source position" is an authoring abstraction, not a reconstructed acoustic point.

Listener dependence and no real wavefront

Both methods reconstruct only the velocity vector (the energy-weighted direction) at one point, not the actual sound field over an area. Off the design point the phantom shifts or collapses; there is no large sweet area, only a sweet spot (VBAP) or a position-cued field that follows precedence (DBAP). The reproduced field is not physically correct anywhere except trivially — two real wavefronts are present, not one synthesized one, so interference colouration and width variation are intrinsic.

Where to go next

These limits are precisely what the next two methods address. Ambisonics keeps the gain-decode philosophy but reconstructs the field as a basis-function expansion (spherical harmonics), giving a layout-independent encode, a principled decode, and control of both the velocity vector rV\mathbf{r}_V and the energy vector rE\mathbf{r}_E across a wider area. Wave Field Synthesis abandons the phantom entirely and physically reconstructs the wavefront from a dense array using the Huygens principle, so sources can sit in front of, behind, or inside the array with a large valid listening zone. Amplitude panning is the floor on which these stand; understanding its phantom-image mechanism and its limits is what makes the more elaborate methods legible. It is also worth remembering, per stereo is already spatial, that even two speakers already do real spatial work — VBAP and DBAP are that same trick, generalized.

Connection to DAM RIPL

The methods in this chapter are gain-only: they steer images by relative level. A parallel family steers by delay (wavefront timing), as in WFS, and a third combines both. DAM Audio's RIPL engine is built around exactly this unification — treating amplitude panning and delay-based rendering as two ends of one continuum rather than separate, incompatible paradigms.

The conceptual bridge is straightforward. A pure amplitude panner sets per-speaker gain gig_i with zero delay; a pure delay/wavefront renderer sets per-speaker delay τi\tau_i (and a 1/d1/\sqrt{d}-type amplitude). A unified renderer assigns both (gi,τi)(g_i, \tau_i) to every speaker as functions of the authored source geometry, so a single object can be rendered with VBAP-like crispness at a sweet spot, DBAP-like robustness across a room, or WFS-like wavefront curvature for near-field and interior sources — and can interpolate between these as the production demands. This matters for object-based workflows, where one stream of object metadata (position, size, distance) must be decoded to whatever physical system is present: a regular dome (favour VBAP), an irregular installation (favour DBAP), or a dense linear array (favour WFS). Keeping gain-based and delay-based panning in one framework is the practical expression of this Part's recurring theme — encode once, decode to the system you actually have — and it is the design principle behind RIPL's spatialization core.

References

  1. Pulkki, V. (1997). "Virtual Sound Source Positioning Using Vector Base Amplitude Panning." Journal of the Audio Engineering Society, 45(6), 456–466.
  2. Pulkki, V. (1999). "Uniform Spreading of Amplitude Panned Virtual Sources." Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY. (The MDAP method.)
  3. Lossius, T., Baltazar, P., & de la Hogue, T. (2009). "DBAP — Distance-Based Amplitude Panning." Proc. International Computer Music Conference (ICMC), Montreal.
  4. Blumlein, A. D. (1931). "Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing Systems." British Patent 394,325. (Origin of the two-channel stereo and the directional summing principle.)
  5. Zotter, F., & Frank, M. (2019). Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality. Springer Open. (Velocity/energy vectors rV\mathbf{r}_V, rE\mathbf{r}_E and the panning-law foundations.)
  6. Bennett, J. C., Barker, K., & Edeko, F. O. (1985). "A New Approach to the Assessment of Stereophonic Sound System Performance." Journal of the Audio Engineering Society, 33(5), 314–321. (Tangent law analysis.)
  7. Braasch, J., Peters, N., & Valente, D. L. (2008). "A Loudspeaker-Based Projection Technique for Spatial Music Applications Using Virtual Microphone Control (ViMiC)." Computer Music Journal, 32(3), 55–71.
  8. Pulkki, V., & Karjalainen, M. (2015). Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics. Wiley. (Summing localization and amplitude-panning perception.)
  9. Theile, G., & Wittek, H. (2004). "Wave Field Synthesis: A Promising Spatial Audio Rendering Concept." Acoustical Science and Technology, 25(6), 393–399. (Contrast of phantom-source panning with physical wavefront synthesis.)

← Back to Spatialization Techniques