Distance & Air Absorption

A spatializer that can place a source to the left and right, up and down, and yet cannot make it sound near or far, has solved only half of the problem. Direction tells you where on the sphere a source sits; distance tells you how far out along the ray to that point the source actually is. The two are perceptually and physically independent, and the cues the auditory system uses for each are almost entirely different. This chapter is about the second axis: how the ear estimates distance, what physics produces the cues, and how a renderer must reproduce them so that a virtual source at 30 metres is heard as distant rather than merely quiet.

The recurring theme of this guide applies here with unusual force. A renderer does not place a source by setting a gain. It must reproduce the perceptual cues that the listener's auditory system is wired to interpret — and for distance those cues are loudness, spectral balance, the ratio of direct to reflected energy, the timing of the first reflections, and, in the proximity region, binaural cues that change shape entirely. Get one cue right and the others wrong, and the brain hears the contradiction: a source that is loud but bright and dry sits at your ear no matter what fader you pull. We will keep returning to the psychoacoustics established in Fundamentals: Psychoacoustics, because distance perception is one of the clearest cases where physics and perception must be modelled together.

Why distance is a distinct perceptual axis

Direction and distance use different machinery

The cues for direction are well known and covered in the fundamentals: interaural time differences (ITD), interaural level differences (ILD), and the spectral colouration imposed by the pinna, head, and torso (the head-related transfer function, HRTF). Crucially, all of these are angular cues. They tell you the bearing to the source — its azimuth and elevation — but in an anechoic free field they are almost completely silent about how far away it is. Move a point source from 2 m to 8 m straight ahead in an anechoic chamber and the ITD stays zero, the ILD stays near zero, and the HRTF-imposed spectrum is essentially unchanged. The only thing that changes is the overall level. The geometry of the two-ears-on-a-head system simply does not encode range once you are more than a metre or so away.

Distance, therefore, is inferred rather than measured. The auditory system fuses several weak, individually ambiguous cues — overall loudness, the proportion of reverberant energy, the high-frequency content, the delay before the first reflection, and prior knowledge of how loud the source "should" be — into a single range estimate. Because each cue is weak and context-dependent, distance perception is far less precise than direction perception. Listeners localise azimuth to a degree or two in the front; they routinely misjudge distance by tens of percent, and they systematically compress the range, hearing near sources as farther than they are and far sources as nearer. This compression follows roughly a compressive power law, perceived distance proportional to physical distance raised to an exponent below one (often quoted around $0.4$ in reverberant rooms), so that doubling the physical distance produces less than a doubling of perceived distance.

Why level alone fails

The naive model of distance is "farther equals quieter," and for an isolated tone in an anechoic field that is literally all that changes. But level alone is a catastrophically poor distance cue in practice, for three reasons that recur throughout this chapter.

First, level is absolute and unanchored. A quiet sound could be a soft source nearby or a loud source far away; nothing in the level itself disambiguates the two. The brain can only use level as a distance cue if it already knows the source's intrinsic power — a whisper, a shout, a trumpet at forte. This is the familiarity cue, and it collapses the moment the source is unfamiliar or the level is arbitrary, as it always is in a mix where the engineer set the fader.

Second, level changes with the playback system's volume control. Turn up the monitor and every source gets louder; nothing should move farther away, and indeed nothing does, because the other cues (reverberant ratio, spectrum, reflection timing) are untouched. The auditory system trusts those stable cues over the global level.

Third, indoors the level barely changes with distance at all once you pass the critical distance, because the reverberant field is essentially uniform throughout the room. A source at 3 m and a source at 12 m in a typical room can produce nearly the same sound-pressure level at the listener, yet they are heard as clearly different distances — because the ratio of direct to reverberant energy has changed even though the total has not.

Key takeaway

This single fact is why the direct-to-reverberant ratio, not level, is the dominant indoor distance cue, and we devote a full section to it below.

The inverse-distance law

Pressure, intensity, and the 6 dB rule

Start in the simplest possible world: a point source radiating steadily into free space, no boundaries, no absorption. Energy is conserved, so the acoustic power crossing every imaginary sphere centred on the source is the same. The area of a sphere grows as $4\pi r^{2}$ , so the intensity — power per unit area — must fall as $I \propto 1/r^{2}$ . This is the inverse-square law, and it is purely geometric: the energy is being spread thinner over an ever-larger surface.

Sound pressure is what microphones and ears respond to, and intensity in a free-field plane-wave-like region is proportional to pressure squared ( $I \propto p^{2}$ ). Therefore pressure falls only as $p \propto 1/r$ , the inverse-distance law. The distinction matters: the intensity obeys inverse-square, but the pressure obeys inverse-first-power. In decibels, sound pressure level is $20\log_{10}(p)$ , so doubling the distance multiplies pressure by $1/2$ and changes the level by $20\log_{10}(1/2) = -6.02$ dB. Every doubling of distance from a point source costs almost exactly 6 dB. Equivalently, the level relative to a reference distance is

$\Delta L = -20\log_{10}\!\left(\frac{r}{r_{\mathrm{ref}}}\right) \text{ decibels.}$

Worked example 1. A point source produces 94 dB SPL at the reference distance $r_{\mathrm{ref}} = 1$ m. At 2 m it is $94 - 20\log_{10}(2) = 94 - 6.02 = 88.0$ dB. At 4 m, $94 - 20\log_{10}(4) = 94 - 12.04 = 82.0$ dB. At 10 m, $94 - 20\log_{10}(10) = 94 - 20 = 74.0$ dB. At 100 m, $94 - 40 = 54.0$ dB. Note that the loss per doubling is constant (6 dB) but the loss per metre shrinks rapidly: the first metre of separation (1 m to 2 m) costs 6 dB, while going from 50 m to 51 m costs only $20\log_{10}(51/50) = 0.17$ dB. Distance attenuation is steep up close and almost flat far away — a fact with direct consequences for how renderers clamp the near field, discussed later.

Free-field assumptions and where they break

The inverse-distance law as stated assumes four things: a point source small compared to the wavelength and to $r$ ; radiation into full space with no reflecting boundaries; no absorption along the path; and a steady, far-enough field that pressure and particle velocity are in phase (so $I \propto p^{2}$ holds). Each assumption fails somewhere. Boundaries (a floor, a wall, a room) add reflected energy and break the monotone roll-off — that is the entire subject of the direct-to-reverberant section. Absorption removes energy preferentially at high frequencies — that is air absorption. And very close to the source the field is reactive, pressure and velocity fall out of phase, and the simple relations break — that is the near field. The 6 dB-per-doubling rule is the anechoic, far-field, point-source ideal; every real situation is a departure from it, and modelling distance well is largely a matter of modelling those departures.

Point versus line versus plane sources

The $1/r$ law is specific to a point (spherical) source.

warning

The geometry of the radiator changes the exponent, and a renderer or system designer who forgets this will mispredict level by many decibels over a hall.

An idealised infinite line source radiates cylindrical waves: the wavefronts are cylinders, whose surface area grows only as $2\pi r L$ — linearly with $r$ , not as $r^{2}$ . Intensity therefore falls as $I \propto 1/r$ , pressure as $p \propto 1/\sqrt{r}$ , and the level drops by $20\log_{10}(1/\sqrt{2}) = -3.01$ dB per doubling — half the rate of a point source. This is precisely why a well-designed line array can cover the back of an arena: a column of drivers behaves, over a useful range, like a line source and rolls off near 3 dB per doubling rather than 6, so the front-to-back level difference is far smaller. Real line arrays are finite, so they behave as a line source in a near region and revert to point-source 6 dB-per-doubling behaviour beyond a transition distance that depends on array length and frequency; clever array curvature ("J" arrays) extends the cylindrical region toward the audience.

An idealised infinite plane source radiates a plane wave that does not spread at all: intensity and pressure are, in the ideal, constant with distance (0 dB per doubling), losing energy only to air absorption. No real source is an infinite plane, but a large flat radiator close up, or a source in a duct, approximates it over a limited range. The table below summarises the geometry.

Source geometry	Wavefront	Intensity vs r	Pressure vs r	Level per doubling
Point (monopole)	Spherical	$1/r^{2}$	$1/r$	$-6.0$ dB
Line (infinite)	Cylindrical	$1/r$	$1/\sqrt{r}$	$-3.0$ dB
Plane (infinite)	Planar	constant	constant	0 dB

Worked example 2. Compare a point source and an ideal line source, both 100 dB at 1 m, heard at 32 m. Point: $100 - 20\log_{10}(32) = 100 - 30.1 = 69.9$ dB. Line: $100 - 10\log_{10}(32) = 100 - 15.05 = 85.0$ dB (using the cylindrical law, where level loss is $10\log_{10}(r/r_{\mathrm{ref}})$ because pressure goes as $1/\sqrt{r}$ ). The line source is 15 dB louder at the back of the room — the entire engineering rationale for line arrays in long venues. A renderer for an object-based system that allows different source types, or a system-design tool, must pick the right exponent; assuming 6 dB-per-doubling for a line array under-predicts coverage badly.

Direct-to-reverberant ratio: the dominant indoor cue

Why DRR is level-independent

Indoors, the sound reaching the listener is the sum of two fields. The direct sound travels straight from source to listener and obeys the $1/r$ law. The reverberant sound is the accumulation of countless reflections; after enough bounces it becomes a diffuse field whose energy density is roughly uniform throughout the room and essentially independent of where the source or listener stands. The total energy is direct plus reverberant.

Now move the listener farther from the source. The direct energy drops as $1/r^{2}$ . The reverberant energy stays put. So the ratio of direct to reverberant energy — the direct-to-reverberant ratio, DRR — falls steadily with distance, even while the total level barely changes. This is the magic of DRR as a cue: because it is a ratio, it is immune to the monitor volume knob and to the absolute power of the source. Double the playback gain and both direct and reverberant rise together; the ratio is unchanged and the perceived distance does not move. That invariance is exactly what level lacks, and it is why the auditory system weights DRR so heavily indoors. We treat the reverberant field itself in Reverberation and the spatial split of direct versus diffuse energy in Direct, Diffuse & Envelopment; here the point is its role as the master distance cue.

Critical distance

There is a special radius at which the direct and reverberant energies are equal: the critical distance (also reverberation distance), $r_{c}$ . Inside it, the direct sound dominates and behaves much like free field; outside it, the reverberant field dominates and the level flattens out. The critical distance depends on the room's absorption — equivalently its reverberation — and on the source's directivity. For an omnidirectional source,

$r_{c} \approx 0.057\,\sqrt{\frac{V}{RT_{60}}} \text{ metres,}$

with room volume $V$ in cubic metres and reverberation time $RT_{60}$ in seconds. The constant folds together the diffuse-field and direct-field relations. A directional source pushes more energy forward, so its critical distance on-axis is larger by a factor of $\sqrt{Q}$ , where $Q$ is the directivity factor ( $Q = 1$ omni, $Q = 2$ for a source on a hard floor radiating into a hemisphere). Source directivity is itself a major topic — see Source Directivity.

Worked example 3. A room of $V = 3000\ \mathrm{m^{3}}$ with $RT_{60} = 1.2$ s. $r_{c} \approx 0.057\,\sqrt{3000 / 1.2} = 0.057\,\sqrt{2500} = 0.057 \times 50 = 2.85$ m. So a listener only 2.85 m from an omni source already receives equal direct and reverberant energy; at 6 m the direct is 6.5 dB below the reverberant ( $-20\log_{10}(6/2.85) = -6.5$ dB on the direct, reverberant unchanged), and beyond about 10 m the listener is firmly in the reverberant field where moving farther changes the level by less than a decibel. In this room, level is nearly useless past 3 m and DRR carries the distance percept almost alone.

Mapping DRR to perceived distance

Because direct energy falls as $1/r^{2}$ while reverberant energy is flat, DRR in decibels is, to good approximation, linear in $\log(r)$ : every doubling of distance lowers the DRR by about 6 dB (the direct drops 6, the reverberant holds). The auditory system reads this monotone relationship as a distance scale. Empirically, listeners can resolve DRR changes of a few decibels, which is why DRR supports a usable but coarse distance sense — finer than level alone, coarser than direction. Several practical consequences follow. A perfectly dry (anechoic) signal has infinite DRR and tends to collapse to the head regardless of its level, which is why dry close-mic'd sources sound "in your face." Adding even a little reverberation immediately pushes a source out and back. And because DRR saturates — beyond the critical distance the direct sound is so far below the reverberant that further reduction is barely audible — rooms impose a maximum perceivable distance: you cannot make a source sound much farther than the room's reverberant field implies. A renderer that wants a source to seem distant must therefore lower the direct send and raise the reverberant send together, not merely attenuate the source. We return to this as the central design move in the "how spatializers model distance" section.

Air absorption

The physics: why air is lossy, and lossy unequally

The inverse-distance law removes no energy; it only spreads it. Air absorption actually destroys acoustic energy, converting it to heat, and it does so far more strongly at high frequencies. Three mechanisms are at work. Viscosity (shear friction between layers of air moving at different velocities within the wave) and thermal conduction (heat flowing from the warm compressed regions to the cool rarefied ones, an irreversible smoothing) together give the "classical" absorption, which rises with the square of frequency. On top of that, and usually dominant in the audible range, is molecular relaxation: oxygen and nitrogen molecules absorb a little vibrational energy from the wave and release it slightly out of phase, a lossy lag. The relaxation frequencies of $\mathrm{O_{2}}$ and $\mathrm{N_{2}}$ depend strongly on the amount of water vapour present, because water molecules catalyse the energy exchange — which is why humidity has such a large and non-obvious effect on high-frequency air absorption.

The net result is a frequency-dependent attenuation coefficient $\alpha$ (in dB per metre, or dB per 100 m) that grows steeply with frequency, depends on temperature and relative humidity, and is negligibly small at low frequencies. The international standard that quantifies it is ISO 9613-1, Attenuation of sound during propagation outdoors — Calculation of the absorption of sound by the atmosphere, which gives the relaxation-frequency formulae and the resulting $\alpha(f, T, h)$ . (Its companion ISO 9613-2 handles geometric and ground effects.) For room-scale work the same physics applies; standards such as ISO 9613-1 and the air-absorption terms in concert-hall acoustics (Beranek, Kuttruff) use identical relaxation models.

Frequency, temperature, humidity dependence

Rule of thumb

The qualitative rules every practitioner should carry: absorption rises roughly as $f^{2}$ at the high end, so it is trivial below 1 kHz and severe above 4 kHz; it is worst at low-to-moderate humidity (around 10–40 percent), where the relaxation peaks sit in the audible band, and is somewhat reduced in very dry or very humid air; and it varies with temperature, generally increasing the relaxation effect as air warms.

The numbers below are representative values near 20 °C and 50 percent relative humidity; treat them as order-of-magnitude, since changing the humidity to 20 percent can roughly double the high-frequency figures.

Frequency	Approx. absorption (dB / 100 m)	Loss over 10 m	Loss over 50 m
125 Hz	0.3	0.03 dB	0.15 dB
250 Hz	1.0	0.10 dB	0.5 dB
500 Hz	1.7	0.17 dB	0.85 dB
1 kHz	3.0	0.30 dB	1.5 dB
2 kHz	7.4	0.74 dB	3.7 dB
4 kHz	19	1.9 dB	9.5 dB
8 kHz	58	5.8 dB	29 dB
16 kHz	180	18 dB	90 dB

Why far sources sound dull

Read the bottom of that table and the perceptual consequence is obvious. Over 50 m, 1 kHz loses about 1.5 dB while 8 kHz loses 29 dB — a relative high-frequency loss of nearly 28 dB. The far source is not just quieter; it is dramatically darker, a gentle low-pass filter whose corner moves downward as distance grows. This is a powerful and unambiguous distance cue, because unlike level it cannot be faked by the volume knob and unlike DRR it works outdoors where there is no reverberant field. Distant thunder rumbles; nearby thunder cracks. A jet overhead is bright; the same jet two miles off is a muffled roar. The auditory system has learned this regularity from a lifetime of listening, and a renderer that omits it produces sources that are "near and far at once" — correct level, wrong spectrum — which the brain reads as artificial.

Worked example 4. A broadband source heard at 100 m versus 1 m outdoors, 20 °C, 50 percent humidity. Geometric loss is the same at all frequencies: $-20\log_{10}(100/1) = -40$ dB. Add air absorption over the extra 99 m: at 1 kHz, $3.0\ \mathrm{dB/100m} \times 0.99 \approx 3.0$ dB, total $\approx 43$ dB down. At 4 kHz, $19 \times 0.99 \approx 18.8$ dB, total $\approx 58.8$ dB down. At 8 kHz, $58 \times 0.99 \approx 57$ dB, total $\approx 97$ dB down — effectively gone. So relative to the 1 kHz reference the source has lost about 16 dB at 4 kHz and 54 dB at 8 kHz: a steep tilt that any listener instantly reads as "very far." A renderer that applied only the 40 dB geometric loss would leave a fully bright 8 kHz, and the source would sound close but inexplicably faint.

The initial time gap and early-reflection cues

What the initial time gap is

When a source plays in a room, the direct sound arrives first; then, after a silence, the first reflection (usually off the floor, or the nearest wall) arrives. The interval between them is the initial time-delay gap (ITDG), a term from Beranek's concert-hall research. Its size depends on geometry. If you sit close to a source in a large room, the direct sound has a short path but the first reflection still has to travel out to a wall and back, so the gap is large. If you sit far from the source, the direct path is long and the reflection paths are comparatively only a little longer, so the gap is small. The ITDG therefore shrinks as the listener moves away from the source — an inverse relationship to distance that the auditory system exploits.

How early reflections encode distance

More generally, the pattern of early reflections — their delays relative to the direct sound and their relative levels — carries rich distance and room-size information. Two cues combine. First, the delay of the first reflection relative to the direct sound: a near source produces a direct sound much earlier than its first reflection (long gap), a far source produces them almost together (short gap). Second, the level of the early reflections relative to the direct sound: as the source recedes, the direct sound drops by 6 dB per doubling while the early reflections, which travel via the room, drop more slowly, so the reflections grow in relative prominence — a smooth precursor to the full DRR effect. The brain uses these early arrivals, which fall within the roughly 50–80 ms integration window where they fuse with the direct sound (the precedence effect, see Psychoacoustics) and reinforce rather than blur it, to estimate both how far the source is and how big the room is.

Worked example 5. A source and listener 2 m apart on a floor; the ceiling is 4 m up and source and listener are each 2 m below it at mid-height. Direct path: 2 m, arriving at $2 / 343 = 5.83$ ms. The floor-bounce path, by image-source construction with the source and receiver at, say, 1.2 m height, has a reflected length of about $\sqrt{2^{2} + (2 \times 1.2)^{2}} = \sqrt{4 + 5.76} = 3.12$ m, arriving at $3.12 / 343 = 9.1$ ms. Initial gap $\approx 3.3$ ms. Now move the listener to 8 m: direct path $8 / 343 = 23.3$ ms; floor bounce $\sqrt{8^{2} + 2.4^{2}} = \sqrt{64 + 5.76} = 8.35$ m arriving at $24.3$ ms; gap $\approx 1.0$ ms. The gap collapsed from 3.3 ms to 1.0 ms as distance quadrupled — exactly the shrinking-ITDG signature of increasing distance. A renderer that models early reflections geometrically (an image-source model feeding a reverberator) reproduces this automatically; one that uses a fixed reverb tail for all distances does not.

Near-field effects and the proximity region

The reactive near field

Everything above assumed the listener is in the source's far field, where the wave is locally plane-like and pressure and particle velocity are in phase. Very close to a source — within roughly a wavelength, and in practice under about a metre for a small source — the field is reactive: a substantial part of the acoustic energy sloshes back and forth between source and air without propagating away, pressure and velocity fall out of phase, and the simple $I \propto p^{2}$ relation no longer holds. Two perceptually important things happen in this proximity region.

Low-frequency emphasis

A small source (acoustically a monopole or dipole) close to the ear produces a relative boost of low frequencies — the near-field "proximity effect," the same physics that makes a vocalist sound bassy with lips on a cardioid microphone. The HRTF measured very close to the head differs from the far-field HRTF chiefly by this low-frequency lift and by a change in overall level distribution between the ears. The auditory system, having learned that only very near sources exhibit this bass emphasis, reads strong proximity-effect bass as a powerful "right next to me" cue. This is why a whisper rendered with near-field HRTFs and a touch of LF lift sounds intimately close, and why a binaural renderer that ignores near-field HRTFs cannot convincingly place a source at, say, 20 cm.

Growing ILD at close range

Recall that in the far field the ILD for low frequencies is small, because long wavelengths diffract around the head and reach both ears at nearly equal level. In the near field this changes dramatically. When the source is, say, 15 cm from the left ear, the right ear is more than twice as far away, and the $1/r$ pressure law alone produces a large interaural level difference even at low frequencies — the far ear is in a different part of the source's rapidly falling field. So as a source approaches the side of the head, its ILD grows steeply and extends down to frequencies where, far away, ILD was negligible. This distance-dependent low-frequency ILD is one of the few binaural distance cues, and it operates only in the proximity region. It also makes lateral near sources far more vivid than frontal ones, since the front-back symmetry keeps the two ear distances similar straight ahead.

Worked example 6. A source 0.15 m from the left ear, head width putting the right ear about 0.32 m from the source (0.15 m plus a $\sim 0.17$ m interaural path around the head). The pure inverse-distance ILD is $20\log_{10}(0.32 / 0.15) = 20\log_{10}(2.13) = 6.6$ dB — and this exists at all frequencies, including the low end where the far-field ILD would be near zero. Move the same source to 1.5 m on the left: now the two ear distances are about 1.5 m and 1.67 m, giving $20\log_{10}(1.67/1.5) = 0.93$ dB of geometric ILD. The near source has roughly seven times the low-frequency geometric ILD of the far one. This is why proximity is so compelling binaurally and so hard to fake with far-field HRTFs and a gain — see Binaural.

Familiarity, loudness constancy, and the limits of the level cue

Loudness constancy

The auditory system exhibits loudness constancy, analogous to the visual system's size and brightness constancy: a familiar source is perceived to have a roughly stable intrinsic loudness even as its received level changes with distance, with the brain "explaining away" the level change as a distance effect. You hear a talker walk away as moving away, not as getting quieter, because you know how loud speech is and attribute the falling level to range. This is enormously useful — it lets level function as a distance cue for familiar sources — but it depends entirely on prior knowledge of the source's intrinsic power.

Familiarity and its failure

For unfamiliar or novel sources, the familiarity cue is absent and level becomes nearly useless for absolute distance: a listener cannot tell whether a strange tone is soft and near or loud and far. Experiments with calibrated speech versus whispers make the point: a whisper played at the same level as normal speech is judged much closer, because the listener knows whispers are intrinsically quiet, so for it to arrive at this level it must be near. The same physical level, different prior, different perceived distance. A renderer cannot rely on level to convey distance unless the listener has a stable expectation of the source's loudness, which in a mix they usually do not — the engineer set the fader arbitrarily. This is the deepest reason the field has converged on DRR, spectrum, and reflection cues as the load-bearing distance tools, with level a supporting player.

The limits of level, summarised

Putting it together, level is a usable distance cue only in the conjunction of three conditions: an anechoic or near-anechoic field (so the $1/r$ law actually governs the received level), a familiar source (so the listener knows the intrinsic power), and a fixed, known playback gain (so the absolute level is meaningful). Remove any one and level fails. Outdoors with an unfamiliar source, spectrum (air absorption) carries the day. Indoors, DRR carries it. In the proximity region, near-field HRTF and low-frequency ILD carry it. Level is the least robust of the distance cues despite being the most obvious, and a spatializer that models distance as gain alone is modelling the weakest cue while ignoring the strong ones.

How spatializers model distance

We now turn to the practical question: given a desired source distance, what does a renderer actually compute, and what parameters does it expose? The honest summary is that no production renderer applies a literal $1/r$ to a literal physical distance, because doing so is both numerically unworkable and perceptually wrong. Instead they expose a small set of tunable parameters that approximate the cue set above while staying controllable.

Reference distance, roll-off, and clamps

Three parameters define the level-versus-distance behaviour in nearly every spatial audio engine (game audio middleware, object-based renderers, and DAW spatial panners alike):

Reference distance ( $r_{\mathrm{ref}}$ ): the distance at which the source plays at its nominal, unattenuated gain. Inside it, level is held constant (or rolled off gently); this prevents the $1/r$ law from sending the gain to infinity as $r \to 0$ . A typical default is 1 m.
Maximum distance ( $r_{\max}$ ): the distance beyond which attenuation stops (or the source is culled). This bounds the quietest the source can get and prevents it from vanishing entirely.
Roll-off factor / model: how steeply level falls between $r_{\mathrm{ref}}$ and $r_{\max}$ . Common models are inverse (true $1/r$ , 6 dB per doubling), linear (level falls linearly with distance to zero at $r_{\max}$ , perceptually crude but predictable), exponential, and various clamped/scaled hybrids. A roll-off factor of 1 with the inverse model is physical 6 dB-per-doubling; values below 1 flatten it, values above steepen it.

The inverse model with clamps computes, for $r_{\mathrm{ref}} \le r \le r_{\max}$ ,

$\text{gain} = \frac{r_{\mathrm{ref}}}{r_{\mathrm{ref}} + \text{rolloff} \cdot (r - r_{\mathrm{ref}})},$

which equals 1 at $r_{\mathrm{ref}}$ , approximates $1/r$ for large $r$ when $\text{rolloff} = 1$ , and never blows up. This clamped inverse is the workhorse formula in most middleware.

Why literal 1/r is impractical

danger

A literal $1/r$ has two fatal problems for a renderer. As $r \to 0$ the gain $\to \infty$ , which is meaningless and will clip or blow up a bus; sources passing through the listener position would produce infinite level. And as $r$ grows the level falls without bound, so a source a kilometre away would be 60 dB down and consuming CPU for nothing.

The reference-distance clamp solves the first (gain held flat inside $r_{\mathrm{ref}}$ ); the max-distance clamp solves the second. Designers also frequently want a non-physical roll-off — a gentler curve so distant game objects remain audible for gameplay, or a steeper one for dramatic effect — which is why the roll-off factor is exposed as an artistic control rather than nailed to physics. The renderer's job is to be perceptually convincing and controllable, not literally correct.

The distance low-pass for air

To reproduce the spectral dulling of far sources (the air-absorption cue), renderers insert a distance-dependent low-pass filter whose cutoff falls as distance grows. A first-order low-pass with a cutoff that scales inversely with distance is the common, cheap approximation; more careful engines tabulate ISO 9613-1 $\alpha(f)$ and apply a true frequency-dependent gain, optionally exposing temperature and humidity. The parameter the user sees is usually an "air absorption" amount or an explicit cutoff-vs-distance curve. Even the crude one-pole version captures the essential perceptual effect — far equals dark — and dramatically improves distance realism over a flat gain. Object-based and game engines almost universally include some form of this filter precisely because the spectral cue is so robust against the volume-knob problem.

The distance-driven direct/reverb balance

tip

The single most important thing a good distance model does indoors is vary the direct-to-reverberant ratio with distance — sending less signal to the dry/direct path and more to the reverberant path as the source recedes, so that DRR falls by roughly 6 dB per doubling as physics requires.

This is what makes a source recede convincingly even at constant total loudness. Renderers expose it as a reverb send that scales with distance (or a "room"/"distance" knob that simultaneously lowers direct gain and raises the send), often together with a distance-dependent pre-delay and early-reflection level to mimic the shrinking ITDG of Section 6. Parametric spatial reverberators are built precisely to take a source distance (and direction) and synthesise the correct direct/early/late balance; DAM Audio's VRA parametric spatial reverb is an example of a renderer that derives the reverberant field and its distance-dependent ratio from the source geometry rather than from a static fixed-tail effect. The mechanics of that reverberant field are developed in Reverberation and its spatial distribution in Direct, Diffuse & Envelopment.

Object metadata: carrying distance and gain

In object-based formats the distance information does not have to be baked into a rendered signal at all; it can be carried as metadata alongside the audio and resolved at the renderer. An object can carry a 3-D position (and hence distance from the listener or from the array centre), a gain, and sometimes explicit size, divergence, and "use-reference-distance" flags. The renderer at playback time reads the position, computes the panning and the distance treatment (level, air low-pass, reverb send) appropriate to the actual reproduction system, and produces the speaker or binaural feeds. This separation is the whole point of object-based audio: the same object plays correctly over headphones, a 5.1 rig, or a large array, with each renderer applying its own distance model. See Object-Based Audio and the metadata discussion in Formats.

Putting it together: a source from 1 m to 64 m

We close the technical core with a single worked walk-through that exercises every cue at once. Take an omnidirectional broadband source in a moderately reverberant room, $V = 3000\ \mathrm{m^{3}}$ , $RT_{60} = 1.2$ s (so from Worked Example 3 the critical distance $r_{c} \approx 2.85$ m), at 20 °C and 50 percent humidity. We move it 1 m, 4 m, 16 m, 64 m and quantify the direct level, the high-frequency air loss, and the DRR at each step. Reference: at 1 m the direct sound is 90 dB and the DRR is, say, $+12$ dB (the listener is well inside the critical distance, direct strongly dominant).

Direct level follows $-20\log_{10}(r)$ . From 1 m: 4 m is $-12.0$ dB (78 dB), 16 m is $-24.1$ dB (65.9 dB), 64 m is $-36.1$ dB (53.9 dB). Note that indoors these are the direct-component levels only; the total received level flattens out past $r_{c}$ because the reverberant field fills in — at 16 m and 64 m the listener is deep in the reverberant field and the total level is nearly constant, dominated by reverberation.

Air absorption adds frequency-dependent loss on top, over the path length. Using the table's 8 kHz figure of $\sim 58$ dB/100 m: at 4 m the extra 8 kHz loss is $0.58\ \mathrm{dB/m} \times 3\ \mathrm{m} \approx 1.7$ dB; at 16 m, $0.58 \times 15 \approx 8.7$ dB; at 64 m, $0.58 \times 63 \approx 36.5$ dB of additional high-frequency roll-off beyond the geometric loss. So at 64 m the 8 kHz band is down 36 dB geometric plus 36.5 dB air $\approx 72$ dB, while 1 kHz is down 36 dB geometric plus only $\sim 1.9$ dB air $\approx 38$ dB. The source is not merely 36 dB quieter at distance; its top octave is gone — the unmistakable dull-far signature.

DRR falls $\sim 6$ dB per doubling of distance as the direct component drops and the reverberant holds. From $+12$ dB at 1 m: 4 m (two doublings) is about $12 - 12 = 0$ dB (direct and reverberant equal — note this is near the critical distance, consistent with $r_{c} \approx 2.85$ m); 16 m (two further doublings) is about $-12$ dB; 64 m is about $-24$ dB (direct sound 24 dB below the reverberant — the source is heard almost entirely through the room). The table consolidates the journey.

Distance	Direct level (re 1 m)	Extra 8 kHz air loss	DRR	Perceived character
1 m	0 dB (90 dB)	$\sim 0$ dB	$+12$ dB	Close, present, bright, dry
4 m	$-12$ dB	$\sim 1.7$ dB	$\sim 0$ dB	At the critical distance; room becoming audible
16 m	$-24$ dB	$\sim 8.7$ dB	$\sim -12$ dB	Clearly distant, reverberant, dulling
64 m	$-36$ dB	$\sim 36.5$ dB	$\sim -24$ dB	Far, dark, heard through the room; level nearly flat vs total field

The lesson of the table is the lesson of the chapter. To render the move from 1 m to 64 m, a spatializer cannot pull a single fader by 36 dB. It must drop the direct gain by 36 dB and progressively low-pass the signal toward a dull top end and raise the reverberant send so the DRR swings from $+12$ dB to $-24$ dB and shorten the early-reflection pre-delay. Reproduce all four and the source recedes convincingly; reproduce only the level and it merely gets faint while sitting, impossibly, right at the listener's ear.

Common mistakes and pitfalls

Gain-only "distance"

Common mistake

The most common error, by a wide margin, is treating distance as a synonym for gain — pulling a fader and calling it depth.

As Sections 1, 7, and 9 establish, level is the weakest and least robust distance cue, and on its own it does not move a source backward; it makes it quieter while leaving it spectrally bright, dry, and acoustically in your face. The brain reads the contradiction and the source refuses to recede. A "distant" sound that is still fully bright and reverb-free will be heard as a quiet near sound, not a loud far one. Always pair level with at least the spectral (air) cue and, indoors, the DRR cue.

Ignoring air absorption

Omitting the distance low-pass leaves far sources unnaturally crisp. This is especially jarring outdoors, in large-venue rendering, and in any scene with a wide distance range, where a bright, articulate source at 80 m breaks the illusion instantly because lifelong listening has taught the ear that far sources are dark. The fix is cheap — even a single distance-scaled one-pole low-pass is a large improvement — so there is little excuse to skip it.

Inconsistent models across speakers or objects

When different sources, channels, or renderers in the same scene use different distance models — one object on a literal $1/r$ , another on a linear roll-off, a third with no air filter — the scene's depth becomes incoherent and sources fail to sit in a common space. The same hazard appears across a multi-speaker system: if the per-speaker distance compensation or the source-type exponent (point versus line, Section 2) is set inconsistently, the rendered distances disagree between zones of the room. Pick one physically grounded model and apply it uniformly; the calibration chapter (covered elsewhere in this guide) addresses keeping the reproduction system itself consistent so that the renderer's distance model survives playback.

Decoupling direction and distance cues

Rendering a believable angle with HRTFs but a flat gain for distance, or vice versa, produces sources that are convincingly placed in bearing but ambiguous in range. Worse, in the proximity region (Section 6) using far-field HRTFs for a 20 cm source omits the low-frequency lift and the growing low-frequency ILD, so the source cannot be heard as truly close however much gain is applied. Near sources need near-field HRTFs; distance and direction must be modelled together.

Double-counting or mismatching reverberation

If the dry signal already contains baked-in room sound (a reverberant recording) and the renderer then adds its own distance-driven reverb send, the DRR cue is applied twice and the source sits farther and muddier than intended. Conversely, applying a fixed reverb tail that does not scale with distance gives every source the same DRR regardless of where it is placed, defeating the dominant indoor cue. The reverberant treatment must be distance-aware and must not collide with reverberation already present in the source — a point developed in Reverberation.

Limits

Even a renderer that models every cue in this chapter faces hard limits, and it is important to state them so expectations are realistic.

First, distance perception is intrinsically coarse and compressive. Listeners misjudge absolute distance by large margins and compress the scale, so no renderer can deliver fine, accurate absolute range the way it can deliver fine angular accuracy. The realistic goal is convincing relative distance and a believable sense of near versus far, not centimetre-accurate range.

Second, the room imposes a maximum perceivable distance. Because DRR saturates beyond the critical distance, a source cannot be made to sound much farther than the reproduction room's (or the simulated room's) reverberant field allows. In a dry rendering there is no DRR cue at all, so indoor-style depth simply cannot be created from level and spectrum alone; you must synthesise a reverberant field, which is why parametric room renderers exist.

Third, air-absorption modelling is only as good as its assumptions. ISO 9613-1 $\alpha$ depends strongly on temperature and humidity; a renderer using a single fixed curve will be wrong by a factor of two or more when the modelled environment differs (dry winter air versus humid summer air shifts the high-frequency loss substantially). For most production this is acceptable, but for simulation-grade work the environmental parameters must be exposed and set.

Fourth, near-field rendering needs near-field HRTFs, which are expensive to measure (a full distance-dependent HRTF set) and are absent from most HRTF libraries, so convincing sub-metre proximity over headphones remains difficult. Many renderers approximate it with a low-frequency shelf and a distance-scaled ILD, which helps but does not fully reproduce the measured near-field transfer functions.

Fifth, over loudspeakers the playback room adds its own distance cues that the renderer cannot remove. A binaural-over-headphones rendering controls the entire acoustic path; a loudspeaker rendering is heard through the listening room's reflections and reverberation, which superimpose their own DRR and may fight the intended distance. This is why critical distance work is often done on headphones or in acoustically controlled rooms, and why the calibration and installation parts of this guide matter for preserving rendered depth.

Finally, the level cue's dependence on familiarity and known playback gain (Section 7) is irreducible: for genuinely unfamiliar sources at an arbitrary mix level, no amount of modelling makes the absolute level meaningful, and the renderer must lean entirely on DRR, spectrum, and reflection cues. Distance, more than any other spatial attribute, is a perceptual inference the renderer can only support, never dictate. Keeping that humility in view — and connecting every parameter back to the cue it serves, as catalogued in Psychoacoustics — is what separates a distance model that convinces from one that merely attenuates.

References

Kuttruff, H. Room Acoustics (6th ed.), CRC Press, 2016. Diffuse field, critical distance, reverberant energy density, and air absorption in enclosures.
Beranek, L. Concert Halls and Opera Houses: Music, Acoustics, and Architecture (2nd ed.), Springer, 2004. Initial time-delay gap (ITDG) and early-reflection structure.
Blauert, J. Spatial Hearing: The Psychophysics of Human Sound Localization (revised ed.), MIT Press, 1997. Distance perception, near-field cues, and the role of loudness and DRR.
ISO 9613-1:1993, Acoustics — Attenuation of sound during propagation outdoors — Part 1: Calculation of the absorption of sound by the atmosphere. Frequency/temperature/humidity dependence of air absorption.
ISO 9613-2:1996, Acoustics — Attenuation of sound during propagation outdoors — Part 2: General method of calculation. Geometric spreading and ground/screen effects.
Bass, H. E., Sutherland, L. C., Zuckerwar, A. J., Blackstock, D. T., and Hester, D. M. "Atmospheric absorption of sound: Further developments," Journal of the Acoustical Society of America, 97(1), 1995. The molecular-relaxation model underlying ISO 9613-1.
Zahorik, P., Brungart, D. S., and Bronkhorst, A. W. "Auditory distance perception in humans: A summary of past and present research," Acta Acustica united with Acustica, 91(3), 2005. Review of distance cues, compressive power law, and DRR.
Bronkhorst, A. W., and Houtgast, T. "Auditory distance perception in rooms," Nature, 397, 1999. Direct-to-reverberant ratio as the dominant indoor distance cue.
Begault, D. R. 3-D Sound for Virtual Reality and Multimedia, Academic Press / NASA, 1994. Practical modelling of distance, reverberation, and air absorption in spatial renderers.

← Back to the Sound Field & the Room

Why distance is a distinct perceptual axis​

Direction and distance use different machinery​

Why level alone fails​

The inverse-distance law​

Pressure, intensity, and the 6 dB rule​

Free-field assumptions and where they break​

Point versus line versus plane sources​

Direct-to-reverberant ratio: the dominant indoor cue​

Why DRR is level-independent​

Critical distance​

Mapping DRR to perceived distance​

Air absorption​

The physics: why air is lossy, and lossy unequally​

Frequency, temperature, humidity dependence​

Why far sources sound dull​

The initial time gap and early-reflection cues​

What the initial time gap is​

How early reflections encode distance​

Near-field effects and the proximity region​

The reactive near field​

Low-frequency emphasis​

Growing ILD at close range​

Familiarity, loudness constancy, and the limits of the level cue​

Loudness constancy​

Familiarity and its failure​

The limits of level, summarised​

How spatializers model distance​

Reference distance, roll-off, and clamps​

Why literal 1/r is impractical​

The distance low-pass for air​

The distance-driven direct/reverb balance​

Object metadata: carrying distance and gain​

Putting it together: a source from 1 m to 64 m​

Common mistakes and pitfalls​

Gain-only "distance"​

Ignoring air absorption​

Inconsistent models across speakers or objects​

Decoupling direction and distance cues​

Double-counting or mismatching reverberation​

Limits​

References​