Why Can't You Make Dolby Atmos Without Upmixing?

20 février 2026 · 20 minutes de lecture

DAM Audio

The audio industry has developed a peculiar contradiction: Dolby Atmos is celebrated as the future of immersive audio, while upmixing — the very process that makes Atmos rendering possible — is dismissed as illegitimate. This article asks a simple question: can you actually make Dolby Atmos without upmixing?

The answer, examined through five decades of Dolby's own technology and the published physics of spatial audio rendering, is no.

1. Spatialization: A Century of Encoding Space in Audio

Before addressing upmixing, it is essential to understand the broader history of spatial audio — because every advance in spatialization has faced the same fundamental challenge: how to encode a three-dimensional sound field into a finite number of channels, and how to decode it faithfully on the other end.

1.1. From Mono to Stereo: The Birth of Spatial Encoding

The history of spatial audio begins with a single insight: that two channels can encode a continuous sound field.

In 1881, Clement Ader demonstrated the Theatrophone at the Paris Opera — a system of telephone lines transmitting binaural audio from microphones placed on either side of the stage. Listeners holding two receivers to their ears perceived spatial depth and width. It was the first documented spatial audio experience.

Fifty years later, Alan Blumlein formalized the principles. His 1931 patent (UK 394,325, filed 14 December 1931, accepted 14 June 1933) contained 70 claims covering stereo recording, reproduction, and the mathematical basis for phantom source imaging. Blumlein demonstrated that inter-channel amplitude and phase relationships could encode a continuous spatial field — that the listener perceives sounds between loudspeakers, not at them. This was not a subjective impression but a psychoacoustic phenomenon rooted in Interaural Level Differences (ILD) and Interaural Time Differences (ITD). (IEEE Milestone, 2017; EMI Archive Trust)

Stereo became the dominant consumer format by the late 1960s. It remains so today: an estimated 97–99% of the world's recorded music catalog exists in stereo or mono.

1.2. Surround Sound: Expanding the Field

The expansion from stereo to surround followed two parallel paths:

Cinema surround emerged with Dolby Stereo (1976), which encoded four channels (L, C, R, S) into a two-channel optical print via matrix encoding. The consumer followed with Dolby Surround (1982) and Dolby Pro Logic (1987). Each of these was, by definition, an upmixer — we will return to this in detail in Section 3.

Discrete surround arrived with Dolby Digital (AC-3) in 1992, delivering 5.1 channels (L, C, R, LS, RS, LFE) as independent streams. The ITU standardized the 3/2 multichannel layout in ITU-R BS.775 (1994, revised through BS.775-4 in 2022): three front channels at 0° and ±30°, two surround channels at ±110°, and an optional Low-Frequency Effects channel limited to 20–120 Hz at +10 dB offset.

1.3. Three Paradigms of Spatial Audio

Modern spatial audio systems are classified into three paradigms:

Channel-Based Audio (CBA) assigns audio to specific loudspeakers in a predefined layout (2.0, 5.1, 7.1). Its limitation is inflexibility — content authored for one layout does not adapt to another without processing (i.e., upmixing or downmixing).

Scene-Based Audio (SBA / Ambisonics) represents the sound field as a set of spherical harmonic coefficients, independent of any specific loudspeaker layout. Michael Gerzon formalized this approach in the 1970s (JAES, vol. 21, no. 1, 1973). The decoding step — transforming Ambisonic coefficients into loudspeaker feeds — is, functionally, an upmixing operation: derived signals for a specific speaker configuration.

Object-Based Audio (OBA) stores audio elements with positional metadata (x, y, z coordinates, size, velocity). A renderer calculates per-speaker gains in real time. Dolby Atmos, MPEG-H Audio (ISO/IEC 23008-3), and DTS:X are all object-based systems. The rendering step — where metadata becomes loudspeaker feeds — is the focus of Section 4.

A fourth approach, Wave Field Synthesis (WFS), proposed by Berkhout in 1988, uses dense arrays of loudspeakers to physically reconstruct sound wavefronts based on the Huygens-Fresnel principle. Unlike all other methods, WFS creates real — not phantom — source positions.

1.4. The Common Thread

Every paradigm beyond mono shares the same fundamental architecture: encoding spatial information into a finite representation, then decoding it for a specific playback system. The decoding step always involves deriving loudspeaker feeds that did not exist in the authored content. This is the context in which upmixing must be understood.

2. What Is Upmixing? Definitions and Scope

2.1. The General Definition

Upmixing is the process of rendering an audio source of n channels to a playback system of n + m loudspeakers (where m > 0), such that the additional loudspeaker feeds are derived — not authored — by a rendering algorithm. The derivation may rely on:

Matrix decoding: extracting spatially encoded information from channel relationships (e.g., L+R for center, L−R for surround).
Signal analysis: examining inter-channel correlation, spectral content, and transient behavior to infer spatial distribution.
Metadata interpretation: using authored position data to calculate per-speaker amplitude coefficients (e.g., VBAP rendering).
Spherical harmonic decoding: transforming Ambisonic coefficients into loudspeaker feeds via decoder matrices.

All four methods produce output channels that did not exist in the original authored content. The distinction between them is one of method, not of nature.

2.2. What Upmixing Is Not

Several processes are sometimes confused with upmixing but are technically distinct:

Downmixing reduces n channels to fewer channels. This is a lossy reduction, not a derivation of new spatial content.
Remixing creates a new mix from original multitrack stems — an artistic reinterpretation, not an algorithmic derivation.
Source separation + re-panning uses AI-based stem splitting to extract elements from a mix, then re-positions them. Apple's guidelines explicitly classify this as distinct and prohibit it: "Extracting stems ('de-mixing') from a stereo release is not allowed." (Apple Music Provider Support, v5.3.13)
Virtualization simulates a multichannel experience over fewer speakers using HRTFs. This is technically a downmix with spatial processing.

2.3. The Academic Literature

The peer-reviewed literature treats upmixing as a legitimate signal processing discipline. The dominant paradigm is primary-ambient decomposition: separating the input into a "primary" (direct, localizable) component and an "ambient" (diffuse, enveloping) component, then distributing each appropriately.

Key contributions include:

Avendano & Jot (2004): "A Frequency-Domain Approach to Multichannel Upmix," JAES, vol. 52, no. 7/8, pp. 740–749.
Goodwin & Jot (2007): "Primary-Ambient Signal Decomposition and Vector-Based Localization," ICASSP.
Goodwin & Jot (2008): "Spatial Audio Scene Coding," AES 125th Convention. A format-agnostic parameterization of audio scenes enabling optimal reproduction over any playback system.
Faller & Breebaart (2011): "Binaural Reproduction of Stereo Signals Using Upmixing and Diffuse Rendering," AES 131st Convention.
Kraft & Zölzer (2016): "Time-Domain Implementation of a Stereo to Surround Sound Upmix Algorithm," DAFx-16. Replaces the STFT with an IIR filter bank — achieving comparable quality with a fraction of computational cost.
Walther & Faller (2011): "Direct-Ambient Decomposition and Upmix of Surround Signals," IEEE WASPAA.

The academic community does not debate whether upmixing is legitimate. It debates which algorithms perform best.

3. Dolby: Five Decades of Upmixing

If upmixing is illegitimate, then Dolby has a problem — because the company defined the category, commercialized it globally, and built its dominant market position on successive generations of upmixing technology.

The Factual Timeline

Year	Technology	Input	Output	Method
1976	Dolby Stereo (cinema)	2-ch optical print	4 channels (L, C, R, S)	Passive matrix decoding
1982	Dolby Surround (consumer)	2-ch stereo	3 channels (L, R, mono S)	Passive matrix decoding
1987	Dolby Pro Logic	2-ch stereo	4 channels (L, C, R, S)	Active matrix with steering logic
2000	Dolby Pro Logic II	2-ch stereo	5 full-range channels	Active matrix, 6-Axis steering
2005	Dolby Pro Logic IIx	2-ch or 5.1	6.1 / 7.1 channels	Extended matrix + back surround derivation
2009	Dolby Pro Logic IIz	2-ch to 7.1	9.1 (with front heights)	Matrix + height channel derivation
2014	Dolby Surround Upmixer (DSU)	2-ch to 9.1	Up to 7.1.4 (27 speakers)	Signal analysis + spatial rendering

Each generation is, by definition, an upmixer.

Dolby Pro Logic II was not developed by Dolby. It was created by Jim Fosgate at Harman International using 6-Axis active matrix processing. Dolby licensed the technology and rebranded it. Over 300 million devices shipped with it. (Audioholics, 2022)

The Anti-Competitive Episode

In the mid-2010s, Dolby mandated that "native Dolby Atmos content shall NOT be up-mixed [...] by any 3rd party competitor upmixer." Xperi (DTS's parent company) publicly characterized this as "anti-competitive, anti-consumer, and a blatant abuse of Dolby's industry position." The European Commission opened an inquiry. Dolby withdrew the restrictions. (Audioholics, 2023)

The implication is unambiguous: Dolby spent five decades building its business on upmixing, then attempted to use its market position to prevent competitors from doing the same.

4. Dolby Atmos Rendering Is Upmixing — By Definition

This is the central argument of this article.

4.1. The Architecture: Beds, Objects, and the Renderer

Dolby Atmos supports up to 128 audio tracks: a 9.1 bed (10 channels) plus up to 118 dynamic audio objects with positional metadata. The Dolby Atmos Renderer Guide (v3.0, 2018) describes the renderer as working "in tandem with the DAW to render mixes to any playback environment based on audio and positional metadata."

In cinema, the Dolby Atmos Cinema Processor (CP850/CP950A) renders up to 64 speaker feeds from the authored beds and objects. In the home, consumer receivers render to configurations ranging from 5.1.2 to 9.1.6. On headphones, binaural rendering uses HRTFs to simulate 3D.

In every case, the renderer's output is derived — calculated from authored content and metadata, not directly authored.

4.2. The Functional Equivalence

Authored Content	Playback System	Renderer Action
7.1.4 bed	5.1.2 system	Downmix — fewer speakers than authored channels
7.1.4 bed	9.1.6 system	Derives additional channel feeds — more speakers than authored
Object at position (x, y, z)	Any speaker array	Calculates per-speaker amplitude gains — output is always derived

In the second and third cases, the renderer generates loudspeaker feeds that were never authored. The Atmos renderer uses a variant of Vector Base Amplitude Panning (VBAP) — originally published by Ville Pulkki (JAES, vol. 45, no. 6, pp. 456–466, 1997; the #1 most-cited JAES paper on Scopus).

Apply the definition from Section 2: the renderer produces loudspeaker feeds that were never authored, using a derivation algorithm applied to fewer input signals. The fact that it uses metadata rather than signal analysis does not change the nature of the operation — it changes only the method.

You cannot make Dolby Atmos without upmixing, because the Atmos renderer is an upmixer.

4.3. The Metadata Argument

One might argue that Atmos is different because it carries explicit spatial metadata. But metadata is simply a different encoding of spatial information. A stereo recording also encodes spatial information — through inter-channel relationships established by panning laws.

Jens Blauert's Spatial Hearing (MIT Press, 1997) demonstrates that human localization relies on Interaural Time Difference (ITD), Interaural Level Difference (ILD), and spectral cues (HRTF). A stereo recording encodes ITD and ILD information through inter-channel relationships. The difference between Atmos metadata (explicit x, y, z) and stereo inter-channel relationships (encoded ILD/ITD) is one of encoding method, not of legitimacy.

4.4. The Binaural Case

Dolby Atmos binaural rendering — delivered to hundreds of millions of headphone users via Apple Music, Tidal, and Amazon Music — convolves object-based audio with HRTFs to produce two output channels. This is the exact inverse of stereo upmixing. Both operations rely on the same psychoacoustic principles. Both produce output that was not directly authored. If one is legitimate, so is the other.

Research by Wenzel, Arruda, Kistler & Wightman (JASA, vol. 94, no. 1, pp. 111–123, 1993) confirms that non-individualized HRTFs introduce systematic localization errors. Atmos binaural rendering is an imperfect derivation process, just like any upmixer.

5. The Legitimate Criticisms — And Where They Apply

Having established that Atmos rendering is itself upmixing, we must be intellectually honest: the backlash is not entirely unfounded.

5.1. Artifact Generation in Legacy Algorithms

Early matrix-based upmixers (including Dolby Pro Logic) had well-documented limitations: mono surround channels bandwidth-limited to 7 kHz, slow steering logic causing audible "pumping," phase artifacts.

FFT-based upmixers introduce latency (21–85 ms at 48 kHz) and can generate pre-echo and musical noise (Berouti, Schwartz & Makhoul, ICASSP, 1979). Beyond artifacts, FFT-based approaches are heavy on CPU usage, making them unsuitable for live applications as well as in entry-level audio products where computational resources are limited.

These criticisms are valid — for algorithms that exhibit these behaviors. They do not constitute an argument against upmixing as a discipline.

5.2. Artistic Intent Violations at Scale

Warner Music Group mass-upmixed catalog stereo recordings and distributed them as "Dolby Atmos" mixes on Apple Music. The community documented approximately 220 tracks from legacy artists that were mechanical upmixes labeled as native Atmos content. Some tracks were subsequently reverted to stereo. (QuadraphonicQuad, 2022)

This directly violated Apple's guidelines: "A Dolby Atmos track must be created from multitracks or stems created from multitracks." Financial incentives compounded the problem: Apple Music offered royalty bonuses for Dolby Atmos content. (Audioholics)

This criticism is valid — when upmixing is applied mechanically and misrepresented as native immersive content. It is a criticism of deceptive practices, not of the upmixing process itself.

5.3. The "Fake Immersive" Argument

Some argue that no algorithm can create "real" immersive audio from stereo. This argument has a kernel of truth: stereo does not contain discrete height information or 3D positional metadata.

However, a stereo recording does contain substantial spatial information: panning positions (ILD), image width (ICC), depth (direct-to-reverberant ratio), diffuse/direct ratios (inter-channel correlation). These are measurable parameters defined in ISO/IEC 23003-1.

Bob Katz's patented K-Stereo and K-Surround processes demonstrate the principle: they "recover lost or amplify hidden ambience, space and imaging [...] without adding artificial reverberation."

This criticism is valid when applied to algorithms that fabricate spatial content. It is not valid when applied to algorithms that analyze and redistribute the spatial information already present in the stereo encoding.

5.4. The Lossy Delivery Argument

Morten Lindberg (2L, 35 Grammy nominations) has stated: "The lossy version of Atmos is to me a bleak shadow of the real, uncompressed source." (Stereophile, December 2023)

This is a legitimate concern — but it applies to the codec and delivery chain, not to the spatialization method. It is orthogonal to the upmixing debate.

6. Stereo in Dolby Atmos: The Unsolved Problem

6.1. Dolby's Recommended Workflows

When stereo content must be integrated into Atmos, the available Dolby-sanctioned workflows each fail:

Option A: Stereo Bed Placement — Route to L/R channels. No spatial analysis occurs. The stereo image is anchored to physical speaker angles.

Option B: Object Placement — Place L/R as objects. This fundamentally misrepresents stereo: phantom sources exist between channels through inter-channel correlation. Treating L and R as independent objects destroys the correlated sound field.

Option C: Source Separation — Use AI to extract stems and re-pan. This is no longer the original mix. Both Apple and Dolby prohibit this.

Option D: DSU — Apply Dolby's own upmixer. This is a general-purpose upmixer, not a precision stereo analyzer.

6.2. Comb Filtering: The Physics

When correlated signal is emitted by multiple loudspeakers at different distances from the listener, frequency-dependent interference produces a comb filter. The first cancellation frequency is:

f_null = c / (2 × Δd)

where c is the speed of sound (~343 m/s) and Δd is the path length difference.

Earl Vickers (AES Paper 7916, 2009) demonstrated that phantom center images exhibit comb-filter cancellations with a measured dip of approximately −5 dB at 1.8 kHz. Zotter & Frank (Ambisonics, Springer, 2019) note that in multi-loudspeaker playback of correlated signals, "the outmost loudspeakers are strongly reduced in level (typically around −12 dB) in order to avoid annoying phasing effects."

It is worth noting that the live sound industry has recognized this problem independently. L-Acoustics introduced the Stereo Mapper feature in L-ISA 3.0, which "maps existing stereo content to an immersive speaker configuration without changing the original artist's mix," conserving "a similar power distribution as traditional left/right array configurations to retain the original stereo image and overall mix." (L-Acoustics, 2025). This is a practical acknowledgment that stereo cannot simply be fed to a spatialization engine as two mono objects — it is, in essence, an upmixing solution within a spatialization framework.

None of Dolby's workflows decode the spatial information that stereo encoding contains. They either bypass it or damage it.

7. The Answer: A Dedicated Stereo Upmixer

If Atmos cannot avoid upmixing, and if its native tools handle stereo poorly, the solution is not to reject upmixing — it is to upmix better.

A properly designed stereo upmixer operates on a fundamentally different premise: stereo is a spatial encoding that must be decoded before it can be rendered to multiple loudspeakers.

The processing chain:

Inter-channel correlation analysis: Continuously examines L/R relationships to identify spatial distribution. High-coherence components (panned sources) are distinguished from low-coherence components (diffuse ambience).
Sound field reconstruction: Reconstructs the continuous energy distribution that the stereo encoding represents. Each phantom source position is identified by its ILD and correlation signature.
Multi-speaker distribution: Renders across available speakers while maintaining energy balance, spatial coherence, and timbral neutrality (no comb filtering). An additional algorithm, like ICS (Interference Cancellation System), can remove residual comb-filtering that may still occur. Compared to feeding a stereo signal directly to multiple speakers, using a stereo-aware upmixer significantly reduces the comb-filtering effect.

The Combined Approach: Upmixing + Spatialization

The most powerful configuration uses a dedicated stereo upmixer as a preprocessing stage before a spatialization engine:

Stereo (2 ch) → Upmixer → N spatial components → Spat / L-ISA / Soundscape → Speakers

In this workflow:

The upmixer decodes the stereo field into N spatially coherent components.
Each component is fed to the spatialization engine as an independent object — but unlike raw L/R, each object carries spatially meaningful content with coherent positioning.
The spatialization engine applies its rendering algorithm (VBAP, Ambisonics, WFS) to objects that are already spatially decomposed, not arbitrarily split stereo channels.

The result: the fidelity of stereo-aware upmixing combined with the flexibility of object-based spatialization. The sound designer retains full control over spatial positioning while the stereo field's encoded spatial information is preserved rather than destroyed. In a live touring context, using multiple buses the front-of-house engineer can change the space reproduction for any stem included in the master signal — maintaining standard stereo workflow compatibility while delivering immersive output.

This combined approach was presented and demonstrated at the IRCAM Forum Workshops (March 2026), comparing four configurations on a 5.0 speaker system: stereo only, stereo through Spat (2 objects), HSR alone, and HSR + Spat — demonstrating that upmixing and spatialization are not alternatives but complementary stages in the spatial audio chain.

Technical Comparison

Criterion	Atmos Bed Placement	Atmos Object Placement	DSU (Dolby Upmixer)	Dedicated Stereo Upmixer
Stereo signal analysis	None	None	General-purpose	Continuous ICC/ILD/IPD analysis
Sound field reconstruction	No	No	Partial	Yes — full spatial distribution
Comb filter risk	Low (2 speakers)	High	Medium	Low (decorrelated distribution)
Phantom source preservation	Angle-dependent	Destroyed	Approximate	Mapped to speaker array
Latency	Codec-dependent	Codec + renderer	Algorithm-dependent	5 samples (time-domain)

8. Conclusion: You Can't — So Do It Right

Why can't you make Dolby Atmos without upmixing?

Because the Atmos renderer derives loudspeaker feeds that were never authored. Because every object rendered through VBAP produces output signals that exist nowhere in the authored content. Because Dolby built five decades of technology on exactly this principle. Because Ambisonics decoding, WFS rendering, and binaural synthesis all perform the same fundamental operation.

Upmixing is not a flaw in the Atmos ecosystem; it is the mechanism that makes all spatial audio reproduction work.

The industry's rejection of upmixing is therefore not a technical position — it is a branding position. The legitimate criticisms — artifact generation, the Warner Music Group scandal, careless mass-processing — are criticisms of bad upmixing, not of upmixing as a discipline. The productive question was never whether to upmix. It is: given that upmixing is unavoidable, who does it best?

The answer is to decode the spatial information that stereo inherently contains and render it across the available loudspeaker array with precision, coherence, and respect for the original mix. Not as a compromise, not as a workaround — but as the technically correct solution to a problem that Dolby Atmos, by its own architecture, cannot avoid.

References

Patents and Standards

Blumlein, A.D. — UK Patent 394,325 (1931/1933). 70 claims covering stereo theory, matrix processing, and disc cutting.
EP0630168A1 — "Improved Dolby Prologic decoder."
US Patent 7,003,119 — "Matrix surround decoder/virtualizer."
ISO/IEC 23003-1 — MPEG Surround (ICC, IID, IPD).
ISO/IEC 23008-3 — MPEG-H 3D Audio.
ITU-R BS.775-4 (2022) — Multichannel stereophonic sound system.
ITU-R BS.1534-3 — MUSHRA.
ITU-R BS.2051-3 (2022) — Advanced sound system for programme production.

Peer-Reviewed Publications

Avendano, C. & Jot, J.-M. — JAES, vol. 52, 2004.
Berkhout, A.J. — JAES, 1988.
Berouti, M., Schwartz, R. & Makhoul, J. — ICASSP, 1979.
Blauert, J. — Spatial Hearing, MIT Press, 1997.
Faller, C. & Breebaart, J. — AES 131st Convention, 2011.
Gerzon, M.A. — JAES, vol. 21, no. 1, 1973.
Goodwin, M. & Jot, J.-M. — ICASSP, 2007.
Goodwin, M. & Jot, J.-M. — AES 125th Convention, 2008.
Kraft, S. & Zölzer, U. — DAFx-16, 2016.
Pulkki, V. — JAES, vol. 45, no. 6, 1997.
Rumsey, F. — Spatial Audio, Focal Press, 2001.
Vickers, E. — AES Paper 7916, 2009.
Walther, A. & Faller, C. — IEEE WASPAA, 2011.
Wenzel, E.M. et al. — JASA, vol. 94, no. 1, 1993.
Zotter, F. & Frank, M. — Ambisonics, Springer, 2019.

Dolby Documentation

Industry Sources

Audioholics — History of Surround Sound Processing: Pro Logic II
Audioholics — Dolby Withdraws Upmixing Restrictions
High-Def Digest — DSU vs. Neural:X vs. Auro-3D
Stereophile — Dolby Atmos: A Bleak Shadow?, December 2023.
QuadraphonicQuad — Dolby Atmos Upmixing on Streaming Services
Apple Music Provider Support — New Video and Audio Asset Guide, v5.3.13
IRCAM Forum — On the use of HSR as an upmix solution for stereo reproduction on multi-speaker systems, February 2026.
L-Acoustics — L-ISA 3.0 Stereo Mapper
IEEE ETHW — Milestones: Invention of Stereo, 1931
EMI Archive Trust — Alan Blumlein and Stereo

1. Spatialization: A Century of Encoding Space in Audio​

1.1. From Mono to Stereo: The Birth of Spatial Encoding​

1.2. Surround Sound: Expanding the Field​

1.3. Three Paradigms of Spatial Audio​

1.4. The Common Thread​

2. What Is Upmixing? Definitions and Scope​

2.1. The General Definition​

2.2. What Upmixing Is Not​

2.3. The Academic Literature​

3. Dolby: Five Decades of Upmixing​

The Factual Timeline​

The Anti-Competitive Episode​

4. Dolby Atmos Rendering Is Upmixing — By Definition​

4.1. The Architecture: Beds, Objects, and the Renderer​

4.2. The Functional Equivalence​

4.3. The Metadata Argument​

4.4. The Binaural Case​

5. The Legitimate Criticisms — And Where They Apply​

5.1. Artifact Generation in Legacy Algorithms​

5.2. Artistic Intent Violations at Scale​

5.3. The "Fake Immersive" Argument​

5.4. The Lossy Delivery Argument​

6. Stereo in Dolby Atmos: The Unsolved Problem​

6.1. Dolby's Recommended Workflows​

6.2. Comb Filtering: The Physics​

7. The Answer: A Dedicated Stereo Upmixer​

The Combined Approach: Upmixing + Spatialization​

Technical Comparison​

8. Conclusion: You Can't — So Do It Right​

References​

Patents and Standards​

Peer-Reviewed Publications​

Dolby Documentation​

Industry Sources​