Why Can't You Make Dolby Atmos Without Upmixing?
The audio industry has developed a peculiar contradiction: Dolby Atmos is celebrated as the future of immersive audio, while upmixing — the very process that makes Atmos rendering possible — is dismissed as illegitimate. This article asks a simple question: can you actually make Dolby Atmos without upmixing?
The answer, examined through five decades of Dolby's own technology and the published physics of spatial audio rendering, is no.
1. Spatialization: A Century of Encoding Space in Audio
Before addressing upmixing, it is essential to understand the broader history of spatial audio — because every advance in spatialization has faced the same fundamental challenge: how to encode a three-dimensional sound field into a finite number of channels, and how to decode it faithfully on the other end.
1.1. From Mono to Stereo: The Birth of Spatial Encoding
The history of spatial audio begins with a single insight: that two channels can encode a continuous sound field.
In 1881, Clement Ader demonstrated the Theatrophone at the Paris Opera — a system of telephone lines transmitting binaural audio from microphones placed on either side of the stage. Listeners holding two receivers to their ears perceived spatial depth and width. It was the first documented spatial audio experience.
Fifty years later, Alan Blumlein formalized the principles. His 1931 patent (UK 394,325, filed 14 December 1931, accepted 14 June 1933) contained 70 claims covering stereo recording, reproduction, and the mathematical basis for phantom source imaging. Blumlein demonstrated that inter-channel amplitude and phase relationships could encode a continuous spatial field — that the listener perceives sounds between loudspeakers, not at them. This was not a subjective impression but a psychoacoustic phenomenon rooted in Interaural Level Differences (ILD) and Interaural Time Differences (ITD). (IEEE Milestone, 2017; EMI Archive Trust)
Stereo became the dominant consumer format by the late 1960s. It remains so today: an estimated 97–99% of the world's recorded music catalog exists in stereo or mono.
1.2. Surround Sound: Expanding the Field
The expansion from stereo to surround followed two parallel paths:
Cinema surround emerged with Dolby Stereo (1976), which encoded four channels (L, C, R, S) into a two-channel optical print via matrix encoding. The consumer followed with Dolby Surround (1982) and Dolby Pro Logic (1987). Each of these was, by definition, an upmixer — we will return to this in detail in Section 3.
Discrete surround arrived with Dolby Digital (AC-3) in 1992, delivering 5.1 channels (L, C, R, LS, RS, LFE) as independent streams. The ITU standardized the 3/2 multichannel layout in ITU-R BS.775 (1994, revised through BS.775-4 in 2022): three front channels at 0° and ±30°, two surround channels at ±110°, and an optional Low-Frequency Effects channel limited to 20–120 Hz at +10 dB offset.
1.3. Three Paradigms of Spatial Audio
Modern spatial audio systems are classified into three paradigms:
Channel-Based Audio (CBA) assigns audio to specific loudspeakers in a predefined layout (2.0, 5.1, 7.1). Its limitation is inflexibility — content authored for one layout does not adapt to another without processing (i.e., upmixing or downmixing).
Scene-Based Audio (SBA / Ambisonics) represents the sound field as a set of spherical harmonic coefficients, independent of any specific loudspeaker layout. Michael Gerzon formalized this approach in the 1970s (JAES, vol. 21, no. 1, 1973). The decoding step — transforming Ambisonic coefficients into loudspeaker feeds — is, functionally, an upmixing operation: derived signals for a specific speaker configuration.
Object-Based Audio (OBA) stores audio elements with positional metadata (x, y, z coordinates, size, velocity). A renderer calculates per-speaker gains in real time. Dolby Atmos, MPEG-H Audio (ISO/IEC 23008-3), and DTS:X are all object-based systems. The rendering step — where metadata becomes loudspeaker feeds — is the focus of Section 4.
A fourth approach, Wave Field Synthesis (WFS), proposed by Berkhout in 1988, uses dense arrays of loudspeakers to physically reconstruct sound wavefronts based on the Huygens-Fresnel principle. Unlike all other methods, WFS creates real — not phantom — source positions.
1.4. The Common Thread
Every paradigm beyond mono shares the same fundamental architecture: encoding spatial information into a finite representation, then decoding it for a specific playback system. The decoding step always involves deriving loudspeaker feeds that did not exist in the authored content. This is the context in which upmixing must be understood.
2. What Is Upmixing? Definitions and Scope
2.1. The General Definition
Upmixing is the process of rendering an audio source of n channels to a playback system of n + m loudspeakers (where m > 0), such that the additional loudspeaker feeds are derived — not authored — by a rendering algorithm. The derivation may rely on:
- Matrix decoding: extracting spatially encoded information from channel relationships (e.g., L+R for center, L−R for surround).
- Signal analysis: examining inter-channel correlation, spectral content, and transient behavior to infer spatial distribution.
- Metadata interpretation: using authored position data to calculate per-speaker amplitude coefficients (e.g., VBAP rendering).
- Spherical harmonic decoding: transforming Ambisonic coefficients into loudspeaker feeds via decoder matrices.
All four methods produce output channels that did not exist in the original authored content. The distinction between them is one of method, not of nature.
2.2. What Upmixing Is Not
Several processes are sometimes confused with upmixing but are technically distinct:
- Downmixing reduces n channels to fewer channels. This is a lossy reduction, not a derivation of new spatial content.
- Remixing creates a new mix from original multitrack stems — an artistic reinterpretation, not an algorithmic derivation.
- Source separation + re-panning uses AI-based stem splitting to extract elements from a mix, then re-positions them. Apple's guidelines explicitly classify this as distinct and prohibit it: "Extracting stems ('de-mixing') from a stereo release is not allowed." (Apple Music Provider Support, v5.3.13)
- Virtualization simulates a multichannel experience over fewer speakers using HRTFs. This is technically a downmix with spatial processing.
2.3. The Academic Literature
The peer-reviewed literature treats upmixing as a legitimate signal processing discipline. The dominant paradigm is primary-ambient decomposition: separating the input into a "primary" (direct, localizable) component and an "ambient" (diffuse, enveloping) component, then distributing each appropriately.
Key contributions include:
- Avendano & Jot (2004): "A Frequency-Domain Approach to Multichannel Upmix," JAES, vol. 52, no. 7/8, pp. 740–749.
- Goodwin & Jot (2007): "Primary-Ambient Signal Decomposition and Vector-Based Localization," ICASSP.
- Goodwin & Jot (2008): "Spatial Audio Scene Coding," AES 125th Convention. A format-agnostic parameterization of audio scenes enabling optimal reproduction over any playback system.
- Faller & Breebaart (2011): "Binaural Reproduction of Stereo Signals Using Upmixing and Diffuse Rendering," AES 131st Convention.
- Kraft & Zölzer (2016): "Time-Domain Implementation of a Stereo to Surround Sound Upmix Algorithm," DAFx-16. Replaces the STFT with an IIR filter bank — achieving comparable quality with a fraction of computational cost.
- Walther & Faller (2011): "Direct-Ambient Decomposition and Upmix of Surround Signals," IEEE WASPAA.
The academic community does not debate whether upmixing is legitimate. It debates which algorithms perform best.
3. Dolby: Five Decades of Upmixing
If upmixing is illegitimate, then Dolby has a problem — because the company defined the category, commercialized it globally, and built its dominant market position on successive generations of upmixing technology.
The Factual Timeline
| Year | Technology | Input | Output | Method |
|---|---|---|---|---|
| 1976 | Dolby Stereo (cinema) | 2-ch optical print | 4 channels (L, C, R, S) | Passive matrix decoding |
| 1982 | Dolby Surround (consumer) | 2-ch stereo | 3 channels (L, R, mono S) | Passive matrix decoding |
| 1987 | Dolby Pro Logic | 2-ch stereo | 4 channels (L, C, R, S) | Active matrix with steering logic |
| 2000 | Dolby Pro Logic II | 2-ch stereo | 5 full-range channels | Active matrix, 6-Axis steering |
| 2005 | Dolby Pro Logic IIx | 2-ch or 5.1 | 6.1 / 7.1 channels | Extended matrix + back surround derivation |
| 2009 | Dolby Pro Logic IIz | 2-ch to 7.1 | 9.1 (with front heights) | Matrix + height channel derivation |
| 2014 | Dolby Surround Upmixer (DSU) | 2-ch to 9.1 | Up to 7.1.4 (27 speakers) | Signal analysis + spatial rendering |
Each generation is, by definition, an upmixer.
Dolby Pro Logic II was not developed by Dolby. It was created by Jim Fosgate at Harman International using 6-Axis active matrix processing. Dolby licensed the technology and rebranded it. Over 300 million devices shipped with it. (Audioholics, 2022)
The Anti-Competitive Episode
In the mid-2010s, Dolby mandated that "native Dolby Atmos content shall NOT be up-mixed [...] by any 3rd party competitor upmixer." Xperi (DTS's parent company) publicly characterized this as "anti-competitive, anti-consumer, and a blatant abuse of Dolby's industry position." The European Commission opened an inquiry. Dolby withdrew the restrictions. (Audioholics, 2023)
The implication is unambiguous: Dolby spent five decades building its business on upmixing, then attempted to use its market position to prevent competitors from doing the same.
4. Dolby Atmos Rendering Is Upmixing — By Definition
This is the central argument of this article.
4.1. The Architecture: Beds, Objects, and the Renderer
Dolby Atmos supports up to 128 audio tracks: a 9.1 bed (10 channels) plus up to 118 dynamic audio objects with positional metadata. The Dolby Atmos Renderer Guide (v3.0, 2018) describes the renderer as working "in tandem with the DAW to render mixes to any playback environment based on audio and positional metadata."
In cinema, the Dolby Atmos Cinema Processor (CP850/CP950A) renders up to 64 speaker feeds from the authored beds and objects. In the home, consumer receivers render to configurations ranging from 5.1.2 to 9.1.6. On headphones, binaural rendering uses HRTFs to simulate 3D.
In every case, the renderer's output is derived — calculated from authored content and metadata, not directly authored.
4.2. The Functional Equivalence
| Authored Content | Playback System | Renderer Action |
|---|---|---|
| 7.1.4 bed | 5.1.2 system | Downmix — fewer speakers than authored channels |
| 7.1.4 bed | 9.1.6 system | Derives additional channel feeds — more speakers than authored |
| Object at position (x, y, z) | Any speaker array | Calculates per-speaker amplitude gains — output is always derived |
In the second and third cases, the renderer generates loudspeaker feeds that were never authored. The Atmos renderer uses a variant of Vector Base Amplitude Panning (VBAP) — originally published by Ville Pulkki (JAES, vol. 45, no. 6, pp. 456–466, 1997; the #1 most-cited JAES paper on Scopus).
Apply the definition from Section 2: the renderer produces loudspeaker feeds that were never authored, using a derivation algorithm applied to fewer input signals. The fact that it uses metadata rather than signal analysis does not change the nature of the operation — it changes only the method.
You cannot make Dolby Atmos without upmixing, because the Atmos renderer is an upmixer.
4.3. The Metadata Argument
One might argue that Atmos is different because it carries explicit spatial metadata. But metadata is simply a different encoding of spatial information. A stereo recording also encodes spatial information — through inter-channel relationships established by panning laws.
Jens Blauert's Spatial Hearing (MIT Press, 1997) demonstrates that human localization relies on Interaural Time Difference (ITD), Interaural Level Difference (ILD), and spectral cues (HRTF). A stereo recording encodes ITD and ILD information through inter-channel relationships. The difference between Atmos metadata (explicit x, y, z) and stereo inter-channel relationships (encoded ILD/ITD) is one of encoding method, not of legitimacy.
4.4. The Binaural Case
Dolby Atmos binaural rendering — delivered to hundreds of millions of headphone users via Apple Music, Tidal, and Amazon Music — convolves object-based audio with HRTFs to produce two output channels. This is the exact inverse of stereo upmixing. Both operations rely on the same psychoacoustic principles. Both produce output that was not directly authored. If one is legitimate, so is the other.
Research by Wenzel, Arruda, Kistler & Wightman (JASA, vol. 94, no. 1, pp. 111–123, 1993) confirms that non-individualized HRTFs introduce systematic localization errors. Atmos binaural rendering is an imperfect derivation process, just like any upmixer.
5. The Legitimate Criticisms — And Where They Apply
Having established that Atmos rendering is itself upmixing, we must be intellectually honest: the backlash is not entirely unfounded.
5.1. Artifact Generation in Legacy Algorithms
Early matrix-based upmixers (including Dolby Pro Logic) had well-documented limitations: mono surround channels bandwidth-limited to 7 kHz, slow steering logic causing audible "pumping," phase artifacts.
FFT-based upmixers introduce latency (21–85 ms at 48 kHz) and can generate pre-echo and musical noise (Berouti, Schwartz & Makhoul, ICASSP, 1979). Beyond artifacts, FFT-based approaches are heavy on CPU usage, making them unsuitable for live applications as well as in entry-level audio products where computational resources are limited.
These criticisms are valid — for algorithms that exhibit these behaviors. They do not constitute an argument against upmixing as a discipline.
5.2. Artistic Intent Violations at Scale
Warner Music Group mass-upmixed catalog stereo recordings and distributed them as "Dolby Atmos" mixes on Apple Music. The community documented approximately 220 tracks from legacy artists that were mechanical upmixes labeled as native Atmos content. Some tracks were subsequently reverted to stereo. (QuadraphonicQuad, 2022)
This directly violated Apple's guidelines: "A Dolby Atmos track must be created from multitracks or stems created from multitracks." Financial incentives compounded the problem: Apple Music offered royalty bonuses for Dolby Atmos content. (Audioholics)
This criticism is valid — when upmixing is applied mechanically and misrepresented as native immersive content. It is a criticism of deceptive practices, not of the upmixing process itself.
5.3. The "Fake Immersive" Argument
Some argue that no algorithm can create "real" immersive audio from stereo. This argument has a kernel of truth: stereo does not contain discrete height information or 3D positional metadata.
However, a stereo recording does contain substantial spatial information: panning positions (ILD), image width (ICC), depth (direct-to-reverberant ratio), diffuse/direct ratios (inter-channel correlation). These are measurable parameters defined in ISO/IEC 23003-1.
Bob Katz's patented K-Stereo and K-Surround processes demonstrate the principle: they "recover lost or amplify hidden ambience, space and imaging [...] without adding artificial reverberation."
This criticism is valid when applied to algorithms that fabricate spatial content. It is not valid when applied to algorithms that analyze and redistribute the spatial information already present in the stereo encoding.
5.4. The Lossy Delivery Argument
Morten Lindberg (2L, 35 Grammy nominations) has stated: "The lossy version of Atmos is to me a bleak shadow of the real, uncompressed source." (Stereophile, December 2023)
This is a legitimate concern — but it applies to the codec and delivery chain, not to the spatialization method. It is orthogonal to the upmixing debate.
6. Stereo in Dolby Atmos: The Unsolved Problem
6.1. Dolby's Recommended Workflows
When stereo content must be integrated into Atmos, the available Dolby-sanctioned workflows each fail:
Option A: Stereo Bed Placement — Route to L/R channels. No spatial analysis occurs. The stereo image is anchored to physical speaker angles.
Option B: Object Placement — Place L/R as objects. This fundamentally misrepresents stereo: phantom sources exist between channels through inter-channel correlation. Treating L and R as independent objects destroys the correlated sound field.
Option C: Source Separation — Use AI to extract stems and re-pan. This is no longer the original mix. Both Apple and Dolby prohibit this.
Option D: DSU — Apply Dolby's own upmixer. This is a general-purpose upmixer, not a precision stereo analyzer.
6.2. Comb Filtering: The Physics
When correlated signal is emitted by multiple loudspeakers at different distances from the listener, frequency-dependent interference produces a comb filter. The first cancellation frequency is:
f_null = c / (2 × Δd)
where c is the speed of sound (~343 m/s) and Δd is the path length difference.
Earl Vickers (AES Paper 7916, 2009) demonstrated that phantom center images exhibit comb-filter cancellations with a measured dip of approximately −5 dB at 1.8 kHz. Zotter & Frank (Ambisonics, Springer, 2019) note that in multi-loudspeaker playback of correlated signals, "the outmost loudspeakers are strongly reduced in level (typically around −12 dB) in order to avoid annoying phasing effects."
It is worth noting that the live sound industry has recognized this problem independently. L-Acoustics introduced the Stereo Mapper feature in L-ISA 3.0, which "maps existing stereo content to an immersive speaker configuration without changing the original artist's mix," conserving "a similar power distribution as traditional left/right array configurations to retain the original stereo image and overall mix." (L-Acoustics, 2025). This is a practical acknowledgment that stereo cannot simply be fed to a spatialization engine as two mono objects — it is, in essence, an upmixing solution within a spatialization framework.
None of Dolby's workflows decode the spatial information that stereo encoding contains. They either bypass it or damage it.
7. The Answer: A Dedicated Stereo Upmixer
If Atmos cannot avoid upmixing, and if its native tools handle stereo poorly, the solution is not to reject upmixing — it is to upmix better.
A properly designed stereo upmixer operates on a fundamentally different premise: stereo is a spatial encoding that must be decoded before it can be rendered to multiple loudspeakers.
The processing chain:
-
Inter-channel correlation analysis: Continuously examines L/R relationships to identify spatial distribution. High-coherence components (panned sources) are distinguished from low-coherence components (diffuse ambience).
-
Sound field reconstruction: Reconstructs the continuous energy distribution that the stereo encoding represents. Each phantom source position is identified by its ILD and correlation signature.
-
Multi-speaker distribution: Renders across available speakers while maintaining energy balance, spatial coherence, and timbral neutrality (no comb filtering). An additional algorithm, like ICS (Interference Cancellation System), can remove residual comb-filtering that may still occur. Compared to feeding a stereo signal directly to multiple speakers, using a stereo-aware upmixer significantly reduces the comb-filtering effect.
The Combined Approach: Upmixing + Spatialization
The most powerful configuration uses a dedicated stereo upmixer as a preprocessing stage before a spatialization engine:
Stereo (2 ch) → Upmixer → N spatial components → Spat / L-ISA / Soundscape → Speakers
In this workflow:
- The upmixer decodes the stereo field into N spatially coherent components.
- Each component is fed to the spatialization engine as an independent object — but unlike raw L/R, each object carries spatially meaningful content with coherent positioning.
- The spatialization engine applies its rendering algorithm (VBAP, Ambisonics, WFS) to objects that are already spatially decomposed, not arbitrarily split stereo channels.
The result: the fidelity of stereo-aware upmixing combined with the flexibility of object-based spatialization. The sound designer retains full control over spatial positioning while the stereo field's encoded spatial information is preserved rather than destroyed. In a live touring context, using multiple buses the front-of-house engineer can change the space reproduction for any stem included in the master signal — maintaining standard stereo workflow compatibility while delivering immersive output.
This combined approach was presented and demonstrated at the IRCAM Forum Workshops (March 2026), comparing four configurations on a 5.0 speaker system: stereo only, stereo through Spat (2 objects), HSR alone, and HSR + Spat — demonstrating that upmixing and spatialization are not alternatives but complementary stages in the spatial audio chain.
Technical Comparison
| Criterion | Atmos Bed Placement | Atmos Object Placement | DSU (Dolby Upmixer) | Dedicated Stereo Upmixer |
|---|---|---|---|---|
| Stereo signal analysis | None | None | General-purpose | Continuous ICC/ILD/IPD analysis |
| Sound field reconstruction | No | No | Partial | Yes — full spatial distribution |
| Comb filter risk | Low (2 speakers) | High | Medium | Low (decorrelated distribution) |
| Phantom source preservation | Angle-dependent | Destroyed | Approximate | Mapped to speaker array |
| Latency | Codec-dependent | Codec + renderer | Algorithm-dependent | 5 samples (time-domain) |
8. Conclusion: You Can't — So Do It Right
Why can't you make Dolby Atmos without upmixing?
Because the Atmos renderer derives loudspeaker feeds that were never authored. Because every object rendered through VBAP produces output signals that exist nowhere in the authored content. Because Dolby built five decades of technology on exactly this principle. Because Ambisonics decoding, WFS rendering, and binaural synthesis all perform the same fundamental operation.
Upmixing is not a flaw in the Atmos ecosystem; it is the mechanism that makes all spatial audio reproduction work.
The industry's rejection of upmixing is therefore not a technical position — it is a branding position. The legitimate criticisms — artifact generation, the Warner Music Group scandal, careless mass-processing — are criticisms of bad upmixing, not of upmixing as a discipline. The productive question was never whether to upmix. It is: given that upmixing is unavoidable, who does it best?
The answer is to decode the spatial information that stereo inherently contains and render it across the available loudspeaker array with precision, coherence, and respect for the original mix. Not as a compromise, not as a workaround — but as the technically correct solution to a problem that Dolby Atmos, by its own architecture, cannot avoid.
References
Patents and Standards
- Blumlein, A.D. — UK Patent 394,325 (1931/1933). 70 claims covering stereo theory, matrix processing, and disc cutting.
- EP0630168A1 — "Improved Dolby Prologic decoder."
- US Patent 7,003,119 — "Matrix surround decoder/virtualizer."
- ISO/IEC 23003-1 — MPEG Surround (ICC, IID, IPD).
- ISO/IEC 23008-3 — MPEG-H 3D Audio.
- ITU-R BS.775-4 (2022) — Multichannel stereophonic sound system.
- ITU-R BS.1534-3 — MUSHRA.
- ITU-R BS.2051-3 (2022) — Advanced sound system for programme production.
Peer-Reviewed Publications
- Avendano, C. & Jot, J.-M. — JAES, vol. 52, 2004.
- Berkhout, A.J. — JAES, 1988.
- Berouti, M., Schwartz, R. & Makhoul, J. — ICASSP, 1979.
- Blauert, J. — Spatial Hearing, MIT Press, 1997.
- Faller, C. & Breebaart, J. — AES 131st Convention, 2011.
- Gerzon, M.A. — JAES, vol. 21, no. 1, 1973.
- Goodwin, M. & Jot, J.-M. — ICASSP, 2007.
- Goodwin, M. & Jot, J.-M. — AES 125th Convention, 2008.
- Kraft, S. & Zölzer, U. — DAFx-16, 2016.
- Pulkki, V. — JAES, vol. 45, no. 6, 1997.
- Rumsey, F. — Spatial Audio, Focal Press, 2001.
- Vickers, E. — AES Paper 7916, 2009.
- Walther, A. & Faller, C. — IEEE WASPAA, 2011.
- Wenzel, E.M. et al. — JASA, vol. 94, no. 1, 1993.
- Zotter, F. & Frank, M. — Ambisonics, Springer, 2019.
Dolby Documentation
- Dolby Atmos Renderer Guide, v3.0 (PDF)
- Dolby Atmos for Home Theater (White Paper)
- CP850 Product Sheet (PDF)
- CP950A Digital Brochure (PDF)
- Source Separation and Upmixing Guidelines
- What are Beds and Objects?
Industry Sources
- Audioholics — History of Surround Sound Processing: Pro Logic II
- Audioholics — Dolby Withdraws Upmixing Restrictions
- High-Def Digest — DSU vs. Neural:X vs. Auro-3D
- Stereophile — Dolby Atmos: A Bleak Shadow?, December 2023.
- QuadraphonicQuad — Dolby Atmos Upmixing on Streaming Services
- Apple Music Provider Support — New Video and Audio Asset Guide, v5.3.13
- IRCAM Forum — On the use of HSR as an upmix solution for stereo reproduction on multi-speaker systems, February 2026.
- L-Acoustics — L-ISA 3.0 Stereo Mapper
- IEEE ETHW — Milestones: Invention of Stereo, 1931
- EMI Archive Trust — Alan Blumlein and Stereo
