Object-Based Audio
An approach where each source is an "object" with position metadata, rather than a pre-mixed signal for a specific configuration.
Concept
Audio Object
An audio object consists of:
- Audio signal (mono or stereo)
- Position metadata (x, y, z)
- Other attributes (size, behavior, snap)
Renderer
The renderer converts objects to signals for a given speaker configuration:
Objects + Metadata ──► Renderer ──► Target Configuration
(5.1, 7.1.4, etc.)
Advantages
Adaptability
The same content can be reproduced on different systems:
- Cinema (Atmos dome)
- Home theater (soundbar)
- Headphones (binaural)
Future-Proofing
Content isn't tied to specific technology. Renderers improve without touching content.
Interactivity
Options can be offered to users:
- Choose dialogue language
- Adjust relative levels
- Personalize experience
Object-Based Formats
| Format | Max Objects | Usage |
|---|---|---|
| Dolby Atmos | 128 (cinema), 16 (home) | Cinema, streaming, music |
| DTS:X | 32 | Cinema, home |
| MPEG-H | Variable | Broadcast |
| Sony 360 RA | 24 | Music streaming |
Bed vs Objects
Bed
Traditional channel-based content (e.g., 7.1) integrated into the mix:
- Ambiences, music
- Predictable behavior
- Less resource-intensive
Objects
Individual sources with metadata:
- Dialog, spot effects
- Precise positioning
- Adapts to any configuration
Typical Mix
7.1.2 Bed: Music, ambiences
+ Objects: Dialog, effects, moving elements
= Complete Atmos Mix
Production Workflow
- Traditional mix for the bed
- Create objects for specific elements
- Automate positions
- Render and verify on different configs
- QC downmix
Link to DAM Audio
Spacelite adopts an interesting hybrid approach:
- Input: Stereo signal (simple channel-based)
- Processing: HSR distributes signal like an "object" with configurable position
- Output: Flexible configuration (1-16 channels per bus)
This enables transforming existing stereo content into quasi-object-based content, where each bus can be configured for a different destination.
Object Metadata
Position
Typically normalized coordinates:
- X: -1 (left) to +1 (right)
- Y: -1 (back) to +1 (front)
- Z: -1 (below) to +1 (above)
Size
Object spread/size:
- Point source (0)
- Extended source (>0)
Snap
Behavior at speaker boundaries:
- Snap to nearest speaker
- Smooth panning between speakers
Rendering Approaches
Point-Source Rendering
Objects rendered as points using VBAP/DBAP:
- Precise localization
- Efficient
Extent Rendering
Objects with size use multiple virtual sources:
- More natural for large sources
- Higher CPU usage
Distance Rendering
Distance affects:
- Level (inverse square law)
- Reverb amount
- Spectral content
Comparison: Channel vs Object
| Aspect | Channel-Based | Object-Based |
|---|---|---|
| Flexibility | Fixed to format | Any format |
| Production | Simpler | More complex |
| File size | Fixed | Variable |
| Metadata | None | Position, size, etc. |
| Personalization | None | Possible |
| Legacy content | Native | Requires conversion |
Object-Based Tools
Production
- Dolby Atmos Production Suite
- DTS:X Creator Suite
- Nuendo with Atmos integration
- Logic Pro with Atmos
Monitoring
- Dolby Atmos Renderer
- DTS:X Encoder Suite
- Apple Spatial Audio