Today, a short tale of fun mystery-solving. I’ve been working on a project that involves a server dynamically generating audio files and streaming them to a client via a WebRTC session.
The dynamic audio generation process works like this: first run a program that
wav file. Then compress the
.wav to an
.ogg which contains an
Opus audio stream and deliver it over the network.
The WebRTC portion is handled by the awesome Pion
library, a pure Golang implementation which
makes customizing WebRTC (for example by streaming a dynamically generated audio
file) super easy.
So here was the bug: certain audio files were delivered successfully over the
wire and played seamlessly in the browser. Other files weren’t. I knew there
wasn’t anything fundamentally wrong with the files that wouldn’t transmit
because I could listen to them with any old audio player (Google Chrome, for
example). All of the files were
ogg + Opus sampled at 48 kHz.
Since I could open the files locally, I figured there was probably an issue with
the network. Chrome has a handy tool at
chrome://webrtc-internals for inspecting WebRTC
sessions. Sure enough, this tool revealed that the client never actually
received any bytes of the problematic
ogg files. But why?
Ogg is a container format which can hold multiple logical data streams, each
with their own respective encoding. In this case, the
ogg files held a single
logical Opus stream. Mozilla maintains a useful tool
opusinfo in the opus-tools package
that inspects Opus streams. Here’s what the
opusinfo output looks like for one
of the files which transmitted successfully:
$ opusinfo good_sound.ogg Processing file "good_sound.ogg"... New logical stream (#1, serial: 1fe69032): type opus Encoded with Lavf58.45.100 User comments section follows... encoder=Lavc58.91.100 libopus Opus stream 1: Pre-skip: 312 Playback gain: 0 dB Channels: 2 Original sample rate: 48000Hz Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min) Page duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min) Total data length: 20863 bytes (overhead: 12.8%) Playback length: 0m:01.779s Average bitrate: 93.79 kb/s, w/o overhead: 81.78 kb/s Logical stream 1 ended
And here’s the output for one which didn’t transmit:
$ opusinfo bad_sound.ogg Processing file "bad_sound.ogg"... New logical stream (#1, serial: 9f763f54): type opus Encoded with Lavf58.45.100 User comments section follows... encoder=Lavc58.91.100 libopus Opus stream 1: Pre-skip: 312 Playback gain: 0 dB Channels: 2 Original sample rate: 48000Hz Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min) Page duration: 1000.0ms (max), 900.0ms (avg), 800.0ms (min) Total data length: 18487 bytes (overhead: 1.6%) Playback length: 0m:01.779s Average bitrate: 83.11 kb/s, w/o overhead: 81.78 kb/s Logical stream 1 ended
Pretty similar! But one obvious difference. The bad files had a much longer
Page duration than the good ones.
Could this make a difference? It could! When the
ogg files are streamed over the
WebRTC media channel, they are sent via RTP, a protocol over UDP. Each RTP
datagram contains one page of the
ogg file. This meant that the WebRTC server
was attempting to send 10kB datagrams. Too big! (The RTP MTU is 1200
Why did the bad files have such large pages? They were generated by compressing
ffmpeg, invoked like so:
ffmpeg -i file.wav -c:a libopus -ac 2 file.ogg
ffmpeg ogg mux has a
-page_duration setting to specify how to slice up the pages. I hadn’t known
about this setting and wasn’t using it. The default: 1000ms. And so, the
19-character fix for my bug:
# page_duration unit is microseconds ffmpeg -i file.wav -c:a libopus -page_duration 2000 -ac 2 file.ogg
And all my files streamed happily ever after.