Audio processing libraries receive far less security attention than their image and video counterparts, but they parse equally complex binary formats with the same C/C++ code. Every application that plays, records, converts, or analyzes audio depends on these libraries. And they have a track record of exploitable vulnerabilities that most developers are unaware of.
The Audio Library Ecosystem
libvorbis decodes Ogg Vorbis audio. It is used by game engines, media players, and web browsers.
libopus decodes the Opus codec, now the standard for WebRTC voice and music. It is embedded in every browser that supports video calls.
libflac handles FLAC lossless audio. Used by music applications, media servers, and archiving tools.
libmpg123 decodes MP3 audio. Despite MP3's age, it remains the most common audio format and mpg123 is widely deployed.
libsndfile reads and writes WAV, AIFF, FLAC, and dozens of other audio formats. It is a common dependency for audio processing applications.
PortAudio provides cross-platform audio I/O. It is not a codec library but handles audio device interaction, and bugs in device handling can be security-relevant.
FFmpeg's libavcodec includes decoders for hundreds of audio formats alongside its video codecs.
SoX (Sound eXchange) is a command-line audio processing tool with codec implementations for many formats.
Vulnerability Patterns
Audio libraries share vulnerability patterns with image and video libraries, but with format-specific variations:
Sample rate and channel count overflows. Audio files declare sample rates and channel counts in their headers. A file claiming 96 channels at 192kHz requires significant buffer allocation. Integer overflows in these calculations lead to undersized buffers and heap overflows.
Chunk parsing bugs. WAV files use a RIFF chunk structure. AIFF uses a similar chunk format. Malformed chunks with incorrect sizes can cause out-of-bounds reads or writes during parsing.
Huffman table corruption. MP3, Vorbis, and other compressed formats use Huffman coding. Crafted Huffman tables can cause decoder state corruption. CVE-2017-11126 in libmpg123 was a buffer overflow in the layer III Huffman decoding path.
Metadata injection. Audio files contain metadata (ID3 tags in MP3, Vorbis comments in Ogg, AIFF chunks). This metadata can contain embedded images, URLs, and other data that requires its own parsing. Metadata parsing bugs are a common vulnerability source.
Looping and seeking exploits. Some audio formats support loop points and seek tables. Crafted loop or seek data can cause infinite loops (denial of service) or incorrect buffer positioning (memory corruption).
Notable CVEs
CVE-2018-13988 (libopus). Buffer overflow in the Opus decoder that could be triggered by a crafted audio stream. Since Opus is the standard WebRTC audio codec, this affected every browser-based video call application.
CVE-2017-12562 (libsndfile). Heap buffer overflow in the AIFF parser. libsndfile is widely used in audio processing applications, so this affected many downstream projects.
CVE-2019-3832 (libsndfile). Another buffer overflow, this time in the WAV parser. The same library, a different format handler, a different buffer overflow.
CVE-2017-14632 and CVE-2017-14633 (libvorbis). Two separate bugs: a heap buffer overflow in the mapping0 decoder and an out-of-bounds read in the codebook processing.
CVE-2021-28024 (SoX). Heap buffer overflow in the WAV processing code. SoX is often used in server-side audio processing pipelines.
Supply Chain Considerations
Audio libraries propagate through dependency trees similarly to image libraries:
Python's pydub wraps FFmpeg. soundfile wraps libsndfile. audioread tries multiple backends including FFmpeg and GStreamer.
Node.js audio packages often shell out to FFmpeg or SoX. Ruby and Go have similar wrapper packages.
Game engines (Unity, Unreal, Godot) bundle their own audio codec implementations or link against system libraries.
Voice and conferencing applications use libopus and other codecs in real-time processing paths where a vulnerability could be exploited through a live audio stream -- no file download required.
Music streaming services transcode audio through server-side pipelines. A crafted audio file uploaded by a user exercises the transcoding codecs.
IoT devices often use stripped-down audio libraries for voice assistants and notification sounds. These embedded implementations may lack security patches.
Mitigation
Treat audio input as untrusted. Apply the same security posture to audio files as you would to executable code. Validate format headers before full decoding.
Sandbox audio processing. Like image and video processing, audio decoding should run in sandboxed environments with minimal privileges.
Limit supported formats. If your application only needs MP3 and WAV, do not include a library that supports fifty formats.
Fuzz your audio pipeline. If you process user-uploaded audio, run continuous fuzzing against your audio processing pipeline. Tools like AFL++ and libFuzzer are effective at finding audio parser bugs.
Monitor for patches. Audio library CVEs may not make headlines, but they are regularly published. Include audio libraries in your dependency monitoring.
How Safeguard.sh Helps
Safeguard.sh identifies audio processing libraries in your dependency tree -- libopus, libvorbis, libflac, libsndfile, and others buried inside language-specific wrappers. When audio library CVEs are published, Safeguard.sh maps them against your inventory, identifying affected projects and versions. For applications that process user-supplied audio, this automated tracking ensures that audio library vulnerabilities do not slip through the cracks of your vulnerability management program.