Finding Bugs Got Cheap. Fixing Them Didn't.

An AI agent just walked through FFmpeg's 1.5 million lines of C code, found 21 zero-day vulnerabilities, and handed the security community a problem it cannot solve. The scan cost roughly $1,000. The bugs include one that lets an attacker execute arbitrary code on a remote server by sending a single 183-byte RTP packet. Some of these flaws have been hiding since 2003.

This is not a theoretical exercise. FFmpeg sits inside almost everything that plays video: browsers, streaming servers, container images, mobile apps, cloud transcoding pipelines. If it touches video, FFmpeg is probably in the stack. And a startup called depthjust built an AI agent that found 21 ways to break it.

What the Agent Actually Did

Depthfirst's autonomous security agent does not work like traditional fuzzing. It performs threat modeling first, mapping the architecture of the codebase and identifying which parsers and handlers are exposed to untrusted input. Then it traces data flows through the code, following inputs from network-facing entry points all the way to dangerous sinks like memory allocation functions. Finally, and this is the part that matters, it generates concrete proof-of-concept inputs that trigger each vulnerability and confirms the bug by execution.

The agent scanned roughly 1.5 million lines of heavily optimized C code across hundreds of media format parsers. It found 21 confirmed zero-day vulnerabilities. Nine have been assigned CVEs (CVE-2026-39210 through CVE-2026-39218). The rest are fixed upstream but unnumbered. Each finding comes with a reproducible PoC input, not a theoretical report.

The 23-Year-Old Bug

The oldest vulnerability the agent found dates back to 2003. CVE-2026-39214 is a stack overflow in FFmpeg's service-description-table parser. It has been sitting in the codebase for 23 years, through hundreds of releases, and no human reviewer caught it.

Other bugs are more recent but equally concerning. CVE-2026-39210 is a heap overflow in the TS demuxer introduced around 2010. CVE-2026-39217 is a heap overflow in the VP9 decoder from March 2025. CVE-2026-39218 affects the DASH demuxer and has been present since 2017. Some bugs predate modern streaming. Others were introduced last year.

The 183-Byte RCE

The most severe finding is tracked as DFVULN-127. It is a heap buffer overflow in FFmpeg's AV1 RTP depacketizer, located in libavformat/rtpdec_av1.c. The vulnerability allows remote code execution through a single 183-byte RTP packet delivered over RTSP, without authentication or user interaction.

The root cause is a subtle logic error in how FFmpeg handles Temporal Delimiter OBUs. The code advances an output cursor (pktpos) when it encounters a TD, but never allocates memory for the space it claims, and never advances the input pointer (buf_ptr). This means the next loop iteration re-parses the same bytes, allowing an attacker to control data written at a poisoned offset. The overflow targets the AVBuffer.free function pointer at offset 152. When the packet is freed, the corrupted pointer executes attacker-controlled code.

Depthfirst released the PoC code on GitHub so defenders can test their own FFmpeg builds.

The Chrome 429 Problem

The FFmpeg discovery landed the same week Google shipped Chrome 149, which patched a record 429 security bugs. More than 100 of those were rated critical or high severity. The most severe, CVE-2026-10881, is an out-of-bounds read/write in the ANGLE graphics engine with a CVSS score of 9.6. Google paid $97,000 for that report.

Google has not attributed the 429 bugs to AI. But the timing is notable. In April, Google overhauled its Chrome and Android bug bounty programs, specifically to handle a flood of AI-generated submissions. The new rules require concise reproducers instead of the long-form writeups that AI agents tend to produce. The company is essentially saying: we believe you found real bugs, but we cannot process them at the volume AI is generating them.

The Economics of Vulnerability Discovery

The cost comparison tells the story. Depthfirst's agent found 21 zero-days in FFmpeg for roughly $1,000. Anthropic's Mythos model separately found a 16-year-old H.264 flaw in the same codebase for around $10,000. Google's Big Sleep agent previously found an exploitable SQLite issue. The trend is clear: autonomous agents are getting better at finding bugs, and the cost is dropping fast.

This creates what security researchers call the "triage gap." Discovery is now cheap and automated. But the work of confirming impact, writing fixes, landing patches, testing deployments, and getting updates distributed to millions of systems is still manual and slow. The FFmpeg maintainers have been responsive, but FFmpeg is not the only project affected. The library sits inside vendor appliances, Docker layers, mobile app build chains, and old desktop applications that bundled it years ago and never updated.

Why FFmpeg Is the Perfect Target

FFmpeg parses hundreds of complex media formats, many of which are inherently hostile. A video file is not inert data. It is a program that tells the decoder how to allocate memory, which structures to initialize, and how to interpret bytes. Parsing untrusted video is one of the hardest problems in systems security, and FFmpeg does it at massive scale.

Traditional static analysis tools struggle with this kind of code. The bugs are not syntax errors or obvious buffer overflows. They are subtle logic errors in state machines, edge cases in format specifications, and interaction bugs between components. The depthfirst agent succeeded where previous scanners failed because it combines threat modeling with concrete validation. It does not just flag suspicious code. It proves the bug is reachable and exploitable.

What This Means for Open Source

FFmpeg is one of the most widely deployed open source projects in the world. It has dedicated maintainers, significant corporate backing, and a history of responding to security reports. If 21 zero-days can hide in FFmpeg for years, the implications for less-maintained projects are grim.

The Linux kernel agent mentioned earlier reproduced working exploits for over 50% of 100 tested N-day bugs, outperforming traditional fuzzing. The pattern is consistent: AI agents are finding vulnerabilities that human review and existing tooling missed. The question is no longer whether they can find bugs. The question is whether the ecosystem can absorb the findings.

What Surprised Me

The thing that caught me off guard is the economics. A $1,000 scan that finds 21 zero-days, including a 23-year-old stack overflow and a network-reachable RCE, changes the calculus for anyone running FFmpeg in production. You are not defending against theoretical attackers. You are defending against commodity AI agents that cost less than a developer's hourly rate.

The Chrome 429 story matters too, but for a different reason. Google is not panicking about the bugs themselves. They are panicking about the volume. When you overhaul your bounty program to handle AI-generated submissions, you are admitting that the old model of human-speed vulnerability reporting is over. The bottleneck has shifted from "who can find the bug" to "who can process the report."

I keep thinking about the 23-year-old bug. It sat there through the entire history of modern streaming, through the rise and fall of Flash, through the birth of WebRTC. No human found it. An AI agent found it for $1,000. That should make everyone running FFmpeg in production nervous. Not because the bug is catastrophic on its own, but because it means there are probably more where that came from.

What the Agent Actually Did

The 23-Year-Old Bug

The 183-Byte RCE

The Chrome 429 Problem

The Economics of Vulnerability Discovery

Why FFmpeg Is the Perfect Target

What This Means for Open Source

What Surprised Me

Sources

RELATED_ENTRIES

The Government Just Killed Anthropic's Best Models Over a Single Jailbreak

Your coding agent wastes tokens thinking. This one doesn't.

Your AI Agent Will Burn Your AWS Budget. Here's Proof.