Web Codecs API: Low-Level Audio and Video Processing

Introduction

The Web Codecs API provides low-level access to browser media encoders and decoders, enabling frame-by-frame processing of audio and video entirely within the browser. Unlike the <video> element which abstracts away all encoding and decoding internals, Web Codecs hands you raw pixel data and audio samples to manipulate, transform, or transmit however you see fit. This is the API that powers browser-based video editors, real-time streaming platforms, computer vision pipelines, and custom media processing workflows that previously required native code or server-side infrastructure.

Video processing workstation with multiple monitors showing editing timelines

Before Web Codecs, developers who needed granular control over media processing had to rely on workarounds like decoding video by drawing frames to a canvas and reading pixels back, or sending video to a server for processing. These approaches were slow, inefficient, and limited. The Web Codecs API changes the equation entirely by exposing the same hardware-accelerated encoders and decoders that the browser uses internally for <video> and <audio> playback, giving web applications native-level media processing performance.

This guide covers every aspect of the Web Codecs API — from the core architecture and individual class interfaces to real-world implementation patterns for video transcoding, screen recording, real-time effects processing, and WebRTC integration. You will also learn about codec selection strategies, memory management best practices, container format handling, and browser compatibility considerations.

Architecture Overview

The Web Codecs API is built around a clean pipeline architecture that separates concerns between raw media data, encoded (compressed) media data, and the encoders/decoders that transform between them.

The Processing Pipeline

Every Web Codecs application follows the same fundamental pattern: source → decode → process → encode → output. The source can be a camera feed via getUserMedia, a video file read through a demuxing library, or frames generated programmatically. The output can be a rendered canvas, a recorded file, a live stream, or anything else that consumes encoded or raw media data.

┌────────────┐     ┌──────────────┐     ┌────────────┐     ┌──────────────┐
│ Media      │────>│ Video/Audio  │────>│ Process    │────>│ Video/Audio  │──> Output
│ Source     │     │ Decoder      │     │ (optional) │     │ Encoder      │
└────────────┘     └──────────────┘     └────────────┘     └────────────┘

Core Class Hierarchy

The API introduces eight primary classes organized into two parallel hierarchies — one for video and one for audio:

Class	Purpose	Data Type
`VideoFrame`	Raw video frame with pixel data and metadata	Pixels in GPU memory
`EncodedVideoChunk`	Compressed video frame data	Binary buffer
`VideoDecoder`	Transforms `EncodedVideoChunk` → `VideoFrame`	—
`VideoEncoder`	Transforms `VideoFrame` → `EncodedVideoChunk`	—
`AudioData`	Raw audio samples (PCM)	Float32/Int16 arrays
`EncodedAudioChunk`	Compressed audio data	Binary buffer
`AudioDecoder`	Transforms `EncodedAudioChunk` → `AudioData`	—
`AudioEncoder`	Transforms `AudioData` → `EncodedAudioChunk`	—

There is typically a 1:1 correspondence between raw and encoded representations. Decoding N encoded chunks yields exactly N raw frames or audio data objects.

Asynchronous Processing Model

Each encoder and decoder maintains an internal processing queue. Methods like configure(), encode(), decode(), and flush() are asynchronous — they append control messages to the queue and return immediately. The actual work happens in the background, potentially on a dedicated hardware thread. Methods named reset() and close() are synchronous: reset() aborts pending work and allows reconfiguration, while close() permanently shuts down the instance and releases all resources.

// The processing model in action
encoder.configure({ codec: 'avc1.64001E', width: 1920, height: 1080, bitrate: 5_000_000 });
encoder.encode(frame1);  // Queued — returns immediately
encoder.encode(frame2);  // Queued — returns immediately
await encoder.flush();   // Waits for both frames to be encoded

Understanding this queue-based model is critical for managing backpressure. If you queue frames faster than the encoder can process them, the internal queue grows unbounded. Production applications must implement flow control, typically by checking encoder.encodeQueueSize before encoding more frames.

Abstract representation of data flow and processing pipelines

Supported Codecs

The Web Codecs API supports a carefully curated set of industry-standard codecs. However, actual availability depends on the browser and underlying hardware. Always verify codec support at runtime using the isConfigSupported() static methods.

Video Codecs

H.264 (AVC) is the most universally supported video codec. Nearly every device with hardware video acceleration can encode and decode H.264. Codec strings follow the pattern avc1.{profile}{level}, such as avc1.64001E for High Profile Level 3.0 or avc1.4d001f for Main Profile Level 3.1.

VP9 is an open-source codec developed by Google that offers better compression than H.264 at equivalent quality. It is widely used on YouTube and in WebM containers. Codec strings use the pattern vp09.{profile}.{level}.{bitDepth}.{chromaSubsampling}, such as vp09.00.40.08 for Profile 0, Level 4.0, 8-bit.

AV1 is the newest open-source codec, offering 30-50% better compression than H.264 and 20-30% better than VP9. Hardware decoder support is broad, but hardware encoder support is still limited to newer GPUs. Codec strings follow av01.{profile}.{level}.{tier}.{bitDepth}, such as av01.0.08M.08 for Main Profile, Main Tier, 8-bit.

H.265 (HEVC) offers better compression than H.264 but has limited browser support outside Apple's Safari and WebKit-based browsers due to patent licensing concerns. Codec strings use hev1.{profile}.{level} or hvc1.{profile}.{level}.

Audio Codecs

Opus is the recommended codec for most Web Codecs audio use cases. It provides excellent quality at low bitrates with very low latency, making it ideal for real-time communication and streaming.

AAC (Advanced Audio Coding) is widely supported and commonly found in MP4 containers. The codec string mp4a.40.2 refers to AAC-LC (Low Complexity).

PCM (Pulse Code Modulation) represents uncompressed audio with no quality loss but very large file sizes. Useful as an intermediate format during processing.

FLAC (Free Lossless Audio Codec) provides lossless compression. Useful for archival quality audio processing.

Checking Codec Support at Runtime

Never assume a codec is available. Always verify before creating encoders or decoders:

async function checkVideoCodecSupport(): Promise<Map<string, boolean>> {
  const codecs = [
    'avc1.64001E',           // H.264 High Profile
    'vp09.00.10.08',         // VP9 Profile 0
    'av01.0.04M.08',         // AV1 Main Profile
  ];
 
  const support = new Map<string, boolean>();
 
  for (const codec of codecs) {
    try {
      const result = await VideoDecoder.isConfigSupported({
        codec,
        codedWidth: 1920,
        codedHeight: 1080,
      });
      support.set(codec, result.supported);
    } catch {
      support.set(codec, false);
    }
  }
 
  return support;
}
 
async function checkAudioCodecSupport(): Promise<Map<string, boolean>> {
  const codecs = ['opus', 'mp4a.40.2', 'flac'];
  const support = new Map<string, boolean>();
 
  for (const codec of codecs) {
    try {
      const result = await AudioEncoder.isConfigSupported({
        codec,
        sampleRate: 48000,
        numberOfChannels: 1,
        bitrate: 128_000,
      });
      support.set(codec, result.supported);
    } catch {
      support.set(codec, false);
    }
  }
 
  return support;
}

Video Decoding in Depth

The VideoDecoder transforms compressed video chunks into raw pixel data that you can render, analyze, or process. Understanding the decoder's behavior is essential for building reliable media applications.

Configuring the Decoder

The decoder must be configured before it can accept input. The configuration object requires at minimum a codec string and the coded dimensions. Optional parameters include description data (required for some codecs like VP9 and AV1, typically extracted from the container's codec-specific header), display dimensions, and color space information.

const decoder = new VideoDecoder({
  output: (frame: VideoFrame) => {
    // Each decoded frame arrives here asynchronously
    console.log(`Decoded frame at ${frame.timestamp}μs`);
    console.log(`  Dimensions: ${frame.displayWidth}×${frame.displayHeight}`);
    console.log(`  Format: ${frame.format}`);          // e.g., "I420", "NV12", "RGBA"
    console.log(`  Duration: ${frame.duration}μs`);
 
    // Render to canvas
    const canvas = document.querySelector('canvas')!;
    const ctx = canvas.getContext('2d')!;
    canvas.width = frame.displayWidth;
    canvas.height = frame.displayHeight;
    ctx.drawImage(frame, 0, 0);
 
    // CRITICAL: Release GPU memory held by this frame
    frame.close();
  },
  error: (e: DOMException) => {
    console.error('Decoder error:', e.message);
    // Common errors: NotSupportedError, DataError, InvalidStateError
  },
});
 
decoder.configure({
  codec: 'avc1.64001E',
  codedWidth: 1920,
  codedHeight: 1080,
  // Optional: hardwareAcceleration: 'prefer-hardware',
});

Feeding Encoded Data to the Decoder

Once configured, you create EncodedVideoChunk objects and pass them to the decoder's decode() method. Each chunk must specify whether it is a key frame (type: 'key') or a delta frame (type: 'delta'), along with a timestamp in microseconds and the raw encoded data.

// When reading from a demuxed file:
for (const packet of demuxedPackets) {
  const chunk = new EncodedVideoChunk({
    type: packet.isKeyFrame ? 'key' : 'delta',
    timestamp: packet.timestamp,       // In microseconds
    duration: packet.duration,
    data: packet.data,
  });
  decoder.decode(chunk);
}
 
// Important: flush to ensure all frames are output
await decoder.flush();
decoder.close();

The decoder requires a key frame as the first chunk after configuration or after a flush. Delta frames reference the previous frame's data and cannot be decoded independently. If the decoder receives a delta frame before a key frame, it will throw a DataError.

Close-up of video editing timeline with multiple tracks

Video Encoding in Depth

The VideoEncoder transforms raw video frames into compressed chunks suitable for storage or transmission. Encoder configuration has a significant impact on output quality, file size, and encoding speed.

Encoder Configuration Strategies

// Configuration for high-quality recording (offline encoding)
const recordingConfig: VideoEncoderConfig = {
  codec: 'avc1.64001E',
  width: 1920,
  height: 1080,
  bitrate: 8_000_000,        // 8 Mbps for high quality
  framerate: 30,
  latencyMode: 'quality',    // Optimize for quality over latency
  // avc: { format: 'avc' }, // For raw AVC access
};
 
// Configuration for real-time streaming (low latency)
const streamingConfig: VideoEncoderConfig = {
  codec: 'avc1.42001f',       // Constrained Baseline for max compatibility
  width: 1280,
  height: 720,
  bitrate: 2_500_000,         // 2.5 Mbps for 720p
  framerate: 30,
  latencyMode: 'realtime',    // Optimize for speed over quality
  bitrateMode: 'variable',    // VBR for better quality in static scenes
};
 
// Configuration for adaptive bitrate streaming
const abrConfig: VideoEncoderConfig = {
  codec: 'vp09.00.10.08',
  width: 1920,
  height: 1080,
  bitrate: 5_000_000,
  framerate: 30,
  scalabilityMode: 'L1T3',   // Temporal scalability for ABR
};

Encoding Frames with Key Frame Control

const encoder = new VideoEncoder({
  output: (chunk: EncodedVideoChunk, metadata?: EncodedVideoChunkMetadata) => {
    const data = new Uint8Array(chunk.byteLength);
    chunk.copyTo(data);
 
    // The metadata may contain codec-specific data (e.g., SPS/PPS for H.264)
    if (metadata?.decoderConfig) {
      console.log('Decoder config updated:', metadata.decoderConfig);
      // Store or transmit the decoder config alongside the chunk
    }
 
    // Transmit or store the encoded data
    muxer.addChunk(chunk, data);
  },
  error: (e: DOMException) => {
    console.error('Encoder error:', e.message);
  },
});
 
encoder.configure(recordingConfig);
 
let frameCount = 0;
const keyFrameInterval = 150; // Force key frame every 5 seconds at 30fps
 
function encodeVideoFrame(source: CanvasImageSource, timestamp: number) {
  const frame = new VideoFrame(source, { timestamp });
 
  // Force key frames at regular intervals for seeking support
  const isKeyFrame = frameCount % keyFrameInterval === 0;
  encoder.encode(frame, { keyFrame: isKeyFrame });
  frame.close(); // Release the source frame immediately
 
  frameCount++;
 
  // Monitor queue pressure
  if (encoder.encodeQueueSize > 10) {
    console.warn('Encoder queue backing up, consider throttling input');
  }
}

Monitoring and Managing the Encode Queue

The encodeQueueSize property tells you how many frames are waiting to be processed. If this number grows, your pipeline is producing frames faster than the encoder can consume them:

function shouldEncodeMore(): boolean {
  const queueSize = encoder.encodeQueueSize;
  if (queueSize > 5) return false;  // Backpressure — stop feeding frames
  if (queueSize > 2) return true;   // Catching up
  return true;                       // Normal — keep going
}

Audio Encoding and Decoding

The audio interfaces mirror the video ones but work with PCM sample data instead of pixel data.

Audio Encoding with Opus

const audioEncoder = new AudioEncoder({
  output: (chunk: EncodedAudioChunk, metadata?: EncodedAudioChunkMetadata) => {
    const data = new Uint8Array(chunk.byteLength);
    chunk.copyTo(data);
 
    // Send to muxer or network
    audioMuxer.write(chunk, data);
  },
  error: (e: DOMException) => console.error('Audio encoder error:', e),
});
 
audioEncoder.configure({
  codec: 'opus',
  sampleRate: 48000,
  numberOfChannels: 1,
  bitrate: 128_000,            // 128 kbps for speech, 256k for music
  // opus: { complexity: 10 }, // Max quality (0-10)
});
 
// Encode audio from microphone capture
async function encodeMicrophoneAudio() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const audioContext = new AudioContext({ sampleRate: 48000 });
  const source = audioContext.createMediaStreamSource(stream);
  const processor = audioContext.createScriptProcessor(4096, 1, 1);
 
  source.connect(processor);
  processor.connect(audioContext.destination);
 
  let timestamp = 0;
  const frameDuration = 4096 / 48000; // ~85ms per chunk
 
  processor.onaudioprocess = (event) => {
    const inputBuffer = event.inputBuffer;
    const samples = inputBuffer.getChannelData(0);
 
    const audioData = new AudioData({
      format: 'f32',
      sampleRate: 48000,
      numberOfFrames: inputBuffer.length,
      numberOfChannels: 1,
      timestamp: timestamp * 1_000_000, // Convert to microseconds
      data: samples,
    });
 
    audioEncoder.encode(audioData);
    audioData.close();
    timestamp += frameDuration;
  };
}

Audio Decoding and Playback

const audioDecoder = new AudioDecoder({
  output: (audioData: AudioData) => {
    // Extract raw PCM samples
    const samples = new Float32Array(audioData.numberOfFrames);
    audioData.copyTo(samples, { planeIndex: 0 });
 
    // Play through Web Audio API
    const audioCtx = new AudioContext({ sampleRate: audioData.sampleRate });
    const buffer = audioCtx.createBuffer(
      audioData.numberOfChannels,
      audioData.numberOfFrames,
      audioData.sampleRate
    );
 
    for (let ch = 0; ch < audioData.numberOfChannels; ch++) {
      const channelData = new Float32Array(audioData.numberOfFrames);
      audioData.copyTo(channelData, { planeIndex: ch, format: 'f32' });
      buffer.copyToChannel(channelData, ch);
    }
 
    const bufferSource = audioCtx.createBufferSource();
    bufferSource.buffer = buffer;
    bufferSource.connect(audioCtx.destination);
    bufferSource.start();
 
    audioData.close();
  },
  error: (e: DOMException) => console.error('Audio decoder error:', e),
});
 
audioDecoder.configure({
  codec: 'opus',
  sampleRate: 48000,
  numberOfChannels: 1,
});

Muxing and Demuxing: Working with Container Formats

The Web Codecs API only handles encoding and decoding — it has no concept of container formats like MP4, WebM, or MKV. To read encoded chunks from a video file, you need a demuxing library. To write encoded chunks to a playable file, you need a muxing library.

Demuxing MP4 Files for Decoding

import { Mp4Demuxer } from './mp4-demuxer'; // Using a demuxing library
 
async function decodeVideoFile(file: File) {
  const buffer = await file.arrayBuffer();
 
  const demuxer = new Mp4Demuxer(buffer);
 
  const decoder = new VideoDecoder({
    output: (frame) => {
      renderFrameToCanvas(frame);
      frame.close();
    },
    error: console.error,
  });
 
  // Configure decoder from the demuxer's track info
  const videoTrack = demuxer.getVideoTrack();
  decoder.configure({
    codec: videoTrack.codec,
    codedWidth: videoTrack.width,
    codedHeight: videoTrack.height,
    description: videoTrack.description, // Codec-specific init data
  });
 
  // Decode all chunks from the file
  for (const chunk of demuxer.getChunks()) {
    decoder.decode(chunk);
  }
 
  await decoder.flush();
  decoder.close();
}

Muxing Encoded Chunks into WebM

import { WebMMuxer } from 'webm-muxer';
 
async function encodeAndMuxToWebM(
  canvas: HTMLCanvasElement,
  durationSeconds: number
) {
  const muxer = new WebMMuxer({
    target: 'buffer',
    video: { codec: 'V_VP9', width: canvas.width, height: canvas.height },
  });
 
  const encoder = new VideoEncoder({
    output: (chunk, metadata) => {
      muxer.addVideoChunk(chunk, metadata);
    },
    error: console.error,
  });
 
  encoder.configure({
    codec: 'vp09.00.10.08',
    width: canvas.width,
    height: canvas.height,
    bitrate: 5_000_000,
    framerate: 30,
  });
 
  // Encode frames from canvas
  for (let i = 0; i < durationSeconds * 30; i++) {
    const timestamp = (i / 30) * 1_000_000; // Microseconds
    const frame = new VideoFrame(canvas, { timestamp });
    encoder.encode(frame, { keyFrame: i % 90 === 0 });
    frame.close();
  }
 
  await encoder.flush();
  encoder.close();
 
  const buffer = muxer.finalize();
  // buffer now contains a valid WebM file
  downloadBlob(new Blob([buffer], { type: 'video/webm' }), 'output.webm');
}

Professional video production studio with camera equipment

Real-World Use Case: Screen Recording Application

A complete screen recording application demonstrates how the Web Codecs APIs work together. This example captures the screen, encodes video with H.264, encodes system audio with Opus, and produces a downloadable WebM file.

class ScreenRecorder {
  private videoEncoder: VideoEncoder;
  private audioEncoder: AudioEncoder;
  private muxer: any; // WebM or MP4 muxer
  private isRecording = false;
  private frameCount = 0;
 
  constructor(private outputCanvas?: HTMLCanvasElement) {
    this.setupEncoder();
  }
 
  private setupEncoder() {
    this.videoEncoder = new VideoEncoder({
      output: (chunk, meta) => this.muxer.addVideoChunk(chunk, meta),
      error: (e) => console.error('Video encoder error:', e),
    });
 
    this.audioEncoder = new AudioEncoder({
      output: (chunk, meta) => this.muxer.addAudioChunk(chunk, meta),
      error: (e) => console.error('Audio encoder error:', e),
    });
  }
 
  async start() {
    const stream = await navigator.mediaDevices.getDisplayMedia({
      video: { width: 1920, height: 1080, frameRate: 30 },
      audio: {
        echoCancellation: false,
        noiseSuppression: false,
        sampleRate: 48000,
      },
    });
 
    // Configure encoders
    this.videoEncoder.configure({
      codec: 'avc1.64001E',
      width: 1920,
      height: 1080,
      bitrate: 8_000_000,
      framerate: 30,
      latencyMode: 'realtime',
    });
 
    const audioTracks = stream.getAudioTracks();
    if (audioTracks.length > 0) {
      this.audioEncoder.configure({
        codec: 'opus',
        sampleRate: 48000,
        numberOfChannels: 1,
        bitrate: 128_000,
      });
    }
 
    this.isRecording = true;
    this.frameCount = 0;
 
    // Process video frames using MediaStreamTrackProcessor
    const videoTrack = stream.getVideoTracks()[0];
    const processor = new MediaStreamTrackProcessor({ track: videoTrack });
    const reader = processor.readable.getReader();
 
    // Process audio if available
    if (audioTracks.length > 0) {
      this.processAudio(audioTracks[0]);
    }
 
    // Video processing loop
    while (this.isRecording) {
      const { done, value } = await reader.read();
      if (done) break;
 
      const frame = new VideoFrame(value, {
        timestamp: performance.now() * 1000,
      });
 
      // Optional: render to canvas for preview or processing
      if (this.outputCanvas) {
        const ctx = this.outputCanvas.getContext('2d')!;
        ctx.drawImage(frame, 0, 0);
      }
 
      const isKeyFrame = this.frameCount % 90 === 0; // Every 3 seconds
      this.videoEncoder.encode(frame, { keyFrame: isKeyFrame });
      frame.close();
      value.close();
      this.frameCount++;
    }
 
    await this.videoEncoder.flush();
    await this.audioEncoder.flush();
    this.videoEncoder.close();
    this.audioEncoder.close();
    stream.getTracks().forEach(t => t.stop());
  }
 
  stop() {
    this.isRecording = false;
  }
}

Real-Time Video Effects Pipeline

Combining Web Codecs with Canvas and WebGL enables real-time video effects processing entirely in the browser. This pattern is useful for video conferencing filters, AR overlays, and live production tools.

class VideoEffectsProcessor {
  private gl: WebGLRenderingContext;
  private program: WebGLProgram;
 
  constructor(private canvas: HTMLCanvasElement) {
    this.gl = canvas.getContext('webgl')!;
    this.setupShaders();
  }
 
  private setupShaders() {
    // Vertex shader: pass-through
    const vsSource = `
      attribute vec2 a_position;
      attribute vec2 a_texCoord;
      varying vec2 v_texCoord;
      void main() {
        gl_Position = vec4(a_position, 0.0, 1.0);
        v_texCoord = a_texCoord;
      }
    `;
 
    // Fragment shader: color manipulation effect
    const fsSource = `
      precision mediump float;
      varying vec2 v_texCoord;
      uniform sampler2D u_image;
      uniform float u_time;
 
      void main() {
        vec4 color = texture2D(u_image, v_texCoord);
        // Apply a subtle color grading effect
        color.r = pow(color.r, 0.9);
        color.b = pow(color.b, 1.1);
        // Add a vignette
        float dist = distance(v_texCoord, vec2(0.5, 0.5));
        color.rgb *= smoothstep(0.8, 0.3, dist);
        gl_FragColor = color;
      }
    `;
 
    // Compile and link shader program...
  }
 
  async processStream(stream: MediaStream) {
    const track = stream.getVideoTracks()[0];
    const processor = new MediaStreamTrackProcessor({ track });
    const reader = processor.readable.getReader();
 
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
 
      const frame = new VideoFrame(value, {
        timestamp: performance.now() * 1000,
      });
 
      // Upload frame texture to WebGL
      this.gl.texImage2D(
        this.gl.TEXTURE_2D, 0, this.gl.RGBA,
        this.gl.RGBA, this.gl.UNSIGNED_BYTE, frame
      );
 
      // Draw with effect shader
      this.gl.drawArrays(this.gl.TRIANGLE_STRIP, 0, 4);
 
      // Create output frame from canvas
      const outputFrame = new VideoFrame(this.canvas, {
        timestamp: frame.timestamp,
      });
 
      // Pass to encoder or display
      frame.close();
      value.close();
    }
  }
}

Worker-Based Architecture for Production Applications

Production Web Codecs applications should run encoding and decoding in Web Workers to avoid blocking the main thread. The VideoFrame and AudioData objects support transferable semantics, allowing zero-copy handoff between threads.

Dedicated Worker for Video Processing

// video-processor.worker.ts
import { VideoEncoder, VideoDecoder, VideoFrame } from 'webcodecs';
 
class VideoProcessorWorker {
  private decoder: VideoDecoder;
  private encoder: VideoEncoder;
 
  constructor() {
    this.decoder = new VideoDecoder({
      output: (frame) => this.processDecodedFrame(frame),
      error: (e) => self.postMessage({ type: 'error', message: e.message }),
    });
 
    this.encoder = new VideoEncoder({
      output: (chunk, meta) => {
        const data = new Uint8Array(chunk.byteLength);
        chunk.copyTo(data);
        self.postMessage(
          { type: 'encoded', data, timestamp: chunk.timestamp, metadata: meta },
          [data.buffer]  // Transfer ownership — zero copy
        );
      },
      error: (e) => self.postMessage({ type: 'error', message: e.message }),
    });
  }
 
  private async processDecodedFrame(frame: VideoFrame) {
    // Apply transformations, filters, or analysis
    // Then re-encode if needed
    this.encoder.encode(frame);
    frame.close();
  }
 
  configure(config: { decoder: VideoDecoderConfig; encoder: VideoEncoderConfig }) {
    this.decoder.configure(config.decoder);
    this.encoder.configure(config.encoder);
  }
 
  decode(data: ArrayBuffer, timestamp: number, type: 'key' | 'delta') {
    const chunk = new EncodedVideoChunk({
      type,
      timestamp,
      data,
    });
    this.decoder.decode(chunk);
  }
}
 
const processor = new VideoProcessorWorker();
 
self.onmessage = (event) => {
  const { action, payload } = event.data;
  switch (action) {
    case 'configure':
      processor.configure(payload);
      break;
    case 'decode':
      processor.decode(payload.data, payload.timestamp, payload.type);
      break;
  }
};

Transferable Objects for Zero-Copy Communication

When passing VideoFrame objects between threads, use the transfer list to avoid expensive copies of GPU-backed pixel data:

// Main thread → Worker: transfer the frame
const frame = new VideoFrame(videoElement, { timestamp: performance.now() * 1000 });
worker.postMessage({ frame, timestamp: frame.timestamp }, [frame]);
// frame is now neutered on the main thread — cannot be used here
 
// Worker receives the frame without copying
worker.onmessage = (event) => {
  const { frame } = event.data;
  // frame is valid and usable in the worker
  processFrame(frame);
};

Video Transcoding Pipeline

Building a browser-based video transcoder demonstrates the full power of Web Codecs. This pattern reads a file, decodes it, optionally applies transformations, and re-encodes in a different format or quality.

class BrowserTranscoder {
  async transcode(inputFile: File, outputConfig: VideoEncoderConfig) {
    const buffer = await inputFile.arrayBuffer();
    const demuxer = new Mp4Demuxer(buffer);
    const muxer = new WebMMuxer({
      target: 'buffer',
      video: { codec: 'V_VP9', width: outputConfig.width!, height: outputConfig.height! },
    });
 
    const encoder = new VideoEncoder({
      output: (chunk, meta) => muxer.addVideoChunk(chunk, meta),
      error: console.error,
    });
    encoder.configure(outputConfig);
 
    const decoder = new VideoDecoder({
      output: (frame) => {
        // Optional: resize, crop, or apply effects here
        encoder.encode(frame);
        frame.close();
      },
      error: console.error,
    });
 
    const videoTrack = demuxer.getVideoTrack();
    decoder.configure({
      codec: videoTrack.codec,
      codedWidth: videoTrack.width,
      codedHeight: videoTrack.height,
      description: videoTrack.description,
    });
 
    for (const chunk of demuxer.getChunks()) {
      decoder.decode(chunk);
    }
 
    await decoder.flush();
    await encoder.flush();
    decoder.close();
    encoder.close();
 
    return muxer.finalize(); // Returns ArrayBuffer containing WebM data
  }
}

This transcoding pipeline runs entirely in the browser with no server round-trips. For large files, integrate it with the worker architecture above to keep the UI responsive.

WebRTC Integration

The Web Codecs API integrates naturally with WebRTC for real-time communication, giving you fine-grained control over encoding parameters that the standard WebRTC API does not expose.

// Use WebCodecs encoder for custom WebRTC sender
async function setupCustomWebRTCSender(pc: RTCPeerConnection, stream: MediaStream) {
  const videoTrack = stream.getVideoTracks()[0];
  const processor = new MediaStreamTrackProcessor({ track: videoTrack });
  const reader = processor.readable.getReader();
 
  const encoder = new VideoEncoder({
    output: (chunk, metadata) => {
      // Create an EncodedVideoChunk and send via RTCRtpSender
      const sender = pc.getSenders().find(s => s.track === videoTrack);
      if (sender) {
        // Use insertable streams or encoded transform
        // for direct encoded frame injection
      }
    },
    error: console.error,
  });
 
  encoder.configure({
    codec: 'avc1.42001f',    // Baseline for max WebRTC compatibility
    width: 640,
    height: 480,
    bitrate: 1_000_000,
    framerate: 30,
    latencyMode: 'realtime',
  });
 
  // Adaptive bitrate based on network conditions
  function adjustBitrate(availableBandwidth: number) {
    encoder.configure({
      codec: 'avc1.42001f',
      width: 640,
      height: 480,
      bitrate: Math.min(availableBandwidth * 0.8, 2_500_000),
      framerate: 30,
      latencyMode: 'realtime',
    });
  }
}

Memory Management Best Practices

The Web Codecs API deals with large binary buffers and GPU-backed frame data. Improper memory management leads to memory leaks that degrade performance and can crash the application.

The Golden Rule: Always Close Resources

Every VideoFrame, AudioData, EncodedVideoChunk, and EncodedAudioChunk holds a reference to an underlying buffer. These resources must be explicitly released:

// WRONG: Memory leak — frames accumulate
const frames: VideoFrame[] = [];
decoder.configure({ codec: 'avc1.64001E', codedWidth: 1920, codedHeight: 1080 });
// ... decode many frames into array without closing
 
// CORRECT: Close each frame after processing
const decoder = new VideoDecoder({
  output: (frame: VideoFrame) => {
    try {
      processFrame(frame);
    } finally {
      frame.close(); // Always close, even if processing fails
    }
  },
  error: console.error,
});

Handling Backpressure

When processing cannot keep up with the input rate, implement flow control:

class BackpressureAwareDecoder {
  private decoder: VideoDecoder;
  private pendingFrames = 0;
  private maxPending = 3;
 
  constructor() {
    this.decoder = new VideoDecoder({
      output: (frame) => {
        this.pendingFrames--;
        this.processAndClose(frame);
      },
      error: console.error,
    });
  }
 
  async decode(chunk: EncodedVideoChunk) {
    // Wait if too many frames are pending
    while (this.pendingFrames >= this.maxPending) {
      await new Promise(resolve => requestAnimationFrame(resolve));
    }
    this.pendingFrames++;
    this.decoder.decode(chunk);
  }
 
  private processAndClose(frame: VideoFrame) {
    try {
      // Process the frame...
    } finally {
      frame.close();
    }
  }
}

Browser Compatibility

The Web Codecs API has broad support across modern browsers, though audio support lags behind video support on some platforms:

Feature	Chrome	Edge	Firefox	Safari
`VideoDecoder`	94+	94+	113+	16.4+
`VideoEncoder`	94+	94+	113+	16.4+
`AudioDecoder`	94+	94+	113+	❌
`AudioEncoder`	94+	94+	113+	❌
`VideoFrame`	94+	94+	113+	16.4+
`AudioData`	94+	94+	113+	❌
Hardware acceleration	✅	✅	Partial	✅

Feature Detection and Fallbacks

function isWebCodecsSupported(): boolean {
  return (
    typeof VideoEncoder !== 'undefined' &&
    typeof VideoDecoder !== 'undefined' &&
    typeof VideoFrame !== 'undefined'
  );
}
 
function getMediaProcessingStrategy() {
  if (isWebCodecsSupported()) {
    return 'webcodecs'; // Best: frame-level control with hardware acceleration
  }
 
  if (typeof MediaRecorder !== 'undefined') {
    return 'mediarecorder'; // Good: simple API but less control
  }
 
  if (typeof MediaSource !== 'undefined') {
    return 'mediasource'; // Fallback: MSE for streaming
  }
 
  return 'server'; // Last resort: send to server for processing
}

Performance Considerations

Hardware vs. Software Encoding

Browsers automatically choose between hardware (GPU) and software (CPU) encoding based on the configuration and available hardware. Hardware encoding is significantly faster but may produce slightly lower quality at the same bitrate. Force hardware preference when latency matters:

encoder.configure({
  codec: 'avc1.64001E',
  width: 1920,
  height: 1080,
  bitrate: 5_000_000,
  hardwareAcceleration: 'prefer-hardware', // 'prefer-software' | 'no-preference'
});

OffscreenCanvas for Worker-Based Processing

Move expensive frame processing off the main thread using OffscreenCanvas in a Web Worker:

// In a Web Worker
self.onmessage = async (event) => {
  const { frame, width, height } = event.data;
 
  const canvas = new OffscreenCanvas(width, height);
  const ctx = canvas.getContext('2d')!;
  ctx.drawImage(frame, 0, 0);
 
  // Apply pixel-level processing
  const imageData = ctx.getImageData(0, 0, width, height);
  const pixels = imageData.data;
 
  // Example: Convert to grayscale
  for (let i = 0; i < pixels.length; i += 4) {
    const gray = pixels[i] * 0.299 + pixels[i+1] * 0.587 + pixels[i+2] * 0.114;
    pixels[i] = pixels[i+1] = pixels[i+2] = gray;
  }
 
  ctx.putImageData(imageData, 0, 0);
 
  // Return processed frame
  const outputFrame = new VideoFrame(canvas, { timestamp: frame.timestamp });
  frame.close();
  self.postMessage({ frame: outputFrame }, [outputFrame]);
};

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Not calling `frame.close()`	GPU memory leak, eventual crash	Always close in a `finally` block
Missing key frames at start	Decoder error, no output	Send key frame first after `configure()`
Using unsupported codec string	`NotSupportedError`	Call `isConfigSupported()` first
Queue overflow	Frames dropped, OOM	Monitor `encodeQueueSize`, implement backpressure
Wrong timestamp units	Audio/video desync	Web Codecs uses microseconds, not milliseconds
Ignoring `decoderConfig` in metadata	Cannot reconstruct file	Store `decoderConfig` with encoded chunks
Calling `flush()` too frequently	Degrades encode quality	Only flush at natural boundaries
Mixing pixel formats	Garbled output	Ensure `VideoFrame.format` matches encoder expectations

Conclusion

The Web Codecs API is a transformative technology that brings native-grade media processing capabilities to the browser. By exposing hardware-accelerated encoders and decoders at the frame level, it enables a new class of web applications — from browser-based video editors and real-time streaming tools to AI-powered computer vision pipelines and custom transcoding services.

Key takeaways from this guide:

Frame-level control — Process individual video frames and audio samples with pixel-perfect precision, enabling effects, analysis, and transformations that were previously impossible in the browser.
Hardware acceleration — The API leverages GPU encoders and decoders automatically, delivering performance comparable to native applications without requiring plugins or server-side processing.
Codec flexibility — Support for H.264, VP9, AV1, Opus, AAC, and more gives you the freedom to choose the right codec for your use case, whether prioritizing compatibility, quality, or compression efficiency.
Memory discipline is mandatory — Every VideoFrame and AudioData must be explicitly closed. Build cleanup into your processing pipelines from the start, using try/finally patterns and backpressure management.
Container formats require external libraries — Web Codecs handles encoding and decoding only. For reading and writing MP4 or WebM files, use dedicated muxing and demuxing libraries like Mediabunny or webm-muxer.
Feature detection is essential — Browser support varies, especially for audio codecs. Always check isConfigSupported() and provide graceful fallbacks for unsupported environments.

The ecosystem around Web Codecs continues to mature rapidly, with growing library support for container formats, improving hardware encoder availability for AV1, and expanding Safari support for audio interfaces. For any application that needs low-level media processing in the browser, Web Codecs is the foundation to build on.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline