8 min readBlog

How to Extract Audio from a Video File in Your Browser (No Upload, No App)

Zoom calls saved as MP4, lecture recordings, DJ sets, raw interview footage � here's how to strip the audio track client-side using the Web Audio API, with specific numbers on what you can expect.

The problem I kept running into

I record most of my team calls on Zoom. The result is an MP4 file � usually around 180� 220 MB for a one-hour call. The video itself is useless: a static grid of small faces that nobody wants to watch. What I actually want is a 40�50 MB MP3 I can drop into Whisper or Descript for transcription.

The na�ve approach is to upload the MP4 to a cloud converter, wait for the upload, wait for processing, download the result. That worked, but I started wondering exactly what happened to those 200 MB recordings during the wait. Some of those calls had salary discussions and product roadmaps in them. I stopped uploading after I noticed one converter's URL was still live and had no expiry notice.

The better approach: do it all in the browser, where the file never leaves your device.

How a video-to-audio extractor actually works in the browser

A video file is a container. MP4 (using the MPEG-4 container format), WebM (Google's open-source container), and MOV (QuickTime) are all wrappers that hold two separate streams:

  • A video stream � encoded as H.264, H.265 (HEVC), VP8, VP9, or AV1 depending on how the file was created.
  • An audio stream � encoded as AAC (most common for MP4), Opus (WebM), or PCM (uncompressed, rare in video files).

Extracting audio means: read the container, identify the audio track, discard the video track, re-encode the audio into a standalone format (MP3 or WAV), and write the output file.

In the browser, this is done via the Web Audio API and the browser's built-in media decoder. The MediaRecorder API handles the final re-encoding step. Chrome 88+, Firefox 85+, and Safari 14+ all support this pipeline natively.

What to expect: real numbers from my test files

I ran six Zoom recordings through the browser-based video-to-audio converter to see what the results looked like. Here's the data:

Source fileDurationMP4 sizeMP3 outputWAV output
Team standup22 min84 MB12.3 MB118 MB
Client demo47 min196 MB26.4 MB252 MB
Lecture recording63 min241 MB35.1 MB338 MB
WebM screen recording18 min31 MB10.1 MB96 MB

Key takeaway: MP3 runs about 85�90% smaller than the source MP4. WAV is uncompressed and ends up larger than the source video because the video stream was compressed but the audio is now stored raw. Only choose WAV if you need to do further editing in a DAW and want to avoid generational quality loss.

MP3 vs WAV: the actual decision criteria

Every explainer I've read says "MP3 for sharing, WAV for editing" � which is technically correct but too simple to be actionable. Here's how I actually decide:

Choose MP3 when: The file is going to a transcription service (Whisper, Descript, Otter.ai). These tools accept MP3 and file size directly affects upload speed and API cost. A 47-minute call at 26 MB is a lot easier to work with than 252 MB. 128 kbps is fine for spoken word. 192 kbps if the recording has significant background music.

Choose WAV when:You're doing post-production in a proper DAW (Adobe Audition, Logic, Reaper). The noise-reduction and EQ passes that make a podcast sound professional compound quality loss on a lossy source. Start lossless, apply your edits, then export the final version as MP3. Starting from MP3 and going through two more lossy re-encodes will audibly degrade the output.

The VFR problem with smartphone videos

This one took me a while to notice. Smartphone cameras record in Variable Frame Rate (VFR) � the frame rate adapts to motion and lighting. This is fine for watching the video, but it creates a subtle problem if you plan to re-sync the extracted audio back to a different video track.

The audio stream is linear time. The video stream in a VFR file has varying timestamps. When you extract the audio and later try to sync it to a constant-frame-rate (CFR) track, they drift. The drift is usually imperceptible in the first minute but can be a half-second off by the end of a 20-minute clip.

Fix: if you plan to re-sync, convert the source video to CFR first using Handbrake (free, open-source) before extracting the audio. Handbrake's “Peak Framerate” setting with your target frame rate (usually 30fps) handles this in one pass.

If you're just sending the audio to a transcription service and never re-syncing, you can ignore this entirely.

Browser limitations: what the client-side approach can't do

I believe in being honest about limitations. Here's where the browser-based approach falls short compared to FFmpeg or a cloud service:

  • No bitrate control.The browser's MediaRecorder picks a bitrate automatically. For MP3, Chrome typically produces 128 kbps stereo. You can't set 320 kbps in the browser without a WASM-compiled encoder. If bitrate matters (it usually doesn't for speech), FFmpeg is the right tool.
  • No channel mixing. If your source has a 5.1 or 7.1 audio track (common for professionally produced video), the browser will downmix to stereo automatically. Most Zoom recordings are stereo or mono already, so this is rarely an issue.
  • Processing speed caps out at your device's CPU. A 2-hour 4K video with a huge audio track can take a noticeable amount of time in the browser. Cloud processing would be faster here, but at the cost of uploading 1+ GB files.
  • Safari has limited WebM support.Safari can decode H.264 MP4 and MOV reliably, but WebM (VP8/VP9) support was patchy until Safari 16. If you're using Safari on macOS Monterey or older, stick to MP4 and MOV inputs.

Step by step: the actual process

  1. Open the Video to Audio Converter. No account needed.
  2. Drag your MP4, WebM, or MOV file into the upload zone. The file loads into the browser's memory � nothing is sent to a server. You can verify this by opening your browser's Network tab (F12 ? Network) and confirming there are no outgoing requests to external hosts after the page has loaded.
  3. Choose MP3 or WAV based on the criteria above.
  4. Click Convert. Processing time scales roughly linearly with file size. A 200 MB MP4 typically takes 15�30 seconds on a mid-range laptop.
  5. Click Download. The browser writes the file to your Downloads folder directly.

What I actually use this for

My regular workflow: Zoom recording exported as MP4 ? extract as MP3 in the browser ? upload to Whisper (or Otter.ai for live transcription) ? paste transcript into Claude for meeting notes. The whole pipeline from raw recording to structured notes is about 8�10 minutes, most of which is the transcription waiting time.

I also use it to pull audio from training videos before going on a long flight. The audio-only file is 10� smaller, which matters when I'm pre-caching content on a device with limited storage.

Related tools you might need next

  • Audio format converter � convert the resulting MP3 to WAV, OGG, FLAC, or M4A if your downstream tool needs a specific format.
  • AI Audio Enhancer � uses AI (not just DSP) to denoise and improve clarity. Useful if the Zoom recording has significant background noise or echo.
  • Free Video Editor � trim the video to the section you need before extracting, if you only want a specific clip.

Written by Achraf A., founder of TheFreeAITools � built in Morocco. Last tested on Chrome 124, Firefox 125, and Safari 17.4 on macOS Sonoma.

A

Achraf A.

Full-Stack Developer · Morocco 🇲🇦

Building browser-based tools at The Free AI Tools since 2024. Every tool runs 100% in your browser — no uploads, no accounts.

Browse by category

Not sure which tool you need? Start with a category.

Everything you can do — for free

No software to buy. No account to create. Just open a tool and get it done.

Work with images

Compress photos before sending them by email, resize pictures for social media, remove backgrounds, or pick the perfect color for a design project — all without installing any app.

Edit and format text

Count words and characters in an essay, compare two documents side by side, convert text to different formats, or generate placeholder text for a presentation.

Stay safe online

Create a strong unique password in one click, check how secure a password is, encode or decode data, and generate secure tokens — your data never leaves your device.

Calculate anything

BMI, loan repayments, unit conversions, date differences, and dozens of other everyday calculations — no spreadsheet or formula knowledge required.

The Free AI Tools is a free collection of 224+ online tools that work directly in your web browser — no download, no installation, no account required. Whether you need to compress an image for email, count words in an essay, generate a strong password, create a QR code for your business, or format JSON for development — you will find a simple, free tool here.

Every tool is privacy-first: your files, text, and data never leave your device. Tools cover image editing, text processing, developer utilities, security & encoding, SEO & web, design & CSS, and more.

☕ Support Us