upvote
> TorchCodec now has a dedicated WavDecoder for decoding WAV files. It bypasses FFmpeg entirely and reads WAV data directly, resulting in significantly faster decoding.

I'm working in this area recently and very keen to use this given the claimed performance benefits, but I tried all your links and didn't see any actual performance numbers. Do you have any to share?

IMO a fair performance benchmark for those not tied to the full pytorch stack would have ffmpeg and the wav already loaded into memory before execution. Given that torchcodec relies on the user-supplied ffmpeg installation I suspect that may not be the case for ffmpeg already, at least not by default.

I understand why meta wouldn't want to do this (then you are inevitably distributing exploitable security vulnerabilities in pytorch, because ffmpeg will probably always have them) but I've been statically linking fmpeg and keeping the binary in-memory while still using separate processes for different batches of audio, with I/O through UDS between the parent and ffmpeg; then the parent does VAD on the pcm on CPU before any further inference. My implementation for static linking is similar to the pattern in https://github.com/amenzhinsky/go-memexec#static-binary - would be interesting to see if this is possible in the pytorch/python ecosystem, or maybe it's already been done.

reply
Hi, In the past I have used NVVideoCodec and VPI for gpu accelerated decoding and processing. What would be torchcodec's appeal here? VPI already provides zero-copy interface with pytorch.

Thanks!

reply