Skip to content

Save VideoFrames and AudioFrames merged in one joint .mp4 file #1071

@anna-charlotte

Description

@anna-charlotte

Overview

I have several VideoFrames and AudioFrames that I want to save together to an .mp4 file. As of right now I am able to save the video frames to one file and the AudioFrames to a separate file. But I haven't found a way to store them in a joint .mp4 file, with audio and video.

Expected behavior

I want to store an audio stream and a video stream to the same .mp4 file.

Actual behavior

No error is thrown, but the resulting .mp4 file can not be opened and seems to be flawed.

Investigation

This results in a flawed .mp4 file:

import av
import numpy as np

video_tensor = np.zeros((250, 176, 320, 3))
audio_tensor = np.zeros((930, 2, 1024))

with av.open('save_to.mp4', mode='w', format='mp4') as container:
    stream_audio = container.add_stream('mp3', rate=48000)  # maybe the 'mp3' needs to be changed?
    stream_video = container.add_stream('h264', rate=24)
    stream_video.height = video_tensor.shape[-3]
    stream_video.width = video_tensor.shape[-2]

    # video encoding
    for vid in video_tensor:
        frame = av.VideoFrame.from_ndarray(vid, format='rgb24')
        for packet in stream_video.encode(frame):
            container.mux(packet)
    for packet in stream_video.encode(None):
        container.mux(packet)

    # audio encoding
    for i, audio in enumerate(audio_tensor):
        frame = av.AudioFrame.from_ndarray(array=audio, format='fltp', layout='stereo')
        frame.rate = 48000
        frame.pts = 1024 * i
        for packet in stream_audio.encode(frame):
            container.mux(packet)

    for packet in stream_audio.encode(None):
        container.mux(packet)
        

Storing the audio frames and video frames seperately works fine.
For VideoFrames to .mp4 (without audio):

import av
import numpy as np

video_tensor = np.zeros((250, 176, 320, 3))
audio_tensor = np.zeros((930, 2, 1024))

with av.open('save_to.mp4', mode='w', format='mp4') as container:
    stream_video = container.add_stream('h264', rate=24)
    stream_video.height = video_tensor.shape[-3]
    stream_video.width = video_tensor.shape[-2]

    # video encoding
    for vid in video_tensor:
        frame = av.VideoFrame.from_ndarray(vid, format='rgb24')
        for packet in stream_video.encode(frame):
            container.mux(packet)
    for packet in stream_video.encode(None):
        container.mux(packet)

For AudioFrames to .mp3:

import av
import numpy as np

video_tensor = np.zeros((250, 176, 320, 3))
audio_tensor = np.zeros((930, 2, 1024))

with av.open('save_to.mp3', mode='w', format='mp3') as container:
    stream_audio = container.add_stream('mp3', rate=48000)  

    # audio encoding
    for i, audio in enumerate(audio_tensor):
        frame = av.AudioFrame.from_ndarray(array=audio, format='fltp', layout='stereo')
        frame.rate = 48000
        frame.pts = 1024 * i
        for packet in stream_audio.encode(frame):
            container.mux(packet)

    for packet in stream_audio.encode(None):
        container.mux(packet)
        

Research

I have done the following:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions