How to extract video and audio from ffmpeg stream in python

Question

I want to get separate video and separate audio objects from ffmpeg stream (python)

To do this, I run it like this on my rasbery pi:

ffmpeg -f alsa -thread_queue_size 1024 -channels 1 -i hw:2,0 -thread_queue_size 1024 -s 1920x1080 -i /dev/video0 -listen 1 -f matroska -vcodec libx264 -preset veryfast -tune zerolatency http://:8080

From the server side, I connect to the stream like this. I know how to get sound from this packet object, but I don’t understand how to get a video frame from the packet object? I would like to present the video stream as a picture by picture and a separate sound for audio and video processing in the program.

    process = (
        ffmpeg.input("http://192.168.1.78:8080").output(
            '-',
            format='matroska',
            acodec='libvorbis',
            vcodec='libx264'
        ).run_async(pipe_stdout=True, pipe_stderr=True)
    )
    while process.poll() is None:
        packet = process.stdout.read(4096)

Using python 3.9 ffmpeg-python==0.2.0

P.S. Essentially I need a numpy array of video and separate audio for each package.

kesh · Accepted Answer

Essentially I need a numpy array of video and separate audio for each package.

The difficult part is how to pipe 2 different streams. And your approach depends on your OS.

Linux/MacOS

(I'm a Windows guy for the most part so take this with a grain of salt)

Use pass_fds option of subprocess.Popen to create the second "stdout." See this link for an example of how to passing additional pipe via pass_fds.
On FFmpeg command line, use 'pipe:3' to make FFmpeg write the 2nd output stream to the extra pipe. For example:

ffmpeg -i input_url -f rawvideo -pix_fmt rgb24 - \
                    -f s16le                   pipe:3

In Python, dispatch 2 threads to read both pipes simultaneously w/out deadlock.

(you may need to specify the codecs and more options, but you get the idea.)

Windows/cross-platform

pass_fds is not available in Windows
Use a container, which can transport both raw video and audio streams (e.g., AVI):

ffmpeg -i input_url -f avi -c:v rawvideo -pix_fmt rgb24 -c:a pcm_s16le -

Python is responsible to demux the AVI stream by reading the # of bytes specified by a RIFF file chunk at a time. First decode the header info then video and audio chunks will alternate till the end of stream. AVI format is not too complicated, See the Wikipedia entry

- I happen to know all this because I'm currently developing this exact mechanism for my ffmpegio library. Release of this feature is still a little away, but you can check out my test implementation: tests/test_media.py and src/ffmpegio/utils/avi.py. Please note that the test is written for a file read instead of stream read. So, test_media.py needs to be modified (use Popen instead of run (line 9) and change BytesIO(out.stdout) to out.stdout on line 30)

(addendum) ffmpegio example

I managed to get the AVI streaming mechanism to work with my ffmpegio library, so here is a couple examples if you are interested in giving it a try.

Your code in OP suggests packet-wise processing, setting blocksize=0 for this:

import ffmpegio

with open("http://192.168.1.78:8080",'rva',blocksize=0) as stream:
    for st_spec, chunk in stream:
        # st_spec: stream specifier string: 'v:0', 'a:0', etc.
        # chunk: retrieved avi chunk data as numpy array
        # e.g., [1xheightxwidthx3] for video or [1357x1] for mono audio
        your_process(t_spec, chunk)

if you want ffmpegio to gather data per stream and output whenever one of the streams has X number of blocks, set blocksize to a positive int:

with open("http://192.168.1.78:8080",'rva',blocksize=1,ref_stream='v:0') as stream:
    for frames in stream:
        # frames : dict of retrieved raw data of video & audio streams
        # e.g., {'v:0': [1xheightxwidthx3] ndarray, 'a:0': [1357x1] ndarray}
        your_process(frames)

You can add (dashless) FFmpeg options to the open argument list as you wish. Input options, append the option name with "_in".

My quick benchmark on my old laptop suggests that it's decoding the video & audio streams at x3 speed, so it should be able to handle live stream on a modern rig.

Finally, my standard disclaimer. The library is very young, please report any issues on GitHub and I'll address them asap.

Karl Knechtel · Answer

It's worth understanding off the top that these are bindings for FFMpeg, which is doing all the work. It's useful to understand the FFMpeg program itself, in particular the command-line arguments it takes. There is a lot there, but you can learn it a piece at a time according to your actual needs.

Your existing input stream:

process = (
    ffmpeg.input("http://192.168.1.78:8080").output(
        '-',
        format='matroska',
        acodec='libvorbis',
        vcodec='libx264'
    ).run_async(pipe_stdout=True, pipe_stderr=True)
)

Let's compare that to the one in the example partway down the documentation, titled "Process video frame-by-frame using numpy:" (I reformatted it a little to match):

process1 = (
    ffmpeg.input(in_filename).output(
        'pipe:',
        format='rawvideo',
        pix_fmt='rgb24'
    ).run_async(pipe_stdout=True)
)

It does not matter whether we use a file or a URL for our input source - ffmpeg.input figures that out for us, and at that point we just have an ffmpeg.Stream either way. (Just like we could use either for a -i argument for the command-line ffmpeg program.)

The next step is to specify how the stream outputs (i.e., what kind of data we will get out when we read from the stdout of the process. The documentation's example uses 'pipe:' to specify writing to stdout; this should be the same as '-'. The documentation's example does not pipe_stderr, but that shouldn't matter since we do not plan to read from the stderr either way.

The key difference is that we specify a format that we know how to handle. 'rawvideo' means exactly what it sounds like, and is suitable for reading the data into a Numpy array. (This is what we would pass as a -f option at the command line.)

The pix_fmt keyword parameter means what it sounds like: 24 bits per pixel, representing red, green and blue components. There are a bunch of pre-defined values for this which you can see with ffmpeg -pix_fmts. And, yes, you would specify this as -pix_fmt at the command line.

Having created such an input stream, we can read from its stdout and create Numpy arrays from each piece of data. We don't want to read data in arbitrary "packet" sizes for this; we want to read exactly as much data as is needed for one frame. That will be the width of the video, times the height, times three (for RGB components at 1 byte each). Which is exactly what we see later in the example:

while True:
    in_bytes = process1.stdout.read(width * height * 3)
    if not in_bytes:
        break
    in_frame = (
        np
        .frombuffer(in_bytes, np.uint8)
        .reshape([height, width, 3])
    )

Pretty straightforward: we iteratively read that amount of data, check for the end of the stream, and then create the frame with standard Numpy stuff.

Notice that at no point here did we attempt to separate audio and video - this is because a rawvideo codec, as the name implies, won't output any audio data. We don't need to select the video from the input stream in order to filter it out. But we can - it's as simple as shown at the top of the documentation: ffmpeg.input(...).video.output(...). Similarly for audio.

We can process the audio by creating a separate stream. Choose an appropriate audio format, and specify any other needed arguments. So, perhaps something like:

process2 = (
    ffmpeg.input(in_filename).output(
        'pipe:',
        format='s16le',
        sample_rate='44100'
    ).run_async(pipe_stdout=True)
)

How to extract video and audio from ffmpeg stream in python

Tags:

python

video

ffmpeg

video-streaming

audio

user5285766

2 Answers

Linux/MacOS

Windows/cross-platform

(addendum) ffmpegio example

kesh

Karl Knechtel

Recent Activity

Donate For Us

How to extract video and audio from ffmpeg stream in python

Tags:

python

video

ffmpeg

video-streaming

audio

user5285766

2 Answers

Linux/MacOS

Windows/cross-platform

(addendum) ffmpegio example

kesh

Karl Knechtel

Related questions

Recent Activity

Donate For Us