I want to get separate video and separate audio objects from ffmpeg stream (python)
To do this, I run it like this on my rasbery pi:
ffmpeg -f alsa -thread_queue_size 1024 -channels 1 -i hw:2,0 -thread_queue_size 1024 -s 1920x1080 -i /dev/video0 -listen 1 -f matroska -vcodec libx264 -preset veryfast -tune zerolatency http://:8080
From the server side, I connect to the stream like this. I know how to get sound from this packet object, but I don’t understand how to get a video frame from the packet object? I would like to present the video stream as a picture by picture and a separate sound for audio and video processing in the program.
process = (
ffmpeg.input("http://192.168.1.78:8080").output(
'-',
format='matroska',
acodec='libvorbis',
vcodec='libx264'
).run_async(pipe_stdout=True, pipe_stderr=True)
)
while process.poll() is None:
packet = process.stdout.read(4096)
Using python 3.9 ffmpeg-python==0.2.0
P.S. Essentially I need a numpy array of video and separate audio for each package.
Essentially I need a numpy array of video and separate audio for each package.
The difficult part is how to pipe 2 different streams. And your approach depends on your OS.
(I'm a Windows guy for the most part so take this with a grain of salt)
pass_fds
option of subprocess.Popen
to create the second "stdout
." See this link for an example of how to passing additional pipe via pass_fds
.'pipe:3'
to make FFmpeg write the 2nd output stream to the extra pipe. For example:ffmpeg -i input_url -f rawvideo -pix_fmt rgb24 - \
-f s16le pipe:3
(you may need to specify the codecs and more options, but you get the idea.)
pass_fds
is not available in Windowsffmpeg -i input_url -f avi -c:v rawvideo -pix_fmt rgb24 -c:a pcm_s16le -
- I happen to know all this because I'm currently developing this exact mechanism for my ffmpegio library. Release of this feature is still a little away, but you can check out my test implementation: tests/test_media.py and src/ffmpegio/utils/avi.py. Please note that the test is written for a file read instead of stream read. So, test_media.py needs to be modified (use Popen instead of run (line 9) and change BytesIO(out.stdout) to out.stdout on line 30)
I managed to get the AVI streaming mechanism to work with my ffmpegio library, so here is a couple examples if you are interested in giving it a try.
Your code in OP suggests packet-wise processing, setting blocksize=0
for this:
import ffmpegio
with open("http://192.168.1.78:8080",'rva',blocksize=0) as stream:
for st_spec, chunk in stream:
# st_spec: stream specifier string: 'v:0', 'a:0', etc.
# chunk: retrieved avi chunk data as numpy array
# e.g., [1xheightxwidthx3] for video or [1357x1] for mono audio
your_process(t_spec, chunk)
if you want ffmpegio to gather data per stream and output whenever one of the streams has X number of blocks, set blocksize
to a positive int:
with open("http://192.168.1.78:8080",'rva',blocksize=1,ref_stream='v:0') as stream:
for frames in stream:
# frames : dict of retrieved raw data of video & audio streams
# e.g., {'v:0': [1xheightxwidthx3] ndarray, 'a:0': [1357x1] ndarray}
your_process(frames)
You can add (dashless) FFmpeg options to the open
argument list as you wish. Input options, append the option name with "_in"
.
My quick benchmark on my old laptop suggests that it's decoding the video & audio streams at x3 speed, so it should be able to handle live stream on a modern rig.
Finally, my standard disclaimer. The library is very young, please report any issues on GitHub and I'll address them asap.
It's worth understanding off the top that these are bindings for FFMpeg, which is doing all the work. It's useful to understand the FFMpeg program itself, in particular the command-line arguments it takes. There is a lot there, but you can learn it a piece at a time according to your actual needs.
Your existing input stream:
process = (
ffmpeg.input("http://192.168.1.78:8080").output(
'-',
format='matroska',
acodec='libvorbis',
vcodec='libx264'
).run_async(pipe_stdout=True, pipe_stderr=True)
)
Let's compare that to the one in the example partway down the documentation, titled "Process video frame-by-frame using numpy:" (I reformatted it a little to match):
process1 = (
ffmpeg.input(in_filename).output(
'pipe:',
format='rawvideo',
pix_fmt='rgb24'
).run_async(pipe_stdout=True)
)
It does not matter whether we use a file or a URL for our input source - ffmpeg.input
figures that out for us, and at that point we just have an ffmpeg.Stream
either way. (Just like we could use either for a -i
argument for the command-line ffmpeg
program.)
The next step is to specify how the stream outputs (i.e., what kind of data we will get out when we read from the stdout
of the process. The documentation's example uses 'pipe:'
to specify writing to stdout; this should be the same as '-'
. The documentation's example does not pipe_stderr
, but that shouldn't matter since we do not plan to read from the stderr either way.
The key difference is that we specify a format that we know how to handle. 'rawvideo'
means exactly what it sounds like, and is suitable for reading the data into a Numpy array. (This is what we would pass as a -f
option at the command line.)
The pix_fmt
keyword parameter means what it sounds like: 24 bits per pixel, representing red, green and blue components. There are a bunch of pre-defined values for this which you can see with ffmpeg -pix_fmts
. And, yes, you would specify this as -pix_fmt
at the command line.
Having created such an input stream, we can read from its stdout and create Numpy arrays from each piece of data. We don't want to read data in arbitrary "packet" sizes for this; we want to read exactly as much data as is needed for one frame. That will be the width of the video, times the height, times three (for RGB components at 1 byte each). Which is exactly what we see later in the example:
while True:
in_bytes = process1.stdout.read(width * height * 3)
if not in_bytes:
break
in_frame = (
np
.frombuffer(in_bytes, np.uint8)
.reshape([height, width, 3])
)
Pretty straightforward: we iteratively read that amount of data, check for the end of the stream, and then create the frame with standard Numpy stuff.
Notice that at no point here did we attempt to separate audio and video - this is because a rawvideo
codec, as the name implies, won't output any audio data. We don't need to select the video from the input stream in order to filter it out. But we can - it's as simple as shown at the top of the documentation: ffmpeg.input(...).video.output(...)
. Similarly for audio.
We can process the audio by creating a separate stream. Choose an appropriate audio format, and specify any other needed arguments. So, perhaps something like:
process2 = (
ffmpeg.input(in_filename).output(
'pipe:',
format='s16le',
sample_rate='44100'
).run_async(pipe_stdout=True)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With