I need to convert a .rtp file (which has been recorded using RTP proxy) to .wav file. If any one knows how it can be done, give me your solutions.
Thanks in advance:)
A little late to the party perhaps but I recently had the same problem and thought I should share my solution to it here if someone else has this question. I also used RTP-proxy to capture audio streams which were saved as two .rtp files, one for each channel, where .o. is the output of the one initiating the call (caller) and .a. is the one receiving the call (callee).
Solution 1. RTP-proxy has a built in module which does the wav conversion for you called "extractaudio". The documentation is lacking to say the least but you can use it from the command-line as follows:
extractaudio -F wav -B /path/to/rtp /path/of/outfile.wav
This will convert one RTP file at a time to a WAV file. The module encode created WAV files with GSM-encoding. If this is undesired you can pass in -D pcm_16 as an extra argument to it to switch the encoding to Linear PCM 16, which is a much better format for retaining audio quality. I extracted WAV files this way programatically through python with the means of subprocesses in order to make command-line calls.
Solution 2. You can extract the raw RTP data directly and convert it to a WAV file using a 3rd-part software like SoX or FFmpeg. This solution requires SoX, FFmpeg and tshark as dependencies. You could do without tshark if you opened the RTP file yourself and extracted the UDP data but it can be done easily with tshark.
Here is my code for it (Python 2.7.9):
import os
import subprocess
import shlex
import binascii
FILENAME = "my_file"
WORKING_DIR = os.path.dirname(os.path.realpath(__file__))
IN_FILE_O = "%s/%s.o.rtp" % (WORKING_DIR, FILENAME)
IN_FILE_A = "%s/%s.a.rtp" % (WORKING_DIR, FILENAME)
conversion_list = {"PCMU" : "sox -t ul -r 8000 -c 1 %s %s",
"GSM" : "sox -t gsm -r 8000 -c 1 %s %s" ,
"PCMA" : "sox -t al -r 8000 -c 1 %s %s",
"G722" : "ffmpeg -f g722 -i %s -acodec pcm_s16le -ar 16000 -ac 1 %s",
"G729": "ffmpeg -f g729 -i %s -acodec pcm_s16le -ar 8000 -ac 1 %s"
}
if __name__ == "__main__":
args_o = "tshark -n -r " + IN_FILE_O + " -T fields -e data"
args_a = "tshark -n -r " + IN_FILE_A + " -T fields -e data"
f_o = WORKING_DIR + "/" + "payload_o.g722"
f_a = WORKING_DIR + "/" + "payload_a.g722"
payload_o = subprocess.Popen(shlex.split(args_o), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]
payload_a = subprocess.Popen(shlex.split(args_a), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]
if os.path.exists(f_o):
os.remove(f_o)
if os.path.exists(f_a):
os.remove(f_a)
with open(f_o, "ab") as new_codec:
payload = payload_o.split("\n")
for line in payload:
line = line.rstrip()
tmp = "%s.o: " % FILENAME
for index, (op, code) in enumerate(zip(line[0::2], line[1::2])):
if index > 11:
new_codec.write(binascii.unhexlify(op + code))
with open(f_a, "ab") as new_codec:
payload = payload_a.split("\n")
for line in payload:
line = line.rstrip()
tmp = "%s.a: " % FILENAME
for index, (op, code) in enumerate(zip(line[0::2], line[1::2])):
if index > 11:
new_codec.write(binascii.unhexlify(op + code))
owav = WORKING_DIR + "/" + "%s.o.wav" % FILENAME
awav = WORKING_DIR + "/" + "%s.a.wav" % FILENAME
if os.path.exists(owav):
os.remove(owav)
if os.path.exists(awav):
os.remove(awav)
print("Creating %s with %s" % (owav, f_o))
print("Creating %s with %s" % (awav, f_a))
subprocess.Popen(shlex.split(conversion_list["G722"] % (f_o, owav)), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]
subprocess.Popen(shlex.split(conversion_list["G722"] % (f_a, awav)), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]
I have G722 hardcoded as input data in my solution but it should work with any type of input encoding given you had the correct SoX/FFmpeg command for it. I've added a few different encodings in a predefined dict. The drawback with this solution is that you have to know the encoding of the call recorded in the RTP file. I tried to find an equivalent parameter in the RTP file to the rtp.p_type found in PCAP files which entails the codec used but didn't have any luck. I'm not familiar enough with RTP files though so it might be present in the data somewhere. Another drawback of this is that the produced audio files can sometimes be shorter than the original audio. I'm assuming this is due to Silence Suppression in which case it could be fixed by inserting silence yourself at the places where the timestamps indicate silence has been removed (not transmitted).
A great way to view information about RTP files is through the tshark-command:
tshark -n -r /path/to/file.rtp
Hope it will help someone!
EDIT: I found another question about detecting the encoding within a RTP file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With