Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert .rtp file(recorded using RTP Proxy codec G711) to .wav file

I need to convert a .rtp file (which has been recorded using RTP proxy) to .wav file. If any one knows how it can be done, give me your solutions.

Thanks in advance:)

like image 290
Prakash GiBBs Avatar asked Dec 06 '25 19:12

Prakash GiBBs


1 Answers

A little late to the party perhaps but I recently had the same problem and thought I should share my solution to it here if someone else has this question. I also used RTP-proxy to capture audio streams which were saved as two .rtp files, one for each channel, where .o. is the output of the one initiating the call (caller) and .a. is the one receiving the call (callee).

Solution 1. RTP-proxy has a built in module which does the wav conversion for you called "extractaudio". The documentation is lacking to say the least but you can use it from the command-line as follows:

extractaudio -F wav -B /path/to/rtp /path/of/outfile.wav

This will convert one RTP file at a time to a WAV file. The module encode created WAV files with GSM-encoding. If this is undesired you can pass in -D pcm_16 as an extra argument to it to switch the encoding to Linear PCM 16, which is a much better format for retaining audio quality. I extracted WAV files this way programatically through python with the means of subprocesses in order to make command-line calls.

Solution 2. You can extract the raw RTP data directly and convert it to a WAV file using a 3rd-part software like SoX or FFmpeg. This solution requires SoX, FFmpeg and tshark as dependencies. You could do without tshark if you opened the RTP file yourself and extracted the UDP data but it can be done easily with tshark.

Here is my code for it (Python 2.7.9):

import os
import subprocess
import shlex
import binascii

FILENAME = "my_file"
WORKING_DIR = os.path.dirname(os.path.realpath(__file__))
IN_FILE_O = "%s/%s.o.rtp" % (WORKING_DIR, FILENAME)
IN_FILE_A = "%s/%s.a.rtp" % (WORKING_DIR, FILENAME)

conversion_list = {"PCMU" : "sox -t ul -r 8000 -c 1 %s %s",
                  "GSM" : "sox -t gsm -r 8000 -c 1 %s %s" ,
                  "PCMA" : "sox -t al -r 8000 -c 1 %s %s",
                  "G722" : "ffmpeg -f g722 -i %s -acodec pcm_s16le -ar 16000 -ac 1 %s",
                  "G729": "ffmpeg -f g729 -i %s -acodec pcm_s16le -ar 8000 -ac 1 %s"
}

if __name__ == "__main__":
  args_o = "tshark -n -r " + IN_FILE_O + " -T fields -e data"
  args_a = "tshark -n -r " + IN_FILE_A + " -T fields -e data"
  f_o = WORKING_DIR + "/" + "payload_o.g722"
  f_a = WORKING_DIR + "/" + "payload_a.g722"
  payload_o = subprocess.Popen(shlex.split(args_o), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]
  payload_a = subprocess.Popen(shlex.split(args_a), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]

  if os.path.exists(f_o):
    os.remove(f_o)
  if os.path.exists(f_a):
    os.remove(f_a)  

  with open(f_o, "ab") as new_codec:
    payload = payload_o.split("\n")
    for line in payload:  
      line = line.rstrip()
      tmp = "%s.o: " % FILENAME
      for index, (op, code) in enumerate(zip(line[0::2], line[1::2])):
        if index > 11:
          new_codec.write(binascii.unhexlify(op + code))

  with open(f_a, "ab") as new_codec:
    payload = payload_a.split("\n")
    for line in payload:  
      line = line.rstrip()
      tmp = "%s.a: " % FILENAME
      for index, (op, code) in enumerate(zip(line[0::2], line[1::2])):
        if index > 11:
          new_codec.write(binascii.unhexlify(op + code))         

  owav = WORKING_DIR + "/" + "%s.o.wav" % FILENAME
  awav = WORKING_DIR + "/" + "%s.a.wav" % FILENAME

  if os.path.exists(owav):
    os.remove(owav)
  if os.path.exists(awav):
    os.remove(awav)  

  print("Creating %s with %s" % (owav, f_o))
  print("Creating %s with %s" % (awav, f_a))
  subprocess.Popen(shlex.split(conversion_list["G722"] % (f_o, owav)), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]
  subprocess.Popen(shlex.split(conversion_list["G722"] % (f_a, awav)), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()[0]

I have G722 hardcoded as input data in my solution but it should work with any type of input encoding given you had the correct SoX/FFmpeg command for it. I've added a few different encodings in a predefined dict. The drawback with this solution is that you have to know the encoding of the call recorded in the RTP file. I tried to find an equivalent parameter in the RTP file to the rtp.p_type found in PCAP files which entails the codec used but didn't have any luck. I'm not familiar enough with RTP files though so it might be present in the data somewhere. Another drawback of this is that the produced audio files can sometimes be shorter than the original audio. I'm assuming this is due to Silence Suppression in which case it could be fixed by inserting silence yourself at the places where the timestamps indicate silence has been removed (not transmitted).

A great way to view information about RTP files is through the tshark-command:

tshark -n -r /path/to/file.rtp 

Hope it will help someone!

EDIT: I found another question about detecting the encoding within a RTP file.

like image 129
havokles Avatar answered Dec 08 '25 22:12

havokles