Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to input audio data into deep learning algorithm?

I'm very new in deep learning, and I'm targeting to use GAN (Generative Adversarial Network) to recognize emotional speech. I've only known images being as inputs to most deep learning algorithms, such as GAN. but I'm curious as to how audio data can be an input into it, besides of using images of the spectrograms as the input. also, i'd appreciate it if you can explain it in laymen terms.

like image 804
silvermaze Avatar asked Mar 19 '26 07:03

silvermaze


1 Answers

Audio data can be be represented in form of numpy arrays but before moving to that you must understand what audio really is. If you give a thought on what an audio looks like, it is nothing but a wave like format of data, where the amplitude of audio change with respect to time.

enter image description here

Assuming that our audio is represented in time domain, we can extract the values at every half-second(arbitrary). This is called sampling rate. Converting the data into frequency domain can reduce the amount of computation requires as the sampling rate is less.

Now, let's load the data. We'll use a library called librosa , which can be installed using pip.

data, sampling_rate = librosa.load('audio.wav')

Now, you have both the data and the sampling rate. We can plot the waveform now.

librosa.display.waveplot(data, sr=sampling_rate)

Now, you have the audio data in form of numpy array. You can now study the features of the data and extract the ones you find interesting to train your models.

like image 100
Ayush Chaurasia Avatar answered Mar 21 '26 23:03

Ayush Chaurasia



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!