How to input audio data into deep learning algorithm?

Question

I'm very new in deep learning, and I'm targeting to use GAN (Generative Adversarial Network) to recognize emotional speech. I've only known images being as inputs to most deep learning algorithms, such as GAN. but I'm curious as to how audio data can be an input into it, besides of using images of the spectrograms as the input. also, i'd appreciate it if you can explain it in laymen terms.

Ayush Chaurasia · Accepted Answer

Audio data can be be represented in form of numpy arrays but before moving to that you must understand what audio really is. If you give a thought on what an audio looks like, it is nothing but a wave like format of data, where the amplitude of audio change with respect to time.

enter image description here

Assuming that our audio is represented in time domain, we can extract the values at every half-second(arbitrary). This is called sampling rate. Converting the data into frequency domain can reduce the amount of computation requires as the sampling rate is less.

Now, let's load the data. We'll use a library called librosa , which can be installed using pip.

data, sampling_rate = librosa.load('audio.wav')

Now, you have both the data and the sampling rate. We can plot the waveform now.

librosa.display.waveplot(data, sr=sampling_rate)

Now, you have the audio data in form of numpy array. You can now study the features of the data and extract the ones you find interesting to train your models.

How to input audio data into deep learning algorithm?

Tags:

classification

deep-learning

audio-processing

speech

generative-adversarial-network

silvermaze

1 Answers

Ayush Chaurasia

Recent Activity

Donate For Us

How to input audio data into deep learning algorithm?

Tags:

classification

deep-learning

audio-processing

speech

generative-adversarial-network

silvermaze

1 Answers

Ayush Chaurasia

Related questions

Recent Activity

Donate For Us