Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What audio formats are supported by Azure Cognitive Services' Speech Service (SST)?

Bearing in mind that the Microsoft/Azure Cognitive Services' "Speech Service" is currently going through a rationalisation exercise, as far as I can tell from looking at

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-apis#speech-to-text

https://learn.microsoft.com/en-us/azure/cognitive-services/speech/home

only .wav binaries are acceptable, with anything else giving the response:

{"Message":"Unsupported audio format"}

Is there any other way to discover the acceptable audio formats/encodings/etc., or is this it?

[Bonus points for tips on preprocessing arbitrary/.m4a audio formats in python pydub so that they meet the bar - currently works for .mp3 but not for .m4a].

Thanks!

like image 706
jtlz2 Avatar asked Oct 21 '25 15:10

jtlz2


1 Answers

The currently support format is WAV (16 kHz or 8 kHz, 16-bit, and mono PCM). More format and codec support will be added in future.

like image 129
Zhou Avatar answered Oct 23 '25 05:10

Zhou