Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oboe Async Audio Extraction

I am trying to build a NDK based C++ low latency audio player which will encounter three operations for multiple audios.

  1. Play from assets.
  2. Stream from an online source.
  3. Play from local device storage.

From one of the Oboe samples provided by Google, I added another function to the class NDKExtractor.cpp to extract a URL based audio and render it to audio device while reading from source at the same time.

int32_t NDKExtractor::decode(char *file, uint8_t *targetData, AudioProperties targetProperties) {

    LOGD("Using NDK decoder: %s",file);

    // Extract the audio frames
    AMediaExtractor *extractor = AMediaExtractor_new();
    // using this method instead of AMediaExtractor_setDataSourceFd() as used for asset files in the rythem game example
    media_status_t amresult = AMediaExtractor_setDataSource(extractor, file);


    if (amresult != AMEDIA_OK) {
        LOGE("Error setting extractor data source, err %d", amresult);
        return 0;
    }
    // Specify our desired output format by creating it from our source
    AMediaFormat *format = AMediaExtractor_getTrackFormat(extractor, 0);

    int32_t sampleRate;
    if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_SAMPLE_RATE, &sampleRate)) {
        LOGD("Source sample rate %d", sampleRate);
        if (sampleRate != targetProperties.sampleRate) {
            LOGE("Input (%d) and output (%d) sample rates do not match. "
                 "NDK decoder does not support resampling.",
                 sampleRate,
                 targetProperties.sampleRate);
            return 0;
        }
    } else {
        LOGE("Failed to get sample rate");
        return 0;
    };

    int32_t channelCount;
    if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_CHANNEL_COUNT, &channelCount)) {
        LOGD("Got channel count %d", channelCount);
        if (channelCount != targetProperties.channelCount) {
            LOGE("NDK decoder does not support different "
                 "input (%d) and output (%d) channel counts",
                 channelCount,
                 targetProperties.channelCount);
        }
    } else {
        LOGE("Failed to get channel count");
        return 0;
    }

    const char *formatStr = AMediaFormat_toString(format);
    LOGD("Output format %s", formatStr);

    const char *mimeType;
    if (AMediaFormat_getString(format, AMEDIAFORMAT_KEY_MIME, &mimeType)) {
        LOGD("Got mime type %s", mimeType);
    } else {
        LOGE("Failed to get mime type");
        return 0;
    }

    // Obtain the correct decoder
    AMediaCodec *codec = nullptr;
    AMediaExtractor_selectTrack(extractor, 0);
    codec = AMediaCodec_createDecoderByType(mimeType);
    AMediaCodec_configure(codec, format, nullptr, nullptr, 0);
    AMediaCodec_start(codec);

    // DECODE

    bool isExtracting = true;
    bool isDecoding = true;
    int32_t bytesWritten = 0;

    while (isExtracting || isDecoding) {

        if (isExtracting) {

            // Obtain the index of the next available input buffer
            ssize_t inputIndex = AMediaCodec_dequeueInputBuffer(codec, 2000);
            //LOGV("Got input buffer %d", inputIndex);

            // The input index acts as a status if its negative
            if (inputIndex < 0) {
                if (inputIndex == AMEDIACODEC_INFO_TRY_AGAIN_LATER) {
                    // LOGV("Codec.dequeueInputBuffer try again later");
                } else {
                    LOGE("Codec.dequeueInputBuffer unknown error status");
                }
            } else {

                // Obtain the actual buffer and read the encoded data into it
                size_t inputSize;
                uint8_t *inputBuffer = AMediaCodec_getInputBuffer(codec, inputIndex,
                                                                  &inputSize);
                //LOGV("Sample size is: %d", inputSize);

                ssize_t sampleSize = AMediaExtractor_readSampleData(extractor, inputBuffer,
                                                                    inputSize);
                auto presentationTimeUs = AMediaExtractor_getSampleTime(extractor);

                if (sampleSize > 0) {

                    // Enqueue the encoded data
                    AMediaCodec_queueInputBuffer(codec, inputIndex, 0, sampleSize,
                                                 presentationTimeUs,
                                                 0);
                    AMediaExtractor_advance(extractor);

                } else {
                    LOGD("End of extractor data stream");
                    isExtracting = false;

                    // We need to tell the codec that we've reached the end of the stream
                    AMediaCodec_queueInputBuffer(codec, inputIndex, 0, 0,
                                                 presentationTimeUs,
                                                 AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM);
                }
            }
        }

        if (isDecoding) {
            // Dequeue the decoded data
            AMediaCodecBufferInfo info;
            ssize_t outputIndex = AMediaCodec_dequeueOutputBuffer(codec, &info, 0);

            if (outputIndex >= 0) {

                // Check whether this is set earlier
                if (info.flags & AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM) {
                    LOGD("Reached end of decoding stream");
                    isDecoding = false;
                } else {
                    // Valid index, acquire buffer
                    size_t outputSize;
                    uint8_t *outputBuffer = AMediaCodec_getOutputBuffer(codec, outputIndex,
                                                                        &outputSize);

                    /*LOGV("Got output buffer index %d, buffer size: %d, info size: %d writing to pcm index %d",
                         outputIndex,
                         outputSize,
                         info.size,
                         m_writeIndex);*/

                    // copy the data out of the buffer
                    memcpy(targetData + bytesWritten, outputBuffer, info.size);
                    bytesWritten += info.size;
                    AMediaCodec_releaseOutputBuffer(codec, outputIndex, false);
                }

            } else {

                // The outputIndex doubles as a status return if its value is < 0
                switch (outputIndex) {
                    case AMEDIACODEC_INFO_TRY_AGAIN_LATER:
                        LOGD("dequeueOutputBuffer: try again later");
                        break;
                    case AMEDIACODEC_INFO_OUTPUT_BUFFERS_CHANGED:
                        LOGD("dequeueOutputBuffer: output buffers changed");
                        break;
                    case AMEDIACODEC_INFO_OUTPUT_FORMAT_CHANGED:
                        LOGD("dequeueOutputBuffer: output outputFormat changed");
                        format = AMediaCodec_getOutputFormat(codec);
                        LOGD("outputFormat changed to: %s", AMediaFormat_toString(format));
                        break;
                }
            }
        }
    }

    // Clean up
    AMediaFormat_delete(format);
    AMediaCodec_delete(codec);
    AMediaExtractor_delete(extractor);
    return bytesWritten;
}

Now the problem I am facing is that this code first extracts all the audio data, saves it into a buffer which then becomes part of AFileDataSource which I derived from DataSource class in the same sample. And after it's done extracting the whole file it plays by calling the onAudioReady() for Oboe AudioStreamBuilder. What I need is to play as it streams the chunk of audio buffer.

Optional Query: Also aside from the question it blocks the UI even though I created a foreground service to communicate with the NDK functions to execute this code. Any thoughts on this?

like image 780
Atif Rehman Avatar asked Oct 20 '25 02:10

Atif Rehman


1 Answers

You probably solved this already, but for future readers... You need a FIFO buffer to store the decoded audio. You can use the Oboe's FIFO buffer e.g. oboe::FifoBuffer. You can have a low/high watermark for the buffer and a state machine, so you start decoding when the buffer is almost empty and you stop decoding when it's full (you'll figure out the other states that you need). As a side note, I implemented such player only to find at some later time, that the AAC codec is broken on some devices (Xiaomi and Amazon come to mind), so I had to throw away the AMediaCodec/AMediaExtractor parts and use an AAC library instead.

like image 118
Roman Avatar answered Oct 21 '25 16:10

Roman