系列文章目錄

基于 CoreAudio 的音頻編解碼（一）：音頻解碼

基于 CoreAudio 的音頻編解碼（二）：音頻編碼

前言

在基于 CoreAudio 的音頻編解碼（一）：音頻解碼中，我們介紹了 Core Audio 中常見的資料結構和基本概念，如果你還沒有看過這些内容，最好去看一看。

Core Audio 表示音頻的資料的方式并不是告訴你 ”hi，這是個 mp3 檔案“ 那麼簡單。檔案格式和檔案内的音頻資料格式之間有很大的差別。

關于格式的很多内容看起來似乎很随意，但 Audio File Services 提供了一個有趣函數，叫做

AudioFileGetGlobalInfo

，它給出的資訊不是關于單個檔案，而是關于 Core Audio 對音頻檔案的總體處理。下面是

AudioFileGetGlobalInfo

可以查詢的資訊：

kAudioFileGlobalInfo_ReadableTypes					
kAudioFileGlobalInfo_WritableTypes					
kAudioFileGlobalInfo_FileTypeName					
kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
kAudioFileGlobalInfo_AvailableFormatIDs				

kAudioFileGlobalInfo_AllExtensions					
kAudioFileGlobalInfo_AllHFSTypeCodes				
kAudioFileGlobalInfo_AllUTIs						
kAudioFileGlobalInfo_AllMIMETypes					

kAudioFileGlobalInfo_ExtensionsForType				
kAudioFileGlobalInfo_HFSTypeCodesForType			
kAudioFileGlobalInfo_UTIsForType					
kAudioFileGlobalInfo_MIMETypesForType				

kAudioFileGlobalInfo_TypesForMIMEType				
kAudioFileGlobalInfo_TypesForUTI					
kAudioFileGlobalInfo_TypesForHFSTypeCode			
kAudioFileGlobalInfo_TypesForExtension

例如

kAudioFileGlobalInfo_AvailableFormatIDs

，當給定檔案類型(AudioFileTypeID），它傳回一組

FormatID

，表示目前檔案類型所支援的資料格式。

下面舉個例子，展示如何使用

AudioFileGetGlobalInfo

擷取想要的資訊。假設我們想知道當檔案類型是

kAudioFileMPEG4Type

時，所支援的格式有哪些，我們可以這麼做：

OSStatus err;
UInt32 file_type = kAudioFileMPEG4Type;
UInt32 size;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableFormatIDs,
                           sizeof(UInt32),
                           &file_type,
                           &size);

auto* formats = (UInt32*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableFormatIDs,
                             sizeof(UInt32),
                             &file_type,
                             &size,
                             formats);

int format_cnt = size / sizeof(UInt32);
for(int i = 0; i < format_cnt; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(formats[i]);
    cout << i << ": mFormatId: " << (char*)(&format4cc);
}

代碼輸出了十幾項，

kAudioFileMPEG4Type

所支援的格式類型相當豐富。

0: mFormatId: .mp1
1: mFormatId: .mp2
2: mFormatId: .mp3
3: mFormatId: aac 
4: mFormatId: aace
5: mFormatId: aacf
6: mFormatId: aacg
7: mFormatId: aach
8: mFormatId: aac
9: mFormatId: aacp	
10: mFormatId: ac-3
11: mFormatId: alac
12: mFormatId: ec-3
13: mFormatId: usac

如果是

kAudioFileAIFFType

呢？它支援一種格式：

0: mFormatId: lpcm

舉另一個例子，

kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat

，當給定檔案類型(AudioFileTypeID）和格式類型，它傳回一組

AudioStreamBasicDescription

并填寫以下字段：mFormatID、mFormatFlags、mBitsPerChannel。這些資訊對于寫入檔案非常有幫助，畢竟你肯定不想去茫茫文檔中找尋這些資訊。

AudioFileTypeAndFormatID  file_type_and_format_id;
file_type_and_format_id.mFileType = kAudioFileAIFFType;
file_type_and_format_id.mFormatID = kAudioFormatLinearPCM;

err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                                 sizeof(file_type_and_format_id),
                                 &file_type_and_format_id,
                                 &size);

auto  *asbds = (AudioStreamBasicDescription*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                             sizeof(file_type_and_format_id),
                             &file_type_and_format_id,
                             &size,
                             asbds);

int asbd_count = size / sizeof(AudioStreamBasicDescription);

for(int i = 0; i < asbd_count; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(asbds[i].mFormatID);
    cout << i << ": mFormatId: " << (char*)(&format4cc)
         << ", mFormatFlags: " << asbds[i].mFormatFlags
         << ", mChannelsPerFrame: " << asbds[i].mChannelsPerFrame
         << ", mBytesPerFrame: " << asbds[i].mBytesPerFrame
         << ", mBitsPerChannel: " << asbds[i].mBitsPerChannel << endl;
}

上述代碼中，指定檔案類型為

kAudioFileAIFFType

，資料格式為

kAudioFormatLinearPCM

，輸出為：

0: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 8
1: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 16
2: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 24
3: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 32

其輸出為表明了它支援 8、16、24、32位資料，其

mFormatFlags = 14

表示

0x2 + 0x4 + 0x8

，即

kAudioFormatFlagIsBigEndian | kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked

音頻編碼

在前言部分，我們介紹了如何利用

AudioFileGetGlobalInfo

擷取資訊，這在音頻編碼過程中非常重要，因為編碼時遵循以下幾個步驟：

确定檔案類型。你想要的檔案是啥類型的？wav，aiff 還是 aac 呢？
确定格式類型。不同的檔案類型支援的資料格式不同，可以通過 AudioFileGetGlobalInfo 和 kAudioFileGlobalInfo_AvailableFormatIDs 确定
合适的 mFormatFlags 和 mBitsPerChannel 。确定合适的 flags 和 bits 能夠確定打開檔案時不會出錯，可以通過 AudioFileGetGlobalInfo 和 kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat 來确定。

Show me the code

廢話不多說，直接上代碼，具體解釋在代碼後面。

int main(int argc, char* argv[])
{
    AudioFileTypeID file_type = kAudioFileMPEG4Type;
    int o_channels = 2;
    double o_sr = 44100;

    AudioStreamBasicDescription output_asbd;
    memset(&output_asbd, 0, sizeof(output_asbd));
    output_asbd.mSampleRate = o_sr;
    output_asbd.mChannelsPerFrame = o_channels;
    output_asbd.mFormatID = kAudioFormatMPEG4AAC;
    AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &output_asbd);

    // open output file
    CFURLRef output_url = createCFURLWithStdString("sin440.aac");
    ExtAudioFileRef output_file;
    OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                                &output_asbd, nullptr,
                                                kAudioFileFlags_EraseFile,
                                                &output_file);
    assert(status == noErr);
    double i_sr = 44100;
    double i_channels = 2;
    AudioStreamBasicDescription input_asbd;
    FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
    status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                     sizeof(input_asbd), &input_asbd);

    assert(status == noErr);

    const int num_frame_out_per_block = 1024;
    AudioBufferList outputData;
    outputData.mNumberBuffers = 1;
    outputData.mBuffers[0].mNumberChannels = i_channels;
    outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
    std::vector<float> buffer(num_frame_out_per_block * i_channels);
    outputData.mBuffers[0].mData = buffer.data();


    float t = 0;
    float tincr = 2 * M_PI * 440.0f / i_sr;
    for(int i = 0; i < 200; ++i){
        for(int j = 0; j < num_frame_out_per_block; ++j){
            buffer[j * i_channels] = sin(t);
            buffer[j * i_channels + 1] = buffer[j * i_channels];

            t += tincr;
        }

        // write audio block
        status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);

        assert(status == noErr);
    }

    ExtAudioFileDispose(output_file);

    return 0;
}

首先，我們建立

AudioStreamBasicDescription

，并指定其檔案類型為

kAudioFileMPEG4Type

，以及采樣率、聲道數和資料格式。其他部分通通置零，然後調用

AudioFormatGetProperty

來填充其他資訊，但如果是

kAudioFormatLinearPCM

，你最好應該使用

FillOutASBDForLPCM

來填充資訊。

AudioFileTypeID file_type = kAudioFileMPEG4Type;

AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &asbd);

接着，通過

ExtAudioFileCreateWithURL

建立并打開檔案，其中

kAudioFileFlags_EraseFile

表示将覆寫已有檔案進行建立。

CFURLRef output_url = createCFURLWithStdString("sin440.flac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                            &output_asbd, nullptr,
                                            kAudioFileFlags_EraseFile,
                                            &output_file);

接下來一步非常重要，通過

ExtAudioFileSetProperty

設定 client format，表明編碼檔案時，輸入的音頻資料格式是咋樣的。在這裡例子中，我們輸入的音頻資料格式為，雙聲道的interleave float。

AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                 sizeof(input_asbd), &input_asbd);

然後是建立

AudioBufferList

用于存放音頻資料。由于是 interleave float，是以

mNumberBuffers = 1

。

const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();

接下來進行音頻資料的寫入，示例中寫入的是 440hz 的正弦波。

float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
    for(int j = 0; j < num_frame_out_per_block; ++j){
        buffer[j * i_channels] = sin(t);
        buffer[j * i_channels + 1] = buffer[j * i_channels];

        t += tincr;
    }

    // write audio block
    status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
}

最後不要忘記釋放資源。

Q&A

如果輸入資料是 Planar 格式的要如何處理？

當

kAudioFormatFlagIsNonInterleaved

為

true

時，表示資料是 planar 格式，對此它有一段特别的注釋說明

//    Typically, when an ASBD is being used, the fields describe the complete layout
//    of the sample data in the buffers that are represented by this description -
//        where typically those buffers are represented by an AudioBuffer that is
//    contained in an AudioBufferList.
//
//        However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
//    AudioBufferList has a different structure and semantic. In this case, the ASBD
//    fields will describe the format of ONE of the AudioBuffers that are contained in
//    the list, AND each AudioBuffer in the list is determined to have a single (mono)
//    channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
//    total number of AudioBuffers that are contained within the AudioBufferList -
//        where each buffer contains one channel. This is used primarily with the
//    AudioUnit (and AudioConverter) representation of this list - and won't be found
//    in the AudioHardware usage of this structure.

這時候的 AudioBufferLists 的語義發生了變換，使用方式大緻如下：

int i_channels = 2;
    const int num_frame_out_per_block = 1024;
    AudioBufferList *outputData = (AudioBufferList*)malloc(sizeof(AudioBufferList) + (sizeof(AudioBuffer) * (i_channels - 1)));

    // if input_asbd inIsNonInterleaved is true(planar data), mNumberBuffers set to number of channels
    outputData->mNumberBuffers = i_channels;
    for(auto i = 0; i < i_channels; ++i){
        outputData->mBuffers[i].mNumberChannels = 1;
        outputData->mBuffers[i].mDataByteSize = sizeof(float) * num_frame_out_per_block;
        outputData->mBuffers[i].mData = new float[num_frame_out_per_block];
    }

總結

使用 Core Audio 進行音頻檔案編碼，最重要的是找到合适

AudioStreamBasicDescription

。通過

AudioFileGetGlobalInfo

，可以從檔案類型出發，找到合适的資料格式，最後在找到合适的

AudioStreamBasicDescription

。之後的工作隻要交給

ExtAudioFile

就能夠簡潔高效的完成。

完整代碼在 CoreAudioExtAudioFileExample

基于 CoreAudio 的音頻編解碼（二）：音頻編碼系列文章目錄前言音頻編碼Q&A總結

系列文章目錄

前言

音頻編碼

Show me the code

Q&A

如果輸入資料是 Planar 格式的要如何處理？

總結

繼續閱讀

C語言第四章自述2第四章選擇結構程式設計

面試題:vector和map的差別，異同。空間分布，100萬資料存哪個比較合适。一、疊代器差別二、vector三、Map、Set四、vector_map 為什麼比map效率高五、如何選擇六、容器選擇原則七、效率對比

C++ 多線程用條件變量确定線程的執行順序而不是使用 sleep(1)

POJ 1284 Primitive Roots (歐拉函數&原根定理)

CQ V1.0分詞bates(基于雙數組tire樹)—應該是目前最快的中文分詞算法

成員函數初始化清單

2021-08-13c++——類之操作符重載

swmm與lisflood-fp源碼如何一起編譯 CMake指令

Windows下VS開發環境環境安裝工程項目設定關于Debug和Release的提示

一文看懂字元串的加減乘除

C++ 第十五周報告1--《冒泡法排序》

C++實作簡單順序表

C經典書籍筆記——C陷阱與缺陷②(文法陷阱之優先級)一、錯誤案列二、優先級規律

線性表之順序表的實作

C++判斷素數、求最大公約數代碼判斷一個數是否為素數求兩個數的最大公約數

SequoiaDB巨杉資料庫C++驅動概述

基于 CoreAudio 的音頻編解碼（二）：音頻編碼系列文章目錄前言音頻編碼Q&amp;A總結

系列文章目錄

前言

音頻編碼

Show me the code

Q&A

如果輸入資料是 Planar 格式的要如何處理？

總結

繼續閱讀

基于 CoreAudio 的音頻編解碼（二）：音頻編碼系列文章目錄前言音頻編碼Q&A總結