天天看點

基于 CoreAudio 的音頻編解碼(二):音頻編碼系列文章目錄前言音頻編碼Q&A總結

系列文章目錄

  • 基于 CoreAudio 的音頻編解碼(一):音頻解碼
  • 基于 CoreAudio 的音頻編解碼(二):音頻編碼

前言

在 基于 CoreAudio 的音頻編解碼(一):音頻解碼 中,我們介紹了 Core Audio 中常見的資料結構和基本概念,如果你還沒有看過這些内容,最好去看一看。

Core Audio 表示音頻的資料的方式并不是告訴你 ”hi,這是個 mp3 檔案“ 那麼簡單。檔案格式和檔案内的音頻資料格式之間有很大的差別。

關于格式的很多内容看起來似乎很随意,但 Audio File Services 提供了一個有趣函數,叫做

AudioFileGetGlobalInfo

,它給出的資訊不是關于單個檔案,而是關于 Core Audio 對音頻檔案的總體處理。下面是

AudioFileGetGlobalInfo

可以查詢的資訊:

kAudioFileGlobalInfo_ReadableTypes					
kAudioFileGlobalInfo_WritableTypes					
kAudioFileGlobalInfo_FileTypeName					
kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
kAudioFileGlobalInfo_AvailableFormatIDs				

kAudioFileGlobalInfo_AllExtensions					
kAudioFileGlobalInfo_AllHFSTypeCodes				
kAudioFileGlobalInfo_AllUTIs						
kAudioFileGlobalInfo_AllMIMETypes					

kAudioFileGlobalInfo_ExtensionsForType				
kAudioFileGlobalInfo_HFSTypeCodesForType			
kAudioFileGlobalInfo_UTIsForType					
kAudioFileGlobalInfo_MIMETypesForType				

kAudioFileGlobalInfo_TypesForMIMEType				
kAudioFileGlobalInfo_TypesForUTI					
kAudioFileGlobalInfo_TypesForHFSTypeCode			
kAudioFileGlobalInfo_TypesForExtension
           

例如

kAudioFileGlobalInfo_AvailableFormatIDs

,當給定檔案類型(AudioFileTypeID),它傳回一組

FormatID

,表示目前檔案類型所支援的資料格式。

下面舉個例子,展示如何使用

AudioFileGetGlobalInfo

擷取想要的資訊。假設我們想知道當檔案類型是

kAudioFileMPEG4Type

時,所支援的格式有哪些,我們可以這麼做:

OSStatus err;
UInt32 file_type = kAudioFileMPEG4Type;
UInt32 size;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableFormatIDs,
                           sizeof(UInt32),
                           &file_type,
                           &size);

auto* formats = (UInt32*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableFormatIDs,
                             sizeof(UInt32),
                             &file_type,
                             &size,
                             formats);

int format_cnt = size / sizeof(UInt32);
for(int i = 0; i < format_cnt; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(formats[i]);
    cout << i << ": mFormatId: " << (char*)(&format4cc);
}
           

代碼輸出了十幾項,

kAudioFileMPEG4Type

所支援的格式類型相當豐富。

0: mFormatId: .mp1
1: mFormatId: .mp2
2: mFormatId: .mp3
3: mFormatId: aac 
4: mFormatId: aace
5: mFormatId: aacf
6: mFormatId: aacg
7: mFormatId: aach
8: mFormatId: aac
9: mFormatId: aacp	
10: mFormatId: ac-3
11: mFormatId: alac
12: mFormatId: ec-3
13: mFormatId: usac
           

如果是

kAudioFileAIFFType

呢?它支援一種格式:

0: mFormatId: lpcm
           

舉另一個例子,

kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat

,當給定檔案類型(AudioFileTypeID)和格式類型,它傳回一組

AudioStreamBasicDescription

并填寫以下字段:mFormatID、mFormatFlags、mBitsPerChannel。這些資訊對于寫入檔案非常有幫助,畢竟你肯定不想去茫茫文檔中找尋這些資訊。

AudioFileTypeAndFormatID  file_type_and_format_id;
file_type_and_format_id.mFileType = kAudioFileAIFFType;
file_type_and_format_id.mFormatID = kAudioFormatLinearPCM;

err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                                 sizeof(file_type_and_format_id),
                                 &file_type_and_format_id,
                                 &size);

auto  *asbds = (AudioStreamBasicDescription*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                             sizeof(file_type_and_format_id),
                             &file_type_and_format_id,
                             &size,
                             asbds);

int asbd_count = size / sizeof(AudioStreamBasicDescription);

for(int i = 0; i < asbd_count; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(asbds[i].mFormatID);
    cout << i << ": mFormatId: " << (char*)(&format4cc)
         << ", mFormatFlags: " << asbds[i].mFormatFlags
         << ", mChannelsPerFrame: " << asbds[i].mChannelsPerFrame
         << ", mBytesPerFrame: " << asbds[i].mBytesPerFrame
         << ", mBitsPerChannel: " << asbds[i].mBitsPerChannel << endl;
}
           

上述代碼中,指定檔案類型為

kAudioFileAIFFType

,資料格式為

kAudioFormatLinearPCM

,輸出為:

0: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 8
1: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 16
2: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 24
3: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 32
           

其輸出為表明了它支援 8、16、24、32位資料,其

mFormatFlags = 14

表示

0x2 + 0x4 + 0x8

,即

kAudioFormatFlagIsBigEndian | kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
           

音頻編碼

在前言部分,我們介紹了如何利用

AudioFileGetGlobalInfo

擷取資訊,這在音頻編碼過程中非常重要,因為編碼時遵循以下幾個步驟:

  1. 确定檔案類型。你想要的檔案是啥類型的?wav,aiff 還是 aac 呢?
  2. 确定格式類型。不同的檔案類型支援的資料格式不同,可以通過

    AudioFileGetGlobalInfo

    kAudioFileGlobalInfo_AvailableFormatIDs

    确定
  3. 合适的

    mFormatFlags

    mBitsPerChannel

    。确定合适的 flags 和 bits 能夠確定打開檔案時不會出錯,可以通過

    AudioFileGetGlobalInfo

    kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat

    來确定。

Show me the code

廢話不多說,直接上代碼,具體解釋在代碼後面。

int main(int argc, char* argv[])
{
    AudioFileTypeID file_type = kAudioFileMPEG4Type;
    int o_channels = 2;
    double o_sr = 44100;

    AudioStreamBasicDescription output_asbd;
    memset(&output_asbd, 0, sizeof(output_asbd));
    output_asbd.mSampleRate = o_sr;
    output_asbd.mChannelsPerFrame = o_channels;
    output_asbd.mFormatID = kAudioFormatMPEG4AAC;
    AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &output_asbd);

    // open output file
    CFURLRef output_url = createCFURLWithStdString("sin440.aac");
    ExtAudioFileRef output_file;
    OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                                &output_asbd, nullptr,
                                                kAudioFileFlags_EraseFile,
                                                &output_file);
    assert(status == noErr);
    double i_sr = 44100;
    double i_channels = 2;
    AudioStreamBasicDescription input_asbd;
    FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
    status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                     sizeof(input_asbd), &input_asbd);

    assert(status == noErr);

    const int num_frame_out_per_block = 1024;
    AudioBufferList outputData;
    outputData.mNumberBuffers = 1;
    outputData.mBuffers[0].mNumberChannels = i_channels;
    outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
    std::vector<float> buffer(num_frame_out_per_block * i_channels);
    outputData.mBuffers[0].mData = buffer.data();


    float t = 0;
    float tincr = 2 * M_PI * 440.0f / i_sr;
    for(int i = 0; i < 200; ++i){
        for(int j = 0; j < num_frame_out_per_block; ++j){
            buffer[j * i_channels] = sin(t);
            buffer[j * i_channels + 1] = buffer[j * i_channels];

            t += tincr;
        }

        // write audio block
        status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);

        assert(status == noErr);
    }

    ExtAudioFileDispose(output_file);

    return 0;
}
           

首先,我們建立

AudioStreamBasicDescription

,并指定其檔案類型為

kAudioFileMPEG4Type

,以及采樣率、聲道數和資料格式。其他部分通通置零,然後調用

AudioFormatGetProperty

來填充其他資訊, 但如果是

kAudioFormatLinearPCM

,你最好應該使用

FillOutASBDForLPCM

來填充資訊。

AudioFileTypeID file_type = kAudioFileMPEG4Type;

AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &asbd);
           

接着,通過

ExtAudioFileCreateWithURL

建立并打開檔案,其中

kAudioFileFlags_EraseFile

表示将覆寫已有檔案進行建立。

CFURLRef output_url = createCFURLWithStdString("sin440.flac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                            &output_asbd, nullptr,
                                            kAudioFileFlags_EraseFile,
                                            &output_file);
           

接下來一步非常重要,通過

ExtAudioFileSetProperty

設定 client format,表明編碼檔案時,輸入的音頻資料格式是咋樣的。在這裡例子中,我們輸入的音頻資料格式為,雙聲道的interleave float。

AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                 sizeof(input_asbd), &input_asbd);
           

然後是建立

AudioBufferList

用于存放音頻資料。由于是 interleave float,是以

mNumberBuffers = 1

const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();
           

接下來進行音頻資料的寫入,示例中寫入的是 440hz 的正弦波。

float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
    for(int j = 0; j < num_frame_out_per_block; ++j){
        buffer[j * i_channels] = sin(t);
        buffer[j * i_channels + 1] = buffer[j * i_channels];

        t += tincr;
    }

    // write audio block
    status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
}
           

最後不要忘記釋放資源。

Q&A

如果輸入資料是 Planar 格式的要如何處理?

kAudioFormatFlagIsNonInterleaved

true

時,表示資料是 planar 格式,對此它有一段特别的注釋說明

//    Typically, when an ASBD is being used, the fields describe the complete layout
//    of the sample data in the buffers that are represented by this description -
//        where typically those buffers are represented by an AudioBuffer that is
//    contained in an AudioBufferList.
//
//        However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
//    AudioBufferList has a different structure and semantic. In this case, the ASBD
//    fields will describe the format of ONE of the AudioBuffers that are contained in
//    the list, AND each AudioBuffer in the list is determined to have a single (mono)
//    channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
//    total number of AudioBuffers that are contained within the AudioBufferList -
//        where each buffer contains one channel. This is used primarily with the
//    AudioUnit (and AudioConverter) representation of this list - and won't be found
//    in the AudioHardware usage of this structure.
           

這時候的 AudioBufferLists 的語義發生了變換,使用方式大緻如下:

int i_channels = 2;
    const int num_frame_out_per_block = 1024;
    AudioBufferList *outputData = (AudioBufferList*)malloc(sizeof(AudioBufferList) + (sizeof(AudioBuffer) * (i_channels - 1)));

    // if input_asbd inIsNonInterleaved is true(planar data), mNumberBuffers set to number of channels
    outputData->mNumberBuffers = i_channels;
    for(auto i = 0; i < i_channels; ++i){
        outputData->mBuffers[i].mNumberChannels = 1;
        outputData->mBuffers[i].mDataByteSize = sizeof(float) * num_frame_out_per_block;
        outputData->mBuffers[i].mData = new float[num_frame_out_per_block];
    }
           

總結

使用 Core Audio 進行音頻檔案編碼,最重要的是找到合适

AudioStreamBasicDescription

。通過

AudioFileGetGlobalInfo

,可以從檔案類型出發,找到合适的資料格式,最後在找到合适的

AudioStreamBasicDescription

。之後的工作隻要交給

ExtAudioFile

就能夠簡潔高效的完成。

完整代碼在 CoreAudioExtAudioFileExample

繼續閱讀