系列文章目錄
- 基于 CoreAudio 的音頻編解碼(一):音頻解碼
- 基于 CoreAudio 的音頻編解碼(二):音頻編碼
前言
在 基于 CoreAudio 的音頻編解碼(一):音頻解碼 中,我們介紹了 Core Audio 中常見的資料結構和基本概念,如果你還沒有看過這些内容,最好去看一看。
Core Audio 表示音頻的資料的方式并不是告訴你 ”hi,這是個 mp3 檔案“ 那麼簡單。檔案格式和檔案内的音頻資料格式之間有很大的差別。
關于格式的很多内容看起來似乎很随意,但 Audio File Services 提供了一個有趣函數,叫做
AudioFileGetGlobalInfo
,它給出的資訊不是關于單個檔案,而是關于 Core Audio 對音頻檔案的總體處理。下面是
AudioFileGetGlobalInfo
可以查詢的資訊:
kAudioFileGlobalInfo_ReadableTypes
kAudioFileGlobalInfo_WritableTypes
kAudioFileGlobalInfo_FileTypeName
kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
kAudioFileGlobalInfo_AvailableFormatIDs
kAudioFileGlobalInfo_AllExtensions
kAudioFileGlobalInfo_AllHFSTypeCodes
kAudioFileGlobalInfo_AllUTIs
kAudioFileGlobalInfo_AllMIMETypes
kAudioFileGlobalInfo_ExtensionsForType
kAudioFileGlobalInfo_HFSTypeCodesForType
kAudioFileGlobalInfo_UTIsForType
kAudioFileGlobalInfo_MIMETypesForType
kAudioFileGlobalInfo_TypesForMIMEType
kAudioFileGlobalInfo_TypesForUTI
kAudioFileGlobalInfo_TypesForHFSTypeCode
kAudioFileGlobalInfo_TypesForExtension
例如
kAudioFileGlobalInfo_AvailableFormatIDs
,當給定檔案類型(AudioFileTypeID),它傳回一組
FormatID
,表示目前檔案類型所支援的資料格式。
下面舉個例子,展示如何使用
AudioFileGetGlobalInfo
擷取想要的資訊。假設我們想知道當檔案類型是
kAudioFileMPEG4Type
時,所支援的格式有哪些,我們可以這麼做:
OSStatus err;
UInt32 file_type = kAudioFileMPEG4Type;
UInt32 size;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableFormatIDs,
sizeof(UInt32),
&file_type,
&size);
auto* formats = (UInt32*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableFormatIDs,
sizeof(UInt32),
&file_type,
&size,
formats);
int format_cnt = size / sizeof(UInt32);
for(int i = 0; i < format_cnt; ++i){
UInt32 format4cc = CFSwapInt32HostToBig(formats[i]);
cout << i << ": mFormatId: " << (char*)(&format4cc);
}
代碼輸出了十幾項,
kAudioFileMPEG4Type
所支援的格式類型相當豐富。
0: mFormatId: .mp1
1: mFormatId: .mp2
2: mFormatId: .mp3
3: mFormatId: aac
4: mFormatId: aace
5: mFormatId: aacf
6: mFormatId: aacg
7: mFormatId: aach
8: mFormatId: aac
9: mFormatId: aacp
10: mFormatId: ac-3
11: mFormatId: alac
12: mFormatId: ec-3
13: mFormatId: usac
如果是
kAudioFileAIFFType
呢?它支援一種格式:
0: mFormatId: lpcm
舉另一個例子,
kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
,當給定檔案類型(AudioFileTypeID)和格式類型,它傳回一組
AudioStreamBasicDescription
并填寫以下字段:mFormatID、mFormatFlags、mBitsPerChannel。這些資訊對于寫入檔案非常有幫助,畢竟你肯定不想去茫茫文檔中找尋這些資訊。
AudioFileTypeAndFormatID file_type_and_format_id;
file_type_and_format_id.mFileType = kAudioFileAIFFType;
file_type_and_format_id.mFormatID = kAudioFormatLinearPCM;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
sizeof(file_type_and_format_id),
&file_type_and_format_id,
&size);
auto *asbds = (AudioStreamBasicDescription*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
sizeof(file_type_and_format_id),
&file_type_and_format_id,
&size,
asbds);
int asbd_count = size / sizeof(AudioStreamBasicDescription);
for(int i = 0; i < asbd_count; ++i){
UInt32 format4cc = CFSwapInt32HostToBig(asbds[i].mFormatID);
cout << i << ": mFormatId: " << (char*)(&format4cc)
<< ", mFormatFlags: " << asbds[i].mFormatFlags
<< ", mChannelsPerFrame: " << asbds[i].mChannelsPerFrame
<< ", mBytesPerFrame: " << asbds[i].mBytesPerFrame
<< ", mBitsPerChannel: " << asbds[i].mBitsPerChannel << endl;
}
上述代碼中,指定檔案類型為
kAudioFileAIFFType
,資料格式為
kAudioFormatLinearPCM
,輸出為:
0: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 8
1: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 16
2: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 24
3: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 32
其輸出為表明了它支援 8、16、24、32位資料,其
mFormatFlags = 14
表示
0x2 + 0x4 + 0x8
,即
kAudioFormatFlagIsBigEndian | kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
音頻編碼
在前言部分,我們介紹了如何利用
AudioFileGetGlobalInfo
擷取資訊,這在音頻編碼過程中非常重要,因為編碼時遵循以下幾個步驟:
- 确定檔案類型。你想要的檔案是啥類型的?wav,aiff 還是 aac 呢?
- 确定格式類型。不同的檔案類型支援的資料格式不同,可以通過
和AudioFileGetGlobalInfo
确定kAudioFileGlobalInfo_AvailableFormatIDs
- 合适的
和mFormatFlags
。确定合适的 flags 和 bits 能夠確定打開檔案時不會出錯,可以通過mBitsPerChannel
和AudioFileGetGlobalInfo
來确定。kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
Show me the code
廢話不多說,直接上代碼,具體解釋在代碼後面。
int main(int argc, char* argv[])
{
AudioFileTypeID file_type = kAudioFileMPEG4Type;
int o_channels = 2;
double o_sr = 44100;
AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &output_asbd);
// open output file
CFURLRef output_url = createCFURLWithStdString("sin440.aac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
&output_asbd, nullptr,
kAudioFileFlags_EraseFile,
&output_file);
assert(status == noErr);
double i_sr = 44100;
double i_channels = 2;
AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
sizeof(input_asbd), &input_asbd);
assert(status == noErr);
const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();
float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
for(int j = 0; j < num_frame_out_per_block; ++j){
buffer[j * i_channels] = sin(t);
buffer[j * i_channels + 1] = buffer[j * i_channels];
t += tincr;
}
// write audio block
status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
assert(status == noErr);
}
ExtAudioFileDispose(output_file);
return 0;
}
首先,我們建立
AudioStreamBasicDescription
,并指定其檔案類型為
kAudioFileMPEG4Type
,以及采樣率、聲道數和資料格式。其他部分通通置零,然後調用
AudioFormatGetProperty
來填充其他資訊, 但如果是
kAudioFormatLinearPCM
,你最好應該使用
FillOutASBDForLPCM
來填充資訊。
AudioFileTypeID file_type = kAudioFileMPEG4Type;
AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &asbd);
接着,通過
ExtAudioFileCreateWithURL
建立并打開檔案,其中
kAudioFileFlags_EraseFile
表示将覆寫已有檔案進行建立。
CFURLRef output_url = createCFURLWithStdString("sin440.flac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
&output_asbd, nullptr,
kAudioFileFlags_EraseFile,
&output_file);
接下來一步非常重要,通過
ExtAudioFileSetProperty
設定 client format,表明編碼檔案時,輸入的音頻資料格式是咋樣的。在這裡例子中,我們輸入的音頻資料格式為,雙聲道的interleave float。
AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
sizeof(input_asbd), &input_asbd);
然後是建立
AudioBufferList
用于存放音頻資料。由于是 interleave float,是以
mNumberBuffers = 1
。
const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();
接下來進行音頻資料的寫入,示例中寫入的是 440hz 的正弦波。
float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
for(int j = 0; j < num_frame_out_per_block; ++j){
buffer[j * i_channels] = sin(t);
buffer[j * i_channels + 1] = buffer[j * i_channels];
t += tincr;
}
// write audio block
status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
}
最後不要忘記釋放資源。
Q&A
如果輸入資料是 Planar 格式的要如何處理?
當
kAudioFormatFlagIsNonInterleaved
為
true
時,表示資料是 planar 格式,對此它有一段特别的注釋說明
// Typically, when an ASBD is being used, the fields describe the complete layout
// of the sample data in the buffers that are represented by this description -
// where typically those buffers are represented by an AudioBuffer that is
// contained in an AudioBufferList.
//
// However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
// AudioBufferList has a different structure and semantic. In this case, the ASBD
// fields will describe the format of ONE of the AudioBuffers that are contained in
// the list, AND each AudioBuffer in the list is determined to have a single (mono)
// channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
// total number of AudioBuffers that are contained within the AudioBufferList -
// where each buffer contains one channel. This is used primarily with the
// AudioUnit (and AudioConverter) representation of this list - and won't be found
// in the AudioHardware usage of this structure.
這時候的 AudioBufferLists 的語義發生了變換,使用方式大緻如下:
int i_channels = 2;
const int num_frame_out_per_block = 1024;
AudioBufferList *outputData = (AudioBufferList*)malloc(sizeof(AudioBufferList) + (sizeof(AudioBuffer) * (i_channels - 1)));
// if input_asbd inIsNonInterleaved is true(planar data), mNumberBuffers set to number of channels
outputData->mNumberBuffers = i_channels;
for(auto i = 0; i < i_channels; ++i){
outputData->mBuffers[i].mNumberChannels = 1;
outputData->mBuffers[i].mDataByteSize = sizeof(float) * num_frame_out_per_block;
outputData->mBuffers[i].mData = new float[num_frame_out_per_block];
}
總結
使用 Core Audio 進行音頻檔案編碼,最重要的是找到合适
AudioStreamBasicDescription
。通過
AudioFileGetGlobalInfo
,可以從檔案類型出發,找到合适的資料格式,最後在找到合适的
AudioStreamBasicDescription
。之後的工作隻要交給
ExtAudioFile
就能夠簡潔高效的完成。
完整代碼在 CoreAudioExtAudioFileExample