ffmpeg sdl 播放器實作（音視訊同步實作）

這篇文章要結合

https://blog.csdn.net/qq_15255121/article/details/117327999?spm=1001.2014.3001.5501來看

音視訊同步要用到時間戳。

時間戳的類型：

PTS：presentation timestamp 展示時間戳

DTS：Decoding timestamp 解碼時間戳

存在這兩個時間戳的主要原因是因為存在I B P幀。導緻展示順序和解碼順序不一緻。

I 幀關鍵幀

P幀向前參考幀

B幀前後參考幀

具體可以參考https://blog.csdn.net/qq_15255121/article/details/115494817

從哪裡擷取到PTS

AVPacket中擷取PTS

AVFrame中擷取PTS。音視訊同步我們一般使用解碼後的PTS。

時間基（顧名思義就是時間的刻度）

tbr 幀率如果tbr為25，那麼時間基就是1/25秒

tbn time base of stream

tbc time base of codec

PTS如何轉化為真正的時間呢？（pts可以了解為多少個目前的時間刻度）

av_q2d就是将時間基轉為每個刻度是多少秒。

PTS=PTS * av_q2d(video_stream->time_base)

音視訊同步的方式

視訊同步到音頻

音頻同步到視訊

音頻和視訊都同步到系統時鐘

我們要想同步，必須有一個基礎時鐘。這個基礎時鐘既可以是視訊，也可是音頻，還可以是系統時間。

我們這裡用音頻作為基礎時間。音頻時間的計算方式

is->audio_clock += (double)data_size /(double)(out_channel * av_get_bytes_per_sample(out_format) * is->audio_ctx->sample_rate);

data_size 是每幀資料的資料大小

out_channel * av_get_bytes_per_sample(out_format) * is->audio_ctx->sample_rate 計算出的是音頻裝置1秒中要處理的資料。聲道數 * 每個樣本的位元組數 * 采樣率

那麼(double)data_size /(double)(out_channel * av_get_bytes_per_sample(out_format) * is->audio_ctx->sample_rate) 得到的結果是處理這個資料需要多少時間。

通過

is->audio_clock += (double)data_size /(double)(out_channel * av_get_bytes_per_sample(out_format) * is->audio_ctx->sample_rate);

計算出is->audio_clock就是處理了目前音頻資料後的音頻時鐘。

擷取音頻時鐘：

double get_audio_clock(VideoState *is) {
  double pts;
  int hw_buf_size;
  int bytes_per_sec;//每秒處理的資料數
  pts = is->audio_clock; /* maintained in the audio thread */
  hw_buf_size = is->audio_buf_size - is->audio_buf_index;
  bytes_per_sec = 0;
  if(is->audio_st) {
    bytes_per_sec = is->audio_ctx->sample_rate * out_channel * av_get_bytes_per_sample(out_format);
  }
  if(bytes_per_sec) {
    pts -= (double)hw_buf_size / bytes_per_sec;
  }
  return pts;
}

由于is->audio_clock代表的是播放了目前音頻資料的時鐘，是以擷取目前時鐘要減掉未播放的資料占用的時間。

視訊渲染要參考音頻的時鐘

視訊的初始時間我們在解複用的第一幀時标記為目前系統時間

int stream_component_open(VideoState *is, int stream_index)

{



.......



switch (codecCtx->codec_type)

{

case AVMEDIA_TYPE_AUDIO:

........

break;



case AVMEDIA_TYPE_VIDEO:

is->video_st = pFormatCtx->streams[stream_index];

is->video_ctx = codecCtx;

is->frame_timer = (double)av_gettime()/1000000.0;

is->frame_last_delay = 40e-3;

packet_queue_init(&is->videoq);

is->video_tid = SDL_CreateThread(video_thread, "video_thread", is);

is->sws_ctx = sws_getContext(is->video_ctx->width,

is->video_ctx->height,

is->video_ctx->pix_fmt,

is->video_ctx->width,

is->video_ctx->height,

out_pix_foramt,

SWS_BILINEAR,

NULL, NULL, NULL);

break;

default:

break;

}



return 0;

}

is->frame_timer = (double)av_gettime()/1000000.0; 用目前時間初始化了基礎時間

我們要知道下一幀視訊幀的渲染時間，就必須知道下一幀視訊幀的pts。

。。。。。。。

AVFrame best_effort_timestamp 預測下一幀最佳pts

pts *= av_q2d(is->video_st->time_base);會計算出視訊渲染的具體時間

渲染代碼

以音頻作為基礎時鐘，視訊就要和音頻的時鐘實時對比。如果目前視訊幀的渲染時刻小于音頻時鐘，視訊幀要立即顯示或者丢棄。如果目前幀的渲染時刻大于音頻時鐘，那麼我們就要延遲一段時間，使得延遲後的視訊時刻大于等于音頻時鐘。

void video_refresh_timer(void *userdata)
{

      。。。。。。
      vp = &is->pictq[is->pictq_rindex];
      delay = vp->pts - is->frame_last_pts;
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer vp->pts=%f  is->frame_last_pts=%f delay=%f\n", vp->pts, is->frame_last_pts, delay);
      if(delay <= 0 || delay >= 1.0){
        delay = is->frame_last_delay;
      }
      is->frame_last_delay = delay;
      is->frame_last_pts = vp->pts;

      ref_clock = get_audio_clock(is); //擷取參考時鐘，目前音頻的播放時間
      diff = vp->pts - ref_clock;
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer ref_clock=%f, diff=%f\n", ref_clock,  diff);

      sync_threshold = (delay > AV_SYNC_THRESHOLD)?delay:AV_SYNC_THRESHOLD;
      if(fabs(diff) < AV_NOSYNC_THRESHOLD){
         if(diff <= -sync_threshold){
           delay = 0;
         }else if(diff >= sync_threshold){
           delay = 2 * delay;
         }
      }
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer delay=%f, is->frame_timer=%f\n", delay,  is->frame_timer);

      is->frame_timer += delay;
      double current_time = av_gettime()/1000000.0;
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer 11111 is->frame_timer=%f, current_time=%f\n", is->frame_timer, current_time);

      actual_delay = is->frame_timer - current_time;//av_gettime得到的是微妙
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer 222222 actual_delay=%f\n", actual_delay);

      if(actual_delay < AV_SYNC_THRESHOLD ){
        actual_delay = AV_SYNC_THRESHOLD;
      }

      /* Now, normally here goes a ton of code
	 about timing, etc. we're just going to
	 guess at a delay for now. You can
	 increase and decrease this value and hard code
	 the timing - but I don't suggest that ;)
	 We'll learn how to do it for real later.
      */
      schedule_refresh(is, actual_delay * 1000 + 0.5);

      /* show the picture! */
      video_display(is);

    。。。。。。
}

1、先計算出目前幀需要播放多長時間，vp->pts下一幀播放時刻， is->frame_last_pts目前幀的播放時刻

delay = vp->pts - is->frame_last_pts;

2、擷取目前音頻時鐘

ref_clock = get_audio_clock(is); //擷取參考時鐘，目前音頻的播放時刻

3、下一幀視訊幀和音頻幀內插補點計算

diff = vp->pts - ref_clock;

4、計算目前視訊幀播放時間的門檻值

sync_threshold = (delay > AV_SYNC_THRESHOLD)?delay:AV_SYNC_THRESHOLD;

從這裡可以知道目前視訊幀最少也要播放AV_SYNC_THRESHOLD 10ms

5、第3步計算的內插補點

if(fabs(diff) < AV_NOSYNC_THRESHOLD){

if(diff - sync_threshold <= 0){

delay = 0;

}else if(diff >= sync_threshold){

delay = 2 * delay;

}

}

如果小于計算出的播放門檻值，那麼說明目前視訊如果按照delay來播放，還沒有播放完畢就要播放下一幀。是以我們要将delay指派為0

如果內插補點大于我們計算的門檻值，那麼我們就要遲一點渲染下一幀。是以也就有了 delay = 2 * delay;

6、計算真正的下一幀渲染時間

is->frame_timer += delay; //下一幀渲染的時間

double current_time = av_gettime()/1000000.0;

actual_delay = is->frame_timer - current_time; //計算出和目前時間相比要多長時間後開始渲染

if(actual_delay < AV_SYNC_THRESHOLD ){

actual_delay = AV_SYNC_THRESHOLD;

}

schedule_refresh(is, actual_delay * 1000 + 0.5);

這樣我們就把音視訊同步給完成了

下面是完整的代碼

#include <stdio.h>
#include <assert.h>
#include <math.h>

#include <SDL2/SDL.h>
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
#include <libswresample/swresample.h>
#include <libavutil/samplefmt.h>
#include <libavutil/imgutils.h>
#include <libavutil/mem.h>
#include <libavutil/time.h>

#define MAX_AUDIO_FRAME_SIZE 192000

#define MAX_AUDIOQ_SIZE (5 * 16 * 1024)
#define MAX_VIDEOQ_SIZE (5 * 256 * 1024)

#define FF_REFRESH_EVENT SDL_USEREVENT
#define FF_QUIT_EVENT SDL_USEREVENT + 1
#define REFRESH_TIME 45

#define VIDEO_PICTURE_QUEUE_SIZE 1
#define AV_SYNC_THRESHOLD 0.01
#define AV_NOSYNC_THRESHOLD 10.0
static enum AVPixelFormat out_yuv_foramt = AV_PIX_FMT_YUV420P;

typedef struct PacketQueue
{
  AVPacketList *first_pkt, *last_pkt;
  int nb_packets;
  int size;
  SDL_mutex *mutex;
  SDL_cond *cond;
  int total;
  int end;
  int useCout;
  int flash;
} PacketQueue;

typedef struct VideoPicture
{
  AVFrame *yuv_frame;
  int width, height;
  int allocated;
  double pts; //目前視訊幀的pts
} VideoPicture;

typedef struct VideoState
{
  char filename[1024];
  AVFormatContext *pFormatCtx;
  int videoStream, audioStream;

  //audio
  AVStream *audio_st;
  AVCodecContext *audio_ctx;
  PacketQueue audioq;
  uint8_t audio_buf[(MAX_AUDIO_FRAME_SIZE * 3) / 2];
  unsigned int audio_buf_size;
  unsigned int audio_buf_index;
  struct SwrContext *audio_swr_ctx;

  //video
  AVStream *video_st;
  AVCodecContext *video_ctx;
  PacketQueue videoq;
  struct SwsContext *sws_ctx;

  VideoPicture pictq[VIDEO_PICTURE_QUEUE_SIZE];
  int pictq_size, pictq_rindex, pictq_windex;

  //for thread
  SDL_mutex *pictq_mutex;
  SDL_cond *pictq_cond;

  SDL_Thread *parse_tid;
  SDL_Thread *video_tid;

  double audio_clock; //音頻播放的時間
  double video_clock; //下一幀視訊播放的時間
  double frame_timer; //下一次渲染要回調的時間是多少
  double frame_last_pts; //上一次視訊幀的PTS
  double frame_last_delay;//上一次視訊幀增加的delay


  int quit;
} VideoState;

//SDL_mutex       *texture_mutex;
SDL_Window *win;
SDL_Renderer *renderer;
SDL_Texture *texture;

VideoState *global_video_state;
static Uint8 out_channel = 2;
static enum AVSampleFormat out_format = AV_SAMPLE_FMT_S16;
static int out_nb_samples = 0; //一般情況下輸入音頻的采樣個數要等于輸出音頻的采樣個數
static int out_sample_rate = 0;
static enum AVPixelFormat out_pix_foramt = AV_PIX_FMT_YUV420P;

void packet_queue_init(PacketQueue *q)
{

  memset(q, 0, sizeof(PacketQueue));
  q->mutex = SDL_CreateMutex();
  q->cond = SDL_CreateCond();
}

int packet_queue_put(PacketQueue *q, AVPacket *srcpkt)
{
  AVPacket *pkt = av_packet_alloc();
  AVPacketList *pkt1;
  if (av_packet_ref(pkt, srcpkt) < 0)
  {
    return -1;
  }
  pkt1 = av_malloc(sizeof(AVPacketList));
  if (!pkt1)
    return -1;
  pkt1->pkt = *pkt;
  pkt1->next = NULL;

  SDL_LockMutex(q->mutex);

  if (!q->last_pkt)
    q->first_pkt = pkt1;
  else
    q->last_pkt->next = pkt1;
  q->last_pkt = pkt1;
  q->nb_packets++;
  q->size += pkt1->pkt.size;
  //fprintf(stderr, "enqueue, packets:%d, send cond signal\n", q->nb_packets);
  SDL_CondSignal(q->cond);

  SDL_UnlockMutex(q->mutex);
  return 0;
}

int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block)
{
  AVPacketList *pkt1;
  int ret;

  SDL_LockMutex(q->mutex);

  for (;;)
  {

    if (global_video_state->quit)
    {
      fprintf(stderr, "quit from queue_get\n");
      ret = -1;
      break;
    }

    pkt1 = q->first_pkt;
    if (pkt1)
    {
      q->first_pkt = pkt1->next;
      if (!q->first_pkt)
        q->last_pkt = NULL;
      q->nb_packets--;
      q->size -= pkt1->pkt.size;
      *pkt = pkt1->pkt;
      av_free(pkt1);
      ret = 1;
      break;
    }
    else if (!block)
    {
      ret = 0;
      break;
    }
    else if (!(q->end))
    {
      fprintf(stderr, "queue is empty, so wait a moment and wait a cond signal\n");
      SDL_CondWait(q->cond, q->mutex);
    }
    else
    {
      ret = -1;
      break;
    }
  }
  SDL_UnlockMutex(q->mutex);
  return ret;
}

int audio_decode_frame(VideoState *is, uint8_t *audio_buf, int buf_size, double *pts_ptr)
{

  static AVPacket pkt;
  static uint8_t *audio_pkt_data = NULL;
  static int audio_pkt_size = 0;
  static AVFrame frame;
  int data_size = 0;
  int ret = 0;
  av_init_packet(&pkt);
  pkt.data = NULL;
  pkt.size = 0;
  int index = 0;
  uint64_t out_channel_layout = AV_CH_LAYOUT_STEREO;
  frame.channels = is->audio_ctx->channels;
  frame.format = is->audio_ctx->sample_fmt;
  frame.nb_samples = is->audio_ctx->frame_size;
  av_frame_get_buffer(&frame, 0);
  double pts;
  int n;
  for (;;)
  {

    if (pkt.data)
      av_packet_unref(&pkt);

    if (is->quit)
    {
      return -1;
    }

    ret = avcodec_receive_frame(is->audio_ctx, &frame);
    if (ret == 0)
    {
      goto __SWR_DATA;
    }

    if (packet_queue_get(&(is->audioq), &pkt, 1) < 0)
    {
      if (is->audioq.flash)
      {
        goto __RECEIVE;
      }
      av_log(NULL, AV_LOG_ERROR, "flash audio\n");
      is->audioq.flash = 1;
      ret = avcodec_send_packet(is->audio_ctx, NULL);
      if (ret < 0)
      {
        return -1;
      }
      goto __RECEIVE;
    }
    ++(is->audioq.useCout);
    ret = avcodec_send_packet(is->audio_ctx, &pkt);
    if (ret < 0)
    {
      ret = -1;
      printf("decode error");
      av_packet_unref(&pkt);
      return -1;
    }
    if (pkt.data)
    {
      av_packet_unref(&pkt);
    }
  __RECEIVE:
    index = 0;
    ret = avcodec_receive_frame(is->audio_ctx, &frame);
    if (ret < 0)
    {
      return ret;
    }
  __SWR_DATA:
    data_size = av_get_bytes_per_sample(out_format) * out_channel * out_nb_samples;
    swr_convert(is->audio_swr_ctx,
                &audio_buf,
                out_nb_samples,
                (const uint8_t **)frame.data,
                frame.nb_samples);
    pts = is->audio_clock;
    *pts_ptr = pts;    
    is->audio_clock += (double)data_size /(double)(out_channel * av_get_bytes_per_sample(out_format) * is->audio_ctx->sample_rate);
    return data_size;
  }
}

void audio_callback(void *userdata, Uint8 *stream, int len)
{

  VideoState *is = (VideoState *)userdata;
  int len1, audio_size;
  double pts;

  SDL_memset(stream, 0, len);

  while (len > 0)
  {
    if (is->audio_buf_index >= is->audio_buf_size)
    {
      /* We have already sent all our data; get more */
      audio_size = audio_decode_frame(is, is->audio_buf, sizeof(is->audio_buf), &pts);
      if (audio_size < 0)
      {
        /* If error, output silence */
        is->audio_buf_size = 1024 * 2 * 2;
        memset(is->audio_buf, 0, is->audio_buf_size);
      }
      else
      {
        is->audio_buf_size = audio_size;
      }
      is->audio_buf_index = 0;
    }
    len1 = is->audio_buf_size - is->audio_buf_index;
    // fprintf(stderr, "stream addr:%p, audio_buf_index:%d, audio_buf_size:%d, len1:%d, len:%d\n",
    //         stream,
    //         is->audio_buf_index,
    //         is->audio_buf_size,
    //         len1,
    //         len);

    if (len1 > len)
      len1 = len;
    SDL_MixAudio(stream, (uint8_t *)is->audio_buf + is->audio_buf_index, len1, SDL_MIX_MAXVOLUME);
    len -= len1;
    stream += len1;
    is->audio_buf_index += len1;
  }
}

static Uint32 sdl_refresh_timer_cb(Uint32 interval, void *opaque)
{
  SDL_Event event;
  event.type = FF_REFRESH_EVENT;
  event.user.data1 = opaque;
  SDL_PushEvent(&event);
  return 0; /* 0 means stop timer */
}

static void schedule_refresh(VideoState *is, int delay)
{
  // av_log(NULL, AV_LOG_INFO, "schedule_refresh delay=%d\n", delay);
  SDL_AddTimer(delay, sdl_refresh_timer_cb, is);
}
void video_display(VideoState *is)
{

  SDL_Rect rect;
  VideoPicture *vp;
  float aspect_ratio;
  int w, h, x, y;
  int i;

  vp = &is->pictq[is->pictq_rindex];
  if (vp->yuv_frame)
  {
    if (is->video_ctx->sample_aspect_ratio.num == 0)
    {
      aspect_ratio = 0;
    }
    else
    {
      aspect_ratio = av_q2d(is->video_ctx->sample_aspect_ratio) *
                     is->video_ctx->width / is->video_ctx->height;
    }

    if (aspect_ratio <= 0.0)
    {
      aspect_ratio = (float)is->video_ctx->width /
                     (float)is->video_ctx->height;
    }
    // size_t buffer_size = av_image_get_buffer_size(AV_PIX_FMT_YUV420P, 960, 540, 32);
    // uint8_t *buffer = malloc(buffer_size);
    // int y_size = 960 * 540;
    // memcpy(buffer, vp->yuv_frame->data[0], y_size);
    // memcpy(buffer + y_size, vp->yuv_frame->data[1], y_size / 4);
    // memcpy(buffer + y_size + y_size / 4, vp->yuv_frame->data[2], y_size / 4);
    SDL_UpdateYUVTexture(texture, NULL,
                         vp->yuv_frame->data[0], vp->yuv_frame->linesize[0],
                         vp->yuv_frame->data[1], vp->yuv_frame->linesize[1],
                         vp->yuv_frame->data[2], vp->yuv_frame->linesize[2]);
    // SDL_UpdateTexture(texture, NULL, buffer, 960);

    rect.x = 0;
    rect.y = 0;
    rect.w = is->video_ctx->width;
    rect.h = is->video_ctx->height;

    //SDL_LockMutex(texture_mutex);
    SDL_RenderClear(renderer);
    SDL_RenderCopy(renderer, texture, NULL, &rect);
    SDL_RenderPresent(renderer);
    //SDL_UnlockMutex(texture_mutex);
  }
}

double get_audio_clock(VideoState *is) {
  double pts;
  int hw_buf_size;
  int bytes_per_sec;//每秒處理的資料數
  pts = is->audio_clock; /* maintained in the audio thread */
  hw_buf_size = is->audio_buf_size - is->audio_buf_index;
  bytes_per_sec = 0;
  if(is->audio_st) {
    bytes_per_sec = is->audio_ctx->sample_rate * out_channel * av_get_bytes_per_sample(out_format);
  }
  if(bytes_per_sec) {
    pts -= (double)hw_buf_size / bytes_per_sec;
  }
  return pts;
}

void video_refresh_timer(void *userdata)
{

  VideoState *is = (VideoState *)userdata;
  VideoPicture *vp;
  double actual_delay, delay, sync_threshold, ref_clock, diff;

  if (is->video_st)
  {
    if (is->pictq_size == 0)
    {
      schedule_refresh(is, 1); //if the queue is empty, so we shoud be as fast as checking queue of picture
    }
    else
    {
      vp = &is->pictq[is->pictq_rindex];
      delay = vp->pts - is->frame_last_pts;
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer vp->pts=%f  is->frame_last_pts=%f delay=%f\n", vp->pts, is->frame_last_pts, delay);
      if(delay <= 0 || delay >= 1.0){
        delay = is->frame_last_delay;
      }
      is->frame_last_delay = delay;
      is->frame_last_pts = vp->pts;

      ref_clock = get_audio_clock(is); //擷取參考時鐘，目前音頻的播放時間
      diff = vp->pts - ref_clock;
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer ref_clock=%f, diff=%f\n", ref_clock,  diff);

      sync_threshold = (delay > AV_SYNC_THRESHOLD)?delay:AV_SYNC_THRESHOLD;
      if(fabs(diff) < AV_NOSYNC_THRESHOLD){
         if(diff - sync_threshold <= 0){
           delay = 0;
         }else if(diff >= sync_threshold){
           delay = 2 * delay;
         }
      }
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer delay=%f, is->frame_timer=%f\n", delay,  is->frame_timer);

      is->frame_timer += delay;
      double current_time = av_gettime()/1000000.0;
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer 11111 is->frame_timer=%f, current_time=%f\n", is->frame_timer, current_time);

      actual_delay = is->frame_timer - current_time;//av_gettime得到的是微妙
      // av_log(NULL, AV_LOG_INFO, "video_refresh_timer 222222 actual_delay=%f\n", actual_delay);

      if(actual_delay < AV_SYNC_THRESHOLD ){
        actual_delay = AV_SYNC_THRESHOLD;
      }
      av_log(NULL, AV_LOG_INFO, "video_refresh_timer actual_delay=%f ms\n", actual_delay * 1000 + 0.5);

      /* Now, normally here goes a ton of code
	 about timing, etc. we're just going to
	 guess at a delay for now. You can
	 increase and decrease this value and hard code
	 the timing - but I don't suggest that ;)
	 We'll learn how to do it for real later.
      */
      schedule_refresh(is, actual_delay * 1000 + 0.5);

      /* show the picture! */
      video_display(is);

      /* update queue for next picture! */
      if (++is->pictq_rindex == VIDEO_PICTURE_QUEUE_SIZE)
      {
        is->pictq_rindex = 0;
      }
      SDL_LockMutex(is->pictq_mutex);
      is->pictq_size--;
      SDL_CondSignal(is->pictq_cond);
      SDL_UnlockMutex(is->pictq_mutex);
    }
  }
  else
  {
    schedule_refresh(is, 100);
  }
}

void alloc_picture(void *userdata)
{

  VideoState *is = (VideoState *)userdata;
  VideoPicture *vp;

  vp = &is->pictq[is->pictq_windex];
  if (vp->yuv_frame)
  { //free space if vp->pict is not NULL
    av_frame_free(&(vp->yuv_frame));
    free(vp->yuv_frame);
  }

  // Allocate a place to put our YUV image on that screen
  //SDL_LockMutex(texture_mutex);

  vp->yuv_frame = av_frame_alloc();
  vp->yuv_frame->width = is->video_ctx->width;
  vp->yuv_frame->height = is->video_ctx->height;
  vp->yuv_frame->format = out_yuv_foramt;
  av_frame_get_buffer(vp->yuv_frame, 32);

  vp->width = is->video_ctx->width;
  vp->height = is->video_ctx->height;
  vp->allocated = 1;
}

int queue_picture(VideoState *is, AVFrame *pFrame, double pts)
{

  VideoPicture *vp;
  int dst_pix_fmt;
  AVPicture pict;

  /* wait until we have space for a new pic */
  SDL_LockMutex(is->pictq_mutex);
  while (is->pictq_size >= VIDEO_PICTURE_QUEUE_SIZE &&
         !is->quit)
  {
    SDL_CondWait(is->pictq_cond, is->pictq_mutex);
  }
  SDL_UnlockMutex(is->pictq_mutex);

  if (is->quit)
  {
    fprintf(stderr, "quit from queue_picture....\n");
    return -1;
  }

  // windex is set to 0 initially
  vp = &is->pictq[is->pictq_windex];

  /*
  fprintf(stderr, "vp.width=%d, vp.height=%d, video_ctx.width=%d, video_ctx.height=%d\n", 
		  vp->width, 
		  vp->height, 
		  is->video_ctx->width,
		  is->video_ctx->height);
  */

  /* allocate or resize the buffer! */
  if (!vp->yuv_frame ||
      vp->width != is->video_ctx->width ||
      vp->height != is->video_ctx->height)
  {

    vp->allocated = 0;
    alloc_picture(is);
    if (is->quit)
    {
      fprintf(stderr, "quit from queue_picture2....\n");
      return -1;
    }
  }

  /* We have a place to put our picture on the queue */

  if (vp->yuv_frame)
  {
    vp->pts=pts;
    // Convert the image into YUV format that SDL uses
    sws_scale(is->sws_ctx,
              (uint8_t const *const *)pFrame->data,
              pFrame->linesize,
              0,
              is->video_ctx->height,
              vp->yuv_frame->data,
              vp->yuv_frame->linesize);

    /* now we inform our display thread that we have a pic ready */
    if (++is->pictq_windex == VIDEO_PICTURE_QUEUE_SIZE)
    {
      is->pictq_windex = 0;
    }
    SDL_LockMutex(is->pictq_mutex);
    is->pictq_size++;
    SDL_UnlockMutex(is->pictq_mutex);
  }
  return 0;
}

double synchronize_video(VideoState *is, AVFrame *src_frame, double pts) {

  double frame_delay;

  if(pts != 0) {
    /* if we have pts, set video clock to it */
    is->video_clock = pts;
  } else {
    /* if we aren't given a pts, set it to the clock */
    pts = is->video_clock;
  }
  /* update the video clock */
  frame_delay = av_q2d(is->video_ctx->time_base);
  /* if we are repeating a frame, adjust clock accordingly */
  frame_delay += src_frame->repeat_pict * (frame_delay * 0.5);
  is->video_clock += frame_delay;
  return pts;
}

int video_thread(void *arg)
{
  VideoState *is = (VideoState *)arg;
  static AVPacket pkt;
  static AVFrame pFrame;
  int ret = 0;
  av_init_packet(&pkt);
  pkt.data = NULL;
  pkt.size = 0;
  double pts = 0;
  for (;;)
  {
    if (pkt.data)
      av_packet_unref(&pkt);

    if (packet_queue_get(&is->videoq, &pkt, 1) < 0)
    {
      if (is->videoq.flash)
      {
        goto __RECEIVE;
      }
      av_log(NULL, AV_LOG_ERROR, "flash audio\n");
      is->videoq.flash = 1;
      ret = avcodec_send_packet(is->video_ctx, NULL);
      if (ret < 0)
      {
        goto __ERROR;
      }
      goto __RECEIVE;
    }
    pts = 0;
    ret = avcodec_send_packet(is->video_ctx, &pkt);
    if (ret != 0)
    {
      printf("decode error");
      goto __ERROR;
    }
  __RECEIVE:
    ret = avcodec_receive_frame(is->video_ctx, &pFrame);
    if (ret != 0)
    {
      continue;
    }
    pts = pFrame.best_effort_timestamp;
    if(pts == AV_NOPTS_VALUE){
        pts = 0;
    }
    pts *= av_q2d(is->video_st->time_base);
    pts = synchronize_video(is, &pFrame, pts);
    ret = queue_picture(is, &pFrame, pts);
    if (ret < 0)
    {
      goto __ERROR;
    }
  }
__ERROR:
  if (pkt.data)
  {
    av_packet_unref(&pkt);
  }
  return ret;
}

int stream_component_open(VideoState *is, int stream_index)
{

  int64_t in_channel_layout, out_channel_layout;

  AVFormatContext *pFormatCtx = is->pFormatCtx;
  AVCodecContext *codecCtx = NULL;
  AVCodec *codec = NULL;
  SDL_AudioSpec wanted_spec, spec;
  AVCodecParameters *codec_par = NULL;

  if (stream_index < 0 || stream_index >= pFormatCtx->nb_streams)
  {
    return -1;
  }

  codec_par = pFormatCtx->streams[stream_index]->codecpar;
  if (stream_index == is->audioStream)
  {
    out_nb_samples = codec_par->frame_size;
    out_sample_rate = codec_par->sample_rate;
  }

  codec = avcodec_find_decoder(codec_par->codec_id);
  codecCtx = avcodec_alloc_context3(codec);
  avcodec_parameters_to_context(codecCtx, codec_par);

  if (!codec)
  {
    fprintf(stderr, "Unsupported codec!\n");
    return -1;
  }

  if (codecCtx->codec_type == AVMEDIA_TYPE_AUDIO)
  {
    // Set audio settings from codec info
    wanted_spec.freq = codecCtx->sample_rate;
    wanted_spec.format = AUDIO_S16SYS;
    wanted_spec.channels = out_channel;
    wanted_spec.silence = 0;
    wanted_spec.samples = out_nb_samples;
    wanted_spec.callback = audio_callback;
    wanted_spec.userdata = is;

    if (SDL_OpenAudio(&wanted_spec, &spec) < 0)
    {
      fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
      return -1;
    }
  }

  if (avcodec_open2(codecCtx, codec, NULL) < 0)
  {
    fprintf(stderr, "Unsupported codec!\n");
    return -1;
  }

  switch (codecCtx->codec_type)
  {
  case AVMEDIA_TYPE_AUDIO:
    is->audio_st = pFormatCtx->streams[stream_index];
    is->audio_ctx = codecCtx;
    is->audio_buf_size = 0;
    is->audio_buf_index = 0;
    packet_queue_init(&is->audioq);
    SDL_PauseAudio(0);

    //Out Audio Param
    uint64_t out_channel_layout = av_get_default_channel_layout(out_channel);

    //uint8_t *out_buffer=(uint8_t *)av_malloc(MAX_AUDIO_FRAME_SIZE*2);
    int64_t in_channel_layout = av_get_default_channel_layout(is->audio_ctx->channels);

    struct SwrContext *audio_convert_ctx = NULL;
    audio_convert_ctx = swr_alloc();
    if (!audio_convert_ctx)
    {
      printf("Failed to swr_alloc\n");
      return -1;
    }
    swr_alloc_set_opts(audio_convert_ctx,
                       out_channel_layout,
                       out_format,
                       out_sample_rate,
                       in_channel_layout,
                       is->audio_ctx->sample_fmt,
                       is->audio_ctx->sample_rate,
                       0,
                       NULL);

    // fprintf(stderr, "swr opts: out_channel_layout:%lld, out_sample_fmt:%d, out_sample_rate:%d, in_channel_layout:%lld, in_sample_fmt:%d, in_sample_rate:%d\n",
    //         out_channel_layout,
    //         out_format,
    //         out_sample_rate,
    //         in_channel_layout,
    //         is->audio_ctx->sample_fmt,
    //         is->audio_ctx->sample_rate);

    swr_init(audio_convert_ctx);
    is->audio_swr_ctx = audio_convert_ctx;

    break;

  case AVMEDIA_TYPE_VIDEO:
    is->video_st = pFormatCtx->streams[stream_index];
    is->video_ctx = codecCtx;
    is->frame_timer = (double)av_gettime()/1000000.0;
    is->frame_last_delay = 40e-3;
    packet_queue_init(&is->videoq);
    is->video_tid = SDL_CreateThread(video_thread, "video_thread", is);
    is->sws_ctx = sws_getContext(is->video_ctx->width,
                                 is->video_ctx->height,
                                 is->video_ctx->pix_fmt,
                                 is->video_ctx->width,
                                 is->video_ctx->height,
                                 out_pix_foramt,
                                 SWS_BILINEAR,
                                 NULL, NULL, NULL);
    break;
  default:
    break;
  }

  return 0;
}

int decode_thread(void *arg)
{
  VideoState *is = arg;
  AVPacket packet;
  av_init_packet(&packet);
  packet.data = NULL;
  packet.size = 0;

  if (is->audioStream >= 0)
  {
    stream_component_open(is, is->audioStream);
  }
  if (is->videoStream >= 0)
  {
    stream_component_open(is, is->videoStream);
  }

  fprintf(stderr, "video context: width=%d, height=%d\n", is->video_ctx->width, is->video_ctx->height);

  // main decode loop
  for (;;)
  {

    if (is->quit)
    {
      SDL_CondSignal(is->videoq.cond);
      SDL_CondSignal(is->audioq.cond);
      break;
    }

    // seek stuff goes here
    if (is->audioq.size > MAX_AUDIOQ_SIZE ||
        is->videoq.size > MAX_VIDEOQ_SIZE)
    {
      SDL_Delay(10);
      continue;
    }
    int ret = av_read_frame(is->pFormatCtx, &packet);
    // fprintf(stderr, "av_read_frame, ret :%s\n", av_err2str(ret));

    if (ret < 0)
    {
      break;
    }

    // Is this a packet from the video stream?
    if (packet.stream_index == is->videoStream)
    {
      packet_queue_put(&is->videoq, &packet);
      ++(is->videoq.total);
      // fprintf(stderr, "put video queue, size :%d\n", is->videoq.total);
    }
    else if (packet.stream_index == is->audioStream)
    {
      packet_queue_put(&is->audioq, &packet);
      ++(is->audioq.total);
      // fprintf(stderr, "put audio queue, size :%d\n", is->audioq.total);
    }
    av_packet_unref(&packet);
  }

  is->audioq.end = 1;
  is->videoq.end = 1;

  /* all done - wait for it */
  while (!is->quit)
  {
    SDL_Delay(100);
  }

fail:
  if (1)
  {
    SDL_Event event;
    event.type = FF_QUIT_EVENT;
    event.user.data1 = is;
    SDL_PushEvent(&event);
  }

  return 0;
}

int init_VideoState(VideoState *is)
{
  Uint32 pixformat;
  AVFormatContext *pFormatCtx = NULL;
  AVPacket pkt1, *packet = &pkt1;

  int i;

  is->videoStream = -1;
  is->audioStream = -1;

  global_video_state = is;

  // Open video file
  if (avformat_open_input(&pFormatCtx, is->filename, NULL, NULL) != 0)
    return -1; // Couldn't open file

  is->pFormatCtx = pFormatCtx;

  // Retrieve stream information
  if (avformat_find_stream_info(pFormatCtx, NULL) < 0)
    return -1; // Couldn't find stream information

  // Dump information about file onto standard error
  av_dump_format(pFormatCtx, 0, is->filename, 0);

  // Find the first video stream
  is->videoStream = av_find_best_stream(pFormatCtx, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, -1);
  is->audioStream = av_find_best_stream(pFormatCtx, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, -1);

  if (is->videoStream < 0 || is->audioStream < 0)
  {
    av_log(NULL, AV_LOG_ERROR, "%s: could not open codecs\n", is->filename);
    return -1;
  }

  return 0;
}

int main(int argc, char *argv[])
{

  int ret = -1;

  SDL_Event event;

  VideoState *is;

  if (argc < 2)
  {
    fprintf(stderr, "Usage: test <file>\n");
    exit(1);
  }
  av_log_set_level(AV_LOG_INFO);
  //big struct, it's core
  is = av_mallocz(sizeof(VideoState));

  // Register all formats and codecs

  if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER))
  {
    fprintf(stderr, "Could not initialize SDL - %s\n", SDL_GetError());
    exit(1);
  }

  //texture_mutex = SDL_CreateMutex();
  memcpy(is->filename, argv[1], sizeof(is->filename));
  is->pictq_mutex = SDL_CreateMutex();
  is->pictq_cond = SDL_CreateCond();

  ret = init_VideoState(is);
  if (ret < 0)
  {
    goto __FAIL;
  }

  AVCodecParameters *video_paramters = is->pFormatCtx->streams[is->videoStream]->codecpar;
  win = SDL_CreateWindow("Media Player",
                         SDL_WINDOWPOS_UNDEFINED,
                         SDL_WINDOWPOS_UNDEFINED,
                         video_paramters->width,
                         video_paramters->height,
                         SDL_WINDOW_OPENGL | SDL_WINDOW_RESIZABLE);
  renderer = SDL_CreateRenderer(win, -1, 0);

  texture = SDL_CreateTexture(renderer,
                              SDL_PIXELFORMAT_IYUV,
                              SDL_TEXTUREACCESS_STREAMING,
                              video_paramters->width,
                              video_paramters->height);
  //set timer
  schedule_refresh(is, 40);

  is->parse_tid = SDL_CreateThread(decode_thread, "decode_thread", is);
  if (!is->parse_tid)
  {
    av_free(is);
    goto __FAIL;
  }

  for (;;)
  {

    SDL_WaitEvent(&event);
    switch (event.type)
    {
    case FF_QUIT_EVENT:
    case SDL_QUIT:
      fprintf(stderr, "receive a QUIT event: %d\n", event.type);
      is->quit = 1;
      SDL_CondSignal(is->audioq.cond);
      SDL_CondSignal(is->pictq_cond);
      goto __QUIT;
      break;
    case FF_REFRESH_EVENT:
      //fprintf(stderr, "receive a refresh event: %d\n", event.type);
      video_refresh_timer(event.user.data1);
      break;
    default:
      break;
    }
  }

__QUIT:
  ret = 0;

__FAIL:
  SDL_Delay(20);
  SDL_Quit();
  if (is)
  {
    if (is->audio_swr_ctx)
    {
      swr_close(is->audio_swr_ctx);
      swr_free(&(is->audio_swr_ctx));
    }
    if (is->sws_ctx)
    {
      sws_freeContext(is->sws_ctx);
    }

    if (is->audio_ctx)
    {
      avcodec_free_context(&is->audio_ctx);
    }

    if (is->video_ctx)
    {
      avcodec_free_context(&is->video_ctx);
    }

    if (is->pFormatCtx)
    {
      avformat_close_input(&(is->pFormatCtx));
      avformat_free_context(is->pFormatCtx);
    }
  }
  return ret;
}

ffmpeg sdl 播放器實作（音視訊同步實作）

繼續閱讀

Android ffmpeg yuv原始資料寫入yuv檔案

FFmpeg視訊解碼為YUV像素資料檔案

FFmpeg視訊檔案解碼為YUV資料

FFmpeg 解碼視訊流實作yuv播放

YUV RGB常見視訊格式解析

FFmpeg 将多幅jpg/png圖檔轉為mp4/avi/yuv視訊序列的方法轉YUV轉MP4/AVI播放YUV視訊

ffmpeg擷取視訊時長(秒數)

Java通過調用FFMPEG擷取視訊時長（已測試）

asp.net中将各種視訊檔案轉換成.flv格式

利用ffmpeg把視訊檔案轉換為flv檔案

用ffmpeg 轉換flv 2 wma

c#中調用Ffmpeg轉換視訊格式的問題

c#使用 FFMPEG 視訊格式轉換

ffmpeg windows下編譯ffmpeg

ffmpeg視訊轉換工具

ffmpeg開發出現的問題(四) ftp/rstp/ts 流輸出