天天看點

bam/sam 資料格式的介紹 (二)

5.詳解

舉例:

E00606:11:H2CC3CCXY:8:1101:7172:14195 77 * 0 0 * * 0 0 CTACGAGTCATTTAGCACCGGGTTCTCCACAAACTTGCGGTGCGTCTCCAGAGAGGGGCGGCACTCGTTCGGCCGCACCCCGGTCCAGTCACGAACGGCTCTCCACACCGGCCGGCCCCGGGGGGTCGACCGGCTATCCCAGGCCAATCA AAFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JFJJJJ<FJJJJJJJJJJJJJJJJJ)FJ<JJJJJJFJJJJJJJJJJFJJ< XM:i:0

E00606:11:H2CC3CCXY:8:1101:7172:14195 2:N:0:ATCACG 141 * 0 0 * * 0 0 AGACATTTGGTGCGTGTGCTTGGCTGAGGAGCCACTGGTGCGAAGCTACCATCTGTGGGATTATGACTGAACGCCTCTAAGTCAGAATCCCGCCTAAACGTAACGATACCGCAGCGCCGCGGGACTTTGATTGGCCTGGGATAGCCGGTC AAAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJJ<JAJJJJJJJ<JJJJJ<JJJJJJJJJF7JJFFFJJJJJFJAJJJAJFJJJJ7JJFJJFFA-A7FFJJJJF-AFJJJJJJJJ XM:i:0

E00606:11:H2CC3CCXY:8:1101:6400:14195 77 * 0 0 * * 0 0 GCGGGATGCAGGCCGCTCACCATGGCGACGGAGCTGGAGGCGTGGCTCATGTATGAGGATGTCTGGGGCAGCGGATACGTCACCACCTCCAGTACATCATGAGAGCTGCGCTTGAAGCGGTTATTACTGGGCAGCGGCAGCAGGGGGCAG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ XM:i:0

E00606:11:H2CC3CCXY:8:1101:7963:14195 2:N:0:ATCACG 141 * 0 0 * * 0 0 GAGTCTAACGCACGCGCGAGTCAAAGGGTGTCTCCGAGCCCCCACGGCGCAATGAAGGTGAAGGCCGGCGCTCGCCGGCCCAGGTGGGATCCCCCCGCCCCGGCGGGGGGCGCACCACCGGCCCGTCTCGCCCGCACCGCCGGGCAGGTG AAAFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJFJFJJFJJJJJ<JJJJJJJ<FJJJFJJJFFFJJJJFJJJFJJJJJJJJJJJJJ)AFFJJJJFJJJFFJJJJJJJJ)7<JF--<FJFJ)< XM:i:0

1)QNAME

query name 一般就是read名稱 如:E00606:11:H2CC3CCXY:8:1101:7172:14195

2)FLAG  

bam/sam 資料格式的介紹 (二)

以下資訊來自于:http://www.cnblogs.com/xudongliang/p/5437850.html

#define BAM_FPAIRED        1

#define BAM_FPROPER_PAIR   2

#define BAM_FUNMAP         4

#define BAM_FMUNMAP        8

#define BAM_FREVERSE      16

#define BAM_FMREVERSE     32

#define BAM_FREAD1        64

#define BAM_FREAD2       128

#define BAM_FSECONDARY   256

#define BAM_FQCFAIL      512

#define BAM_FDUP        1024

#define BAM_FSUPPLEMENTARY 2048

1 : 代表這個序列采用的是PE雙端測序

2: 代表這個序列和參考序列完全比對,沒有錯配和插入缺失

4: 代表這個序列沒有mapping到參考序列上

8: 代表這個序列的另一端序列沒有比對到參考序列上,比如這條序列是R1,它對應的R2端序列沒有比對到參考序列上

16:代表這個序列比對到參考序列的負鍊上

32 :代表這個序列對應的另一端序列比對到參考序列的負鍊上

64 : 代表這個序列是R1端序列, read1;

128 : 代表這個序列是R2端序列,read2;

256: 代表這個序列不是主要的比對,一條序列可能比對到參考序列的多個位置,隻有一個是首要的比對位置,其他都是次要的

512: 代表這個序列在QC時失敗了,被過濾不掉了(# 這個标簽不常用)

1024: 代表這個序列是PCR重複序列(#這個标簽不常用)

2048: 代表這個序列是補充的比對(#這個标簽具體什麼意思,沒搞清楚,但是不常用)

上面的這幾個标簽都是2的n次方,這樣的數列有一個特點,就是随機挑選其中的幾個,它們的和是唯一的,比如65 隻能是1 和 64 組成,代表這個序列是雙端測序,而且是read1

samtools 中flag 可以檢視flags詳細資訊:如:

$samtools flags 77

0x4d    77      PAIRED,UNMAP,MUNMAP,READ1

flags值為77 

PAIRED表示這條序列采用雙端測序, 其值為1;

UNMAP表示這個序列沒有mapping到參考序列上, 其值為4;

MUNMAP表示這個序列的另一端序列沒有比對到參考序列上, 其值為8;

READ1表示這條序列是R1端序列,其值為64.

以上數值相加和為77

$samtools flags 141

0x8d    141     PAIRED,UNMAP,MUNMAP,READ2

flags值為141

PAIRED表示這條序列采用雙端測序, 其值為1;

UNMAP表示這個序列沒有mapping到參考序列上, 其值為4;

MUNMAP表示這個序列的另一端序列沒有比對到參考序列上, 其值為8;

READ1表示這條序列是R1端序列,其值為128.

以上數值相加和為141

3)RNAME

reference sequence name

一般是參考基因組染色體名稱,如果沒有比對上,用*表示

繼續閱讀