文章目錄
-
- ANNOVAR的程式子產品
- ANNOVAR的輸入檔案
- ANNOVAR輸入檔案的格式轉換
- ANNOVAR注釋功能
- 下載下傳資料庫
- Annotate_variation.pl
-
-
- Annotate_variation.pl 執行個體
-
- Region-based annotation
- Filter-based annotation
- 下面給大家介紹常用的兩種過濾注釋
- dbSNP annotations
ANNOVAR的程式子產品
(ANNOVAR程式結構
│ annotate_variation.pl #主程式,功能包括下載下傳資料庫,三種不同的注釋
│ coding_change.pl #可用來推斷蛋白質序列
│ convert2annovar.pl #将多種格式轉為.avinput的程式
│ retrieve_seq_from_fasta.pl #用于自行建立其他物種的轉錄本
│ table_annovar.pl #注釋程式,可一次性完成三種類型的注釋
│ variants_reduction.pl #可用來更靈活地定制過濾注釋流程
│
├─example #存放示例檔案
│
└─humandb #人類注釋資料庫)
ANNOVAR的輸入檔案
ANNOVAR使用.avinput格式,如以上代碼所示,該格式每列以tab分割,需要有以下幾個資訊:
染色體位置
起始位點
終止位點
參考基因組堿基
突變堿基
......
chrM 302 302 - C chrM 302 . A AC 93.73 PASS AC=4;AF=0.500;AN=8;ClippingRankSum=0.000;DP=121;ExcessHet=3.0103;MLEAC=1;MLEAF=0.500;set=Intersection GT:AD:DP:GQ:PL 0/1:16,9:25:99:183,0,384
chrM 963 963 T - chrM 962 . CT C 258.73 PASS AC=3;AF=0.500;AN=6;ClippingRankSum=0.000;DP=75;ExcessHet=3.0103;MLEAC=1;MLEAF=0.500;set=variant2-variant3-variant4 GT:AD:DP:GQ:PL 0/1:11,11:22:99:296,0,247
chr1 2178293 2178294 GG - chr1 2178292 . TGG T 128 PASS AC=6;AF=1.00;AN=6;DP=31;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;set=variant-variant2-variant3 GT:AD:DP:GQ:PL 1/1:0,4:4:12:165,12,0
chr1 2248382 2248382 - C chr1 2248382 . G GC 98.25 PASS AC=4;AF=1.00;AN=4;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;set=variant2-variant3 GT:AD:DP:GQ:PL 1/1:0,3:3:9:135,9,0
chr1 3278899 3278899 - C chr1 3278899 . A AC 30.71 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.00;QD=15.35;SOR=0.693;set=variant2 GT:AD:DP:GQ:PL 1/1:0,2:2:6:67,6,0
chr1 3817092 3817092 - T chr1 3817092 . C CT 22.73 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.00;QD=11.36;SOR=0.693;set=variant2 GT:AD:DP:GQ:PL 1/1:0,2:2:6:59,6,0
chr1 6067261 6067261 - G chr1 6067261 . T TG 21.73 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.00;QD=10.87;SOR=0.693;set=variant2 GT:AD:DP:GQ:PL 1/1:0,2:2:6:58,6,0
chr1 6211850 6211850 - C chr1 6211850 . A AC 30.71 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.00;QD=15.35;SOR=0.693;set=variant2 GT:AD:DP:GQ:PL 1/1:0,2:2:6:67,6,0
chr1 7538772 7538772 - TTTA chr1 7538772 . C CTTTA 53.70 PASS AC=2;AF=1.00;AN=2;DP=2;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=37.00;QD=26.85;SOR=2.303;set=variant2 GT:AD:DP:GQ:PL 1/1:0,2:2:6:90,6,0
chr1 7709649 7709649 - T chr1 7709649 . C CT 62.74 PASS AC=2;AF=0.500;AN=4;ClippingRankSum=0.000;DP=15;ExcessHet=3.0103;MLEAC=1;MLEAF=0.500;MQ=37.00;MQRankSum=0.000;set=variant2-variant3 GT:AD:DP:GQ:PL 0/1:1,3:4:25:100,0,25
ANNOVAR輸入檔案的格式轉換
ANNOVAR主要使用convert2annovar.pl程式進行轉換,轉換後檔案是精簡過的,主要包含前面提到的5列内容,如果要将原格式的檔案的所有内容都包含在轉換後的.avinput檔案中,可以使用-includeinfo參數;如果需要分開每個sample輸出單一的.avinput檔案,可以使用-allsample參數,等等。
$ convert2annovar.pl -format vcf4 example/ex2.vcf > ex2.avinput
# -format vcf4 指定格式為vcf
ANNOVAR還主要支援以下格式轉換:
SAMtools pileup format
Complete Genomics format
GFF3-SOLiD calling format
SOAPsnp calling format
MAQ calling format
CASAVA calling format
ANNOVAR注釋功能
Table_annovar.pl(可一次完成三種類型的注釋)
使用ANNOVAR最簡單的方法就是使用table_annovar.pl進行注釋,它的輸入檔案可以是多種格式包括VCF,輸出檔案已Tab分隔,每一列代表着一種注釋。
注釋指令示例:
$~/biosoft/ANNOVAR/annovar/table_annovar.pl 15_indel_pre.avinput.hg19.variant2.avinput ~/biosoft/ANNOVAR/annovar/humandb/ -buildver hg19 -out myanno -remove -protocol refGene,cytoBand,genomicSuperDups,esp6500siv2_all,1000g2015aug_all,1000g2015aug_eur,exac03,avsnp147,dbnsfp30a -operation g,r,r,f,f,f,f,f,f -nastring . -csvout
# -buildver hg19 表示使用hg19版本
# -out myanno 表示輸出檔案的字首為myanno
# -remove 表示删除注釋過程中的臨時檔案
# -protocol 表示注釋使用的資料庫,用逗号隔開,且要注意順序
# -operation 表示對應順序的資料庫的類型(g代表gene-based、r代表region-based、f代表filter-based),用逗号隔開,注意順序
# -nastring . 表示用點号替代預設的值
# -csvout 表示最後輸出.csv檔案
輸出的csv檔案将包含輸入的5列主要資訊以及各個資料庫裡的注釋,此外,table_annoval.pl可以直接對vcf檔案進行注釋(不需要轉換格式),注釋的内容将會放在vcf檔案的“INFO”那一欄。
注釋結果示例:
下載下傳資料庫
#下載下傳1000g2015Aug資料庫
$perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g2015aug humandb/
Annotate_variation.pl
Annotate_variation.pl的注釋方式分為三種:
Gene-based annotation
Region-based annotation
Filter-based annotation
annotate_variation.pl -geneanno -buildver hg19 example/ex1.avinput humandb/
annotate_variation.pl -regionanno -dbtype cytoBand -buildver hg19 example/ex1.avinput humandb/
annotate_variation.pl -filter -dbtype 1000g2014oct_all -buildver hg19 example/ex1.avinput humandb/
#三種指令示例,使用package自帶資料進行注釋,分别對應三種注釋方式
Annotate_variation.pl 執行個體
Gene-based annotation
顧名思義,Gene-based annotation是根據SNPs以及CNVs的位置資訊來确定是否會造成編碼序列以及開放閱讀框的改變進而影響氨基酸的改變,使用者可以自主選擇RefSeq genes, 包括UCSC genes, ENSEMBL genes, GENCODE genes, AceView genes等來進行注釋。
指令示例:
$ annotate_variation.pl -geneanno -dbtype refGene -out ex1 -build hg19 example/ex1.avinput humandb/
# -geneanno 表示使用基于基因的注釋
# -dbtype refGene 表示使用"refGene"資料庫
# -out ex1 表示輸出檔案以ex1為字首
因為annotate_variation.pl預設使用gene-based注釋類型以及refGene資料庫,是以上面的指令可以預設-geneanno -dbtype refGene
運作結果會産生兩個檔案:
#ex1.variant_function
[[email protected] ~/]$ cat ex1.variant_function
UTR5 ISG15(NM_005101:c.-33T>C) 1 948921 948921 T C comments: rs15842, a SNP in 5' UTR of ISG15
UTR3 ATAD3C(NM_001039211:c.*91G>T) 1 1404001 1404001 G T comments: rs149123833, a SNP in 3' UTR of ATAD3C
splicing NPHP4(NM_001291593:exon19:c.1279-2T>A,NM_001291594:exon18:c.1282-2T>A,NM_015102:exon22:c.2818-2T>A) 1 5935162 5935162 A T comments: rs1287637, a splice site variant in NPHP4
intronic DDR2 1 162736463 162736463 C T comments: rs1000050, a SNP in Illumina SNP arrays
intronic DNASE2B 1 84875173 84875173 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
intergenic LOC645354(dist=11566),LOC391003(dist=116902) 1 13211293 13211294 TC - comments: rs59770105, a 2-bp deletion
intergenic UBIAD1(dist=55105),PTCHD2(dist=135699) 1 11403596 11403596 - AT comments: rs35561142, a 2-bp insertion
intergenic LOC100129138(dist=872538),NONE(dist=NONE) 1 105492231 105492231 A ATAAA comments: rs10552169, a block substitution
exonic IL23R 1 67705958 67705958 G A comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
exonic ATG16L1 2 234183368 234183368 A G comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease
exonic NOD2 16 50745926 50745926 C T comments: rs2066844 (R702W), a non-synonymous SNP in NOD2
exonic NOD2 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
exonic NOD2 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
exonic GJB2 13 20763686 20763686 G - comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss
exonic CRYL1,GJB6 13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss
第一個檔案包括對于所有突變的注釋,通過在檔案最前面加入兩列,以tab分割
第一列為變異所在基因位置的類型,如外顯子,内含子,UTR5,UTR3,基因間等
第二列為對第一列的描述資訊,詳情見下
#ex1.exonic_variant_function
[[email protected] ~/]$ cat ex1.exonic_variant_function
line9 nonsynonymous SNV IL23R:NM_144701:exon9:c.G1142A:p.R381Q, 1 67705958 67705958 G A comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
line10 nonsynonymous SNV ATG16L1:NM_001190267:exon9:c.A550G:p.T184A,ATG16L1:NM_017974:exon8:c.A841G:p.T281A,ATG16L1:NM_001190266:exon9:c.A646G:p.T216A,ATG16L1:NM_030803:exon9:c.A898G:p.T300A,ATG16L1:NM_198890:exon5:c.A409G:p.T137A, 2 234183368 234183368 A G comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease
line11 nonsynonymous SNV NOD2:NM_022162:exon4:c.C2104T:p.R702W,NOD2:NM_001293557:exon3:c.C2023T:p.R675W, 16 50745926 50745926 C comments: rs2066844 (R702W), a non-synonymous SNP in NOD2
line12 nonsynonymous SNV NOD2:NM_022162:exon8:c.G2722C:p.G908R,NOD2:NM_001293557:exon7:c.G2641C:p.G881R, 16 50756540 50756540 G comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
line13 frameshift insertion NOD2:NM_022162:exon11:c.3017dupC:p.A1006fs,NOD2:NM_001293557:exon10:c.2936dupC:p.A979fs, 16 50763778 5076377comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
line14 frameshift deletion GJB2:NM_004004:exon2:c.35delG:p.G12fs, 13 20763686 20763686 G - comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss line15 frameshift deletion GJB6:NM_001110221:wholegene,GJB6:NM_001110220:wholegene,GJB6:NM_001110219:wholegene,CRYL1:NM_015974:wholegene,GJB6:NM_006783:wholegene, 13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss
第二個輸出檔案以.exonic_variant_function結尾,隻列出外顯子(氨基酸會改變)的變異
第一列為第一個檔案中該變異所在的行号;
第二列為該變異的功能性後果,如外現在改變導緻的氨基酸變化,閱讀框移碼等,詳情見下
第三列為基因名稱,轉錄識别标志和相應的轉錄本的序列變化
第四列為原輸入檔案内容
Region-based annotation
其與Gene-based annotation作用相反,它是用來确認在特定區域的突變造成的影響。比如在44個物種的保守基因區域,預測的轉錄因子結合區域,基因重複區域,GWAS分析區域,基因突變資料庫,表觀組學位點等。此處以Conserved genomic elements annotation為例介紹region-based annotation的使用:
指令示例:
#資料庫下載下傳
[[email protected] ~/]$ annotate_variation.pl -build hg19 -downdb phastConsElements46way humandb/
NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/phastConsElements46way.txt.gz ... OK
NOTICE: Uncompressing downloaded files
NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory
#使用下載下傳資料庫進行注釋
[[email protected] ~/]$ annotate_variation.pl -regionanno -build hg19 -out ex1 -dbtype phastConsElements46way example/ex1.avinput humandb/
NOTICE: Reading annotation database humandb/hg19_phastConsElements46way.txt ... Done with 5163775 regions
NOTICE: Finished region-based annotation on 12 genetic variants in ex1.hg19.avinput
NOTICE: Output files were written to ex1.hg19_phastConsElements46way
# -regionanno 表示使用基于區域的注釋
# -dbtype phastConsElements46way 表示使用"phastConsElements46way"資料庫,注意需要使用Region-based的資料庫
#輸出檔案
[[email protected] ~/]$ cat ex1.hg19_phastConsElements46way
phastConsElements46way Score=387;Name=lod=50 1 67705958 67705958 G A comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
phastConsElements46way Score=420;Name=lod=68 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
phastConsElements46way Score=385;Name=lod=49 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
phastConsElements46way Score=395;Name=lod=54 13 20763686 20763686 G - comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss
phastConsElements46way Score=545;Name=lod=218 13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss
輸出檔案:輸出的注釋檔案第1列為“phastConsElements46way”,對應注釋的類型,這裡的phastCons 46-way alignments屬于保守的基因組區域的注釋;
第二列包含評分和名稱,評分來自UCSC,可以使用--score_threshold和--normscore_threshold來過濾評分低的變異,“Name=lod=x”名稱表示該區域的名稱;
剩餘的部分為輸入檔案的内容。
Filter-based annotation
Filter-based annotation是用以确認已記錄在特定資料庫裡的突變。例如想要知道突變是否為novel variation就需要知道該突變是否存在于dbSNP庫裡,它在1000 genome project裡面等位基因頻率怎樣,以及計算一系列突變項目得分并加以過濾。它差別于region-based annotation就在于它針對突變堿基進行工作,而region-based annotation 針對染色體位置。舉例來說就是region-based比對chr1:1000-1000而filter-based比對chr1:1000-1000上的A->G。
它擁有多種資料庫,包括針對全基因組測序的突變頻率,針對全外顯子資料測序的突變頻率,在孤立或者小類群人群中的突變頻率,全基因組資料突變的功能預測,全外顯子組突變的功能預測,剪切變異體的功能預測,疾病相關突變,突變确認等,如下:
下面給大家介紹常用的兩種過濾注釋
1000 Genomes Project annotations
[[email protected] ~/]$ annotate_variation.pl -filter -dbtype 1000g2012apr_eur -buildver hg19 -out ex1 example/ex1.avinput humandb/
NOTICE: Variants matching filtering criteria are written to ex1.hg19_EUR.sites.2012_04_dropped, other variants are written to ex1.hg19_EUR.sites.2012_04_filtered
NOTICE: Processing next batch with 15 unique variants in 15 input lines
NOTICE: Database index loaded. Total number of bins is 2766067 and the number of bins to be scanned is 12
NOTICE: Scanning filter database humandb/hg19_EUR.sites.2012_04.txt...Done
#檢視資料格式
[[email protected] ~/]$ cat ex1.hg19_EUR.sites.2012_04_dropped
1000g2012apr_eur 0.04 1 1404001 1404001 G T comments: rs149123833, a SNP in 3' UTR of ATAD3C
1000g2012apr_eur 0.87 1 162736463 162736463 C T comments: rs1000050, a SNP in Illumina SNP arrays
1000g2012apr_eur 0.81 1 5935162 5935162 A T comments: rs1287637, a splice site variant in NPHP4
1000g2012apr_eur 0.06 1 67705958 67705958 G A comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
1000g2012apr_eur 0.54 1 84875173 84875173 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
1000g2012apr_eur 0.96 1 948921 948921 T C comments: rs15842, a SNP in 5' UTR of ISG15
1000g2012apr_eur 0.05 16 50745926 50745926 C T comments: rs2066844 (R702W), a non-synonymous SNP in NOD2
1000g2012apr_eur 0.01 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
1000g2012apr_eur 0.01 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
1000g2012apr_eur 0.53 2 234183368 234183368 A G comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease
# -filter 使用基于過濾的注釋
# -dbtype 1000g2012apr_eur 使用"1000g2012apr_eur"資料庫
該注釋使用2012年4月歐洲釋出1000基因組計劃資料庫,輸出檔案會有兩個,output_dropped file 和 output_filtered file
#dropped file
[[email protected] ~/]$ cat ex1.hg19_EUR.sites.2012_04_dropped
1000g2012apr_eur 0.04 1 1404001 1404001 G T comments: rs149123833, a SNP in 3' UTR of ATAD3C
1000g2012apr_eur 0.87 1 162736463 162736463 C T comments: rs1000050, a SNP in Illumina SNP arrays
1000g2012apr_eur 0.81 1 5935162 5935162 A T comments: rs1287637, a splice site variant in NPHP4
1000g2012apr_eur 0.06 1 67705958 67705958 G A comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
1000g2012apr_eur 0.54 1 84875173 84875173 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
1000g2012apr_eur 0.96 1 948921 948921 T C comments: rs15842, a SNP in 5' UTR of ISG15
1000g2012apr_eur 0.05 16 50745926 50745926 C T comments: rs2066844 (R702W), a non-synonymous SNP in NOD2
1000g2012apr_eur 0.01 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
1000g2012apr_eur 0.01 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
1000g2012apr_eur 0.53 2 234183368 234183368 A G comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease
#*dropped檔案
第一列如region-based注釋的結果一樣以資料庫命名;
第二列為等位基因頻率,我們可以用-maf 0.05參數來過濾掉低于0.05的變異;
第三列開始同樣是輸入檔案的内容。
#需要注意的是,我們也可以使用-maf 0.05 -reverse過濾掉高于0.05的變異;但是過濾ALT等位基因的頻率,我們更提倡使用-score_threshold參數。
dbSNP annotations
通過dbsnp annotation, annovar可以确認已經出現在dbSNP資料庫裡面的突變并且注釋SNP identifiers
指令如下:
#下載下傳dbsnpp138資料庫
[[email protected] ~/]$ annotate_variation.pl -downdb -buildver hg19 -webfrom annovar snp138 humandb
NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done
NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_snp138.txt.gz ... OK
NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_snp138.txt.idx.gz ... OK
NOTICE: Uncompressing downloaded files NOTICE: Finished downloading annotation files for hg18 build version, with files saved at the 'humandb' directory
#使用dbsnp138注釋
[[email protected] ~/]$ annotate_variation.pl -filter -out ex1 -build hg19 -dbtype snp138 example/ex1.avinput humandb/
NOTICE: Variants matching filtering criteria are written to ex1.hg19_snp138_dropped, other variants are written to ex1.hg19_snp138_filtered
NOTICE: Processing next batch with 15 unique variants in 15 input lines
NOTICE: Database index loaded. Total number of bins is 2858459 and the number of bins to be scanned is 12
NOTICE: Scanning filter database humandb/hg19_snp138.txt...Done
#輸入dropped file
[[email protected] ~/]$ cat ex1.hg19_snp138_dropped
snp138 rs35561142 1 11403596 11403596 - AT comments: rs35561142, a 2-bp insertion
snp138 rs149123833 1 1404001 1404001 G T comments: rs149123833, a SNP in 3' UTR of ATAD3C
snp138 rs1000050 1 162736463 162736463 C T comments: rs1000050, a SNP in Illumina SNP arrays
snp138 rs1287637 1 5935162 5935162 A T comments: rs1287637, a splice site variant in NPHP4
snp138 rs11209026 1 67705958 67705958 G A comments: rs11209026 (R381Q), a SNP in IL23R associated with Crohn's disease
snp138 rs6576700 1 84875173 84875173 C T comments: rs6576700 or SNP_A-1780419, a SNP in Affymetrix SNP arrays
snp138 rs15842 1 948921 948921 T C comments: rs15842, a SNP in 5' UTR of ISG15
snp138 rs80338939 13 20763686 20763686 G - comments: rs1801002 (del35G), a frameshift mutation in GJB2, associated with hearing loss
snp138 rs2066844 16 50745926 50745926 C T comments: rs2066844 (R702W), a non-synonymous SNP in NOD2
snp138 rs2066845 16 50756540 50756540 G C comments: rs2066845 (G908R), a non-synonymous SNP in NOD2
snp138 rs2066847 16 50763778 50763778 - C comments: rs2066847 (c.3016_3017insC), a frameshift SNP in NOD2
snp138 rs2241880 2 234183368 234183368 A G comments: rs2241880 (T300A), a SNP in the ATG16L1 associated with Crohn's disease
#*dropped檔案
第一列如region-based注釋的結果一樣以資料庫命名;
第二列為已經在資料庫的突變的indentifier号;
第三列開始同樣是輸入檔案的内容。
該注釋使用2012年4月歐洲釋出1000基因組計劃資料庫,輸出檔案會有兩個,*dropped file 和 *filtered file
filtered file裡面包含不在過濾資料庫内的突變。
參考文章:
ANNOVAR的使用
https://www.jianshu.com/p/95331e7a98cd
annovar-生信檔案夾
https://pzweuj.github.io/2017/09/22/annovar.html
歐易解說Annovar軟體
https://www.weixin765.com/doc/rnijnqqf.html
clinvar資料庫人類疾病資料庫ANNOVAR資料庫使用
http://www.omicsclass.com/article/458