天天看点

ALTER:序列比对格式转化小工具

多序列比对结果可以存储为很多格式(Multiple sequence alignments can be stored in a large variety of formats.)

比如最常见的:

Fasta

>ccsA1
ATGATATTTTCAACTTTAGAGCATATAT
>ccsA2
ATGATATTTTCAACTTTAGAGCATATAT
>ccsA3
ATGATATTTTCAACTTTAGAGCATATAT
>ccsA4
ATGATATTTTCAACTTTAGAGCATATAT           

复制

clustal

CLUSTAL W (1.8) multiple sequence alignment (ALTER 1.3.3)


ccsA1           ATGATATTTTCAACTTTAGAGCATATAT
ccsA2           ATGATATTTTCAACTTTAGAGCATATAT
ccsA3           ATGATATTTTCAACTTTAGAGCATATAT
ccsA4           ATGATATTTTCAACTTTAGAGCATATAT
                ****************************           

复制

NEXUS

#NEXUS
BEGIN DATA;
dimensions ntax=4 nchar=28;
format missing=?
interleave=yes datatype=DNA gap=- match=.;

matrix
ccsA1       ATGATATTTTCAACTTTAGAGCATATAT
ccsA2       ATGATATTTTCAACTTTAGAGCATATAT
ccsA3       ATGATATTTTCAACTTTAGAGCATATAT
ccsA4       ATGATATTTTCAACTTTAGAGCATATAT

;
end;           

复制

PHYLIP

4 28
ccsA1       atgatatttt caactttaga gcatatat
ccsA2       atgatatttt caactttaga gcatatat
ccsA3       atgatatttt caactttaga gcatatat
ccsA4       atgatatttt caactttaga gcatatat           

复制

MEGA

#mega
TITLE: MSA converted with ALTER 1.3.3

#ccsA1       ATGATATTTT CAACTTTAGA GCATATAT
#ccsA2       ATGATATTTT CAACTTTAGA GCATATAT
#ccsA3       ATGATATTTT CAACTTTAGA GCATATAT
#ccsA4       ATGATATTTT CAACTTTAGA GCATATAT           

复制

不同的比对软件会输出不一样的比对格式;比对后分析用到的软件对输入格式的要求也不一样。比如序列比对我习惯使用MAFFT。MAFFT输出结果默认为fasta格式,clustal可选;如果后续需要使用MrBayes构建贝叶斯树,需要将其转化为NEXUS格式。这里推荐 ALTER http://www.sing-group.org/ALTER/ 来完成比对格式转化的任务。如果分析的序列不是很多,可以选择网页版;如果序列条数比较多可以选择安装本地版 https://github.com/sing-group/ALTER;按照安装步骤执行即可,自己的安装过程没有遇到报错;

安装步骤

git clone https://github.com/sing-group/ALTER.git
cd ALTER
mvn package           

复制

依赖

Git tool for cloning the last version
A Java Compiler and tool
The Maven tool           

复制

以上依赖软件都可以通过conda安装;关于conda的安装教程可以微信搜索教程价值999的全外显子教学视频--免费送

安装好以后执行

java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar help

# 输出结果
No argument is allowed: help
 -c (--collapse)              : Collapse sequences to haplotypes.
 -cg (--collapseGaps)         : Treat gaps as missing data when collapsing.
 -cl (--collapseLimit) N      : Connection limit (sequences differing at <= l si
                                tes will be collapsed) (default is l=0).
 -cm (--collapseMissing)      : Count missing data as differences when collapsin
                                g.
 -i (--input) FILE            : Input file.
 -ia (--inputAutodetect)      : Autodetect format (other input options are omitt
                                ed).
 -if (--inputFormat) VAL      : Input format (ALN, FASTA, GDE, MEGA, MSF, NEXUS,
                                 PHYLIP or PIR).
 -io (--inputOS) VAL          : Input operating system (Linux, MacOS or Windows)
                                .
 -ip (--inputProgram) VAL     : Input program (Clustal, MAFFT, MUSCLE, PROBCONS 
                                or TCoffee).
 -o (--output) FILE           : Output file.
 -of (--outputFormat) VAL     : Output format (ALN, FASTA, GDE, MEGA, MSF, NEXUS
                                , PHYLIP or PIR).
 -ol (--outputLowerCase)      : Lowe case output.
 -om (--outputMatch)          : Output match characters.
 -on (--outputResidueNumbers) : Output residue numbers (only ALN format).
 -oo (--outputOS) VAL         : Output operating system (Linux, MacOS or Windows
                                ).
 -op (--outputProgram) VAL    : Output program (jModelTest, MrBayes, PAML, PAUP,
                                 PhyML, ProtTest, RAxML, TCS, CodABC, BioEdit, M
                                EGA, dnaSP, Se-Al, Mesquite, SplitsTree, Clustal
                                , MAFFT, MUSCLE, PROBCONS, TCoffee, Gblocks, Sea
                                View, trimAl or GENERAL)
 -os (--outputSequential)     : Sequential output (only NEXUS and PHYLIP formats
                                ).           

复制

我自己将fasta格式转化为NEXUX格式

java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar -i ~/mingyan/practice_assorted/Myrtales_CP_genomes/another/Myrtales_cp_genome_aligned.fasta-gb -ia -o ./output.nex -of NEXUS -op MrBayes -oo Linux

# 运行结果
<INFO> : FASTA format detected.
<INFO> : MSA read in FASTA format (Taxa = 90, Length =  106571).
<INFO> : Nucleotide MSA type inferred.
<INFO> : MSA successfully converted to NEXUS format!           

复制

小工具对应的论文

ALTER: program-oriented conversion of DNA and protein alignments

期刊

Nucleic Acids Research 2010年