天天看点

计算基因组染色体长度的Python脚本

调用命令:python cal_chrom_length.py hg19.fasta hg19.len 

import sys
import os

fasta_path = sys.argv[1]
len_path = sys.argv[2]
temp_path = len_path + ".temp"

fw = open(len_path, "w+")
os.system("fastalength " + fasta_path + " > " + temp_path)
with open(temp_path) as f:
    lines = f.readlines()
    for line in lines:
        cols = line.strip("\n").split()
        fw.write("\t".join([cols[1], cols[0]]) + "\n")
fw.close()
os.remove(temp_path)
           

输出的hg19.len文件的结构是:染色体名称<TAB>长度,适用于下游的genomeCoverageBed、bedGraphToBigWig等分析。

继续阅读