天天看點

單細胞學習筆記5:cellranger中測序資料命名規則1. 明确.fastq.gz檔案路徑,修改檔案名稱(Specifying Input FASTQ Files for 10x Pipelines),檔案必須符合cellranger相關命名規則

1. 明确.fastq.gz檔案路徑,修改檔案名稱(Specifying Input FASTQ Files for 10x Pipelines),檔案必須符合cellranger相關命名規則

[Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz

Where Read Type is one of:

I1: Sample index read (optional)
R1: Read 1
R2: Read 2
           

*注:I1是optional,因為我的資料中沒有I1 reads

主要涉及到以下四個指令:

--fastqs  ##fastq.gz檔案的絕對路徑,如果測序資料在多個檔案夾中,用逗号(,)分隔多個路徑
--sample  ##可選擇參數(optional),想要分析的某個樣品或多個樣品(逗号分隔)
--libraries	 ##feature-barcode analysis需要該參數,用于描述輸入文庫的.csv檔案的絕對路徑
             ##需要注意的是不能和--fastqs或--sample參數同時使用
--lanes	  ##大的測序資料量(如150G)需要在不同泳道(lane)上測序,該參數可以指定分析特定泳道的資料
           

不同的測序公司傳遞測序資料的儲存路徑,以及測序檔案的命名規則千差萬别,是以10X特别對測序檔案路徑和命名規則做出了說明,下面是我的資料傳遞路徑:

單細胞學習筆記5:cellranger中測序資料命名規則1. 明确.fastq.gz檔案路徑,修改檔案名稱(Specifying Input FASTQ Files for 10x Pipelines),檔案必須符合cellranger相關命名規則
單細胞學習筆記5:cellranger中測序資料命名規則1. 明确.fastq.gz檔案路徑,修改檔案名稱(Specifying Input FASTQ Files for 10x Pipelines),檔案必須符合cellranger相關命名規則

測序得到的最最最原始資料是.bcl檔案,也就是一個個的熒光信号資訊,可以用cellranger mkfastq或bcl2fastq指令轉換為我們最熟知的.fastq.gz檔案,不同指令和參數生成的測序檔案路徑以及名稱存在差異:

案例1: 不同樣品測序資料在不同樣品名檔案夾下,與上邊我的檔案路徑和命名規則相同,路徑結構和參數設定如下:

MKFASTQ_ID
|-- MAKE_FASTQS_CS
`-- outs
    |-- fastq_path
        |-- HFLC5BBXX
            |-- test_sample1
            |   |-- test_sample1_S1_L001_I1_001.fastq.gz
            |   |-- test_sample1_S1_L001_R1_001.fastq.gz
            |   |-- test_sample1_S1_L001_R2_001.fastq.gz
            |   |-- test_sample1_S1_L002_I1_001.fastq.gz
            |   |-- test_sample1_S1_L002_R1_001.fastq.gz
            |   |-- test_sample1_S1_L002_R2_001.fastq.gz
            |   |-- test_sample1_S1_L003_I1_001.fastq.gz
            |   |-- test_sample1_S1_L003_R1_001.fastq.gz
            |   `-- test_sample1_S1_L003_R2_001.fastq.gz
            |-- test_sample2
            |   |-- test_sample2_S2_L001_I1_001.fastq.gz
            |   |-- test_sample2_S2_L001_R1_001.fastq.gz
            |   |-- test_sample2_S2_L001_R2_001.fastq.gz
            |   |-- test_sample2_S2_L002_I1_001.fastq.gz
            |   |-- test_sample2_S2_L002_R1_001.fastq.gz
            |   |-- test_sample2_S2_L002_R2_001.fastq.gz
            |   |-- test_sample2_S2_L003_I1_001.fastq.gz
            |   |-- test_sample2_S2_L003_R1_001.fastq.gz
            |   `-- test_sample2_S2_L003_R2_001.fastq.gz
        |-- Reports
        |-- Stats
        |-- Undetermined_S0_L001_I1_001.fastq.gz
        ...
        `-- Undetermined_S0_L003_R2_001.fastq.gz
           
--fastqs=MKFASTQ_ID/outs/fastq_path   ##全部樣品

--fastqs=MKFASTQ_ID/outs/fastq_path1,MKFASTQ_ID/outs/fastq_path2  ##多個flowcells上測序的全 
                                                                  ##部樣品
--fastqs=/PATH/TO/bcl2fastq_output    ##bcl2fastq指令生成的.fastq.gz路徑

--fastqs=MKFASTQ_ID/outs/fastq_path \
--sample=test_sample1                 ##sample1在全部泳道(lane1-3)的測序資料

--fastqs=MKFASTQ_ID/outs/fastq_path \
--sample=test_sample1 \
--lanes=1                             ##sample1在泳道1(lane1)的測序資料

fastqs=MKFASTQ_ID/outs/fastq_path \
--sample=test_sample1,test_sample2    ##将sample1和sample2作為合并樣品進行分析
           

案例2: 同一樣品的測序檔案在多個檔案夾下(包含4個sample index),路徑結構和參數設定如下:

bcl2fastq_output
|-- HFLC5BBXX
    |-- SI-GA-A1_1
    |   |-- SI-GA-A1_1_S1_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_1_S1_L001_R1_001.fastq.gz
    |   `-- SI-GA-A1_1_S1_L001_R2_001.fastq.gz
    |-- SI-GA-A1_2
    |   |-- SI-GA-A1_2_S2_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_2_S2_L001_R1_001.fastq.gz
    |   `-- SI-GA-A1_2_S2_L001_R2_001.fastq.gz
    |-- SI-GA-A1_3
    |   |-- SI-GA-A1_3_S3_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_3_S3_L001_R1_001.fastq.gz
    |   `-- SI-GA-A1_3_S3_L001_R2_001.fastq.gz
    |-- SI-GA-A1_4
    |   |-- SI-GA-A1_4_S4_L001_I1_001.fastq.gz
    |   |-- SI-GA-A1_4_S4_L001_R1_001.fastq.gz
    |   `-- SI-GA-A1_4_S4_L001_R2_001.fastq.gz
|-- Reports
|-- Stats
|-- Undetermined_S0_L001_I1_001.fastq.gz
|-- Undetermined_S0_L001_R1_001.fastq.gz
`-- Undetermined_S0_L001_R2_001.fastq.gz
           
--fastqs=MKFASTQ_ID/outs/fastq_path      ##全部樣品

--fastqs=MKFASTQ_ID/outs/fastq_path \    ##單獨分析每一個檔案夾下測序資料
--sample=SI-GA-A1_1,SI-GA-A1_2,SI-GA-A1_3,SI-GA-A1_4

--fastqs=MKFASTQ_ID/outs/fastq_path \    ##隻分析第一個sample index
--sample=SI-GA-A1_1
           

案例3:感覺與案例1相似,但是顯得特别亂 

fastq_path
|-- Reports
|-- Stats
|-- test_sample_S1_L001_I1_001.fastq.gz
|-- test_sample_S1_L001_R1_001.fastq.gz
|-- test_sample_S1_L001_R2_001.fastq.gz
|-- test_sample_S1_L002_I1_001.fastq.gz
|-- test_sample_S1_L002_R1_001.fastq.gz
|-- test_sample_S1_L002_R2_001.fastq.gz
|-- test_sample_S1_L003_I1_001.fastq.gz
|-- test_sample_S1_L003_R1_001.fastq.gz
|-- test_sample_S1_L003_R2_001.fastq.gz
|-- Undetermined_S0_L001_I1_001.fastq.gz
...
`-- Undetermined_S0_L003_R2_001.fastq.gz
           
--fastqs=MKFASTQ_ID/outs/fastq_path   ##所有樣品,mkfastq

-fastqs=/PATH/TO/bcl2fastq_output     ##所有樣品,bcl2fastq

--fastqs=MKFASTQ_ID/outs/fastq_path \ ##test sample資料
--sample=test_sample

--fastqs=MKFASTQ_ID/outs/fastq_path \  ##test sample,lane1資料
--sample=test_sample \
--lanes=1
           

案例4:路徑結構和參數設定如下:

PROJECT_FOLDER
|-- MySample_S1_L001_I1_001.fastq.gz
|-- MySample_S1_L001_R1_001.fastq.gz
|-- MySample_S1_L001_R2_001.fastq.gz
|-- MySample_S1_L002_I1_001.fastq.gz
|-- MySample_S1_L002_R1_001.fastq.gz
|-- MySample_S1_L002_R2_001.fastq.gz
           
--fastqs=/PATH/TO/PROJECT_FOLDER   ##所有樣品

--fastqs=/PATH/TO/PROJECT_FOLDER \ ##某一樣品
--sample=MySample


--fastqs=/PATH/TO/PROJECT_FOLDER \ ##MySample的lane1
--sample=MySample \
--lanes=1