ComplexHeatmap學習筆記和總結
聲明:
本文檔是在學習ComplexHeatmap和測試例子過程中,相關方法的小結,友善回顧檢視,快速實作資料可視化,了解錯誤之處,歡迎批評指正
1. ComplexHeatmap總覽
ComplexHeatmap軟體包主要用來展現熱圖,常說的熱圖包括,body和components(title, dendrograms, matrix names 和熱圖注釋(放在熱圖旁,該可根據注釋組合不同的複雜圖形)。
heatmap body部分可以根據行和列分割。
整體上講,這個包類似于ggplot,layout操作,不同track和使用+組合,此外可以将注釋圖形直接加入主圖中。
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLiIXZ05WZj91YpB3I2EzX4xSZz91ZsAzNfRHLGZkRGZkRfJ3bs92YsAjMfVmepNHL9gzVZBHaYFWN1cVWwJlbihmVIVWQClGVF5UMR9Fd4VGdsATNfd3bkFGazxycykFaKdkYzZUbapXNXlleSdVY2pESa9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnLzMzN3IGM0EWN1UTM4UzM5gDZzQTOlV2YkhDZ1MmZyI2Lc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
2. 單個熱圖
2.1 input 格式
單個熱圖和pheatmap功能較一緻,且二者可以相容使用
輸入是一個數值矩陣,行和列名可以使用rownames和colnames分别命名
library(circlize)
col_fun = colorRamp2(c(-2, 0, 2), c(“green”, “white”, “red”)) # 矩陣的值會映射到-2和2之間,需要根據實際資料進行調整
colorRamp2() 如果使用同一個mapping color, 允許比較,可以清晰的看出不同處理的資料差異
對于連續型資料,可以提供顔色向量,eg: colorRamp2(seq(min(mat), max(mat), length = 10), rev(rainbow(10))), 會自動mapping,但是會受到離群值影響較大,discrete values(數字或字元)需要提供一個顔色向量
Heatmap(mat, name = "mat", col = col_fun)
Heatmap(mat, name = "mat", col = col_fun, column_title = "mat")
Heatmap(mat/4, name = "mat", col = col_fun, column_title = "mat/4")
Heatmap(abs(mat), name = "mat", col = col_fun, column_title = "abs(mat)")
**discrete values:**
discrete_mat = matrix(sample(letters[1:4], 100, replace = TRUE), 10, 10)
colors = structure(1:4, names = letters[1:4])
Heatmap(discrete_mat, name = "mat", col = colors,
column_title = "a discrete character matrix")
pheatmap 介紹:
pheatmap(test, legend = FALSE),如下常用參數和作用
border_color: cells 格子的顔色,顔色值
border:邏輯值,是否顯示邊框
show_rownames和show_colnames: 邏輯值,是否顯示名字
display_numbers:邏輯,是否顯示數字
number_format:格式調整
cellwidth 和cellheight: cell的寬和高調整,調整cells的大小
**annotation_col:添加列注釋
annotation_row: 添加行注釋,注意row和col要和矩陣一緻,需要加個對應的rownames和colnames
annotation_colors:各個注釋水準的顔色對應清單, 清單每個元素 var=c(注釋水準="")
gaps_row 和gaps_col:數值向量,提供需要gap的索引,cluster_rows 和cluster_cols需要設定False
fontsize:數值,字型大小
scale:對row進行歸一化
**labels_row:**重點顯示幾個基因名字,類似Mark annotaion, #labels_row = c("", “”, “”, “”, “”, “”, “”, “”, “”, “”, “”, “”, “”, “”, “”,
“”, “”, “Il10”, “Il15”, “Il1b”)
angle_col = 45
fontsize_row和fontsize_col:row和col字型的大小
pheatmap
2.2 顔色選取和相關參數介紹
預設linearly interpolated in LAB color space,但是可以根據資料使用colorRamp2()調整。
f1 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"))
f2 = colorRamp2(seq(min(mat), max(mat), length = 3), c("blue", "#EEEEEE", "red"), space = "RGB")
Heatmap(mat, name = "mat1", col = f1, column_title = "LAB color space")
Heatmap(mat, name = "mat2", col = f2, column_title = "RGB color space")
border/border_gp 和 rect_gp 分别控制heatmap body 和 cell區域
border可以是邏輯值T或顔色向量,border_gp是一個gpar對象,個人了解這個gpar(grid::gpar())類似于R 的par 和 html style 可以設定相關的屬性
Heatmap(mat, name = "mat", border_gp = gpar(col = "black", lty = 2),
column_title = "set heatmap borders")
Heatmap(mat, name = "mat", column_title = "I am a column title at the bottom", column_title_side = "bottom")
Heatmap(mat, name = "mat", column_title = "I am a column title",
column_title_gp = gpar(fill = "red", col = "white", border = "blue"))
2.3 聚類track
Heatmap(mat, name = "mat", clustering_distance_rows = "pearson",
column_title = "pre-defined distance method (1 - pearson)") #pearson
Heatmap(mat, name = "mat", clustering_distance_rows = function(m) dist(m), column_title = "a function that calculates distance matrix") #距離
Heatmap(mat, name = "mat", clustering_distance_rows = function(x, y) 1 - cor(x, y), column_title = "a function that calculates pairwise distance") #pairwise distance
#pairwise distance 去除離群值,(0.1,0.9)之間過濾篩選
at_with_outliers = mat
for(i in 1:10) mat_with_outliers[i, i] = 1000
robust_dist = function(x, y) {
qx = quantile(x, c(0.1, 0.9))
qy = quantile(y, c(0.1, 0.9))
l = x > qx[1] & x < qx[2] & y > qy[1] & y < qy[2]
x = x[l]
y = y[l]
sqrt(sum((x - y)^2))
}
Heatmap(mat_with_outliers, name = "mat",
col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")),
clustering_distance_rows = robust_dist,
clustering_distance_columns = robust_dist,
column_title = "robust_dist")
cell_fun: 控制cell顯示,聚類方法通過clustering_method_rows 和 clustering_method_columns設定,與**hclust()**方法類似
library(cluster)
Heatmap(mat, name = "mat", cluster_rows = diana(mat),
cluster_columns = agnes(t(mat)), column_title = "clustering objects")
如果想修改旁邊的系統樹的 style,可以先得到dendrogram對象,通過nodePar 和 edgePar來設定邊和頂點,這個用的不多,
ibrary(dendextend)
row_dend = as.dendrogram(hclust(dist(mat)))
row_dend = color_branches(row_dend, k = 2) # `color_branches()` returns a dendrogram object
Heatmap(mat, name = "mat", cluster_rows = row_dend)
同樣的,row_dend_gp 和 column_dend_gp控制系統樹的設定
控制行和列的顯示順序,使用row_order和column_order,使用factors也可以,自然,行和列聚類關閉才生效。
列和行名顯示位置參數
row_names_side # rowname 顯示
row_dend_side #行進化樹
column_names_side # 列名
column_dend_side # 列進化樹
2.4 熱圖分割 split
控制分割的參數: row_km, row_split, column_km, column_split
row_km and column_km按照均值分割,另外可以設定row_km_repeats和column_km_repeats分别跑多次,最後取個一緻性的分割值,比預設的要小。
Heatmap(mat, name = "mat", row_km = 2, row_km_repeats = 100,
column_km = 3, column_km_repeats = 100)
可根據字元向量分割,比較常用,row_split or column_split字元向量或資料框,需要和矩陣的次元一緻。
Heatmap(mat, name = "mat",
row_split = rep(c("A", "B"), 9), column_split = rep(c("C", "D"), 12))
#字元型矩陣
# split by the first column in `discrete_mat`
Heatmap(discrete_mat, name = "mat", col = 1:4, row_split = discrete_mat[, 1])
slices(subgroups)順序問題,預設是排序的,可以設定cluster_row_slices or cluster_column_slices為False, 這樣順序就按照column_split分割的順序了
Heatmap(mat, name = "mat",
row_split = rep(LETTERS[1:3], 6),
column_split = rep(letters[1:6], 4))
Heatmap(mat, name = "mat", row_split = factor(rep(LETTERS[1:3], 6), levels = LETTERS[3:1]),column_split=factor(rep(letters[1:6], 4), levels = letters[6:1]), cluster_row_slices = FALSE, cluster_column_slices = FALSE)
slices的其他屬性,graphic parameters需要和slices的個數一緻
ht_opt$TITLE_PADDING = unit(c(4, 4), "points")
Heatmap(mat, name = "mat",
row_km = 2, row_title_gp = gpar(col = c("red", "blue"), font = 1:2),
row_names_gp = gpar(col = c("green", "orange"), fontsize = c(10, 14)),
column_km = 3, column_title_gp = gpar(fill = c("red", "blue", "green"), font = 1:3),
column_names_gp = gpar(col = c("green", "orange", "purple"), fontsize = c(10, 14, 8)))
slices之間的距離:row_gap = unit(5, “mm”)
row_gap = unit(5, “mm”)
邊框: border = TRUE
另外分割時,注釋圖形一起分割,cell_fun分别畫一個cells,layer_fun垂直版本,
small_mat = mat[1:9, 1:9]
col_fun = colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
Heatmap(small_mat, name = "mat", col = col_fun,
cell_fun = function(j, i, x, y, width, height, fill) {
grid.text(sprintf("%.1f", small_mat[i, j]), x, y, gp = gpar(fontsize = 10))
})
3. 熱圖注釋
熱圖注釋賦予圖形豐富的内容,可以展示軸相關的row和columns額外的資訊,op_annotation, bottom_annotation, left_annotation 和 right_annotation控制位置參數。參數的值需要是HeatmapAnnotation 類,有HeatmapAnnotation和rowAnnotation() [注釋row] 建構産生,官方文檔說明,rowAnnotation隻是HeatmapAnnotation(…, which = “row”)的特殊情況。
set.seed(123)
mat = matrix(rnorm(100), 10)
rownames(mat) = paste0("R", 1:10)
colnames(mat) = paste0("C", 1:10)
column_ha = HeatmapAnnotation(foo1 = runif(10), bar1 = anno_barplot(runif(10)))
row_ha = rowAnnotation(foo2 = runif(10), bar2 = anno_barplot(runif(10)))
Heatmap(mat, name = "mat", top_annotation = column_ha, right_annotation = row_ha)
注意注釋列,行次元需要和熱圖中資料分别一緻,注釋圖形函數以anno_*形式,可以注釋block,圖檔,points, lines,Barplot, Boxplot, histogram, Density, Text, Mark,分别格式如下:
# blocks
Heatmap(matrix(rnorm(100), 10), name = "mat",
top_annotation = HeatmapAnnotation(foo = anno_block(gp = gpar(fill = 2:4))),
column_km = 3)
#images
image_png = sample(dir("IcoMoon-Free-master/PNG/64px", full.names = TRUE), 10)
ha = HeatmapAnnotation(foo = anno_image(image_png))
ha = HeatmapAnnotation(foo = anno_image(image_png, space = unit(3, "mm")))
#points
ha = HeatmapAnnotation(foo = anno_points(matrix(runif(20), nc = 2),
pch = 1:2, gp = gpar(col = 2:3))) # foo 隻是個名字,可以修改
# lines
ha = HeatmapAnnotation(foo = anno_lines(cbind(c(1:5, 1:5), c(5:1, 5:1)),
gp = gpar(col = 2:3), add_points = TRUE, pt_gp = gpar(col = 5:6), pch = c(1, 16))) # 資料可以是一個向量或者矩陣
# barplot
ha = HeatmapAnnotation(foo = anno_barplot(1:10, gp = gpar(fill = 1:10))) # 向量是普通的bar圖
ha = HeatmapAnnotation(foo = anno_barplot(cbind(1:10, 10:1), #矩陣堆疊在一起,stack
gp = gpar(fill = 2:3, col = 2:3)))
# boxplot
ha = HeatmapAnnotation(foo = anno_boxplot(m, height = unit(4, "cm"),
gp = gpar(fill = 1:10)))
#Histogram,對于row注釋比較合适,輸入資料類似 anno_boxplot() [***a matrix or a list***]
‘m = matrix(rnorm(1000), nc = 100)
ha = rowAnnotation(foo = anno_histogram(m))
#Density, 類似Histogram,拟合曲線分布,type參數比較重要,可以取heatmap, violin,資料形式需要數值矩陣
m2 = matrix(rnorm(50*10), nrow = 50)
m = matrix(rnorm(1000), nc = 100)
ha = rowAnnotation(foo = anno_density(m, joyplot_scale = 2,
gp = gpar(fill = "#CCCCCC80")))
ha = rowAnnotation(foo = anno_density(m2, type = "heatmap", width = unit(6, "cm")))
# Mark 注釋,行和列有多個時,需要mark一部分,使用anno_mark(),至少需要兩個參數,at 原始矩陣的索引,labels相應的text
m = matrix(rnorm(1000), nrow = 100)
rownames(m) = 1:100
ha = rowAnnotation(foo = anno_mark(at = c(1:4, 20, 60, 97:100), labels = month.name[1:10]))
Heatmap(m, name = "mat", cluster_rows = FALSE, right_annotation = ha,
row_names_side = "left", row_names_gp = gpar(fontsize = 4))
Heatmap(m, name = "mat", cluster_rows = FALSE, right_annotation = ha,
row_names_side = "left", row_names_gp = gpar(fontsize = 4), row_km = 4)
多個注釋的,隻需要name-value寫到HeatmapAnnotation即可
ha = HeatmapAnnotation(foo = 1:10,
bar = cbind(1:10, 10:1),
pt = anno_points(1:10),
show_legend = c("bar" = FALSE)
)
Heatmap(matrix(rnorm(100), 10), name = "mat", top_annotation = ha)
小結:
注釋同通過anno_*擷取對象,通過 [right|left|bottom|top]_annotation顯示注釋的位置,所有注釋對象可以使用+連接配接
hitlist<-anno1+anno1 draw(hitlist),和多個下面介紹的多個熱圖組合一樣
4. 熱圖清單操作
熱圖清單操作類似于,上面的注釋,多個熱圖可以組合成一個list,使用draw進行顯示,可以水準黏貼和垂直黏貼,水準黏貼比較常用,便于比較,所有熱圖和對應注釋的的row數量 需要相同,即行數需要相同,
ht1 = Heatmap(mat1, name = "rnorm")
ht2 = Heatmap(mat2, name = "runif")
ht3 = Heatmap(le, name = "letters")
draw(ht_list, row_title = "Three heatmaps, row title", row_title_gp = gpar(col = "red"),
column_title = "Three heatmaps, column title", column_title_gp = gpar(fontsize = 16))
ht_list =ht1 + ht2 + ht3 #傳回heatmap list,會面可以任意追加 (+)熱圖
組合圖常用調整參數:
- size: width = unit(5, “mm”)
-
熱圖間Gap: draw(ht_list, ht_gap = unit(1, “cm”))
3)Row annotations,水準heatmaplist 可以連接配接注釋
ha1 = rowAnnotation(foo = 1:12, bar = anno_barplot(1:12, width = unit(4, "cm")))
ht1 = Heatmap(mat1, name = "rnorm", col = col_rnorm, row_km = 2)
ht1 + ha1
Heatmap(mat1, name = "rnorm", col = col_rnorm, row_km = 2) +
rowAnnotation(foo = 1:12) +
rowAnnotation(bar = anno_barplot(1:12, width = unit(4, "cm")))
4)adding a text annotation row names
ht1 + ha1 + rowAnnotation(rn = anno_text(rownames(mat1),
location = unit(0, "npc"), just = "left"))
5)adjust_annotation_extension,是否考慮軸和names之間的自動調整,heatmap name 和heatbody之間的空白區域
其他參數參考官方文檔:
https://jokergoo.github.io/ComplexHeatmap-reference/book/upset-plot.html