ggplot2学术绘图:配色方案、主题、多图拼接
ggplot2 默认输出与学术期刊要求之间存在差距:灰色背景、默认配色、字体偏小、图例默认位置。本文覆盖从默认输出到发表级图表的关键调整:CNS 级别配色方案、ggpubr 统计标注、cowplot/patchwork 多图拼接、300dpi 矢量图导出。每一步都带可运行代码。
实测环境:Debian 13,R 4.3.2,ggplot2 3.5.0。
1. 数据准备——用真实表达矩阵演示
library(tidyverse)library(ggplot2)
# 模拟差异表达结果(5000个基因)set.seed(42)degs <- tibble( gene_id = paste0("ENSG", sprintf("%08d", 1:5000)), log2FC = rnorm(5000, mean = 0, sd = 1.2), pvalue = runif(5000, 0, 1), padj = p.adjust(pvalue, method = "BH"), baseMean = 10^rnorm(5000, mean = 3, sd = 1.5)) %>% mutate( direction = case_when( log2FC > 1 & padj < 0.05 ~ "Up", log2FC < -1 & padj < 0.05 ~ "Down", TRUE ~ "NS" ) )
# 统计degs %>% count(direction)2. 默认图 vs 学术图——差在哪里
先看默认输出,记住它的样子:
# 默认版本(丑但能跑)p_default <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) + geom_point()ggsave("default.png", p_default, width = 8, height = 6, dpi = 72)默认图的问题清单:
- 灰底灰网格——期刊一律要求白底
- 默认配色饱和度拉满——大红大蓝打印出来一团黑
- 字体太小——缩放到页面宽度后坐标轴标签看不清
- 图例标题 “direction” 没人知道什么意思
- 散点透明度不设——几十万个点糊在一起
下面一步一步修。
3. 配色——从默认到 CNS 级别
3.1 配色原则
学术配色的核心约束:打印友好 + 色盲友好 + 期刊限制。
大部分期刊(Nature、Cell、Science)接受彩色图但要额外收费,所以很多人投稿用灰阶兼容的配色。推荐两个方案:
| 方案 | 包 | 适用 |
|---|---|---|
| viridis | scale_color_viridis_d() | 连续/离散都行,色盲友好 |
| ggsci | scale_color_npg() 等 | 模仿 CNS 期刊常用色板 |
| RColorBrewer | scale_color_brewer() | 经典学术色板 |
library(viridis)library(ggsci)
# 方案A:viridis(色盲友好,连续型适合热图)p_viridis <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) + geom_point(alpha = 0.4, size = 0.6) + scale_color_viridis_d(option = "D", end = 0.85)
# 方案B:ggsci 的 NPG 色板(Nature Publishing Group 风格)p_npg <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) + geom_point(alpha = 0.4, size = 0.6) + scale_color_npg()
# 方案C:手动指定——最可控p_custom <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) + geom_point(alpha = 0.4, size = 0.6) + scale_color_manual( values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB"), labels = c("Up" = "Up-regulated", "Down" = "Down-regulated", "NS" = "Not significant"), name = "" # 去掉图例标题 )Pro tip: 手动配色用十六进制,在 colorbrewer2.org 上挑。红-蓝对是生信火山图的事实标准。
3.2 连续型配色——表达量热图
热图用连续渐变色。默认从暗蓝到亮蓝——审稿人看了想打人。
# 经典红-白-蓝(低表达白,高表达红)ggplot(heatmap_data, aes(x = sample, y = gene, fill = expression)) + geom_tile() + scale_fill_gradient2( low = "#2166AC", # 深蓝(低表达) mid = "#F7F7F7", # 白色(中间值) high = "#B2182B", # 深红(高表达) midpoint = 0, name = "Z-score" )这里的核心是 midpoint 参数——如果数据不是对称的(比如只有上调没有下调),要把 midpoint 设成中位数而不是 0。
4. 主题——告别灰色背景
# 主题选择链p_base <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) + geom_point(alpha = 0.4, size = 0.6) + scale_color_manual( values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB") )
# 方案A:theme_bw() + 微调(最常用)p_bw <- p_base + theme_bw(base_size = 12) + theme( panel.grid.minor = element_blank(), # 去掉次要网格 panel.grid.major = element_line(linewidth = 0.3, color = "grey90"), legend.position = c(0.9, 0.85), # 图例放右上角 legend.background = element_rect(fill = "white", color = "grey80"), legend.key.size = unit(0.4, "cm") )
# 方案B:theme_minimal() + 边框(Nature 风格)p_nature <- p_base + theme_minimal(base_size = 12) + theme( panel.border = element_rect(fill = NA, color = "black", linewidth = 0.8), panel.grid = element_blank(), axis.line = element_blank(), legend.position = "right" )
# 方案C:theme_classic()(最简单,像Excel图但干净)p_classic <- p_base + theme_classic(base_size = 12) + theme( axis.line = element_line(color = "black", linewidth = 0.5), legend.position = "right" )base_size 参数的秘密: 全图字体基于这个值缩放。设 12,最终导出的 PDF 放论文里刚刚好。缩到期刊半页宽度(约 80mm)时,base_size 设 10。
4.1 自定义主题——一次定义,终生复用
# 定义你自己的学术主题theme_academic <- function(base_size = 12) { theme_bw(base_size = base_size) %+replace% theme( panel.grid.minor = element_blank(), panel.grid.major = element_line(linewidth = 0.3, color = "grey92"), panel.border = element_rect(fill = NA, color = "black", linewidth = 0.8), strip.background = element_rect(fill = "grey95", color = "black"), strip.text = element_text(size = base_size - 1, face = "bold"), axis.text = element_text(color = "black"), axis.title = element_text(size = base_size), legend.position = "bottom", legend.key.size = unit(0.5, "cm"), plot.title = element_text(size = base_size + 2, face = "bold", hjust = 0.5), plot.subtitle = element_text(size = base_size, hjust = 0.5) )}
# 使用p_final <- p_base + theme_academic(base_size = 12)5. 统计标注——ggpubr 一键加 p 值
library(ggpubr)
# 箱线图 + 显著性标注data(mtcars)ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) + geom_boxplot(outlier.shape = NA, width = 0.6) + geom_jitter(width = 0.1, alpha = 0.5, size = 1.5) + stat_compare_means( comparisons = list(c("4", "6"), c("6", "8"), c("4", "8")), method = "t.test", label = "p.signif", # 只显示显著性符号 # label = "p.format", # 显示精确 p 值 step.increase = 0.08 # 星号往上挪 ) + scale_fill_manual(values = c("#4DBBD5", "#E64B35", "#00A087")) + theme_academic(base_size = 12) + labs(x = "Cylinders", y = "Miles per Gallon", fill = "")label = "p.signif" 显示的是符号(*, **, ***),label = "p.format" 显示精确数值。SCI 通常要求给精确 p 值而不是星号——除非在 Figure Legend 里定义了阈值。
p 值的精确格式:
在实际标注时, 是 R 的浮点精度极限——别写成 ,那是错的。
6. 多图拼接——patchwork 无敌
library(patchwork)
# 生成四个子图p1 <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) + geom_point(alpha = 0.3, size = 0.5) + scale_color_manual(values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB")) + theme_academic() + labs(title = "Volcano plot")
p2 <- ggplot(degs, aes(x = log2FC)) + geom_histogram(fill = "#4DBBD5", alpha = 0.7, bins = 60) + theme_academic() + labs(title = "log2FC distribution", x = "log2 Fold Change")
p3 <- ggplot(degs, aes(x = direction, fill = direction)) + geom_bar() + scale_fill_manual(values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB")) + theme_academic() + labs(title = "DEG counts", x = "") + theme(legend.position = "none")
p4 <- ggplot(degs, aes(x = baseMean, y = log2FC)) + geom_point(alpha = 0.3, size = 0.5, color = "grey40") + scale_x_log10() + geom_hline(yintercept = c(-1, 1), linetype = "dashed", color = "red", alpha = 0.5) + theme_academic() + labs(title = "MA plot", x = "Mean expression (log10)")
# 拼接:上面两个下面两个(经典2×2)combined <- (p1 | p2) / (p3 | p4) + plot_annotation( title = "RNA-seq Differential Expression Analysis", tag_levels = "A" # 自动加 A/B/C/D 标签 ) & theme(plot.tag = element_text(face = "bold", size = 14))
# 导出ggsave("figure_panel.pdf", combined, width = 14, height = 10, device = cairo_pdf)patchwork 的灵魂操作符:
|:水平拼接/:垂直拼接+ plot_annotation():加总标题和子图标签& theme(...):所有子图统一应用主题plot_layout(guides = "collect"):合并相同的图例
7. 导出——像素级控制
# PDF(矢量,期刊首选)ggsave("figure.pdf", p_final, width = 8, height = 6, device = cairo_pdf)
# TIFF(300dpi 位图,Cell/Nature 要求)ggsave("figure.tiff", p_final, width = 8, height = 6, dpi = 300, device = "tiff", compression = "lzw")
# PNG(预览用)ggsave("figure_preview.png", p_final, width = 8, height = 6, dpi = 150)
# SVG(矢量,适合进一步在 Illustrator 里编辑)ggsave("figure.svg", p_final, width = 8, height = 6)期刊宽度换算:
- 单栏:80-90mm ≈ 3.15-3.54 inch
- 双栏/整页:170-180mm ≈ 6.7-7.1 inch
# 适配单栏宽度ggsave("figure_single_col.pdf", p_final, width = 3.5, height = 3, device = cairo_pdf)8. 两个高级技巧
8.1 火山图基因标签(ggrepel)
library(ggrepel)
# 标记 top10 基因top_genes <- degs %>% filter(padj < 0.05) %>% slice_max(order_by = abs(log2FC), n = 10)
ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) + geom_point(alpha = 0.4, size = 0.6) + geom_text_repel( data = top_genes, aes(label = gene_id), size = 3, max.overlaps = 15, box.padding = 0.5, force = 2 ) + scale_color_manual( values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB") ) + theme_academic()8.2 分面(facet)——一组图自动拆分
# 按染色体分面展示(模拟数据)degs_chr <- degs %>% mutate(chr = sample(paste0("chr", 1:22), nrow(degs), replace = TRUE)) %>% filter(chr %in% paste0("chr", 1:6))
ggplot(degs_chr, aes(x = baseMean, y = log2FC)) + geom_point(alpha = 0.3, size = 0.3, color = "grey40") + facet_wrap(~ chr, ncol = 3, scales = "free_x") + theme_academic(base_size = 9) + labs(x = "Mean expression", y = "log2 Fold Change")9. 踩坑记录
坑1:PDF 里中文字体全是方框
症状:ggsave("figure.pdf") 打开,中文标题全变成 □□□。
解决: 用 cairo_pdf 设备,它能通过 fontconfig 找到系统中文字体。
ggsave("figure.pdf", p, device = cairo_pdf)
# 或者全局设置options(bitmapType = "cairo")如果还是不行,检查系统有无中文字体:
fc-list :lang=zh | head -5没有就装:
sudo apt install fonts-noto-cjk -y坑2:ggsave 尺寸用错了单位
症状:ggsave("fig.png", p, width = 80, height = 60) 输出 80 英寸宽的图,几十 MB。
原因: width 和 height 单位是英寸(inch),不是毫米。。
换算公式:
80mm ≈ 3.15 inches。别直接写 80。
坑3:stat_compare_means 报 Can't compute p-value
# 报错示例stat_compare_means(comparisons = list(c("A", "B")), method = "t.test")# Error: not enough 'x' observations原因:某个分组样本数太少(比如 n=1),t 检验需要至少 2 个样本。
检查:
table(metadata$group)
# 如果某组 n=1,用 Wilcoxon(非参数)或换可视化方式stat_compare_means(method = "wilcox.test")坑4:patchwork 拼接后图例重复出现
症状:(p1 | p2) / (p3 | p4) 四个子图的图例各出现一次,占了 1/3 画面。
解决: 把所有图的 legend 统一收集:
combined <- (p1 | p2) / (p3 | p4) + plot_layout(guides = "collect") & theme(legend.position = "bottom")plot_layout(guides = "collect") 会自动合并相同的图例。& theme(legend.position = "bottom") 把合并后的图例放到底部。
坑5:ggsave 后的图在当前设备上不显示
症状:跑完 ggsave("x.pdf", p),RStudio 的 Plots 面板空白。
原因: ggsave 默认关闭当前图形设备。不影响实际导出,但会让你以为代码没运行。
解决: 导出后重新打印一次图:
ggsave("x.pdf", p) # 导出print(p) # 在当前设备中重新显示或者用 ggsave(..., create.dir = TRUE) 至少保证目录存在。更好的习惯是——在脚本最后加一行 p_final 让图自动在 console 里重新渲染。
本文于 2025-07-22 在 Debian 13 + R 4.3.2 上实测。所有代码可直接复制运行。
文章分享
如果这篇文章对你有帮助,欢迎分享给更多人!