DESeq2差异表达分析：RNA-seq最小可用流程

差异表达分析回答”处理组和对照组哪些基因发生了变化”。从 count 矩阵出发，DESeq2 提供了标准化的差异分析流程。本文聚焦最小可用流程：表达矩阵准备、样本表构建、差异分析、结果提取、火山图，不深入统计原理。

1. 准备数据#

DESeq2 需要两个输入：

1
# 1. count矩阵（基因×样本，整数）
2
head(counts)
3
#            control1 control2 control3 treat1 treat2 treat3
4
# ENSG00001      200      180      190    500    520    480
5
# ENSG00002        5        8        3     10     12     15
6

7
# 2. 样本信息
8
coldata <- data.frame(
9
    row.names = colnames(counts),
10
    condition = factor(c(rep("Control",3), rep("Treatment",3)))
11
)

注意：count必须是整数（raw counts），不能用FPKM/TPM。

2. 差异分析#

1
library(DESeq2)
2

3
# 构建对象
4
dds <- DESeqDataSetFromMatrix(
5
    countData = counts,
6
    colData = coldata,
7
    design = ~ condition
8
)
9

10
# 过滤低表达基因（至少在3个样本中count≥10）
11
keep <- rowSums(counts(dds) >= 10) >= 3
12
dds <- dds[keep, ]
13

14
# 跑分析
15
dds <- DESeq(dds)
16

17
# 提取结果
18
res <- results(dds, contrast = c("condition", "Treatment", "Control"),
19
               alpha = 0.05)
20
summary(res)
21

22
# 导出
23
write.csv(as.data.frame(res), "DEG_results.csv")

3. 筛选差异基因#

1
res_sig <- as.data.frame(res) %>%
2
    filter(padj < 0.05, abs(log2FoldChange) > 1) %>%
3
    arrange(padj)

阈值	含义
padj < 0.05	多重检验校正后显著
\|log2FC\| > 1	变化超过2倍

4. 火山图#

1
library(ggplot2)
2
ggplot(as.data.frame(res), aes(x = log2FoldChange, y = -log10(padj))) +
3
    geom_point(alpha = 0.3, size = 0.5) +
4
    geom_vline(xintercept = c(-1, 1), linetype = "dashed") +
5
    geom_hline(yintercept = -log10(0.05), linetype = "dashed") +
6
    theme_bw()

5. 踩坑#

坑1：count矩阵有小数——featureCounts 输出有时含浮点数。用 round() 取整。

坑2：样本名不匹配——coldata 的 row.names 必须等于 count 矩阵的 colnames。

坑3：没有生物学重复报错——DESeq2 要求每组≥2个重复。n=1时用 design = ~ 1，但不推荐。

本文于 2025-08-20 实测。

1. 准备数据#

2. 差异分析#

3. 筛选差异基因#

4. 火山图#

5. 踩坑#

文章分享

文章目录