Back to skills
extension
Category: Content & MediaNo API key required

clustermarkersofallcells

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

personAuthor: jakexiaohubgithub

ClusterMarkersOfAllCells Process Configuration

Purpose

Finds marker genes for clusters of ALL cells before T/B cell selection. This process identifies differentially expressed genes across unsupervised clusters to help identify broad cell types (T cells, B cells, Myeloid cells, NK cells, etc.) in mixed immune cell populations.

When to Use

  • After SeuratClusteringOfAllCells: Runs on all cells before T/B selection
  • Before TOrBCellSelection: Provides markers to identify which clusters are T/B cells
  • Broad cell type identification: Distinguish major immune cell types from mixed populations
  • Mixed cell populations: When your data contains T, B, Myeloid, NK, and other cell types
  • Initial cell typing: First-pass identification before detailed annotation
  • Data quality check: Verify expected cell types are present in your data

Configuration Structure

Process Enablement

[ClusterMarkersOfAllCells]
cache = true

Input Specification

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
# Accepts output from SeuratClusteringOfAllCells process

Environment Variables

All parameters are inherited from ClusterMarkers and MarkersFinder:

[ClusterMarkersOfAllCells.envs]
# Parallel computing
ncores = 1

# Grouping (uses seurat_clusters by default)
group_by = null  # null = use Seurat::Idents() (usually "seurat_clusters")

# Statistical test parameters (passed to Seurat::FindMarkers())
test.use = "wilcox"           # wilcox (Wilcoxon), bimod, roc, t, negbinom, poisson
min.pct = 0.1                  # Only test genes detected in >=10% of cells
logfc.threshold = 0.25         # Minimum log2 fold change

# Marker filtering
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"  # Filter for significant markers

# Enrichment analysis
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]
enrich_style = "enrichr"       # enrichr or clusterprofiler

# Error handling
error = false                  # Don't error out if no markers found

# Visualization
marker_plots_defaults = {"order_by": "desc(avg_log2FC)"}
allmarker_plots = {"Top 10 markers of all clusters": {"plot_type": "heatmap"}}

External References

Seurat FindMarkers Parameters

  • Full reference: https://satijalab.org/seurat/reference/findmarkers
  • Statistical tests: test.use parameter
    • "wilcox": Wilcoxon Rank Sum test (default, recommended)
    • "roc": Receiver Operating Characteristic
    • "t": Student's t-test
    • "negbinom": Negative binomial (requires DESeq2)
    • "poisson": Poisson test
  • Common arguments (use - instead of . in TOML):
    • min-pct: Minimum detection percentage in either group
    • logfc-threshold: Minimum log2 fold change threshold
    • only-pos: Only return positive markers
    • min-diff-pct: Minimum difference in detection percentage

Enrichment Databases

  • MSigDB: https://www.gsea-msigdb.org/gsea/msigdb/
  • KEGG: https://www.genome.jp/kegg/
  • Reactome: https://reactome.org/
  • GO: http://geneontology.org/

Configuration Examples

Minimal Configuration

[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

Standard Marker Finding

[SeuratClusteringOfAllCells]
[ClusterMarkersOfAllCells]

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]

[ClusterMarkersOfAllCells.envs]
# Find markers for broad cell type identification
dbs = ["MSigDB_Hallmark_2020", "KEGG_2021_Human"]
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0.25"

# Generate key visualizations
[ClusterMarkersOfAllCells.envs.marker_plots."Volcano Plot (log2FC)"]
plot_type = "volcano_log2fc"

[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 10 markers of all clusters"]
plot_type = "heatmap"

[ClusterMarkersOfAllCells.envs.enrich_plots."Bar Plot"]
plot_type = "bar"
top_term = 10

Common Patterns

Pattern 1: Broad Cell Type Markers

[ClusterMarkersOfAllCells.envs]
# Optimized for distinguishing T/B/Myeloid/NK cells
min-pct = 0.1              # Require detection in >=10% of cells
logfc-threshold = 0.25     # Minimum log2 fold change
test.use = "wilcox"        # Fast and robust
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 0"

# Visualize markers to identify cell types
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top 20 markers per cluster"]
plot_type = "heatmap"

# Check for expected markers in outputs
# T cells: CD3D, CD3E, CD3G, CD4, CD8A
# B cells: CD19, MS4A1 (CD20), CD79A, CD79B
# Myeloid: CD14, LYZ, FCGR3A, CD68
# NK cells: NCAM1 (CD56), KLRD1 (CD94), NKG7

Pattern 2: Quick Wilcoxon for Large Datasets

[ClusterMarkersOfAllCells.envs]
# Fast analysis for large datasets (>50k cells)
ncores = 8                  # Use multiple cores
test.use = "wilcox"
min-pct = 0.15              # More stringent to reduce noise
logfc-threshold = 0.3
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 0.5"

# Skip enrichment to save time
dbs = []

# Generate only essential plots
[ClusterMarkersOfAllCells.envs.allmarker_plots."Top markers heatmap"]
plot_type = "heatmap"

Pattern 3: Identify T/B Cell Clusters

[ClusterMarkersOfAllCells.envs]
# Focus on finding T and B cell markers for selection
sigmarkers = "p_val_adj < 0.05 & avg_log2FC > 1"

# Will help identify which clusters express:
# T cell markers: CD3D, CD3E, CD3G
# B cell markers: CD19, MS4A1, CD79A

[ClusterMarkersOfAllCells.envs.allmarker_plots."All markers heatmap"]
plot_type = "heatmap"

Difference from ClusterMarkers

| Aspect | ClusterMarkersOfAllCells | ClusterMarkers | |--------|--------------------------|----------------| | Timing | BEFORE TOrBCellSelection | AFTER TOrBCellSelection | | Data Scope | ALL cells (mixed population) | SELECTED T/B cells only | | Purpose | Identify broad cell types | Fine-grained sub-clusters | | Typical markers | CD3, CD19, CD14, NK markers | Activation, differentiation markers | | Use case | "Which clusters are T/B/Myeloid?" | "What subtypes exist within T cells?" | | Upstream | SeuratClusteringOfAllCells | SeuratClustering (post-selection) | | Downstream | TOrBCellSelection | Cell type annotation, downstream analysis |

Key insight: Use ClusterMarkersOfAllCells when you need to separate T/B cells from other cell types. Use ClusterMarkers when you want to analyze sub-clusters within already-purified T or B cell populations.

Dependencies

Upstream Processes

  • SeuratClusteringOfAllCells: Required - provides clustered object with seurat_clusters metadata
  • SeuratPreparing: Indirect - provides normalized Seurat object
  • SampleInfo or LoadingRNAFromSeurat: Entry point for data

Downstream Processes

  • TOrBCellSelection: Primary consumer - uses marker results to select T/B cells
  • TopExpressingGenesOfAllCells: Optional complementary analysis

Validation Rules

Required Inputs

[ClusterMarkersOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]  # Must be specified

Process Enablement

  • Process automatically enabled when SeuratClusteringOfAllCells is in config
  • No need to explicitly set [ClusterMarkersOfAllCells] if SeuratClusteringOfAllCells is enabled

Parameter Constraints

  • test.use: Must be one of "wilcox", "roc", "t", "negbinom", "poisson"
  • min-pct: Should be between 0 and 1 (e.g., 0.1 = 10%)
  • logfc-threshold: Numeric value (log2 scale)
  • sigmarkers: Valid dplyr filter expression

Common Errors

  • Missing clustering: Ensure SeuratClusteringOfAllCells runs first
  • No markers found: Adjust sigmarkers or logfc-threshold if too stringent
  • Memory issues: Reduce ncores or subset data with large datasets

Troubleshooting

Issue: No significant markers found

Symptoms: Empty output directory or warning about no markers

Solutions:

[ClusterMarkersOfAllCells.envs]
# Less stringent thresholds
logfc-threshold = 0.1           # Lower fold change requirement
min-pct = 0.05                 # Lower detection percentage
sigmarkers = "p_val_adj < 0.1"  # More relaxed p-value

# Or check data quality
# - Are cells properly clustered?
# - Is expression matrix normalized?
# - Are there enough cells per cluster (>30 recommended)?

Issue: Too many markers (slow enrichment)

Symptoms: Process takes very long, memory issues

Solutions:

[ClusterMarkersOfAllCells.envs]
# More stringent filtering
logfc-threshold = 0.5
min-pct = 0.2
sigmarkers = "p_val_adj < 0.01 & avg_log2FC > 1"

# Reduce enrichment databases
dbs = ["MSigDB_Hallmark_2020"]

# Or skip enrichment entirely
dbs = []

Issue: Can't identify T/B cell clusters

Symptoms: Markers don't show clear T/B cell signatures

Solutions:

  1. Check marker gene presence:

    # Verify expected markers are in your data
    # Use SeuratClusterStats to visualize:
    [SeuratClusterStats.envs.features_defaults]
    features = ["CD3D", "CD3E", "CD19", "MS4A1", "CD14", "LYZ"]
    
  2. Adjust clustering parameters:

    [SeuratClusteringOfAllCells.envs]
    res = 0.5  # Try different resolutions (0.2-1.5)
    
  3. Check data quality:

    • Are genes properly normalized?
    • Are there enough cells per cluster?
    • Is species correct (human vs mouse gene symbols)?

Issue: Process not running

Symptoms: Process skipped in workflow

Solutions:

  • Verify SeuratClusteringOfAllCells is in config
  • Check dependencies are running correctly
  • Ensure TCR data requires T/B selection (not all T cells already)

Typical Marker Genes for Identification

| Cell Type | Positive Markers | Negative Markers | |-----------|------------------|------------------| | T cells | CD3D, CD3E, CD3G, CD4, CD8A | CD19, MS4A1, CD14 | | B cells | CD19, MS4A1 (CD20), CD79A, CD79B | CD3E, CD3D, CD14 | | Monocytes | CD14, LYZ, FCGR3A, S100A8 | CD3E, CD19 | | NK cells | NCAM1 (CD56), KLRD1 (CD94), NKG7 | CD3E, CD19, CD14 | | Dendritic cells | FCER1A, CST3 | CD3E, CD19, CD14 | | Megakaryocytes | PPBP, PF4 | CD3E, CD19, CD14 |

Use these marker lists to identify which clusters correspond to which cell types in your allmarker_plots heatmaps.