Skip to content

Conversation

@KevinMLanderos
Copy link
Collaborator

This section allows the pipeline to run 'SAMPLE' and 'COMPARE' modules on samples pseudo-bulked by phenotype.

@github-actions
Copy link

github-actions bot commented Jan 11, 2026

Unit Test Results

10 tests  ±0   10 ✅ ±0   2m 45s ⏱️ -1s
 2 suites ±0    0 💤 ±0 
 1 files   ±0    0 ❌ ±0 

Results for commit 9113490. ± Comparison against base commit b862c22.

♻️ This comment has been updated with latest results.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds phenotype-based pseudobulking functionality to the TCR toolkit pipeline, allowing samples to be pseudo-bulked by phenotype annotations from Seurat objects and then analyzed through the existing SAMPLE and COMPARE modules.

Changes:

  • Added phenotype pseudobulking workflow that processes samples by phenotype when a Seurat GEX object is provided
  • Extended the AIRR_CONVERT subworkflow to generate phenotype-specific pseudobulk files from CellRanger data
  • Created parallel analysis paths (SAMPLE_PHENO and COMPARE_PHENO) to process phenotype-segregated data
  • Enhanced Python scripts to support phenotype metadata extraction and improved data handling in compare_calc.py

Reviewed changes

Copilot reviewed 12 out of 15 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
workflows/tcrtoolkit.nf Main workflow integration of phenotype analysis with conditional execution based on sobject_gex parameter
subworkflows/local/sample_pheno.nf New subworkflow for sample-level analysis of phenotype-pseudobulked data
subworkflows/local/compare_pheno.nf New subworkflow for comparison analysis of phenotype-pseudobulked data
subworkflows/local/resolve_samplesheet_pheno.nf New subworkflow to collect phenotype sample files for downstream processing
subworkflows/local/map_phenotypes.nf New subworkflow to transform phenotype files and generate phenotype-specific samplesheet
subworkflows/local/airr_convert.nf Extended to support phenotype pseudobulking for CellRanger input format
subworkflows/local/sample.nf Removed commented-out emit statements
subworkflows/local/compare.nf Added collectFile operations for similarity matrices and removed commented code
modules/local/samplesheet/generate_pheno_samplesheet.nf New module to generate phenotype-specific samplesheet from metadata
modules/local/airr_convert/pseudobulk_phenotype_cellranger.nf New module to run phenotype-based pseudobulking on CellRanger data
modules/local/airr_convert/pseudobulk_cellranger.nf Added container directive
conf/modules.config Added publishDir configurations for phenotype analysis outputs
bin/pseudobulk.py Added phenotype processing functions and command-line options
bin/create_pheno_samplesheet.py New script to generate phenotype samplesheet from JSON metadata
bin/compare_calc.py Improved data handling with better validation and numeric type handling
Comments suppressed due to low confidence (1)

subworkflows/local/compare.nf:56

  • The collectFile operations are collecting outputs from COMPARE_CALC.out but then COMPARE_PLOT is still using the original COMPARE_CALC.out channels (lines 50-52), not the collected files. This creates redundant file collection. Either use the collected files in COMPARE_PLOT or remove the collectFile operations if they're not needed.
    COMPARE_CALC.out.jaccard_mat
        .collectFile(name: 'jaccard_mat.csv', sort: true, 
                     storeDir: "${params.outdir}/compare")
        .set { jaccard_mat }

    COMPARE_CALC.out.sorensen_mat
        .collectFile(name: 'sorensen_mat.csv', sort: true, 
                     storeDir: "${params.outdir}/compare")
        .set { sorensen_mat }

    COMPARE_CALC.out.morisita_mat
        .collectFile(name: 'morisita_mat.csv', sort: true, 
                     storeDir: "${params.outdir}/compare")
        .set { morisita_mat }

    COMPARE_PLOT( samplesheet_resolved,
                  COMPARE_CALC.out.jaccard_mat,
                  COMPARE_CALC.out.sorensen_mat,
                  COMPARE_CALC.out.morisita_mat,
                  file(params.compare_stats_template),
                  params.project_name,
                  all_sample_files
                  )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- reformat convert subworkflow into more granular branches
- temporarily using phenotype .csv from SCRATCH-annotate as GEX input
- code cleanup to minimize needed inputs

To do:
- clean up pseudobulk.py
  - remove hardcoding of metadata
  - needs more robust sample/cell id/barcode matching
- update GEX input from SCRATCH (make generalizable)
- update readme
- reformat data flow so ps-phenotype files can go through truncated version of main pipeline, rather than its own subworkflow
@dltamayo dltamayo merged commit 55167e4 into main Jan 23, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants