-
Notifications
You must be signed in to change notification settings - Fork 2
Pheno pseudobulk #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pheno pseudobulk #71
Conversation
…x' when testing on GBM data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds phenotype-based pseudobulking functionality to the TCR toolkit pipeline, allowing samples to be pseudo-bulked by phenotype annotations from Seurat objects and then analyzed through the existing SAMPLE and COMPARE modules.
Changes:
- Added phenotype pseudobulking workflow that processes samples by phenotype when a Seurat GEX object is provided
- Extended the AIRR_CONVERT subworkflow to generate phenotype-specific pseudobulk files from CellRanger data
- Created parallel analysis paths (SAMPLE_PHENO and COMPARE_PHENO) to process phenotype-segregated data
- Enhanced Python scripts to support phenotype metadata extraction and improved data handling in compare_calc.py
Reviewed changes
Copilot reviewed 12 out of 15 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows/tcrtoolkit.nf | Main workflow integration of phenotype analysis with conditional execution based on sobject_gex parameter |
| subworkflows/local/sample_pheno.nf | New subworkflow for sample-level analysis of phenotype-pseudobulked data |
| subworkflows/local/compare_pheno.nf | New subworkflow for comparison analysis of phenotype-pseudobulked data |
| subworkflows/local/resolve_samplesheet_pheno.nf | New subworkflow to collect phenotype sample files for downstream processing |
| subworkflows/local/map_phenotypes.nf | New subworkflow to transform phenotype files and generate phenotype-specific samplesheet |
| subworkflows/local/airr_convert.nf | Extended to support phenotype pseudobulking for CellRanger input format |
| subworkflows/local/sample.nf | Removed commented-out emit statements |
| subworkflows/local/compare.nf | Added collectFile operations for similarity matrices and removed commented code |
| modules/local/samplesheet/generate_pheno_samplesheet.nf | New module to generate phenotype-specific samplesheet from metadata |
| modules/local/airr_convert/pseudobulk_phenotype_cellranger.nf | New module to run phenotype-based pseudobulking on CellRanger data |
| modules/local/airr_convert/pseudobulk_cellranger.nf | Added container directive |
| conf/modules.config | Added publishDir configurations for phenotype analysis outputs |
| bin/pseudobulk.py | Added phenotype processing functions and command-line options |
| bin/create_pheno_samplesheet.py | New script to generate phenotype samplesheet from JSON metadata |
| bin/compare_calc.py | Improved data handling with better validation and numeric type handling |
Comments suppressed due to low confidence (1)
subworkflows/local/compare.nf:56
- The collectFile operations are collecting outputs from COMPARE_CALC.out but then COMPARE_PLOT is still using the original COMPARE_CALC.out channels (lines 50-52), not the collected files. This creates redundant file collection. Either use the collected files in COMPARE_PLOT or remove the collectFile operations if they're not needed.
COMPARE_CALC.out.jaccard_mat
.collectFile(name: 'jaccard_mat.csv', sort: true,
storeDir: "${params.outdir}/compare")
.set { jaccard_mat }
COMPARE_CALC.out.sorensen_mat
.collectFile(name: 'sorensen_mat.csv', sort: true,
storeDir: "${params.outdir}/compare")
.set { sorensen_mat }
COMPARE_CALC.out.morisita_mat
.collectFile(name: 'morisita_mat.csv', sort: true,
storeDir: "${params.outdir}/compare")
.set { morisita_mat }
COMPARE_PLOT( samplesheet_resolved,
COMPARE_CALC.out.jaccard_mat,
COMPARE_CALC.out.sorensen_mat,
COMPARE_CALC.out.morisita_mat,
file(params.compare_stats_template),
params.project_name,
all_sample_files
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- reformat convert subworkflow into more granular branches - temporarily using phenotype .csv from SCRATCH-annotate as GEX input - code cleanup to minimize needed inputs To do: - clean up pseudobulk.py - remove hardcoding of metadata - needs more robust sample/cell id/barcode matching - update GEX input from SCRATCH (make generalizable) - update readme - reformat data flow so ps-phenotype files can go through truncated version of main pipeline, rather than its own subworkflow
This section allows the pipeline to run 'SAMPLE' and 'COMPARE' modules on samples pseudo-bulked by phenotype.