nf-core/sarek
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
3.3.1
). The latest
stable release is
3.5.1
.
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.csv$
Automatic retrieval for restart
string
^\S+\.csv$
Starting step
string
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Most common options used for the pipeline
Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.
integer
50000000
Enable when exome or panel data is provided.
boolean
Path to target bed file in case of whole exome or targeted sequencing or intervals file.
string
Estimate interval size.
number
200000
Disable usage of intervals.
boolean
Tools to use for duplicate marking, variant calling and/or for annotation.
string
Disable specified tools.
string
Trim fastq file or handle UMIs
Run FastP for read trimming
boolean
Remove bp from the 5’ end of read 1
integer
Remove bp from the 5’ end of read 2
integer
Remove bp from the 3’ end of read 1
integer
Remove bp from the 3’ end of read 2
integer
Removing poly-G tails.
integer
Save trimmed FastQ file intermediates.
boolean
Specify UMI read structure
string
Default strategy with UMI
string
Adjacency
If set, publishes split FASTQ files. Intended for testing purposes.
boolean
Configure preprocessing tools
Specify aligner to be used to map reads to reference genome.
string
Save mapped files.
boolean
Saves output from mapping (if --save_mapped
), Markduplicates & Baserecalibration as BAM file instead of CRAM
boolean
Enable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration
string
Configure variant calling tools
Option for concatenating germline vcf-files.
boolean
If true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.
boolean
Turn on the joint germline variant calling for GATK haplotypecaller
boolean
Runs Mutect2 in joint (multi-sample) mode for better concordance among variant calls of tumor samples from the same patient. Mutect2 outputs will be stored in a subfolder named with patient ID under variant_calling/mutect2/
folder. Only a single normal sample per patient is allowed. Tumor-only mode is also supported.
boolean
Overwrite Ascat min base quality required for a read to be counted.
number
20
Overwrite Ascat minimum depth required in the normal for a SNP to be considered.
number
10
Overwrite Ascat min mapping quality required for a read to be counted.
number
35
Overwrite ASCAT ploidy.
number
Overwrite ASCAT purity.
number
Specify a custom chromosome length file.
string
Overwrite Control-FREEC coefficientOfVariation
number
0.05
Overwrite Control-FREEC contaminationAdjustement
boolean
Design known contamination value for Control-FREEC
number
Minimal sequencing quality for a position to be considered in BAF analysis.
number
Minimal read coverage for a position to be considered in BAF analysis.
number
Genome ploidy used by ControlFREEC
string
2
Overwrite Control-FREEC window size.
number
Copy-number reference for CNVkit
string
Panel-of-normals VCF (bgzipped) for GATK Mutect2
string
Index of PON panel-of-normals VCF.
string
Do not analyze soft clipped bases in the reads for GATK Mutect2.
boolean
Option for selecting output and emit-mode of Sentieon’s Haplotyper.
string
variant
Path to VEP cache.
string
s3://annotation-cache/vep_cache/
Path to snpEff cache.
string
s3://annotation-cache/snpeff_cache/
Allow usage of fasta file for annotation with VEP
boolean
Enable the use of the VEP dbNSFP plugin.
boolean
Path to dbNSFP processed file.
string
Path to dbNSFP tabix indexed file.
string
Consequence to annotate with
string
Fields to annotate with
string
rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF
Enable the use of the VEP LOFTEE plugin.
boolean
Enable the use of the VEP SpliceAI plugin.
boolean
Path to spliceai raw scores snv file.
string
Path to spliceai raw scores snv tabix indexed file.
string
Path to spliceai raw scores indel file.
string
Path to spliceai raw scores indel tabix indexed file.
string
Enable the use of the VEP SpliceRegion plugin.
boolean
Add an extra custom argument to VEP.
string
--everything --filter_common --per_gene --total_length --offline --format vcf
Use annotation cache keys for snpeff_cache and vep_cache.
boolean
The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
VEP output-file format.
string
Reference genome related files and options required for the workflow.
Name of iGenomes reference.
string
GATK.GRCh38
ASCAT genome.
string
Path to ASCAT allele zip file.
string
Path to ASCAT loci zip file.
string
Path to ASCAT GC content correction file.
string
Path to ASCAT RT (replictiming) correction file.
string
Path to BWA mem indices.
string
Path to bwa-mem2 mem indices.
string
Path to chromosomes folder used with ControLFREEC.
string
Path to dbsnp file.
string
Path to dbsnp index.
string
label string for VariantRecalibration (haplotypecaller joint variant calling)
string
Path to FASTA dictionary file.
string
Path to dragmap indices.
string
Path to FASTA genome file.
string
^\S+\.fn?a(sta)?(\.gz)?$
Path to FASTA reference index.
string
Path to GATK Mutect2 Germline Resource File.
string
Path to GATK Mutect2 Germline Resource Index.
string
Path to known indels file.
string
Path to known indels file index.
string
If you use AWS iGenomes, this has already been set for you appropriately.
1st label string for VariantRecalibration (haplotypecaller joint variant calling)
string
If you use AWS iGenomes, this has already been set for you appropriately.
Path to known snps file.
string
Path to known snps file snps.
string
If you use AWS iGenomes, this has already been set for you appropriately.
label string for VariantRecalibration (haplotypecaller joint variant calling)
string
Path to Control-FREEC mappability file.
string
snpEff DB version.
string
snpEff genome.
string
VEP genome.
string
VEP species.
string
VEP cache version.
number
Save built references.
boolean
Only built references.
boolean
Download annotation cache.
boolean
Directory / URL base for iGenomes references.
string
s3://ngi-igenomes/igenomes/
Do not load the iGenomes reference config.
boolean
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Base path / URL for data used in the test profiles
string
https://raw.githubusercontent.com/nf-core/test-datasets/sarek3
Sequencing center information to be added to read group (CN field).
string
Sequencing platform information to be added to read group (PL field).
string
ILLUMINA
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer
16
Maximum amount of memory that can be requested for any single job.
string
128.GB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Maximum amount of time that can be requested for any single job.
string
240.h
^(\d+\.?\s*(s|m|h|d|day)\s*)+$
Less common options for the pipeline, typically set in a config file.
Display help text.
boolean
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Show all params when using --help
boolean
Validation of parameters fails when an unrecognised parameter is found.
boolean
Validation of parameters in lenient more.
boolean
Incoming hook URL for messaging service
string