nf-core/sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing

annotationcancergatk4genomicsgermlinepre-processingsomatictarget-panelsvariant-callingwhole-exome-sequencingwhole-genome-sequencing

These pages are for an old version of the pipeline (3.5.0). The latest stable release is 3.6.0 .

An incompatibility advisory with severity high has been issued for this version of the pipeline.
See the advisory entry for more information.

Launch version 3.5.0 https://github.com/nf-core/sarek

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

type: string

pattern: ^\S+\.csv$

Automatic retrieval for restart

hidden

type: string

pattern: ^\S+\.csv$

Starting step

required

type: string

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Most common options used for the pipeline

Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.

type: integer

default: 50000000

Estimate interval size.

type: integer

default: 200000

Path to target bed file in case of whole exome or targeted sequencing or intervals file.

type: string

Disable usage of intervals.

type: boolean

Enable when exome or panel data is provided.

type: boolean

Tools to use for duplicate marking, variant calling and/or for annotation.

type: string

Disable specified tools.

type: string

Trim fastq file or handle UMIs

Run FastP for read trimming

type: boolean

Remove bp from the 5’ end of read 1

type: integer

Remove bp from the 5’ end of read 2

type: integer

Remove bp from the 3’ end of read 1

type: integer

Remove bp from the 3’ end of read 2

type: integer

Removing poly-G tails.

type: integer

Minimum length of reads to keep

type: integer

default: 15

Save trimmed FastQ file intermediates.

type: boolean

Specify UMI read structure

type: string

Default strategy with UMI

type: string

default: Adjacency

If set, publishes split FASTQ files. Intended for testing purposes.

type: boolean

Configure preprocessing tools

Specify aligner to be used to map reads to reference genome.

type: string

Save mapped files.

type: boolean

Saves output from mapping (if --save_mapped), Markduplicates & Baserecalibration as BAM file instead of CRAM

type: boolean

Enable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration

type: string

Configure variant calling tools

If true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.

type: boolean

Overwrite Ascat min base quality required for a read to be counted.

type: integer

default: 20

Overwrite Ascat minimum depth required in the normal for a SNP to be considered.

type: integer

default: 10

Overwrite Ascat min mapping quality required for a read to be counted.

type: integer

default: 35

Overwrite ASCAT ploidy.

type: number

Overwrite ASCAT purity.

type: number

Specify a custom chromosome length file.

type: string

Overwrite Control-FREEC coefficientOfVariation

type: number

default: 0.05

Overwrite Control-FREEC contaminationAdjustement

type: boolean

Design known contamination value for Control-FREEC

type: integer

Minimal sequencing quality for a position to be considered in BAF analysis.

type: integer

Minimal read coverage for a position to be considered in BAF analysis.

type: integer

Genome ploidy used by ControlFREEC

type: string

default: 2

Overwrite Control-FREEC window size.

type: number

Copy-number reference for CNVkit

type: string

Turn on the joint germline variant calling for GATK haplotypecaller

type: boolean

Runs Mutect2 in joint (multi-sample) mode for better concordance among variant calls of tumor samples from the same patient. Mutect2 outputs will be stored in a subfolder named with patient ID under variant_calling/mutect2/ folder. Only a single normal sample per patient is allowed. Tumor-only mode is also supported.

type: boolean

Do not analyze soft clipped bases in the reads for GATK Mutect2.

type: boolean

Panel-of-normals VCF (bgzipped) for GATK Mutect2

type: string

Index of PON panel-of-normals VCF.

type: string

Option for selecting output and emit-mode of Sentieon’s Haplotyper.

type: string

default: variant

Option for selecting output and emit-mode of Sentieon’s Dnascope.

type: string

default: variant

Option for selecting the PCR indel model used by Sentieon Dnascope.

type: string

default: CONSERVATIVE

Option for concatenating germline vcf-files.

type: boolean

Allow usage of fasta file for annotation with VEP

type: boolean

Enable the use of the VEP dbNSFP plugin.

type: boolean

Path to dbNSFP processed file.

type: string

Path to dbNSFP tabix indexed file.

type: string

Consequence to annotate with

type: string

Fields to annotate with

type: string

default: rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF

Enable the use of the VEP LOFTEE plugin.

type: boolean

Enable the use of the VEP SpliceAI plugin.

type: boolean

Path to spliceai raw scores snv file.

type: string

Path to spliceai raw scores snv tabix indexed file.

type: string

Path to spliceai raw scores indel file.

type: string

Path to spliceai raw scores indel tabix indexed file.

type: string

Enable the use of the VEP SpliceRegion plugin.

type: boolean

Add an extra custom argument to VEP.

type: string

default: --everything --filter_common --per_gene --total_length --offline --format vcf

Should reflect the VEP version used in the container.

type: string

default: 111.0-0

The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.

type: string

VEP output-file format.

type: string

A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.

type: string

Index file for bcftools_annotations

type: string

Text file with the header lines of bcftools_annotations

type: string

General options to interact with reference genomes.

The base path to the igenomes reference files

type: string

default: s3://ngi-igenomes/igenomes/

Do not load the iGenomes reference config.

type: boolean

Save built references.

type: boolean

Only built references.

type: boolean

Download annotation cache.

type: boolean

Reference genome related files and options required for the workflow. If you use AWS iGenomes, this has already been set for you appropriately.

Name of iGenomes reference.

type: string

default: GATK.GRCh38

ASCAT genome.

type: string

Path to ASCAT allele zip file.

type: string

Path to ASCAT loci zip file.

type: string

Path to ASCAT GC content correction file.

type: string

Path to ASCAT RT (replictiming) correction file.

type: string

Path to BWA mem indices.

type: string

Path to bwa-mem2 mem indices.

type: string

Path to chromosomes folder used with ControLFREEC.

type: string

Path to dbsnp file.

type: string

Path to dbsnp index.

type: string

Label string for VariantRecalibration (haplotypecaller joint variant calling).

If you use AWS iGenomes, this has already been set for you appropriately.

type: string

Path to FASTA dictionary file.

type: string

Path to dragmap indices.

type: string

Path to FASTA genome file.

type: string

pattern: ^\S+\.fn?a(sta)?(\.gz)?$

Path to FASTA reference index.

type: string

Path to GATK Mutect2 Germline Resource File.

type: string

Path to GATK Mutect2 Germline Resource Index.

type: string

Path to known indels file.

type: string

Path to known indels file index.

type: string

Label string for VariantRecalibration (haplotypecaller joint variant calling). If you use AWS iGenomes, this has already been set for you appropriately.

type: string

Path to known snps file.

type: string

Path to known snps file snps.

type: string

Label string for VariantRecalibration (haplotypecaller joint variant calling).If you use AWS iGenomes, this has already been set for you appropriately.

type: string

Path to Control-FREEC mappability file.

type: string

Path to SNP bed file for sample checking with NGSCheckMate

type: string

Machine learning model for Sentieon Dnascope.

type: string

Path to snpEff cache.

type: string

default: s3://annotation-cache/snpeff_cache/

snpEff DB version.

type: string

Path to VEP cache.

type: string

default: s3://annotation-cache/vep_cache/

VEP cache version.

type: string

VEP genome.

type: string

VEP species.

type: string

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Base path / URL for data used in the test profiles

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/sarek3

Base path / URL for data used in the modules

hidden

type: string

Sequencing center information to be added to read group (CN field).

hidden

type: string

Sequencing platform information to be added to read group (PL field).

hidden

type: string

default: ILLUMINA

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

On this page