GATK
- Version:
4.2.6.1, 4.1.0.0
- Category:
bio
- Cluster:
Loki
Description
The Genome Analysis Toolkit (GATK) is a collection of bioinformatic tools for analyzing high-throughput sequencing data (HTS) with a focus on variant discovery analysis and how to call variants in DNA and in RNAseq with Haplotype Caller and other tools. The toolkit is well established for germline short variant discovery from the whole genome and exome sequencing data.
Documentation
Usage template for all tools (uses --spark-runner LOCAL when used with a Spark tool)
gatk AnyTool toolArgs
Usage template for Spark tools (will NOT work on non-Spark tools)
gatk SparkTool toolArgs [ -- --spark-runner <LOCAL | SPARK | GCS> sparkArgs ]
Getting help
gatk --list Print the list of available tools
gatk Tool --help Print help on a particular tool
Configuration File Specification
--gatk-config-file PATH/TO/GATK/PROPERTIES/FILE
gatk forwards commands to GATK and adds some sugar for submitting spark jobs
--spark-runner <target> controls how spark tools are run
valid targets are:
LOCAL: run using the in-memory spark runner
SPARK: run using spark-submit on an existing cluster
--spark-master must be specified
--spark-submit-command may be specified to control the Spark submit command
arguments to spark-submit may optionally be specified after --
GCS: run using Google cloud dataproc
commands after the -- will be passed to dataproc
--cluster <your-cluster> must be specified after the --
spark properties and some common spark-submit parameters will be translated
to dataproc equivalents
--dry-run may be specified to output the generated command line without running it
--java-options 'OPTION1[ OPTION2=Y ... ]' optional - pass the given string of options to the
java JVM at runtime.
Java options MUST be passed inside a single string with space-separated values.
--debug-port <number> sets up a Java VM debug agent to listen to debugger connections on a
particular port number. This in turn will add the necessary java VM arguments
so that you don't need to explicitly indicate these using --java-options.
--debug-suspend sets the Java VM debug agent up so that the run get immediatelly suspended
waiting for a debugger to connect. By default the port number is 5005 but
can be customized using --debug-port
Examples/Usage
List available modules:
$ module avail gatk
Load the gatk module:
$ module load bio/gatk/4.2.6.1
Check the loaded modules:
$ module list
Unload the gatk module:
$ module unload bio/gatk/4.2.6.1
Installation
Source code is obtained from gatk