GATK

Version:

4.2.6.1, 4.1.0.0

Category:

bio

Cluster:

Loki

Author / Distributor

https://gatk.broadinstitute.org/hc/en-us

Description

The Genome Analysis Toolkit (GATK) is a collection of bioinformatic tools for analyzing high-throughput sequencing data (HTS) with a focus on variant discovery analysis and how to call variants in DNA and in RNAseq with Haplotype Caller and other tools. The toolkit is well established for germline short variant discovery from the whole genome and exome sequencing data.

Documentation

Usage template for all tools (uses --spark-runner LOCAL when used with a Spark tool)
   gatk AnyTool toolArgs

Usage template for Spark tools (will NOT work on non-Spark tools)
   gatk SparkTool toolArgs  [ -- --spark-runner <LOCAL | SPARK | GCS> sparkArgs ]

Getting help
   gatk --list       Print the list of available tools

   gatk Tool --help  Print help on a particular tool

Configuration File Specification
    --gatk-config-file                PATH/TO/GATK/PROPERTIES/FILE

gatk forwards commands to GATK and adds some sugar for submitting spark jobs

  --spark-runner <target>    controls how spark tools are run
    valid targets are:
    LOCAL:      run using the in-memory spark runner
    SPARK:      run using spark-submit on an existing cluster
                --spark-master must be specified
                --spark-submit-command may be specified to control the Spark submit command
                arguments to spark-submit may optionally be specified after --
    GCS:        run using Google cloud dataproc
                commands after the -- will be passed to dataproc
                --cluster <your-cluster> must be specified after the --
                spark properties and some common spark-submit parameters will be translated
                to dataproc equivalents

  --dry-run      may be specified to output the generated command line without running it
  --java-options 'OPTION1[ OPTION2=Y ... ]'   optional - pass the given string of options to the
                java JVM at runtime.
                Java options MUST be passed inside a single string with space-separated values.

  --debug-port <number> sets up a Java VM debug agent to listen to debugger connections on a
                        particular port number. This in turn will add the necessary java VM arguments
                        so that you don't need to explicitly indicate these using --java-options.
  --debug-suspend       sets the Java VM debug agent up so that the run get immediatelly suspended
                        waiting for a debugger to connect. By default the port number is 5005 but
                        can be customized using --debug-port

Examples/Usage

  • List available modules:

    $ module avail gatk
    
  • Load the gatk module:

    $ module load bio/gatk/4.2.6.1
    
  • Check the loaded modules:

    $ module list
    
  • Unload the gatk module:

    $ module unload bio/gatk/4.2.6.1
    

Installation

Source code is obtained from gatk