vcftools

Version:: 2.68
Category:: bio
Cluster:: Loki, Vali

Author / Distributor

https://catchenlab.life.illinois.edu/stacks/manual/

Description

The Stacks pipeline is designed modularly to perform several different types of analyses. Programs listed under Raw Reads are used to clean and filter raw sequence data. Programs under Core represent the main Stacks pipeline — building single-end loci (ustacks), creating a catalog of loci (cstacks), and matching samples back against the catalog (sstacks), transposing the data to be organized from sample to instead being organized by locus (tsv2bam), assembling the paired-end contig, calling variable sites in the population and genotyping each sample at those sites (gstacks). Finally. populations performs a population genomics analysis. Programs under Execution Control will run the whole pipeline.

Documentation

ustacks -f file_path -i id -o path [-M max_dist] [-m min_cov] [-p num_threads]
 f: input file path.
 i: a unique integer ID for this sample.
 o: output path to write results.
 M: Maximum distance (in nucleotides) allowed between stacks (default 2).
 m: Minimum depth of coverage required to create a stack (default 3).
 N: Maximum distance allowed to align secondary reads to primary stacks (default: M + 2).
 p: enable parallel execution with num_threads threads.
 t: input file type. Supported types: fasta, fastq, gzfasta, or gzfastq (default: guess).
 --name: a name for the sample (default: input file name minus the suffix).
 R: retain unused reads.
 H: disable calling haplotypes from secondary reads.

 Stack assembly options:
   --force-diff-len: allow raw input reads of different lengths, e.g. after trimming (default: ustacks perfers raw input reads of uniform length).
   --keep-high-cov: disable the algorithm that removes highly-repetitive stacks and nearby errors.
   --high-cov-thres: highly-repetitive stacks threshold, in standard deviation units (default: 3.0).
   --max-locus-stacks <num>: maximum number of stacks at a single de novo locus (default 3).
    --k-len <len>: specify k-mer size for matching between alleles and loci (automatically calculated by default).
   --deleverage: enable the Deleveraging algorithm, used for resolving over merged tags.

 Gapped assembly options:
   --max-gaps: number of gaps allowed between stacks before merging (default: 2).
   --min-aln-len: minimum length of aligned sequence in a gapped alignment (default: 0.80).

   --disable-gapped: do not preform gapped alignments between stacks (default: gapped alignements enabled).
 Model options:
   --model-type: either 'snp' (default), 'bounded', or 'fixed'
   For the SNP or Bounded SNP model:
     --alpha <num>: chi square significance level required to call a heterozygote or homozygote, either 0.1, 0.05 (default), 0.01, or 0.001.
   For the Bounded SNP model:
     --bound-low <num>: lower bound for epsilon, the error rate, between 0 and 1.0 (default 0).
     --bound-high <num>: upper bound for epsilon, the error rate, between 0 and 1.0 (default 1).
   For the Fixed model:
     --bc-err-freq <num>: specify the barcode error frequency, between 0 and 1.0.


sstacks -P dir -M popmap [-p n_threads]
sstacks -c catalog_path -s sample_path [-s sample_path ...] -o path [-p n_threads]
 -P,--in-path: path to the directory containing Stacks files.
 -M,--popmap: path to a population map file from which to take sample names.
 -s,--sample: filename prefix from which to load sample loci.
 -c,--catalog: path to the catalog.
 -p,--threads: enable parallel execution with n_threads threads.
 -o,--out-path: output path to write results.
 -x: don't verify haplotype of matching locus.

Gapped assembly options:
 --disable-gapped: disable gapped alignments between stacks (default: enable gapped alignments).


cstacks -P in_dir -M popmap [-n num_mismatches] [-p num_threads]
cstacks -s sample1_path [-s sample2_path ...] -o path [-n num_mismatches] [-p num_threads]

 -P,--in-path: path to the directory containing Stacks files.
 -M,--popmap: path to a population map file.
 -n: number of mismatches allowed between sample loci when build the catalog (default 1; suggested: set to ustacks -M).
 -p,--threads: enable parallel execution with num_threads threads.
 -s: sample prefix from which to load loci into the catalog.
 -o,--outpath: output path to write results.
 -c,--catalog <path>: add to an existing catalog.

Gapped assembly options:
 --max-gaps: number of gaps allowed between stacks before merging (default: 2).
 --min-aln-len: minimum length of aligned sequence in a gapped alignment (default: 0.80).
 --disable-gapped: disable gapped alignments between stacks (default: use gapped alignments).

Advanced options:
 --k-len <len>: specify k-mer size for matching between between catalog loci (automatically calculated by default).
 --report-mmatches: report query loci that match more than one catalog locus.

Examples/Usage

List available modules:
```
$ module avail stacks
```

Load the Anaconda module:

$ module load bio/Stacks/2.62-foss-2022a

Check the loaded modules:
```
$ module list
```

Unload the Anaconda module:

$ module unload bio/Stacks/2.62-foss-2022a

Run on individual samples to identify loci:

$ ustacks -t fastq -f sample_1.fq -o ./output -i 1 -m 3 -M 2

Installation

Source code is obtained from Stacks