cutadapt

Version:

4

Category:

bio

Cluster:

Loki

Author / Distributor

https://cutadapt.readthedocs.io/en/stable/

Description

Cutadapt finds and removes adapter sequences, primers, poly-A tails, and other types of unwanted sequence from your high-throughput sequencing reads. Cutadapt helps with these trimming tasks by finding the adapter or primer sequences in an error-tolerant way. It can also modify and filter single-end and paired-end reads in various ways.

Documentation

Usage:
   cutadapt -a ADAPTER [options] [-o output.fastq] input.fastq

For paired-end reads:
   cutadapt -a ADAPT1 -A ADAPT2 [options] -o out1.fastq -p out2.fastq in1.fastq in2.fastq

Replace "ADAPTER" with the actual sequence of your 3' adapter. IUPAC wildcard
characters are supported. All reads from input.fastq will be written to
output.fastq with the adapter sequence removed. Adapter matching is
error-tolerant. Multiple adapter sequences can be given (use further -a
options), but only the best-matching adapter will be removed.

Input may also be in FASTA format. Compressed input and output is supported and
auto-detected from the file name (.gz, .xz, .bz2). Use the file name '-' for
standard input/output. Without the -o option, output is sent to standard output.

Citation:

Marcel Martin. Cutadapt removes adapter sequences from high-throughput
sequencing reads. EMBnet.Journal, 17(1):10-12, May 2011.
http://dx.doi.org/10.14806/ej.17.1.200

Run "cutadapt --help" to see all command-line options.
See https://cutadapt.readthedocs.io/ for full documentation.

Options:
 -h, --help            Show this help message and exit
 --version             Show version number and exit
 --debug               Print debug log. Use twice to also print DP matrices
 -j CORES, --cores CORES
                       Number of CPU cores to use. Use 0 to auto-detect. Default: 1

Finding adapters:
 Parameters -a, -g, -b specify adapters to be removed from each read (or from R1 if data is paired-
 end. If specified multiple times, only the best matching adapter is trimmed (but see the --times
 option). Use notation 'file:FILE' to read adapter sequences from a FASTA file.

 -a ADAPTER, --adapter ADAPTER
                       Sequence of an adapter ligated to the 3' end (paired data: of the first read).
                       The adapter and subsequent bases are trimmed. If a '$' character is appended
                       ('anchoring'), the adapter is only found if it is a suffix of the read.
 -g ADAPTER, --front ADAPTER
                       Sequence of an adapter ligated to the 5' end (paired data: of the first read).
                       The adapter and any preceding bases are trimmed. Partial matches at the 5' end
                       are allowed. If a '^' character is prepended ('anchoring'), the adapter is only
                       found if it is a prefix of the read.
 -b ADAPTER, --anywhere ADAPTER
                       Sequence of an adapter that may be ligated to the 5' or 3' end (paired data: of
                       the first read). Both types of matches as described under -a and -g are allowed.
                       If the first base of the read is part of the match, the behavior is as with -g,
                       otherwise as with -a. This option is mostly for rescuing failed library
                       preparations - do not use if you know which end your adapter was ligated to!
 -e E, --error-rate E, --errors E
                       Maximum allowed error rate (if 0 <= E < 1), or absolute number of errors for
                       full-length adapter match (if E is an integer >= 1). Error rate = no. of errors
                       divided by length of matching region. Default: 0.1 (10%)
 --no-indels           Allow only mismatches in alignments. Default: allow both mismatches and indels
 -n COUNT, --times COUNT
                       Remove up to COUNT adapters from each read. Default: 1
 -O MINLENGTH, --overlap MINLENGTH
                       Require MINLENGTH overlap between read and adapter for an adapter to be found.
                       Default: 3
 --match-read-wildcards
                       Interpret IUPAC wildcards in reads. Default: False
 -N, --no-match-adapter-wildcards
                       Do not interpret IUPAC wildcards in adapters.
 --action {trim,retain,mask,lowercase,none}
                       What to do if a match was found. trim: trim adapter and up- or downstream
                       sequence; retain: trim, but retain adapter; mask: replace with 'N' characters;
                       lowercase: convert to lowercase; none: leave unchanged. Default: trim
 --rc, --revcomp       Check both the read and its reverse complement for adapter matches. If match is
                       on reverse-complemented version, output that one. Default: check only read

Additional read modifications:
 -u LENGTH, --cut LENGTH
                       Remove bases from each read (first read only if paired). If LENGTH is positive,
                       remove bases from the beginning. If LENGTH is negative, remove bases from the
                       end. Can be used twice if LENGTHs have different signs. This is applied *before*
                       adapter trimming.
 --nextseq-trim 3'CUTOFF
                       NextSeq-specific quality trimming (each read). Trims also dark cycles appearing
                       as high-quality G bases.
 -q [5'CUTOFF,]3'CUTOFF, --quality-cutoff [5'CUTOFF,]3'CUTOFF
                       Trim low-quality bases from 5' and/or 3' ends of each read before adapter
                       removal. Applied to both reads if data is paired. If one value is given, only
                       the 3' end is trimmed. If two comma-separated cutoffs are given, the 5' end is
                       trimmed with the first cutoff, the 3' end with the second.
 --quality-base N      Assume that quality values in FASTQ are encoded as ascii(quality + N). This
                       needs to be set to 64 for some old Illumina FASTQ files. Default: 33
 --length LENGTH, -l LENGTH
                       Shorten reads to LENGTH. Positive values remove bases at the end while negative
                       ones remove bases at the beginning. This and the following modifications are
                       applied after adapter trimming.
 --trim-n              Trim N's on ends of reads.
 --length-tag TAG      Search for TAG followed by a decimal number in the description field of the
                       read. Replace the decimal number with the correct length of the trimmed read.
                       For example, use --length-tag 'length=' to correct fields like 'length=123'.
 --strip-suffix STRIP_SUFFIX
                       Remove this suffix from read names if present. Can be given multiple times.
 -x PREFIX, --prefix PREFIX
                       Add this prefix to read names. Use {name} to insert the name of the matching
                       adapter.
 -y SUFFIX, --suffix SUFFIX
                       Add this suffix to read names; can also include {name}
 --rename TEMPLATE     Rename reads using TEMPLATE containing variables such as {id}, {adapter_name}
                       etc. (see documentation)
 --zero-cap, -z        Change negative quality values to zero.

Filtering of processed reads:
 Filters are applied after above read modifications. Paired-end reads are always discarded pairwise
 (see also --pair-filter).

 -m LEN[:LEN2], --minimum-length LEN[:LEN2]
                       Discard reads shorter than LEN. Default: 0
 -M LEN[:LEN2], --maximum-length LEN[:LEN2]
                       Discard reads longer than LEN. Default: no limit
 --max-n COUNT         Discard reads with more than COUNT 'N' bases. If COUNT is a number between 0 and
                       1, it is interpreted as a fraction of the read length.
 --max-expected-errors ERRORS, --max-ee ERRORS
                       Discard reads whose expected number of errors (computed from quality values)
                       exceeds ERRORS.
 --discard-trimmed, --discard
                       Discard reads that contain an adapter. Use also -O to avoid discarding too many
                       randomly matching reads.
 --discard-untrimmed, --trimmed-only
                       Discard reads that do not contain an adapter.
 --discard-casava      Discard reads that did not pass CASAVA filtering (header has :Y:).

Output:
 --quiet               Print only error messages.
 --report {full,minimal}
                       Which type of report to print: 'full' or 'minimal'. Default: full
 --json FILE           Dump report in JSON format to FILE
 -o FILE, --output FILE
                       Write trimmed reads to FILE. FASTQ or FASTA format is chosen depending on input.
                       Summary report is sent to standard output. Use '{name}' for demultiplexing (see
                       docs). Default: write to standard output
 --fasta               Output FASTA to standard output even on FASTQ input.
 -Z                    Use compression level 1 for gzipped output files (faster, but uses more space)
 --info-file FILE      Write information about each read and its adapter matches into FILE. See the
                       documentation for the file format.
 -r FILE, --rest-file FILE
                       When the adapter matches in the middle of a read, write the rest (after the
                       adapter) to FILE.
 --wildcard-file FILE  When the adapter has N wildcard bases, write adapter bases matching wildcard
                       positions to FILE. (Inaccurate with indels.)
 --too-short-output FILE
                       Write reads that are too short (according to length specified by -m) to FILE.
                       Default: discard reads
 --too-long-output FILE
                       Write reads that are too long (according to length specified by -M) to FILE.
                       Default: discard reads
 --untrimmed-output FILE
                       Write reads that do not contain any adapter to FILE. Default: output to same
                       file as trimmed reads

Paired-end options:
 The -A/-G/-B/-U/-Q options work like their lowercase counterparts, but are applied to R2 (second
 read in pair)

 -A ADAPTER            3' adapter to be removed from R2
 -G ADAPTER            5' adapter to be removed from R2
 -B ADAPTER            5'/3 adapter to be removed from R2
 -U LENGTH             Remove LENGTH bases from R2
 -Q [5'CUTOFF,]3'CUTOFF
                       Quality-trimming cutoff for R2. Default: same as for R1
 -p FILE, --paired-output FILE
                       Write R2 to FILE.
 --pair-adapters       Treat adapters given with -a/-A etc. as pairs. Either both or none are removed
                       from each read pair.
 --pair-filter {any,both,first}
                       Which of the reads in a paired-end read have to match the filtering criterion in
                       order for the pair to be filtered. Default: any
 --interleaved         Read and/or write interleaved paired-end reads.
 --untrimmed-paired-output FILE
                       Write second read in a pair to this FILE when no adapter was found. Use with
                       --untrimmed-output. Default: output to same file as trimmed reads
 --too-short-paired-output FILE
                       Write second read in a pair to this file if pair is too short.
 --too-long-paired-output FILE
                       Write second read in a pair to this file if pair is too long.

Examples/Usage

  • List available modules:

    $ module avail cutadapt
    
  • Load the Cutadapt module:

    $ module load bio/cutadapt/4.0
    
  • Check the loaded modules:

    $ module list
    
  • Unload the Cutadapt module:

    $ module unload bio/cutadapt/4.0
    

Installation

Source code is obtained from Cutadapt