Tools

All scripts are written in python and can be found in the folder ddrage/tools/ of the code repository. Additionally, entry points for all scripts are available so that they can be launched using just the script name. To use these scripts, ddRAGE needs to be installed.

BBD Visualization: visualize_bbd

Pick parameters for the Beta-binomial model using a live rendering of the coverage function. This scripts is implemented using bokeh and hence, requires the bokeh python module. You need to launch the server script using bokeh and then open the website with the visualization. The easiest way to do this is by using the visualize_bbd script:

me@machine:~/$ visualize_bbd

Alternatively you can also launch the server manually, like this:

me@machine:~/$ cd tools
me@machine:~/tools$ bokeh serve bbd_visualization.py  # start the bokeh server
me@machine:~/tools$ firefox localhost:5006/bbd_visualization  # open html page with visualization

instead of the latter step you can also just enter localhost:5006/bbd_visualization in your web browser.

Randomize Read Order: randomize_fastq

The FASTQ files generated by ddRAGE are written to file in order of their simulation. Since this can create very easy instances for some analysis tools, for a realistic assessment the FASTQ files need to be randomized. The randomize_fastq script takes one, two or four input parameters.

With one parameter, the FASTQ file will be shuffled in place and overwritten. Two parameters act as input and output path. With four parameters, the first two are input files, the last two are output files.

me@machine:~/$ ls
ddRAGEds_ATCACG_1.fastq   ddRAGEds_ATCACG_2.fastq
me@machine:~/$ randomize_fastq ddRAGEds_ATCACG_1.fastq ddRAGEds_ATCACG_2.fastq ddRAGEds_ATCACG_randomized_1.fastq ddRAGEds_ATCACG_randomized_2.fastq

By passing a file name ending with ".gz" as an output file, the output will be written as a zipped file. Note that this script can not read gzipped files.

Learn a Quality Model: learn_qmodel

Analyze a (set of) FASTQ files, compute a quality profile from the data set and create a .qmodel file.

me@machine:~/$ learn_qmodel -1 my_dataset_1.fastq -2 my_dataset_2.fastq -o my_dataset.qmodel

The progress of the analysis can be visualized by passing the -v parameter.

Remove FASTQ annotation: remove_annotation

Some ddRAD analysis tools, like stacks, can not handle modified FASTQ name lines. To remove the annotation ddRAGE adds to the file you can use the remove_annotation script:

me@machine:~/$ remove_annotation ddRAGEds_ATCACG_1.fastq ddRAGEds_ATCACG_2.fastq
Reading FASTQ file ddRAGEds_ATCACG_1.fastq
Writing output files:
  - ddRAGEds_ATCACG_1_noheader.fastq
  - ddRAGEds_ATCACG_1_annotation.txt
Reading FASTQ file ddRAGEds_ATCACG_2.fastq
Writing output files:
  - ddRAGEds_ATCACG_2_noheader.fastq
  - ddRAGEds_ATCACG_2_annotation.txt

This will preserve the simulated FASTQ file(s) and write two new files without the annotation (with the _noheader suffix). Additionally, the annotation is written to a new file (with the _annotation.txt suffix).

Split multi p7 barcode files : split_by_p7_barcode

After creating a multi-p7 barcode set using the --multiple-p7-barcodes parameter, the split_by_p7_barcode tool can be used to splits the generated FASTQ files up by their p7 barcode.

Example:

$ rage --multiple p7 barcodes
Simulating reads from 3 individuals at 3 loci with a coverage of 30.

Created output files:
    p5 reads                  data_folder/ddRAGEdataset_2_p7_barcodes_1.fastq
    p7 reads                  data_folder/ddRAGEdataset_2_p7_barcodes_2.fastq
    ground truth              data_folder/ddRAGEdataset_2_p7_barcodes_gt.yaml
    barcode file              data_folder/ddRAGEdataset_2_p7_barcodes_barcodes.txt
    annotation file           data_folder/logs/ddRAGEdataset_2_p7_barcodes_annotation.txt
    statistics file           data_folder/logs/ddRAGEdataset_2_p7_barcodes_statstics.pdf

$ cat data_folder/logs/ddRAGEdataset_2_p7_barcodes_annotation.txt
#  Ind.      p5 bc   p7 bc   p5 spc  p7 spc  Annotation
Individual 05        ACAGTG  ATCACG  AC              Annotation 1
Individual 12        CTTGTA  ATCACG  GAC             Annotation 1
Individual 54        GCCAAT  TAGCTT          AT      Annotation 3

The files contain reads with two different p7 barcodes (ATCACG and TAGCTT). To split them up, call split_by_p7_barcode file_1.fq file_2.fq and pass the two FASTQ files as parameters:

$ split_by_p7_barcode data_folder/ddRAGEdataset_2_p7_barcodes_1.fastq data_folder/ddRAGEdataset_2_p7_barcodes_2.fastq

Found new barcode: TAGCTT
Writing to:
  -> reads_TAGCTT_1.fastq
  -> reads_TAGCTT_2.fastq

Found new barcode: GGCTAC
Writing to:
  -> reads_GGCTAC_1.fastq
  -> reads_GGCTAC_2.fastq

This leaves you with two FASTQ files for each barcode, that are placed in the current working folder. The tool preserves the file ending, hence if you pass two .fq.gz files, the output will also be in gzipped FASTQ format.

If these target files are already present, you need to pass the --force parameter to overwrite them.