Tools¶
All scripts are written in python and can be found in the folder ddrage/tools/
of the code repository.
Additionally, entry points for all scripts are available so that they can be launched using just the script name.
To use these scripts, ddRAGE needs to be installed.
BBD Visualization: visualize_bbd
¶
Pick parameters for the Beta-binomial model using a live rendering of the coverage function.
This scripts is implemented using bokeh and hence, requires the bokeh python module.
You need to launch the server script using bokeh and then open the website with the visualization.
The easiest way to do this is by using the visualize_bbd
script:
me@machine:~/$ visualize_bbd
Alternatively you can also launch the server manually, like this:
me@machine:~/$ cd tools
me@machine:~/tools$ bokeh serve bbd_visualization.py # start the bokeh server
me@machine:~/tools$ firefox localhost:5006/bbd_visualization # open html page with visualization
instead of the latter step you can also just enter localhost:5006/bbd_visualization in your web browser.
Randomize Read Order: randomize_fastq
¶
The FASTQ files generated by ddRAGE are written to file in order of their simulation.
Since this can create very easy instances for some analysis tools, for a realistic assessment the FASTQ files need to be randomized.
The randomize_fastq
script takes one, two or four input parameters.
With one parameter, the FASTQ file will be shuffled in place and overwritten. Two parameters act as input and output path. With four parameters, the first two are input files, the last two are output files.
me@machine:~/$ ls
ddRAGEds_ATCACG_1.fastq ddRAGEds_ATCACG_2.fastq
me@machine:~/$ randomize_fastq ddRAGEds_ATCACG_1.fastq ddRAGEds_ATCACG_2.fastq ddRAGEds_ATCACG_randomized_1.fastq ddRAGEds_ATCACG_randomized_2.fastq
By passing a file name ending with ".gz"
as an output file, the output will be written as a zipped file.
Note that this script can not read gzipped files.
Learn a Quality Model: learn_qmodel
¶
Analyze a (set of) FASTQ files, compute a quality profile from the data set and create a .qmodel file.
me@machine:~/$ learn_qmodel -1 my_dataset_1.fastq -2 my_dataset_2.fastq -o my_dataset.qmodel
The progress of the analysis can be visualized by passing the -v
parameter.
Remove FASTQ annotation: remove_annotation
¶
Some ddRAD analysis tools, like stacks, can not handle modified FASTQ name lines.
To remove the annotation ddRAGE adds to the file you can use the remove_annotation
script:
me@machine:~/$ remove_annotation ddRAGEds_ATCACG_1.fastq ddRAGEds_ATCACG_2.fastq
Reading FASTQ file ddRAGEds_ATCACG_1.fastq
Writing output files:
- ddRAGEds_ATCACG_1_noheader.fastq
- ddRAGEds_ATCACG_1_annotation.txt
Reading FASTQ file ddRAGEds_ATCACG_2.fastq
Writing output files:
- ddRAGEds_ATCACG_2_noheader.fastq
- ddRAGEds_ATCACG_2_annotation.txt
This will preserve the simulated FASTQ file(s) and write two new files without the annotation (with the _noheader suffix). Additionally, the annotation is written to a new file (with the _annotation.txt suffix).
Split multi p7 barcode files : split_by_p7_barcode
¶
After creating a multi-p7 barcode set using the --multiple-p7-barcodes
parameter, the split_by_p7_barcode
tool can be used to splits the
generated FASTQ files up by their p7 barcode.
Example:
$ rage --multiple p7 barcodes
Simulating reads from 3 individuals at 3 loci with a coverage of 30.
Created output files:
p5 reads data_folder/ddRAGEdataset_2_p7_barcodes_1.fastq
p7 reads data_folder/ddRAGEdataset_2_p7_barcodes_2.fastq
ground truth data_folder/ddRAGEdataset_2_p7_barcodes_gt.yaml
barcode file data_folder/ddRAGEdataset_2_p7_barcodes_barcodes.txt
annotation file data_folder/logs/ddRAGEdataset_2_p7_barcodes_annotation.txt
statistics file data_folder/logs/ddRAGEdataset_2_p7_barcodes_statstics.pdf
$ cat data_folder/logs/ddRAGEdataset_2_p7_barcodes_annotation.txt
# Ind. p5 bc p7 bc p5 spc p7 spc Annotation
Individual 05 ACAGTG ATCACG AC Annotation 1
Individual 12 CTTGTA ATCACG GAC Annotation 1
Individual 54 GCCAAT TAGCTT AT Annotation 3
The files contain reads with two different p7 barcodes (ATCACG and TAGCTT).
To split them up, call split_by_p7_barcode file_1.fq file_2.fq
and pass the two FASTQ
files as parameters:
$ split_by_p7_barcode data_folder/ddRAGEdataset_2_p7_barcodes_1.fastq data_folder/ddRAGEdataset_2_p7_barcodes_2.fastq
Found new barcode: TAGCTT
Writing to:
-> reads_TAGCTT_1.fastq
-> reads_TAGCTT_2.fastq
Found new barcode: GGCTAC
Writing to:
-> reads_GGCTAC_1.fastq
-> reads_GGCTAC_2.fastq
This leaves you with two FASTQ files for each barcode,
that are placed in the current working folder.
The tool preserves the file ending, hence if you pass two .fq.gz
files,
the output will also be in gzipped FASTQ format.
If these target files are already present, you need to pass the
--force
parameter to overwrite them.