Overview#
To assist customers in biological interpretation of their data, we provide a script distributed outside of modality to annotate the results of the D(h)MR analysis.
The script creates a copy of the D(h)MR results BED file, with additional columns from the annotation file used.
There is also a Distance column, that describes the distance in bp to the nearest feature, for example a gene.
If the annotation file contains columns, chr
, start
, end
, A
, B
, C
, then the resulting file will contain all columns in the D(h)MR output plus A
, B
, C
and Distance
.
Optionally, a user can create a volcano plot; shown below for example GIAB data between Han Chinese and Ashkenazi Jewish trios.
Usage#
The script takes two main arguments; the D(h)MR output file created by modality and your preferred annotation file that can be downloaded from public resources such as UCSC. BED, GFF3 and GTF formats are supported.
The recommended use case is a BED file of gene annotations, generated from UCSC.
The name of the downloaded file must end with .bed
or .bed.gz
so pyranges
knows how to read the file.
In general the filetype is inferred from the name, with .gff3(.gz)
being interpreted as GFF3 and .gtf(.gz)
as GTF.
Another option, though less favoured as it contains many overlapping features and is much larger, is using annotation from GENCODE, which is distributed as a GFF3.
If the GENCODE data is used, we recommmend either pre-filtering for annotations of interest, or pairing with the --filter
option. For example --filter Feature==gene
will filter the file to that type, avoiding annotating with secondary transcripts, introns, exons etc.
See the API below for full details.
Filtering of D(h)MRs is done by qvalue, and excludes D(h)MRs before annotation. By default all D(h)MRs passing the q-threshold are output, but there is a distance-cutoff threshold if users wish to restrict results to those within or close to a feature.
API reference#
Usage: annotate_dmrs.py [OPTIONS]
Options:
--dmr-bedfile PATH * BED file containing D(h)MRs output from modality
CLI. [required]
--annotation-file PATH Annotation file to use for annotating D(h)MRs.
Supported formats: BED, GFF3, GTF. Not strictly
required. A user may want to use this script to
filter D(h)MRs without annotation and/or draw a
volcano plot.
--qvalue-cutoff FLOAT Q-value threshold for filtering D(h)MRs. Exclude D(h)MRs
with a q-score > value. Default: 0.05. {value}
--distance-cutoff FLOAT Distance (bp) threshold for annotating D(h)MRs. If
specified D(h)MRs further from the annotation than
value will be excluded. Default: None. {value}
--filter TEXT Filters to apply to the annotation file in
key=value form. i.e. --filter Gene=ENSG0000012345
would include only annotations where the 'Gene'
column is 'ENSG0000012345'. Multiple filters can be
provided. {key=value}
--volcano-plot Draw a volcano plot. Default: False.
--output-dir PATH Output directory for the annotated bedfile and
volcano plot if requested. {value}
--output-stem TEXT Output stem for the annotated bedfile and volcano
plot. If not provided, the input bedfile name less
the suffix will be used. NB the output file names
will have '_annotated.bed' and '_volcano.png'
appended to the stem. {value}
--help Show this message and exit.
Dependencies#
The script relies on four packages not included in the standard library: pyranges
, fsspec
, matplotlib
, numpy
.
Which will need to be installed prior to use.
pip install pyranges fsspec matplotlib numpy
All these are required for modality
, so the script will run in any environment with modality
installed.
To install python/pip follow these instructions.
Availability#
The script can be downloaded from here. NB: You may need to right-click and use “save target as…” or “save link as”.
Notes#
Note- this script is designed to work with the output of the CLI DM(h)R caller. However, if you wish to use it with the results of interactive D(h)MR calling, you can simply:
import pyranges as pr
dmr_result = call_dmrs(...) # precise command depending
pr.PyRanges(dmr_result).to_bed(path, keep=True)