Overview#

To assist customers in biological interpretation of their data, we provide a script distributed outside of modality to annotate the results of the D(h)MR analysis.

The script creates a copy of the D(h)MR results BED file, with additional columns from the annotation file used. There is also a Distance column, that describes the distance in bp to the nearest feature, for example a gene. If the annotation file contains columns, chr, start, end, A, B, C, then the resulting file will contain all columns in the D(h)MR output plus A, B, C and Distance.

Optionally, a user can create a volcano plot; shown below for example GIAB data between Han Chinese and Ashkenazi Jewish trios.

Volcano plot of D(h)MRs between Han Chinese and Ashkenazi Jewish trios from GIAB.

Usage#

The script takes two main arguments; the D(h)MR output file created by modality and your preferred annotation file that can be downloaded from public resources such as UCSC. BED, GFF3 and GTF formats are supported.

The recommended use case is a BED file of gene annotations, generated from UCSC. The name of the downloaded file must end with .bed or .bed.gz so pyranges knows how to read the file. In general the filetype is inferred from the name, with .gff3(.gz) being interpreted as GFF3 and .gtf(.gz) as GTF.

Another option, though less favoured as it contains many overlapping features and is much larger, is using annotation from GENCODE, which is distributed as a GFF3. If the GENCODE data is used, we recommmend either pre-filtering for annotations of interest, or pairing with the --filter option. For example --filter Feature==gene will filter the file to that type, avoiding annotating with secondary transcripts, introns, exons etc. See the API below for full details.

Filtering of D(h)MRs is done by qvalue, and excludes D(h)MRs before annotation. By default all D(h)MRs passing the q-threshold are output, but there is a distance-cutoff threshold if users wish to restrict results to those within or close to a feature.

API reference#

Usage: annotate_dmrs.py [OPTIONS]

Options:
  --dmr-bedfile PATH       * BED file containing D(h)MRs output from modality
                           CLI.  [required]
  --annotation-file PATH   Annotation file to use for annotating D(h)MRs.
                           Supported formats: BED, GFF3, GTF. Not strictly
                           required. A user may want to use this script to
                           filter D(h)MRs without annotation and/or draw a
                           volcano plot.
  --qvalue-cutoff FLOAT    Q-value threshold for filtering D(h)MRs. Exclude D(h)MRs
                           with a q-score > value. Default: 0.05. {value}
  --distance-cutoff FLOAT  Distance (bp) threshold for annotating D(h)MRs. If
                           specified D(h)MRs further from the annotation than
                           value will be excluded. Default: None. {value}
  --filter TEXT            Filters to apply to the annotation file in
                           key=value form. i.e. --filter Gene=ENSG0000012345
                           would include only annotations where the 'Gene'
                           column is 'ENSG0000012345'. Multiple filters can be
                           provided. {key=value}
  --volcano-plot           Draw a volcano plot. Default: False.
  --output-dir PATH        Output directory for the annotated bedfile and
                           volcano plot if requested. {value}
  --output-stem TEXT       Output stem for the annotated bedfile and volcano
                           plot. If not provided, the input bedfile name less
                           the suffix will be used. NB the output file names
                           will have '_annotated.bed' and '_volcano.png'
                           appended to the stem. {value}
  --help                   Show this message and exit.

Dependencies#

The script relies on four packages not included in the standard library: pyranges, fsspec, matplotlib, numpy. Which will need to be installed prior to use.

pip install pyranges fsspec matplotlib numpy

All these are required for modality, so the script will run in any environment with modality installed.

To install python/pip follow these instructions.

Availability#

The script can be downloaded from here. NB: You may need to right-click and use “save target as…” or “save link as”.

Notes#

Note- this script is designed to work with the output of the CLI DM(h)R caller. However, if you wish to use it with the results of interactive D(h)MR calling, you can simply:

import pyranges as pr

dmr_result = call_dmrs(...)  # precise command depending

pr.PyRanges(dmr_result).to_bed(path, keep=True)