:orphan: Overview -------- To assist customers in biological interpretation of their data, we provide a script distributed outside of modality to annotate the results of the D(h)MR analysis. The script creates a copy of the D(h)MR results BED file, with additional columns from the annotation file used. There is also a Distance column, that describes the distance in bp to the nearest feature, for example a gene. If the annotation file contains columns, ``chr``, ``start``, ``end``, ``A``, ``B``, ``C``, then the resulting file will contain all columns in the D(h)MR output plus ``A``, ``B``, ``C`` and ``Distance``. Optionally, a user can create a volcano plot; shown below for example GIAB data between Han Chinese and Ashkenazi Jewish trios. .. figure:: images/volcano.png :alt: Volcano plot of D(h)MRs between Han Chinese and Ashkenazi Jewish trios from GIAB. :width: 80% Usage ----- The script takes two main arguments; the D(h)MR output file created by modality and your preferred annotation file that can be downloaded from public resources such as UCSC. BED, GFF3 and GTF formats are supported. The recommended use case is a BED file of gene annotations, generated from `UCSC `_. The name of the downloaded file must end with ``.bed`` or ``.bed.gz`` so ``pyranges`` knows how to read the file. In general the filetype is inferred from the name, with ``.gff3(.gz)`` being interpreted as GFF3 and ``.gtf(.gz)`` as GTF. Another option, though less favoured as it contains many overlapping features and is much larger, is using annotation from `GENCODE `_, which is distributed as a GFF3. If the GENCODE data is used, we recommmend either pre-filtering for annotations of interest, or pairing with the ``--filter`` option. For example ``--filter Feature==gene`` will filter the file to that type, avoiding annotating with secondary transcripts, introns, exons etc. See the API below for full details. Filtering of D(h)MRs is done by qvalue, and excludes D(h)MRs before annotation. By default all D(h)MRs passing the q-threshold are output, but there is a distance-cutoff threshold if users wish to restrict results to those within or close to a feature. API reference ------------- .. code-block:: none Usage: annotate_dmrs.py [OPTIONS] Options: --dmr-bedfile PATH * BED file containing D(h)MRs output from modality CLI. [required] --annotation-file PATH Annotation file to use for annotating D(h)MRs. Supported formats: BED, GFF3, GTF. Not strictly required. A user may want to use this script to filter D(h)MRs without annotation and/or draw a volcano plot. --qvalue-cutoff FLOAT Q-value threshold for filtering D(h)MRs. Exclude D(h)MRs with a q-score > value. Default: 0.05. {value} --distance-cutoff FLOAT Distance (bp) threshold for annotating D(h)MRs. If specified D(h)MRs further from the annotation than value will be excluded. Default: None. {value} --filter TEXT Filters to apply to the annotation file in key=value form. i.e. --filter Gene=ENSG0000012345 would include only annotations where the 'Gene' column is 'ENSG0000012345'. Multiple filters can be provided. {key=value} --volcano-plot Draw a volcano plot. Default: False. --output-dir PATH Output directory for the annotated bedfile and volcano plot if requested. {value} --output-stem TEXT Output stem for the annotated bedfile and volcano plot. If not provided, the input bedfile name less the suffix will be used. NB the output file names will have '_annotated.bed' and '_volcano.png' appended to the stem. {value} --help Show this message and exit. Dependencies ------------ The script relies on four packages not included in the standard library: ``pyranges``, ``fsspec``, ``matplotlib``, ``numpy``. Which will need to be installed prior to use. .. code-block:: bash pip install pyranges fsspec matplotlib numpy All these are required for ``modality``, so the script will run in any environment with ``modality`` installed. To install python/pip follow these `instructions `_. Availability ------------ The script can be downloaded from `here `_. NB: You may need to right-click and use "save target as..." or "save link as". Notes ----- Note- this script is designed to work with the output of the CLI DM(h)R caller. However, if you wish to use it with the results of interactive D(h)MR calling, you can simply: .. code-block:: python :emphasize-lines: 5 import pyranges as pr dmr_result = call_dmrs(...) # precise command depending pr.PyRanges(dmr_result).to_bed(path, keep=True)