modality CLI#

The information below pertains to the modality CLI.

Background#

The modality command line interface (CLI) is a tool that allows users to perform genomic data analysis on Zarr stores output by the duet multiomics solution bioinformatics pipeline. Zarr stores contain the epigenetic quantification output for your samples and is a format for the storage of chunked, compressed, N-dimensional arrays that allow for storage of large heterogeneous data in an efficient way.

Requirements#

In order to use the modality CLI you will need to have installed modality version >= 0.13.0, see Quickstart for installation instructions.

Example of commands#

Users can confirm their modality version installed but also test that the CLI is working by running modality version.

By running modality --help you will be provided with a list of different modality CLI commands that can be run and how to use them, for example, modality dmr --help will inform you of the options available to you when performing D(h)MR calling.

Here is some guidance on how to interpret the CLI help syntax:

  • * - mandatory inputs.

  • --parameter - a flag parameter that takes some input. Flag parameters can be used in different ways and are denoted when running --help e.g. modality dmr --help.

    • [--output-dir OUTPUT_DIR] - flag parameter that takes 1 input for the output directory.

    • [--sample-ids SAMPLE_IDS [SAMPLE_IDS ...]] - flag parameter that takes >= 1 input of sample IDs separated by spaces e.g. sample1 sample2 sample3.

    • [--tabix] - flag parameter that turns on tabix indexing if passed.

  • {some_value} - Example of how to use the flag parameter.

modality join - Combine 2 or more Zarr stores/files#

If its desired to combine Zarr stores produced as a result of multiple pipeline runs, this can be done by running the following:

modality join \
    --zarr-path mock_mod_evoC.zarr mock_mod_evoC.zarr
    --output-path test.combined.zarr

modality export - Export quantification data to common public file formats#

modality allows users to export the Zarr store into a public file format that can be used with common downstream tools, a list of available exports can be found by running modality export --help. Below is an example of how to export your quantification output from a Zarr store into a bismark file for all samples present in your Zarr.

modality export bismark \
    --zarr-path file.zarr \
    --tag genome \
    --modification-count-column num_modc \
    --denominator-column num_total_c \
    --skip-rows-zero-counts \
    --tabix \
    --output-dir output_folder

modality D(h)MR - calling D(h)MRs with or without covariates#

To perform D(h)MR calling with modality, users will need to provide the following:

  • Zarr store (mandatory)

    • a Zarr store output from the duet pipeline containing the quantification output for your samples.

  • Sample sheet (mandatory)

    • a sample sheet containing at least a sample_id column and at least one column to be used as as a condition in D(h)MR calling e.g. family. If you are interested in calling D(h)MRs with covariates, these would instead feature as an additional column or columns in the sample sheet e.g. smoker.

  • BED file (optional)

    • Users can provide a BED file containing regions to be used in the D(h)MR calling process. If you do not have a BED file, users can also use the –window-size flag to generate windows of specified length to consider as regions to call D(h)MRs on.

Below is an example of a sample sheet, metadata.tsv as well as an example of how to call D(h)MRs using the modality CLI.

sample_id

family

smoker

sample_name1

African

yes

sample_name2

Ashkenazi Jew

yes

sample_name3

African

no

sample_name4

Ashkenazi Jew

no

modality dmr \
    --sample-sheet metadata.tsv \
    --zarr-path file.zarr \
    --condition-array-name family \
    --covariates smoker \
    --methylation-contexts num_mc \
    --bedfile regions.bed \
    --filter-context-depth 0 \
    --output-dir output_folder

Annotating DMR output#

Users can also annotate the DMR output with genomic features and optionally plot their D(h)MR results in a volcano plot. A stand alone script has been distributed to enable this and details on how to access and run the script can be found in the examples page, see Examples.