Configuration file

The configuration file is a JavaScript Object Notation formatted file that allows keys and values easily to be set. For running UROPA, a template of the config file as below is provided:

{
"queries":[
    {"feature":"",    "feature.anchor": "",    "distance":"",    "strand":"",
            "direction":"",    "internals":"",     "filter.attribute":"",
            "attribute.value":"",     "show.attributes":"" }
          ],
"priority": "",
"gtf": ".gtf",
"bed": ".bed"
}

Three global keys are mandatory: 'queries', 'gtf', and 'bed', additionally there is an optional global key 'priority'. In a default annotation, only the 'gtf' and 'bed' keys needs to be specified by the user (relative file paths). The key 'queries' has to be present in the config file, but can be left empty (e.g. "queries": []). Empty or missing key-value pairs are filled with their default values by UROPA.

Queries

The 'queries' key field is a list with 1 to n entries, each specifying one valid annotation query with specific parameters for UROPAs annotation process.

Hint

  • If more than one query is given, curly brackets should be used , for example [{}, {}].
  • Doublecheck spelling of keywords and comma settings, otherwise the UROPA annotation might differ from expectations.

Each query can specify the following keys:

Query-specific keys

  • feature: Peaks will be annotated only to listed features from the 3rd column of the file specified by 'gtf'.

    Default: All available features from 'gtf'.

    Example: 'feature': ['gene','transcript'] or 'feature': 'exon'.

  • feature.anchor: The position(s) from which the distance to the peak center will be calculated. The best annotation is defined as the minimum distance if multiple anchors are defined. Valid distances are less or equal to the distance key value(specified by 'distance', see below).

    Default: ['start', 'center', 'end']

    Example: 'feature.anchor': ['start']

  • distance: Maximum permitted distance from the genomic feature anchor to the peak center. If only one value is given, this distance is valid in both directions from the feature anchor. If two values are given, the first value corresponds to the distance upstream of the feature anchor, and the second value to the distance downstream of the feature anchor.

    Default: 100000

    Example: 'distance': [2000,5000] or 'distance': [5000] or 'distance': 5000.

  • strand: The desired strand of the annotated feature relative to the peak. A constraint on strand specificity is only successfully evaluated if strand information is available for the feature and the peak.

    Default: ['ignore', 'same', 'opposite']

    Example: 'strand': ['same'] or 'strand': 'same'.

  • direction: Define the peak location relative to the feature’s location, in respect of its orientation. A peak is ‘upstream’ if its center is upstream of a feature anchor position. Accordingly, a peak is ‘downstream’ if its center is downstream of a feature anchor position. Visual repesentation in Figure 1 of section Application examples

    Default: 'any_direction'

    Example: 'direction': ['upstream','downstream']

  • internals: A minimum overlap fraction for annotations, where peaks are located (partially) within features or vice versa. This key allows to annotate peaks to features with a wide size range, such as large genes, which would otherwise be removed due to the distance thresholds. Further, this key can be used to allow partial overlaps to be reported, even if the feature anchor is too far away. A visual repesentation is included in Figure 1 of section Application examples.

    Default: 0.0 for not reporting partially overlapping annotations if they are located outside of distance thresholds.

    Furthermore, 'T', 'True', 'Y', 'Yes' are allowed and will be converted to 1.0, 'F', 'False', 'N', 'No' are allowed and will be converted to 0.0.

    Example: 'internals':'1.0' for allowing annotations where peaks are fully within features or vice versa. 'internals':'0.1' for allowing annotations where peaks overlap features (or vice versa) by at least 10 percent.

  • filter.attribute : Key filters the attributes found in the 9th column of the GTF file. If a 'filter.attribute' is given, only features that have a 'attribute.value' for this attribute is kept as valid annotations. If this key is set, the key 'attribute.value' is mandatory, too (see below).

    Default: 'None'

    Example: 'filter.attribute': ['gene_type']

  • attribute.value : Corresponding attribute value for the 'filter.attribute' found in the 9th column of the GTF file. If a 'filter.attribute' is given, only features that have a 'attribute.value' for this attribute can be valid annotations.

    Default: 'None'

    Example: 'attribute.value': ['protein_coding']

  • show.attributes: A list of attributes found in the 9th column of the GTF file which should appear in the final output tables. If non existent attributes are specified, annotated peaks will display 'NA' for those attributes. If set to 'all' the list of attributes will be defined from all possible attributes in the annotated features from the gtf.

    Default: 'None'

    Example: 'show.attributes':['gene_id', 'gene_biotype']

Prioritizing queries

priority: Allows multiple queries to be treated as a hierarchy, which means that a peak can be annotated according to subsequent queries only if no match to the preceding query is found. If ‘False’, all given queries are weighted equally and any feature matching with any of these queries will be a valid annotation. If only one query is provided, the value of ‘priority’ has no influence on the annotation process. Allowed values are one of 'T', True', 'Y', 'Yes' or 'F', 'False' ,'N' ,'No'.

Default: 'False'

Example: 'priority':'Yes'

Annotation database (GTF)

gtf: A path to a file in standard GTF format (9 columns), as described by Ensembl GTF format. The GTF file acts as annotation database. If your annotation database is not in the Ensembl GTF format, a conversion can be done by UROPA. For more information see UROPA to GTF utility.

Required, no default.

Genomic regions (BED)

bed: A path to a file in BED format, as described by Ensembl Bed format. The BED file can be any tab-delimited file containing the genomic regions, e.g. enriched regions from a peak-calling tool (e.g. MACS2, MUSIC, FindPeaks, CisGenome, PeakSeq), with a minimum of 3 columns (chr/start/stop).

Required, no default.