Alignment

The alignment process takes available mappings and produces a set of stable merges (output/domain_ontology/merges.csv). In merge example (one to one) the node SNOMED:31387002 is deemed the same, i.e. a duplicate, of thenode MONDO:0004979 (indeed both nodes represent the disease concept asthma).

merge example (one to one)

source_id

target_id

SNOMED:31387002

MONDO:0004979

Stable means that from the perspective of the source ontology there are no splits. In merge example (many to one) two OMIM nodes are mapped to the same MONDO nodes, where we retain the concept representation granularity of the target ontology but loose it from the perspective of the source ontology.

merge example (many to one)

source_id

target_id

OMIM:608584

MONDO:0004979

OMIM:600807

MONDO:0004979

In cases where a source concept would match to two different concepts of the target ontology, merging is inappropriate and it is handled in the connectivity process.

Steps

Establishing source alignment order

First we produce the source alignment order. The alignment process is a sequence of steps, where in each step we attempt to merge nodes to a given ontology. The goal is to have the minimal number of nodes, from the minimal number of different sources. Therefore the alignment order is produced by putting the seed ontology as first (this should have the most mappings and the desired hierarchy), and the rest of the ontologies according to the frequency of the nodes. For example, in the example data set this would be MONDO, MEDDRA, ICD10CM, MESH, ....

Pre-processing mappings

Next we preprocess the mappings.

Mappings typically contain internal code reassignments. These are mappings between nodes of the same source that describe the new code(s) of deprecated nodes. In the internal code reassignment mapping example the node MONDO:0022856 was deprecated and replaced by the node MONDO:0001217. The input table nodes_obsolete.csv helps to determine the deprecated and the current node ID (some input mappings may be parsed inconsistently regarding to directionality).

internal code reassignment mapping example

source_id

target_id

MONDO:0022856

MONDO:0001217

These are removed from the full mapping sets (mappings.csv). The remainder mapping set is updated using the internal code reassignments mappings (the full mapping set may contain mappings from different sources, that are not necessarily up to date, as described here).

Aligning sources for mapping type groups

The alignment is run in several batches, where each batch aligns nodes to each source as specified by the source alignment order. First it uses the strongest mapping relation type group, equivalence, then database reference and the rest.

Aligning nodes to a source

This process is repeated for each source.

  1. Filter mappings for the source: filters the available mappings to
    find all where either the source or the target node is from the given
    source (e.g. MONDO)
  2. Filter mappings for the permitted type: filters source mappings for
    the mapping type group (e.g. equivalence)
  3. Orient mappings towards source: source and target node IDs are
    potentially flipped so the target node ID is always from the ontology
    that we are aligning onto (e.g. MONDO:0004979, OMIM:608584 becomes
    MONDO:0004979, OMIM:608584)
  4. Get one or many source to one target node mappings:
    • mappings are de-duplicated: if we have two mappings where the
      source and target node IDs are the same, but the mapping relation is
      different, the two mappings are reduced to one (note that as these are
      in the same type group)
    • filtering for unmapped nodes: only those mappings are retained
      where the source node ID is unmapped.
    • filtering for multiplicity: only those mappings are kept that wont
      form a one source to many target (i.e. split) mapping cluster, i.e.
      in the remaining mappings are one or many source to one target node
      mappings (the rest of such mappings are dropped, these are saved
      for debugging in the folder:
      PROJECT_FOLDER/output/intermediate/dropped_mappings folder, with
      step ID, aligned source ID and the mapping strength e.g.
      ../equivalence_1_MONDO.csv)
  5. Filtered mappings are saved as merges to the source