Project structure and data¶
This page describes the project folder structure, and the input and output data.
Project folder organisation¶
We recommend creating an empty folder for the project. The path of this
project will be the input of the API. The input
folder needs to be
manually created and populated with the data; the rest of the folders will be
generated by the API.
PROJECT_FOLDER
├── input
│ └── ...
└── output
├── domain_ontology
│ ├── ...
├── intermediate
│ ├── ...
└── report
└── ...
Data schema¶
General points:
default_id
column is mandatory (nodes
andnodes_obsolete
tables).default_label
column is optional in all tables.source_id
andtarget_id
must be the same format asdefault_id
,for more on this please read node ID schema,
Node ID schema¶
All nodes must be formatted as follows NAMESPACE:CODE
, for example:
MONDO:0000123
, SNOMED:1233425
, where
NAMESPACE
: denotes the source of the node, e.g.MONDO
,SNOMED
;this should be all caps for uniformity.CODE
: is the actual node identifier code that must not contain:
.NAMESPACE
andCODE
parts must be concatenated by the:
character.There must be only one
:
character in the node ID.
Input data¶
The input requires four tables with the below specified schema.
All of the tables are required to be present, however nodes_obsolete.csv
can be an empty table with a header.
PROJECT_FOLDER
├── input
│ ├── config.json
│ ├── edges_hierarchy.csv
│ ├── mappings.csv
│ ├── nodes.csv
│ └── nodes_obsolete.csv
├── ...
Table nodes.csv
¶
default_id |
default_label |
MONDO:0004979 |
asthma |
Table nodes_obsolete.csv
¶
default_id |
default_label |
MONDO:0006775 |
obsolete haemophilus influenzae meningitis |
Table edges_hierarchy.csv
¶
source_id |
target_id |
relation |
prov |
MONDO:0008798 |
MONDO:0019211 |
rdfs:subClassOf |
MONDO |
Table mappings.csv
¶
source_id |
target_id |
relation |
prov |
MONDO:0004979 |
SNOMED:31387002 |
equivalent_to |
MONDO |
Output Data¶
PROJECT_FOLDER
├── input
│ └── ...
└── output
├── domain_ontology
│ ├── edges_hierarchy.csv
│ ├── mappings.csv
│ ├── merges.csv
│ └── nodes.csv
├── intermediate
│ ├── data_tests
│ │ └── ...
│ ├── dropped_mappings
│ │ └── equivalence_1_MONDO.csv
│ ├── ...
└── report
├── data_docs
│ └── ...
├── data_profile_reports
│ ├── nodes_report.html
│ └── ...
├── index.html
└── logs
└── onto-merger.logger
Folders¶
domain_ontology
: contains the output of the alignment process, i.e. thefinal merged ontology in table format.intermediate
: contains the intermediate files generated during thealignment process, data testing and data profiling.report
: contains the report of the alignment process with links to thedata profiling and data testing pages. Also includes the log output file:onto-merger.logger
.
Table edges_hierarchy.csv
¶
Same format as in the input set. Contains the merged ontology hierarchy.
Table mappings.csv
¶
Same format as in the input set. Node IDs are potentially updated with canonical node IDs and / or internal code reassignments.
Table merges.csv
¶
source_id |
target_id |
SNOMED:31387002 |
MONDO:0004979 |
Table nodes.csv
¶
Same format as in the input set. Contains the final set of nodes (i.e. canonical nodes). Merged nodes are excluded from this table.