umls2neo4j: UMLS to Neo4j Importer
This Python script parses selected relationships from the UMLS Metathesaurus (MRREL.RRF and MRCONSO.RRF) and loads them into a Neo4j graph database.
Important
Requires a UMLS licence!
Features
- Filters and loads
PAR(parent) andCHD(child) relationships fromMRREL.RRF - Loads only preferred English concept names from
MRCONSO.RRF
Quickstart
Create a configuration file, storing your details for the database-connection.
E.g. in your home-directory with the name umls.conf.
[neo4j]
uri = bolt://localhost:7687
username = neo4j
password = myfancypassword
Start the program by providing the location of your configuration-file and the location of the UMLS-files.
python3 src/umls2neo4j.py --conf ~/umls.conf --mrconsofiles ~/umls/MRCONSO.RRF --mrrelfiles ~/umls/MRREL.RRF
Requirements
- make sure,
python3is installed - install the required libraries with
pip install -r requirements.txt - download the UMLS Metathesaurus files (
MRREL.RRF,MRCONSO.RRF) from → requires a UMLS licence - have a running Neo4j DB (Neo4j version 5), with APOC installed
- create the configuration-file as described in the Quickstart section
Detailled Infos
The script will:
- Load preferred English concept names from
MRCONSO.RRF - Parse allowed relationships from
MRREL.RRF - Insert nodes and relationships into Neo4j using chunked batches
Customisation
- Adjust
ALLOWED_RELSin the script to include more relationship types - Tune
batch_chunk_sizeandapoc_batch_sizefor better performance