BRAinS
This repo contains the code for creating the BRAinS-Graph (Biomedical Knowledge Graph for Recommending and Analysing Health Studies). It loads data from four sources into a Neo4j graph database:
- ClinicalTrials.gov
- The Portal of Medical Data Models (MDM Portal)
- Medical Subject Headings (MeSH)
- Unified Medical Language System (UMLS)
Structure of the repository
The repository consists of three dataloaders (study2neo4j
, moi
, umls2neo4j
) and a postprocessing script (postprocessing
).
Note
The dataloader for the MDM Portal,
mdm2neo4j
, can be found in an additional repository: mdm2neo4j repo. It should be run as well to create the BRAinS-Graph.
File Structure
src/
│
├── study2neo4j
│ ├── run.py # Main script to execute the ClinicalTrials.gov import
│ ├── ct2neo4j.py # Functions for database connecting and data importing
│ └── README.md # Documentation for setup and usage
│
├── moi
│ ├── moi.py # Main script to execute the ontology import and processing
│ ├── methods.py # Functions for configuring, importing, and processing the graph
│ └── README.md # Documentation for setup and usage
│
├── umls2neo4j
│ ├── umls2neo4j.py # Main script to execute the umls import
│ ├── methods_umls2neo4j.py # Functions for loading cui names, parsing relations, loading into neo4j
│ └── README.md # Documentation for setup and usage
│
└── postprocessing
└── postprocess.py # Main script for postprocessing (creating relationships between data sources)
How to
Create a configuration file, storing your details for the database connection, e.g., in your home-directory with the name brains.conf
.
It should have the following structure:
[neo4j]
uri = bolt://localhost:7687
username = neo4j
password = myfancypassword
Run the dataloaders study2neo4j
, moi
, umls2neo4j
, which can be found under 📁 src
.
Instructions for usage are provided in the individual README.md files.
Run the dataloader mdm2neo4j repo.
Important
In general, the individual dataloader (
study2neo4j
,moi
,umls2neo4j
) can be run in any order. The postprocessing only works if data has been loaded into the Neo4j. It should be run last (after all dataloaders have been run). Make sure not to forget to run mdm2neo4j as a dataloader.
Requirements
- make sure
python3
is installed - install the required libraries with
pip install -r requirements.txt
- have a running Neo4j DB (Neo4j version 5)
- create the configuration file as described in the How To section
Have a running Neo4j instance with APOC and neosemantics installed.
Licence
This program is released under Version 3 of the GPL or any later version.