This repo contains the code for creating the BRAinS-Graph (Biomedical Knowledge Graph for Recommending and Analysing Health Studies). It loads data from four sources into a Neo4j graph database:

Structure of the repository

The repository consists of three dataloaders (study2neo4j, moi, umls2neo4j) and a postprocessing script (postprocessing).

Note

The dataloader for the MDM Portal, mdm2neo4j, can be found in an additional repository: mdm2neo4j repo. It should be run as well to create the BRAinS-Graph.

File Structure

src/
│
├── study2neo4j
│   ├── run.py                  # Main script to execute the ClinicalTrials.gov import
│   ├── ct2neo4j.py             # Functions for database connecting and data importing
│   └── README.md               # Documentation for setup and usage
│
├── moi
│   ├── moi.py                  # Main script to execute the ontology import and processing
│   ├── methods.py              # Functions for configuring, importing, and processing the graph
│   └── README.md               # Documentation for setup and usage
│
├── umls2neo4j
│   ├── umls2neo4j.py           # Main script to execute the umls import
│   ├── methods_umls2neo4j.py   # Functions for loading cui names, parsing relations, loading into neo4j
│   └── README.md               # Documentation for setup and usage
│
└── postprocessing
    └── postprocess.py          # Main script for postprocessing (creating relationships between data sources)

How to

Create a configuration file, storing your details for the database connection, e.g., in your home-directory with the name brains.conf. It should have the following structure:

[neo4j]
uri = bolt://localhost:7687
username = neo4j
password = myfancypassword

Run the dataloaders study2neo4j, moi, umls2neo4j, which can be found under 📁 src. Instructions for usage are provided in the individual README.md files. Run the dataloader mdm2neo4j repo.

Important

In general, the individual dataloader (study2neo4j, moi, umls2neo4j) can be run in any order. The postprocessing only works if data has been loaded into the Neo4j. It should be run last (after all dataloaders have been run). Make sure not to forget to run mdm2neo4j as a dataloader.

Requirements

make sure python3 is installed
install the required libraries with pip install -r requirements.txt
have a running Neo4j DB (Neo4j version 5)
create the configuration file as described in the How To section

Have a running Neo4j instance with APOC and neosemantics installed.

Licence

This program is released under Version 3 of the GPL or any later version.