# BRAinS This repo contains the code for creating the BRAinS-Graph (*Biomedical Knowledge Graph for Recommending and Analysing Health Studies*). It loads data from four sources into a Neo4j graph database: - [ClinicalTrials.gov](https://clinicaltrials.gov/) - [The Portal of Medical Data Models (MDM Portal)](https://medical-data-models.org/) - [Medical Subject Headings (MeSH)](https://www.nlm.nih.gov/mesh/meshhome.html) - [Unified Medical Language System (UMLS)](https://www.nlm.nih.gov/research/umls/index.html) ## Structure of the repository The repository consists of three dataloaders (`study2neo4j`, `moi`, `umls2neo4j`) and a postprocessing script (`postprocessing`). > [!NOTE] > The dataloader for the MDM Portal, `mdm2neo4j`, can be found in an additional repository: [mdm2neo4j repo](https://git.uni-greifswald.de/MILA_public/mdm2neo4j). > It should be run as well to create the BRAinS-Graph. ### File Structure ``` src/ │ ├── study2neo4j │ ├── run.py # Main script to execute the ClinicalTrials.gov import │ ├── ct2neo4j.py # Functions for database connecting and data importing │ └── README.md # Documentation for setup and usage │ ├── moi │ ├── moi.py # Main script to execute the ontology import and processing │ ├── methods.py # Functions for configuring, importing, and processing the graph │ └── README.md # Documentation for setup and usage │ ├── umls2neo4j │ ├── umls2neo4j.py # Main script to execute the umls import │ ├── methods_umls2neo4j.py # Functions for loading cui names, parsing relations, loading into neo4j │ └── README.md # Documentation for setup and usage │ └── postprocessing └── postprocess.py # Main script for postprocessing (creating relationships between data sources) ``` ## How to Create a configuration file, storing your details for the database connection, e.g., in your home-directory with the name `brains.conf`. It should have the following structure: ```ini [neo4j] uri = bolt://localhost:7687 username = neo4j password = myfancypassword ``` Run the dataloaders `study2neo4j`, `moi`, `umls2neo4j`, which can be found under :file_folder: `src`. Instructions for usage are provided in the individual README.md files. Run the dataloader [mdm2neo4j repo](https://git.uni-greifswald.de/MILA_public/mdm2neo4j). > [!IMPORTANT] > In general, the individual dataloader (`study2neo4j`, `moi`, `umls2neo4j`) can be run in any order. > The postprocessing only works if data has been loaded into the Neo4j. It should be run last (after all dataloaders have been run). > Make sure not to forget to run mdm2neo4j as a dataloader. ## Requirements - make sure `python3` is installed - install the required libraries with `pip install -r requirements.txt` - have a running Neo4j DB (Neo4j version 5) - create the configuration file as described in the [How To section](#how-to) Have a running Neo4j instance with APOC and neosemantics installed. ## Licence This program is released under [Version 3 of the GPL or any later version](https://www.gnu.org/licenses/gpl-3.0.en.html).