# MeDaX Pipeline ## 📋 Description The MeDaX pipeline transforms healthcare data from FHIR databases into Neo4j graph databases. This conversion enables efficient searching, querying, and analyses of interconnected health data that would otherwise be complex to retrieve using traditional SQL databases. ## ✨ Features - Seamless conversion from FHIR to Neo4j graph structure - Support for patient-centric data retrieval using FHIR's `$everything` operation - Configurable batch processing for handling large datasets - Docker-based deployment for easy setup and portability - Compatible with public FHIR servers (e.g., HAPI FHIR) and private authenticated instances ## ⚙️ Prerequisites - [Docker](https://docs.docker.com/engine/install/) with the [Docker Compose plugin](https://docs.docker.com/compose/install/linux/) - A FHIR database with API access and the `$everything` operation enabled for retrieving patient data - Alternatively: Use a public FHIR server such as [HAPI FHIR](https://hapi.fhir.org/) (default configuration) ## 🚀 Installation ### Setup 1. Clone this repository 2. Create an environment configuration file 3. Configure the environment variables in `.env`: - For HAPI test server (default): No changes needed - For custom FHIR server: - Change `MODE` to anything else - Uncomment and set `URL`, `PASSWORD`, and `USERNAME` variables - Adjust `BATCH_SIZE` and `NUMBER_OF_PATIENTS` according to your needs - Configure any required proxy settings 4. If needed, modify proxy settings in the `Dockerfile` - Uncomment and set proxy variables ### Running the Pipeline **Start the containers:** ```bash docker compose up --build ``` **Stop and clean up (between runs):** ```bash docker compose down --volumes ``` **Complete removal (containers and images):** ```bash docker compose down --volumes --rmi all ``` > **Note:** Depending on your Docker installation, you might need to use `docker-compose` instead of `docker compose`. ## 🔍 Accessing the Neo4j Database Once the pipeline has completed processing, you can access the Neo4j database: 1. Open your browser and navigate to `http://localhost:8080/` 2. Connect using the following credentials: - Username: neo4j - Password: neo4j 3. Set the new password and save it to a secure password manager ## 📊 Example Queries Here are some basic Cypher queries to get you started with exploring your health data: ```cypher // Count all nodes by type MATCH (n) RETURN labels(n) as NodeType, count(*) as Count; // Find all records for a specific patient MATCH (p:Patient {id: 'patient-id'})-[r]-(connected) RETURN p, r, connected; // Retrieve all medication prescriptions MATCH (m:Medication)-[r]-(p:Patient) RETURN m, r, p; ``` ## ❓ Troubleshooting **Common Issues:** - **Connection refused to FHIR server**: Check your network settings and ensure the FHIR server is accessible from within the Docker container. - **Authentication failures**: Verify your credentials in the `.env` file. - **Container startup failures**: Ensure all required Docker ports are available and not used by other applications. - **No data found in fhir bundle**: Ensure that the FHIR server is up and responding patient data. Try sett the COMPLEX_PATIENTS variable to FALSE in your .env file. Some FHIR servers might not support the FHIR search logic. ## 📚 Architecture The MeDaX pipeline consists of the following components: 1. **FHIR Client**: Connects to the FHIR server and retrieves patient data 2. **Data Transformer**: Converts FHIR resources into graph entities and relationships 3. **Reference Processor**: Converts references to relationships 3. **BioCypher Adapter**: Prepares the transformed data for Neo4j admin import 4. **Neo4j Database**: Stores and serves the graph representation of the health data ## ✍️ Citation If you use the MeDaX pipeline in your research, please cite: 10.5281/zenodo.15229077 and Mazein, I and Gebhardt, T et al. [MeDaX, a knowledge graph on FHIR.](https://doi.org/10.3233/shti240423) ## 🙏 Acknowledgements - We are leveraging [BioCypher](https://biocypher.org) [![DOI](https://zenodo.org/badge/DOI/10.1038/s41587-023-01848-y.svg)](https://doi.org/10.1038/s41587-023-01848-y) to create the Neo4j admin input. - Remark: We introduced slight adjustments to BioCypher's code to support batching. - We used BioCypher's git template as a starting point for our development: - Lobentanzer, S., BioCypher Consortium, & Saez-Rodriguez, J. Democratizing knowledge representation with BioCypher [Computer software]. https://github.com/biocypher/biocypher - We used synthetic data generated with [Synthea](https://doi.org/10.1093/jamia/ocx079) during the development process. This data is provided in the testData folder.