medax_pipeline

MILA_public/medax_pipeline

Fork 0

RSS Feed

release faf1ab1c99

Compare
MeDaX Pipeline v1.0.0 Stable

gebhardtt released this 2025-04-24 15:37:50 +02:00 | 4 commits to main since this release
Release Notes - MeDaX Pipeline v1.0.0 (Initial Release)

Overview

We are pleased to announce the first release of our MeDaX pipeline, enabling seamless transformation of hospital FHIR data into a Neo4j graph database, thereby enabling innovative medical data exploration at german Data Integration Centres. This pipeline facilitates:
- Local setup for controlled data handling
- Efficient data exploration and analysis across connected healthcare data
- Improved query capabilities for complex healthcare relationships
Key Features

Easy Deployment
- Containerised setup using Docker Compose for straightforward deployment and configuration
- Flexible configuration options for connecting to any FHIR server through customisable URL and proxy settings
- Built-in support for Open Access HAPI FHIR server, enabling immediate testing and validation
- Simple environment variable configuration through .env file
Data Processing
- Validated with real hospital data, ensuring production readiness
- Implemented property convolution and reference path reduction for efficient graph size reduction
- Manually curated graph schema to enable semantic enrichment with ontological information, currently using BioLink (BioCypher default)
- Automated schema extension for unspecified input data to maintain compatibility with evolving FHIR resources
- Batching of input data to process large-scale data sets
- Support for patient-centric data retrieval using FHIR's $everything operation
Extensibility
- Developed using BioCypher framework, enabling modular architecture
- Support for additional data source adapters, allowing future expansion to different resources
- Flexible architecture consists of:
  1. FHIR Import module for data retrieval
  2. Reference Processor for relationship management
  3. Property convolution for complexity reduction
  4. BioCypher Adapter for Neo4j integration
Installation & Usage

For detailed installation instructions and information how to cite this work, please refer to the README document included in the repository. Basic setup involves:
1. Clone the repository
2. Configure the environment variables
3. Run docker compose up --build
4. Access Neo4j at http://localhost:8080/
Known Issues and Limitations

Performance Considerations
- Processing large hospital datasets requires significant computational resources and time due to data complexity
- Complete pipeline restart required when modifying graph reduction parameters
- Initial load time may be extensive for large datasets
User Interface
- Currently limited to standard Neo4j browser interface
- Default UI may not be optimal for specialised healthcare use cases
Technical Requirements
- Docker and Docker Compose
- Sufficient storage and computational resources for processing FHIR datasets
  - Currently the memory is the bottleneck, 12GB RAM recommended for a batch size of 200
- Network access to FHIR server
Next Steps

We are actively working on:
- Testing large-scale data sets
- Integrating fitting visualisation interfaces
- Implementing incremental update capability
- Integration with BRO (Biomedical Resource Ontology) through curated schema mapping for standardised terminology
- Performance optimisations for handling larger datasets
Feedback and Contributions

We welcome feedback, bug reports, and contributions! Please submit issues and pull requests through our repository.

© 2025 MeDaX Project Team | Released under MIT License
Downloads
- Source Code (ZIP)
- Source Code (TAR.GZ)

2 Releases 2 Tags

MeDaX Pipeline v1.0.0 Stable

Release Notes - MeDaX Pipeline v1.0.0 (Initial Release)

Overview

Key Features

Easy Deployment

Data Processing

Extensibility

Installation & Usage

Known Issues and Limitations

Performance Considerations

User Interface

Technical Requirements

Next Steps

Feedback and Contributions