• gebhardtt released this 2025-05-12 13:31:27 +02:00 | 2 commits to main since this release

    Release Notes - MeDaX Pipeline v1.0.1 (Patch Release)

    Overview

    This patch release addresses a critical issue with handling nextLinks during the import of large datasets, improving the reliability and consistency of data retrieval from FHIR servers.

    Bug Fix

    • Resolved an issue with nextLinks pagination mechanism that could potentially interrupt data import for large-scale datasets
    • Improved robustness of import process when dealing with paginated FHIR resources
    • Ensures complete data retrieval across multiple pagination cycles

    Compatibility

    • Fully compatible with v1.0.0 configuration and deployment
    • No changes required to existing environment setups

    Upgrade Recommendation

    Users working with large FHIR datasets are strongly recommended to upgrade to this version to ensure complete and reliable data import.


    © 2025 MeDaX Project Team | Released under MIT License

    Downloads
  • gebhardtt released this 2025-04-24 15:37:50 +02:00 | 4 commits to main since this release

    Release Notes - MeDaX Pipeline v1.0.0 (Initial Release)

    Overview

    We are pleased to announce the first release of our MeDaX pipeline, enabling seamless transformation of hospital FHIR data into a Neo4j graph database, thereby enabling innovative medical data exploration at german Data Integration Centres. This pipeline facilitates:

    • Local setup for controlled data handling
    • Efficient data exploration and analysis across connected healthcare data
    • Improved query capabilities for complex healthcare relationships

    Key Features

    Easy Deployment

    • Containerised setup using Docker Compose for straightforward deployment and configuration
    • Flexible configuration options for connecting to any FHIR server through customisable URL and proxy settings
    • Built-in support for Open Access HAPI FHIR server, enabling immediate testing and validation
    • Simple environment variable configuration through .env file

    Data Processing

    • Validated with real hospital data, ensuring production readiness
    • Implemented property convolution and reference path reduction for efficient graph size reduction
    • Manually curated graph schema to enable semantic enrichment with ontological information, currently using BioLink (BioCypher default)
    • Automated schema extension for unspecified input data to maintain compatibility with evolving FHIR resources
    • Batching of input data to process large-scale data sets
    • Support for patient-centric data retrieval using FHIR's $everything operation

    Extensibility

    • Developed using BioCypher framework, enabling modular architecture
    • Support for additional data source adapters, allowing future expansion to different resources
    • Flexible architecture consists of:
      1. FHIR Import module for data retrieval
      2. Reference Processor for relationship management
      3. Property convolution for complexity reduction
      4. BioCypher Adapter for Neo4j integration

    Installation & Usage

    For detailed installation instructions and information how to cite this work, please refer to the README document included in the repository. Basic setup involves:

    1. Clone the repository
    2. Configure the environment variables
    3. Run docker compose up --build
    4. Access Neo4j at http://localhost:8080/

    Known Issues and Limitations

    Performance Considerations

    • Processing large hospital datasets requires significant computational resources and time due to data complexity
    • Complete pipeline restart required when modifying graph reduction parameters
    • Initial load time may be extensive for large datasets

    User Interface

    • Currently limited to standard Neo4j browser interface
    • Default UI may not be optimal for specialised healthcare use cases

    Technical Requirements

    • Docker and Docker Compose
    • Sufficient storage and computational resources for processing FHIR datasets
      • Currently the memory is the bottleneck, 12GB RAM recommended for a batch size of 200
    • Network access to FHIR server

    Next Steps

    We are actively working on:

    • Testing large-scale data sets
    • Integrating fitting visualisation interfaces
    • Implementing incremental update capability
    • Integration with BRO (Biomedical Resource Ontology) through curated schema mapping for standardised terminology
    • Performance optimisations for handling larger datasets

    Feedback and Contributions

    We welcome feedback, bug reports, and contributions! Please submit issues and pull requests through our repository.


    © 2025 MeDaX Project Team | Released under MIT License

    Downloads