Initial commit

This commit is contained in:
Markus Mandalka 2024-09-08 23:35:23 +02:00
parent 2049c6ab70
commit bbd08001ff
9 changed files with 998 additions and 1 deletions

5
.env.example Normal file
View File

@ -0,0 +1,5 @@
FHIR_VALIDATION_DATASOURCE_BASEURL=http://myfhirserver.mydomain.mynet/fhir
FHIR_VALIDATION_DATASOURCE_AUTH_NAME=myFHIRusername
FHIR_VALIDATION_DATASOURCE_AUTH_PASSWORD=changeMe
JUPYTER_TOKEN=changeThisToken

1
.gitignore vendored Normal file
View File

@ -0,0 +1 @@
.env

120
README.md
View File

@ -1,2 +1,120 @@
# Bulk-FHIR-Validation
---
gitea: none
include_toc: true
---
# Bulk FHIR validation
Dockerized Open Source environment for **bulk FHIR validation of FHIR resources**.
## Aggregation and presentation of bulk validation results
This bulk FHIR validation environment **aggregates/groups and presents validation results of bulk FHIR validation** of FHIR Search results.
![Bulk FHIR validation](bulk-fhir-validation.png)
## Based on open standards and powerfull and flexible Open Source Software
Therefore this validation environment uses following standards and Open Source Software by the Python Library [fhirvalidation.py](home/fhirvalidation.py):
- Loading FHIR resources to be validated by [FHIR search](https://www.hl7.org/fhir/search.html) (for documentation see section "Select resources to be validated by FHIR Search parameters" below)
- [FHIR validation](https://www.hl7.org/fhir/validation.html#op) by [HAPI FHIR Validator](https://hapifhir.io/hapi-fhir/docs/validation/introduction.html) configured by Docker environment variables
- Aggregation by [Python Pandas](https://pandas.pydata.org/docs/user_guide/index.html) dataframe
- Presentation of validation results in web UI by [Jupyter Lab](https://jupyterlab.readthedocs.io/en/latest/)
## Architecture
![Software architecture](bulk-fhir-validator.drawio.png)
## Installation and Configuration
### Setup FHIR Packages
Download the FHIR NPM Packages of the [German MII Core Dataset modules](https://www.medizininformatik-initiative.de/de/uebersicht-ueber-versionen-der-kerndatensatz-module) (Kerndatensatz der Medizininformatik Initiative) to the directory `fhir-packages`.
E.g. by running [download-packages.sh](download-packages.sh):
``
bash download-packages.sh
``
If you want to use other FHIR packages, download the NPM packages to the fhir-packages directory and set them up by the environment variables of the HAPI validation service.
The environment variable names is derived from config section `implementationguides` in HAPIs [application.yaml](https://github.com/hapifhir/hapi-fhir-jpaserver-starter/blob/master/src/main/resources/application.yaml)
### Create config file
Copy .env.skeleton to .env (so .env which will contain your credentials will be excluded from the git repo by .gitignore):
`cp .env.example .env`
### Setup your FHIR server parameters
Edit [.env](.env.example) and set up custom parameters like the URL for your FHIR Server.
### Setup initial password for Jupyter Lab UI
Set a custom initial token in variable `JUPYTER_TOKEN` in `.env`
### Start validation environment
Start the validation environment by
``
docker compose up -d
``
## Usage
### Web UI
Access the [web user interface of Jupyter Lab](https://jupyterlab.readthedocs.io/en/latest/) on the configured (default: 80) port:
http://yourserver/
#### Login
Login with the initial password / token you configured in .env
#### Start validation
Now you can start the validation and aggregation of validation results.
Therefore run the Jupyter Notebook [fhir-validation.ipynb](home/fhir-validation.ipynb).
#### Navigate validation results
You can navigate the validation results by "Table of Content" of Jupyter Lab. Therefore switch the left navigation bar from "File browser" to "Table of Contents").
#### User documentation
The further user documentation is embedded in the Jupyter Notebook:
The different outputs are described in markdown cells and used parameters are described in the code cells.
### Select resources to be validated by FHIR Search parameters
You can select/filter the resources to be validated by [FHIR search](https://www.hl7.org/fhir/search.html) parameters.
For filter options you can set `search_parameters`, see [FHIR search common parameters for all resource types](https://www.hl7.org/fhir/search.html#standard), as well as additional FHIR search parameters for certain resource types like [Patient](https://www.hl7.org/fhir/patient.html#search), [Condition](https://www.hl7.org/fhir/condition.html#search), [Observation](https://www.hl7.org/fhir/observation.html#search), ...
### Python library
If you dont want to use Jupyter Lab as a user interface (e.g. if you want to generate markdown for CI/CD reports), you can use the Python library [fhirvalidation.py](home/fhirvalidation.py) returning a [pandas](https://pandas.pydata.org/docs/user_guide/index.html) dataframe independent from Jupyter Lab.
In the Jupyter Notebook, you can find documentation on how to use the library, including example with code snippets.

BIN
bulk-fhir-validation.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 139 KiB

77
docker-compose.yml Normal file
View File

@ -0,0 +1,77 @@
version: '3.7'
services:
# HTTP Server (Nginx) providing FHIR NPM repository with packages from our local directory ./packages
fhir-packages-repository-service:
image: nginx
restart: unless-stopped
volumes:
# The FHIR NPM packages (.tgz archives) are located in local directory ./packages
# This source directory with the FHIR packages is mounted as the Nginx http server standard content directory /usr/share/nginx/html
- ./packages:/usr/share/nginx/html:ro
# FHIR Server (HAPI)
fhir-validation-server:
# HAPI FHIR (https://hapifhir.io/)
image: hapiproject/hapi:v7.0.3
restart: unless-stopped
depends_on:
- fhir-packages-repository-service
ports:
- 8080:8080
environment:
# Load FHIR NPM Package with MII Kerndatensatz module and its dependencies from NPM repository packages.fhir.org mocked by fhir-packages-repository-service
HAPI_FHIR_IMPLEMENTATIONGUIDES_DEBASIS_NAME: "de.basisprofil.r4"
HAPI_FHIR_IMPLEMENTATIONGUIDES_DEBASIS_VERSION: "1.4.0"
HAPI_FHIR_IMPLEMENTATIONGUIDES_DEBASIS_PACKAGEURL: "http://fhir-packages-repository-service/de.basisprofil.r4-1.4.0.tgz"
#HAPI_FHIR_IMPLEMENTATIONGUIDES_DEBASIS_PACKAGEURL: "http://fhir-packages-repository-service/de.basisprofil.r4-1.4.0-explicit-versions-in-valueset.tgz"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSMETA_NAME: "de.medizininformatikinitiative.kerndatensatz.meta"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSMETA_VERSION: "1.0.3"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSMETA_PACKAGEURL: "http://fhir-packages-repository-service/de.medizininformatikinitiative.kerndatensatz.meta-1.0.3.tgz"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSPERSON_NAME: "de.medizininformatikinitiative.kerndatensatz.person"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSPERSON_VERSION: "2024.0.0"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSPERSON_PACKAGEURL: "http://fhir-packages-repository-service/de.medizininformatikinitiative.kerndatensatz.person-2024.0.0.tgz"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSDIAGNOSE_NAME: "de.medizininformatikinitiative.kerndatensatz.diagnose"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSDIAGNOSE_VERSION: "2024.0.0"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSDIAGNOSE_PACKAGEURL: "http://fhir-packages-repository-service/de.medizininformatikinitiative.kerndatensatz.diagnose-2024.0.0.tgz"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSPROZEDUR_NAME: "de.medizininformatikinitiative.kerndatensatz.prozedur"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSPROZEDUR_VERSION: "2024.0.0"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSPROZEDUR_PACKAGEURL: "http://fhir-packages-repository-service/de.medizininformatikinitiative.kerndatensatz.prozedur-2024.0.0.tgz"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSFALL_NAME: "de.medizininformatikinitiative.kerndatensatz.fall"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSFALL_VERSION: "2024.0.1"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSFALL_PACKAGEURL: "http://fhir-packages-repository-service/de.medizininformatikinitiative.kerndatensatz.fall-2024.0.1.tgz"
HAPI_FHIR_IMPLEMENTATIONGUIDES_HL7FHIRUVIPS_NAME: "hl7.fhir.uv.ips"
HAPI_FHIR_IMPLEMENTATIONGUIDES_HL7FHIRUVIPS_VERSION: "1.0.0"
HAPI_FHIR_IMPLEMENTATIONGUIDES_HL7FHIRUVIPS_PACKAGEURL: "http://fhir-packages-repository-service/hl7.fhir.uv.ips-1.0.0.tgz"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSLABORBEFUND_NAME: "de.medizininformatikinitiative.kerndatensatz.laborbefund"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSLABORBEFUND_VERSION: "1.0.6"
HAPI_FHIR_IMPLEMENTATIONGUIDES_MIIKDSLABORBEFUND_PACKAGEURL: "http://fhir-packages-repository-service/de.medizininformatikinitiative.kerndatensatz.laborbefund-1.0.6.tgz"
jupyter:
image: jupyter/scipy-notebook
restart: unless-stopped
ports:
- 80:8888
# set custom token
command: start-notebook.py --NotebookApp.token='${JUPYTER_TOKEN}'
environment:
# set credentials in .env so this docker-compose.yml can be fully versioned in git!
FHIR_VALIDATION_DATASOURCE_BASEURL: ${FHIR_VALIDATION_DATASOURCE_BASEURL}
FHIR_VALIDATION_DATASOURCE_AUTH_NAME: ${FHIR_VALIDATION_DATASOURCE_AUTH_NAME}
FHIR_VALIDATION_DATASOURCE_AUTH_PASSWORD: ${FHIR_VALIDATION_DATASOURCE_AUTH_PASSWORD}
volumes:
- ./home/:/home/jovyan/

13
download-packages.sh Normal file
View File

@ -0,0 +1,13 @@
#!/bin/sh
# if failing because of missing proxy parameters, export environment variable HTTPS_PROXY before
curl https://packages.fhir.org/de.basisprofil.r4/1.4.0 -o packages/de.basisprofil.r4-1.4.0.tgz
curl https://packages.fhir.org/de.medizininformatikinitiative.kerndatensatz.meta/1.0.3 -o packages/de.medizininformatikinitiative.kerndatensatz.meta-1.0.3.tgz
curl https://packages.fhir.org/de.medizininformatikinitiative.kerndatensatz.person/2024.0.0 -o packages/de.medizininformatikinitiative.kerndatensatz.person-2024.0.0.tgz
curl https://packages.fhir.org/de.medizininformatikinitiative.kerndatensatz.diagnose/2024.0.0 -o packages/de.medizininformatikinitiative.kerndatensatz.diagnose-2024.0.0.tgz
curl https://packages.fhir.org/de.medizininformatikinitiative.kerndatensatz.prozedur/2024.0.0 -o packages/de.medizininformatikinitiative.kerndatensatz.prozedur-2024.0.0.tgz
curl https://packages.fhir.org/de.medizininformatikinitiative.kerndatensatz.fall/2024.0.1 -o packages/de.medizininformatikinitiative.kerndatensatz.fall-2024.0.1.tgz
curl https://packages.fhir.org/hl7.fhir.uv.ips/1.0.0 -o packages/hl7.fhir.uv.ips-1.0.0.tgz
curl https://packages.fhir.org/de.medizininformatikinitiative.kerndatensatz.laborbefund/1.0.6 -o packages/de.medizininformatikinitiative.kerndatensatz.laborbefund-1.0.6.tgz
chmod o+r packages/*

479
home/fhir-validation.ipynb Normal file
View File

@ -0,0 +1,479 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "70659ec5-54c4-4eee-ba6a-c3f17ac88638",
"metadata": {},
"source": [
"## Output options for rendering the dataframes/tables"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98078b40-9c8f-4b74-aaa7-275df72c9b79",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"# how many rows of table to show\n",
"pd.set_option(\"display.max_rows\", 1000)\n",
"# we want to see the full error messages (often longer than default colwidth)\n",
"pd.set_option(\"max_colwidth\", 10000)"
]
},
{
"cell_type": "markdown",
"id": "9566f3ad-3587-415b-a49a-02ffaee35b34",
"metadata": {},
"source": [
"## Validate and render validation results of one FHIR-Resource"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f93e44bc-9644-4c50-94af-d9a29a91d54b",
"metadata": {},
"outputs": [],
"source": [
"#from fhirvalidation import Validator\n",
"#validator = Validator()\n",
"#validator.validate_resource_and_render_validation_outcome ('Condition/resource1')\n"
]
},
{
"cell_type": "markdown",
"id": "e0cd94b5-3a97-40c8-9a4a-d5d33feb9d32",
"metadata": {},
"source": [
"## Bulk validation of found resources by FHIR Search\n",
"\n",
"Validate all resources from FHIR search results"
]
},
{
"cell_type": "markdown",
"id": "d8ec4f1c-d933-4222-a35a-178454aba98a",
"metadata": {},
"source": [
"#### Validate conditions"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42eeca2e-8b33-44d7-b194-e72630e66140",
"metadata": {
"editable": true,
"scrolled": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"from fhirvalidation import Validator\n",
"\n",
"validator = Validator()\n",
"\n",
"# Set auth parameters for your FHIR server (if you do not want to use basic auth credentials from the environment variables in .env)\n",
"# Documentation: https://requests.readthedocs.io/en/latest/user/authentication/#basic-authentication\n",
"# validator.requests_kwargs['auth'] = ( 'myusername', 'mypassword )\n",
"\n",
"# Search for all resources of the resource_type\n",
"search_parameters = {}\n",
"\n",
"# Search resources with for certain code\n",
"#search_parameters={\"code\": \"A00.0\"}\n",
"\n",
"df = validator.search_and_validate(resource_type=\"Condition\", search_parameters=search_parameters, limit=10000)"
]
},
{
"cell_type": "markdown",
"id": "a400b6d7-2565-4c06-8fcf-354cd9f0e970",
"metadata": {},
"source": [
"## Issues\n",
"Found issues in dataframe returned by bulk validation"
]
},
{
"cell_type": "markdown",
"id": "001dcc6b-0517-4101-8270-686dcde78e55",
"metadata": {},
"source": [
"### Count of resources with issues\n",
"\n",
"Count of resources (unique fullURL) with issues of all severities (even severity \"info\", so maybe no real issue)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "efa56586-2b67-4219-814a-3d679f360faa",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"len( pd.unique(df['fullUrl']) )"
]
},
{
"cell_type": "markdown",
"id": "7e0fe487-fa30-4c64-8ba2-b13ac20a7714",
"metadata": {},
"source": [
"### Grouped issues with aggregation of codesystems sorted by count of affected resources\n",
"\n",
"Issues grouped by additional aggregation of Codesystems (e.g. ICD10) by removing the different codes of same codesystem resulting in no separate issue for each used code (e.g. ICD10-Code) of the code system\n",
"\n",
"Sorted by count of affected resources"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09af41b7-c4c4-422a-a1c6-577afbec98ac",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"df[['severity', 'location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby([\"severity\", \"location_aggregated\", \"diagnostics_aggregated\"]).count().sort_values(['fullUrl'], ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "e4761060-19c3-4a9e-84a5-ce83088caefd",
"metadata": {},
"source": [
"### Grouped issues with aggregation of codesystems sorted by severty\n",
"\n",
"Issues grouped by additional aggregation of Codesystems (e.g. ICD10) by removing the different codes of same codesystem resulting in no separate issue for each used code (e.g. ICD10-Code) of the code system\n",
"\n",
"Sorterd by severity"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e596c4df-bfc2-4faa-b63a-b606e96dbade",
"metadata": {
"editable": true,
"scrolled": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"df[['severity', 'location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby([\"severity\", \"location_aggregated\", \"diagnostics_aggregated\"]).count().sort_values(['severity','fullUrl'], ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "b84016bd-2aba-4c28-a20a-f4b48a497234",
"metadata": {},
"source": [
"### Grouped issues without aggregation of codesystems\n",
"\n",
"Issues and count of affected resources sorted by amount of affected resources due to no aggregation of codesystem (for additional aggregation of codesystems see upper sections). This will show a separate issue for each used code used from a codesystem"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83f03e4a-d3f5-406a-aa16-de846a72b0b1",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"df[['severity', 'location', 'diagnostics', 'fullUrl']].groupby([\"severity\", \"location\", \"diagnostics\"]).count().sort_values(['fullUrl'], ascending=False)\n"
]
},
{
"cell_type": "markdown",
"id": "227a8710-5b07-421f-91ba-8a2a5ba25172",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"### Filter on severity \"error\"\n",
"\n",
"Show only issues filtered by severity \"error\""
]
},
{
"cell_type": "markdown",
"id": "20b5c938-6a08-4698-9297-7d2764c49838",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"#### Count of resources with severity \"error\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5609dfb4-501b-4cc8-9318-b19cd6399b69",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"len( pd.unique(df[df['severity']==\"error\"]['fullUrl']) )"
]
},
{
"cell_type": "markdown",
"id": "8e72ccfa-813f-43f6-9e1d-c713c03c2714",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"#### Show only issues with severity \"error\" grouped by codesystems\n",
"\n",
"Show grouped issues with filter on severity \"error\"\n",
"\n",
"Issues grouped by additional aggregation of Codesystems (e.g. ICD10) by removing the different codes of same codesystem resulting in no separate issue for each used code (e.g. ICD10-Code) of the code system\n",
"\n",
"Sorted by count of affected resources\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1205db8-0efe-491c-961b-97ad7d8149cf",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"df.query('severity==\"error\"')[['location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby([\"location_aggregated\", \"diagnostics_aggregated\"]).count().sort_values(['fullUrl'], ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "1a8f220d-d314-4053-b42b-1ef6bf9a9bdd",
"metadata": {},
"source": [
"#### Grouped issues with severity \"error\" without aggregation of codesystems\n",
"\n",
"Issues and count of affected resources sorted on amount of affected resources\n",
"Since no aggregation of codesystem (for additional aggregation of codesystems see upper sections) this will show a separate issue for each used code used from a codesystem"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e12ab11b-c245-440f-acfb-400cdb24e6d1",
"metadata": {},
"outputs": [],
"source": [
"df.query('severity==\"error\"')[['location', 'diagnostics', 'fullUrl']].groupby([\"location\", \"diagnostics\"]).count().sort_values(['fullUrl'], ascending=False)"
]
},
{
"cell_type": "markdown",
"id": "ea051ae1-6698-4888-a672-3bc55c6740cd",
"metadata": {},
"source": [
"## Resources with a specific error"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cef7c2f4-22a3-4c18-8eb2-ae7f811df532",
"metadata": {},
"outputs": [],
"source": [
"myerror = \"Condition.code.coding:icd10-gm.version: minimum required = 1, but only found 0 (from https://www.medizininformatik-initiative.de/fhir/core/modul-diagnose/StructureDefinition/Diagnose|2024.0.0)\"\n",
"\n",
"# Use Python syntax:\n",
"# df[df['diagnostics']==myerror]\n",
"#\n",
"# or use df.query\n",
"# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html and https://docs.python.org/3/reference/lexical_analysis.html#f-strings:\n",
"df_query = f'diagnostics==\"{myerror}\"'\n",
"\n",
"df.query(df_query)"
]
},
{
"cell_type": "markdown",
"id": "5f4146e5-ce88-42e0-b1ac-41b93c1d59f1",
"metadata": {},
"source": [
"## Info\n",
"\n",
"Information concerning the dataframe, e.g. dataframe memory usage"
]
},
{
"cell_type": "markdown",
"id": "2be8ee0e-5c48-4453-804e-c2db23115bd9",
"metadata": {},
"source": [
"### Dataframe memory usage"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73994859-6824-42e3-9ee3-9570ef9183a8",
"metadata": {},
"outputs": [],
"source": [
"df.info()\n",
"df.memory_usage(deep=True)"
]
},
{
"cell_type": "markdown",
"id": "c46b93d4-35ad-427d-bda8-44c42f6b91a1",
"metadata": {},
"source": [
"### Head - Returns first rows of dataframe"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e521216-ad7d-4ca6-8e04-7d86435a3a6a",
"metadata": {},
"outputs": [],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "6311f067-72d1-4a32-91fe-585ebfb74c55",
"metadata": {},
"source": [
"## Snippets\n",
"\n",
"Additional code snippets"
]
},
{
"cell_type": "markdown",
"id": "ff1ed096-dc18-492b-988b-8c5b7899adb9",
"metadata": {},
"source": [
"### Markdown generation\n",
"\n",
"How to generate table in markdown format (e.g. for CI/CD status report)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8843debf-ca4c-409c-89eb-8eba64432438",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Reserved char pipe | has to be escaped by | (https://github.com/astanin/python-tabulate/issues/241)\n",
"df_escaped = df.applymap(lambda s: s.replace('|','\\\\|') if isinstance(s, str) else s)\n",
"\n",
"print(df_escaped[['severity', 'location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby([\"severity\", \"location_aggregated\", \"diagnostics_aggregated\"]).count().sort_values(['fullUrl'], ascending=False).to_markdown(tablefmt=\"github\") )\n"
]
},
{
"cell_type": "markdown",
"id": "a9dd17c8-3249-4959-967c-affb1d30cf23",
"metadata": {},
"source": [
"### Navigate validation results dataframe with interactive user interface\n",
"\n",
"Use interactive UI to navigate and filter the dataframe\n",
"\n",
"Documentation: [English](https://docs.kanaries.net/pygwalker#use-pygwalker-in-jupyter-notebook) / [German](https://docs.kanaries.net/de/pygwalker#verwendung-von-pygwalker-in-jupyter-notebook)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "086cda6f-2508-4b04-9c29-2894cf3d8b4b",
"metadata": {},
"outputs": [],
"source": [
"# Install pip package in the current Jupyter kernel\n",
"import sys\n",
"!{sys.executable} -m pip install pygwalker --proxy http://141.53.65.163:8080/"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4b8b1e8-bb9d-48ae-818f-b32b75b47ec6",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# render dataframe with pygwalker\n",
"import pygwalker as pyg\n",
"walker = pyg.walk(df)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "054d49ef-1c15-4ce1-bc0a-d941446e60dd",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

304
home/fhirvalidation.py Normal file
View File

@ -0,0 +1,304 @@
import requests
import json
import pandas as pd
import re
import os
import logging
class Validator():
def __init__(self, fhir_base_url=None):
self.fhir_base_url = fhir_base_url
if not self.fhir_base_url:
self.fhir_base_url = os.environ.get('FHIR_VALIDATION_DATASOURCE_BASEURL')
# Keyword arguments for HTTP(s) requests (f.e. for auth)
# Example parameters:
# Authentication: https://requests.readthedocs.io/en/latest/user/authentication/#basic-authentication
# Proxies: https://requests.readthedocs.io/en/latest/user/advanced/#proxies
# SSL Certificates: https://requests.readthedocs.io/en/latest/user/advanced/#ssl-cert-verification
self.requests_kwargs = {}
# Init basic auth credentials from environment variables
if (os.environ.get('FHIR_VALIDATION_DATASOURCE_AUTH_NAME')):
self.requests_kwargs['auth'] = (os.environ.get('FHIR_VALIDATION_DATASOURCE_AUTH_NAME'),
os.environ.get('FHIR_VALIDATION_DATASOURCE_AUTH_PASSWORD'))
def fhir_operation_validate(self, resource_type, resource, send_pretty=False):
headers = {'User-Agent': 'Bulk FHIR validator',
'Content-Type': 'application/fhir+json'}
if send_pretty:
data = json.dumps(resource, indent=4)
else:
data = json.dumps(resource)
# todo: use environment variable and set it in docker-compose
r = requests.post('http://fhir-validation-server:8080/fhir/' + resource_type + '/$validate', headers=headers,
data=data)
outcome = r.json()
return outcome
def validate(self, resource_type, entry):
resource = entry.get('resource')
fullUrl = entry.get('fullUrl')
logging.debug(f"Validating {fullUrl}")
outcome = self.fhir_operation_validate(resource_type, resource)
df = pd.DataFrame()
for issue in outcome.get('issue'):
diagnostics = issue.get('diagnostics')
diagnostics_aggregated = remove_value_code(diagnostics)
diagnostics_aggregated = remove_array_index(diagnostics_aggregated)
severity = issue.get('severity')
location = issue.get('location')
location = location[0]
location_aggregated = remove_array_index(location)
df_add = pd.DataFrame(
{'severity': severity, 'location': location, 'location_aggregated': location_aggregated,
'diagnostics': diagnostics, 'diagnostics_aggregated': diagnostics_aggregated, 'fullUrl': fullUrl},
index=[0])
df = pd.concat([df, df_add], ignore_index=True)
return df
def search_and_validate(self, resource_type="Patient", search_parameters={}, limit=0):
count = 0
page = 0
headers = {'User-Agent': 'Bulk FHIR validator',
'Content-Type': 'application/fhir+json',
'Prefer': 'handling=strict',
# "Client requests that the server return an error for any unknown or unsupported parameter" instead of "ignore any unknown or unsupported parameter" (f.e. typo in search parameter) and getting all results by ignoring the filter criteria (https://www.hl7.org/fhir/R4/search.html#errors)
}
if '_count' not in search_parameters:
search_parameters['_count'] = 200
df = pd.DataFrame()
is_limit_reached = False
page_url = f'{self.fhir_base_url}/{resource_type}'
while page_url and not is_limit_reached:
page += 1
if (page == 1):
logging.info(f"FHIR Search: Requesting {page_url}")
r = requests.get(page_url,
params=search_parameters,
headers=headers,
**self.requests_kwargs
)
else:
logging.info(f"FHIR Search: Requesting next page {page_url}")
r = requests.get(page_url,
headers=headers,
**self.requests_kwargs
)
r.raise_for_status()
bundle_dict = r.json()
if (page == 1):
total = bundle_dict.get('total')
if total == None:
total = 0
logging.info(f"Found {total} resources")
count_entries = 0
entries = bundle_dict.get('entry')
if entries:
count_entries = len(entries)
logging.info(f"Starting validation of {count_entries} entries on this page")
for entry in entries:
df_add = self.validate(resource_type, entry)
df = pd.concat([df, df_add], ignore_index=True)
count += 1
if (limit > 0 and count >= limit):
is_limit_reached = True
logging.info(
f"Custom limit of {limit} resources reached, no further FHIR search paging and validation")
break
if ((limit == 0) or (total < limit)):
logging.info(f"Validated {count} of {total} resources")
else:
logging.info(
f"Validated {count} of {limit} resources (custom limit, found resources by FHIR search query: {total})")
page_url = get_next_page_url(bundle_dict)
if count > 0:
logging.info(f"Search and validation done for {count} of {total} found resources")
return (df)
def validate_resource_and_render_validation_outcome(self, resource_url, resource_type=None):
resource_url = self.fhir_base_url + '/' + resource_url
# if no resource_type Parameter set, select FHIR resource type from URL
find_resource_type = re.search(r".*/(.*)/.*", resource_url)
resource_type = find_resource_type.groups()[0]
headers = {'User-Agent': 'Bulk FHIR validator',
'Content-Type': 'application/fhir+json'}
r = requests.get(resource_url,
headers=headers,
**self.requests_kwargs
)
resource = r.json()
outcome = fhir_operation_validate(resource_type, resource, send_pretty=True)
render_validation_outcome(resource, outcome, resource_url=resource_url)
def get_next_page_url(bundle_dict):
links = bundle_dict.get('link')
if links:
for link in links:
relation = link.get('relation')
if relation == 'next':
return link.get('url')
return None
def remove_value_code(diagnostics):
find_value_code = re.search(r"Coding provided \(.+?\#(.+?)\) is not in the value set", diagnostics)
if not find_value_code:
find_value_code = re.search(r"Unknown code in fragment CodeSystem \'.+?\#(.+?)\'", diagnostics)
if find_value_code:
value_code = find_value_code.groups()[0]
diagnostics_removed_valuecode = diagnostics.replace(value_code, "REMOVEDCODE")
else:
diagnostics_removed_valuecode = diagnostics
return diagnostics_removed_valuecode
def remove_array_index(diagnostics):
diagnostics_removed_array_index = re.sub("\[[0-9]+\]", "[x]", diagnostics)
return diagnostics_removed_array_index
def select_location_line(issue):
# Get Location line by scraping Element Location by regex
# location_linecolumn = issue['location'][1]
# find_line = re.search(r"Line\[([0-9]+)\]", location_linecolumn)
# location_line = find_line.groups()[0]
# location_line = int(location_line)
# return location_line
# Get location line from FHIR extension http://hl7.org/fhir/StructureDefinition/operationoutcome-issue-line
extensions = issue.get('extension')
if extensions:
for extension in extensions:
url = extension.get('url')
if (url == 'http://hl7.org/fhir/StructureDefinition/operationoutcome-issue-line'):
return extension.get('valueInteger')
return None
def render_validation_outcome(resource, outcome, resource_url=None, do_print_linenumber=True):
from IPython.display import display, HTML
import html
resource_id = resource.get('id')
resource_html = json.dumps(resource, indent=4)
resource_html = html.escape(resource_html)
resource_html = resource_html.replace(" ", "&nbsp;").replace("\n", "<br>")
resource_html_array = resource_html.split('<br>')
if do_print_linenumber:
resource_html_with_linenumber = []
linenumber = 0
for line in resource_html_array:
linenumber += 1
line = '<span style="background: lightgray;">' + str(linenumber).zfill(3) + "</span> " + line
resource_html_with_linenumber.append(line)
resource_html_array = resource_html_with_linenumber
# sort the issues by linenumber so status info for "issue 1 of 5", "issue 2 of 5" etc. is in right order like lines of document
# do it reverse because we add issue at begin of the line of fhir resource and multiple issues can be added to a line of fhir resource
issues_sorted = sorted(outcome['issue'], key=select_location_line, reverse=True)
count_issues = len(issues_sorted)
issuenumber = count_issues
summary_html = ''
for issue in issues_sorted:
location_element = issue['location'][0]
location_line = select_location_line(issue)
match issue['severity']:
case "error":
style = "color: black; background: red;"
case "warning":
style = "color: black; background: orange;"
case _:
style = "color: black; background: lightgray;"
# Issue number and navigation
issue_html = f'<span id="{resource_id}-issue{issuenumber}"><li style="' + style + '"><small>'
# Link to previous issue
if issuenumber > 1:
issue_html += f'<a href="#{resource_id}-issue' + str(issuenumber - 1) + '">&lt; Previous issue</a> | '
issue_html += f'Issue {issuenumber} of {count_issues}'
# Link to summary
issue_html += f' | <a href="#{resource_id}">&circ; Back to summary</a>'
# Link to next issue
if issuenumber < len(issues_sorted):
issue_html += f' | <a href="#{resource_id}-issue' + str(issuenumber + 1) + '">Next issue &gt;</a>'
issue_html += '</small><br>'
issue_html += f'{issue["severity"]} for element <i><b>{location_element}</b></i> (beginning at line ' + str(
location_line) + f'):<br><b>{issue["diagnostics"]}</b>'
issue_html += '</li></span>'
summary_html = f'<li style="{style}">{issue["severity"]} for element <i><b>{location_element}</b></i> (beginning at line ' + str(
location_line) + '):<br><b>{issue["diagnostics"]}</b></li><p><a href="#{resource_id}-issue' + str(
issuenumber) + '">Navigate to JSON Code of the FHIR resource to location where this issue occurs</a>' + summary_html
# add issue html to fhir resource line
resource_html_array[location_line] = issue_html + resource_html_array[location_line]
issuenumber -= 1
resource_html = '<br style="font-family: monospace;">'.join(resource_html_array)
summary_html = f'<h3 id="{resource_id}">Validation result for resource {resource_id}</h3><p>URL of the validated FHIR resource: <a target="_blank" href="{resource_url}">{resource_url}</a></p><h3>Issues</h3>FHIR Validation returned ' + str(
len(issues_sorted)) + ' issues:<ol>{summary_html}'
resource_html = summary_html + '</ol><h4>Where Issues occur in the JSON Code of the FHIR Resource</h4>' + resource_html
resource_html += f'<p><a href="#{resource_id}">Back to summary</a></p>'
display(HTML(resource_html))
outcome_html = html.escape(json.dumps(outcome, indent=4))
outcome_html = outcome_html.replace(" ", "&nbsp;").replace("\n", "<br>")
# display(HTML(outcome_html))