Files
Bulk-FHIR-Validation/home/fhir-validation.ipynb

13 KiB

Output options for rendering the dataframes/tables

In [ ]:
import pandas as pd
# how many rows of table to show
pd.set_option("display.max_rows", 1000)
# we want to see the full error messages (often longer than default colwidth)
pd.set_option("max_colwidth", 10000)

Validate and render validation results of one FHIR-Resource

In [ ]:
#from fhirvalidation import Validator
#validator = Validator()
#validator.validate_resource_and_render_validation_outcome ('Condition/resource1')

Validate all resources from FHIR search results

Validate conditions

In [ ]:
from fhirvalidation import Validator

validator = Validator()

# Set auth parameters for your FHIR server (if you do not want to use basic auth credentials from the environment variables in .env)
# Documentation: https://requests.readthedocs.io/en/latest/user/authentication/#basic-authentication
# validator.requests_kwargs['auth'] = ( 'myusername', 'mypassword )

# Search for all resources of the resource_type
search_parameters = {}

# Search resources with for certain code
#search_parameters={"code": "A00.0"}

df = validator.search_and_validate(resource_type="Condition", search_parameters=search_parameters, limit=10000)

Issues

Found issues in dataframe returned by bulk validation

Count of resources with issues

Count of resources (unique fullURL) with issues of all severities (even severity "info", so maybe no real issue)

In [ ]:
import pandas as pd
len( pd.unique(df['fullUrl']) )

Grouped issues with aggregation of codesystems sorted by count of affected resources

Issues grouped by additional aggregation of Codesystems (e.g. ICD10) by removing the different codes of same codesystem resulting in no separate issue for each used code (e.g. ICD10-Code) of the code system

Sorted by count of affected resources

In [ ]:
df[['severity', 'location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby(["severity", "location_aggregated", "diagnostics_aggregated"]).count().sort_values(['fullUrl'], ascending=False)

Grouped issues with aggregation of codesystems sorted by severty

Issues grouped by additional aggregation of Codesystems (e.g. ICD10) by removing the different codes of same codesystem resulting in no separate issue for each used code (e.g. ICD10-Code) of the code system

Sorterd by severity

In [ ]:
df[['severity', 'location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby(["severity", "location_aggregated", "diagnostics_aggregated"]).count().sort_values(['severity','fullUrl'], ascending=False)

Grouped issues without aggregation of codesystems

Issues and count of affected resources sorted by amount of affected resources due to no aggregation of codesystem (for additional aggregation of codesystems see upper sections). This will show a separate issue for each used code used from a codesystem

In [ ]:
df[['severity', 'location', 'diagnostics', 'fullUrl']].groupby(["severity", "location", "diagnostics"]).count().sort_values(['fullUrl'], ascending=False)

Filter on severity "error"

Show only issues filtered by severity "error"

Count of resources with severity "error"

In [ ]:
len( pd.unique(df[df['severity']=="error"]['fullUrl']) )

Show only issues with severity "error" grouped by codesystems

Show grouped issues with filter on severity "error"

Issues grouped by additional aggregation of Codesystems (e.g. ICD10) by removing the different codes of same codesystem resulting in no separate issue for each used code (e.g. ICD10-Code) of the code system

Sorted by count of affected resources

In [ ]:
df.query('severity=="error"')[['location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby(["location_aggregated", "diagnostics_aggregated"]).count().sort_values(['fullUrl'], ascending=False)

Grouped issues with severity "error" without aggregation of codesystems

Issues and count of affected resources sorted on amount of affected resources Since no aggregation of codesystem (for additional aggregation of codesystems see upper sections) this will show a separate issue for each used code used from a codesystem

In [ ]:
df.query('severity=="error"')[['location', 'diagnostics', 'fullUrl']].groupby(["location", "diagnostics"]).count().sort_values(['fullUrl'], ascending=False)

Resources with a specific error

In [ ]:
myerror = "Condition.code.coding:icd10-gm.version: minimum required = 1, but only found 0 (from https://www.medizininformatik-initiative.de/fhir/core/modul-diagnose/StructureDefinition/Diagnose|2024.0.0)"

# Use Python syntax:
# df[df['diagnostics']==myerror]
#
# or use df.query
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html and https://docs.python.org/3/reference/lexical_analysis.html#f-strings:
df_query = f'diagnostics=="{myerror}"'

df.query(df_query)

Info

Information concerning the dataframe, e.g. dataframe memory usage

Dataframe memory usage

In [ ]:
df.info()
df.memory_usage(deep=True)

Head - Returns first rows of dataframe

In [ ]:
df.head()

Snippets

Additional code snippets

Markdown generation

How to generate table in markdown format (e.g. for CI/CD status report)

In [ ]:
# Reserved char pipe | has to be escaped by | (https://github.com/astanin/python-tabulate/issues/241)
df_escaped = df.applymap(lambda s: s.replace('|','\\|') if isinstance(s, str) else s)

print(df_escaped[['severity', 'location_aggregated', 'diagnostics_aggregated', 'fullUrl']].groupby(["severity", "location_aggregated", "diagnostics_aggregated"]).count().sort_values(['fullUrl'], ascending=False).to_markdown(tablefmt="github") )

Navigate validation results dataframe with interactive user interface

Use interactive UI to navigate and filter the dataframe

Documentation: English / German

In [ ]:
# Install pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install pygwalker
In [ ]:
# render dataframe with pygwalker
import pygwalker as pyg
walker = pyg.walk(df)
In [ ]: