Giter Club home page Giter Club logo

fhir-populator's Introduction

FHIR Populator

PyPI version

A tool to load a lot of FHIR resources into a "naked" FHIR server.

It is intended to quickly load a package of FHIR Profiles (StructureDefinitions) and associated artefacts (such as CodeSystem, ValueSet, ConceptMap) into a FHIR server that has just been spun up.

This tool was developed in the context of the Core Dataset (KDS) of the Medical Informatics Initiative (MII) in Germany as well as the German Corona Consensus Dataset (GECCO) developed by the Network University Medicine.

The script is written in Python 3 and available on PyPI.

Installation

As the package is available on the Python package index, you can install it quickly into a Virtual Environment. First, you may need to create a folder for FHIR populator and a Virtual Environment (all commands are for Unix-based OS and may need tweaking on Windows):

mkdir fhir-populator
cd fhir-populator
python -m venv .venv
source .venv/bin/activate

On Windows without Windows Subsystem for Linux, you will need to change the last command to .venv\bin\activate.bat.

These commands will create a new directory, visit it, create the virtual environment, and activate it.

Next, load the package from PyPI:

python -m pip install fhir-populator

You can now start it as a Python module:

python -m fhir_populator --help

and the help will be printed:

usage: fhir_populator [-h] --endpoint ENDPOINT [--authorization-header AUTHORIZATION_HEADER] [--log-file LOG_FILE]
                      [--get-dependencies] [--non-interactive] [--include-examples]
                      [--log-level {INFO,WARNING,DEBUG,ERROR}] [--rewrite-versions] [--only-put] [--versioned-ids]
                      [--exclude-resource-type [EXCLUDE_RESOURCE_TYPE ...] | --only [ONLY ...]]
                      [--registry-url REGISTRY_URL] [--package PACKAGES [PACKAGES ...]]
                      [--persist] [--from-persistence] [--persistence-dir PERSISTANCE_DIR]

optional arguments:
  -h, --help            show this help message and exit
  --endpoint ENDPOINT   The FHIR server REST endpoint (default: None)
  --authorization-header AUTHORIZATION_HEADER
                        an authorization header to use for uploading. If none, nothing will be sent. (default: None)
  --log-file LOG_FILE   A log file path (default: None)
  --get-dependencies    if provided, dependencies will be retrieved from the FHIR registry. (default: False)
  --non-interactive     In case of errors returned by this FHIR server, the error will be ignored with only a log
                        message being written out. Might be helpful when integrating this module into a script.
                        (default: False)
  --include-examples    If provided, the resources in the 'examples' folder of the packages will be uploaded.
                        (default: False)
  --log-level {INFO,WARNING,DEBUG,ERROR}
                        The level to log at (default: INFO)
  --rewrite-versions    If provided, all versions of FHIR resources will be modified to be consistent with the
                        package version. Otherwise, the version is used as-is! (default: False)
  --only-put            if provided, IDs will be generated for all resources that lack one. This can be combined with
                        --versioned-ids. (default: False)
  --versioned-ids       if provided, all resource IDs will be prefixed with the package version. (default: False)
  --exclude-resource-type [EXCLUDE_RESOURCE_TYPE ...]
                        Specify resource types to ignore! (default: None)
  --only [ONLY ...]     Only upload the resource types provided here, e.g. only StructureDefinitions, CodeSystems and
                        ValueSets (default: None)
  --registry-url REGISTRY_URL
                        The FHIR registry url, Simplifier by default (default: https://packages.simplifier.net)
  --package PACKAGES [PACKAGES ...]
                        Specification for the package to download and push to the FHIR server. You can specify more
                        than one package. Use the syntax 'package@version', or leave out the version to use the
                        latest package available on the registry. (default: None)
  --persist          if provided the package will be persisted in the persist-dir
  --persistence-dir      directory where the persisted packages will be stored or loaded from
  --from-persistence     if provided the package will be loaded from the persistence-dir                      

There are a lot of command line options that can be used to customize the behaviour of the program.

Example Invocation

To try out the program, you can spin up a FHIR server, such as HAPI FHIR JPA Server Starter on your local machine, e.g. using Docker. Assuming the endpoint of the server is http://localhost:8080/fhir, you can upload the latest version of the GECCO package, including dependencies (e.g. the MII KDS modules used by that package), thus:

python -m fhir_populator --endpoint http://localhost:8080/fhir --get-dependencies --package de.gecco

As this example does not specify a version of the de.gecco package, the latest version of the package will first be determined from the Simplifier API. You can also specify a version using the syntax package@version:

python -m fhir_populator --endpoint http://localhost:8080/fhir --get-dependencies --package [email protected]

Also, you can specify as many packages as you like, and mix-and-match versioned references with unversioned ones:

python -m fhir_populator --endpoint http://localhost:8080/fhir --get-dependencies --package [email protected] de.medizininformatikinitiative.kerndatensatz.person

Implementation Details

The script is broken into multiple steps:

  1. All unversioned package references are converted to versioned references, by retrieving the package metadata from the NPM registry.
  2. The packages are downloaded as Tarballs into a temporary directory (under /tmp for Unix systems), and extracted there
  3. After each package is downloaded, the package.json is examined, and dependencies are added to the download queue, if desired. During this download, a dependency graph is built from the downloaded packages, to make sure that every package is uploaded after its dependencies
  4. The packages are uploaded, file-by-file, to the FHIR server. This uses the topological sort of the directed dependency graph, to maintain consistency. Also, the files are uploaded in logical versions (e.g. CodeSystem before ValueSet before StructureDefinition before Patient etc.)
  5. If the FHIR server returns an error, the user is prompted interactively for input.
  6. When all resources are uploaded (or if the user aborts execution with CTRL-C), the temporary directory is recursively deleted.

Configuration

There are a number of configuration options, which are (hopefully) mostly self-explanatory. Some of the more obscure ones are explained below:

  • --authorization-header: USe if your server is configured for Authentication. You can enter something like `--authorization-header "Bearer asdf" here, which will be presented to the server for each request.
  • --exclude-resource-type: You can skip resource types, e.g. --exclude-resource-type CodeSystem ValueSet ConceptMap. This is not case-sensitive, the lower-case version of the resource type will be matched against the lower-case parameter list.
  • --include-examples: Examples in FHIR packages are great, but often not consistent across packages. For example, an Observation example might reference Patient/example, and this patient is nowhere to be found in the package, or its dependencies. Some FHIR servers (such as HAPI JPA Server) validate references on CREATE and return errors for missing references. Hence, examples (files in the examples folder of the package, as per the spec) are ignored by default.
  • --non-interactive: If provided, errors returned by the FHIR server will be ignored, and only a warning will be printed out.
  • --only-put: FHIR requires that IDs are present for all resources that are uploaded via HTTP PUT. Hence, if IDs are missing, an HTTP POST request is used by the script. This does not generate stable, or nice, IDs by default. You can provide this parameter to make the script generate IDs from the file name of the resource, which should be stable across reruns. This uses a "slugified" version of the filename without unsafe characters, and restricted to 64 characters, as per the specification.
  • --registry-url: While the script was only tested using the Simplifier registry, it should be compatible to other implementations of the FHIR NPM Package Spec, which is implemented by the Simplifier software. You can provide the endpoint of an alternative registry hence.
  • --rewrite-versions: If provided, all version attributes of the resources will be rewritten to match the version in the package.json, to separate these definitions from previous versions. You will need to think about the versions numbers you use when communicating with others, who might not use the same versions - ⚠️ use with caution! ⚠️
  • --versioned-ids: To separate versions of the resources on the same FHIR server, you can override the IDs provided in the resources, by including the slugified version of the package in the ID. If combined with the --only-put switch, this will work the same, versioning existing IDs, and slugifying + versioning the filename of resources without IDs.
  • --persist: If provided, the downloaded packages will be persisted in the --persistence-dir directory.
  • --persistence-dir: The directory where the persisted packages will be stored or loaded from.
  • --from-persistence: If provided, the package will be loaded from the --persistence-dir directory.

Proxy:

  • --http-proxy: URL of your HTTP proxy, may optionally include credentials (c.f. the Requests documentation)
  • --https-proxy: URL for HTTPS requests, if not provided, the HTTP proxy is used instead
  • --proxy-for-fhir: If provided, the proxy is also used for requests to your FHIR server
  • --proxy-verify: If provided, this public key (-chain) on your disk is used for validating the re-encrypted traffic to your proxy
  • --proxy-for-fhir: If provided, the proxy is also used for FHIR requests, not only for NPM requests`

Updating

cd fhir-populator
source venv/bin/activate
python -m pip install --upgrade fhir-populator

Hacking

If you want to customize the program, you should:

  1. create a fork in GitHub, and clone it.
  2. create a new virtual environment in your fork: python -m venv .venv; source .venv/bin/activate
  3. Install the package locally, using pip install .
  4. Customize the script. Re-run step 3 if you change the script.
  5. python -m fhir_populator, as before.
  6. Create an issue and pull request in the GitHub Repo! We welcome contributions!

Changelog

Version Date Changes
v1.0.10 2021-06-03 Initial release
v1.1.0 2021-06-08 - handle Unicode filenames, especially on BSD/macOS (#1)
- do not serialize null ID for POST (#2)
- include option for only certain resource types(#6)
- fix XML handling (#6)
- add LICENSE
v1.1.1 2021-06-09 - explicitly open files with UTF-8 encoding (#12)
- ignore pycharm and vscode (#11)
v1.2.0 2022-12-13 - support HTTP/HTTPS proxies
v1.3.0 2023-04-03 - support using a persistence directory

fhir-populator's People

Contributors

alexanderkiel avatar geloro94 avatar jpwiedekopf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fhir-populator's Issues

Remove varargs as command line variables

The varargs of the program i.e. for --package, cause https://github.com/koalaman/shellcheck/wiki/SC2086 when setting the args in bash:
python -m fhir_populator --endpoint "${BLAZE_SERVER_URL}" --get-dependencies --non-interactive --only-put --package ${PACKAGES}

Quoting ${PACKAGES} to resolve the issue is not possible, as the arguments are no longer interpreted as:
arg1 arg2 arg3
but instead as
'arg1 arg2 arg3'
which causes the application to crash.

Offline workflow when no internet connection is available

Some users may require a "sneaker-net" based approach for getting FHIR packages on the machine the FHIR Populator is running on, and can't use proxies. Support that use case by allowing retrieval of FHIR packages from a directory instead of downloading from a registry.

Illegal Byte Sequence Error on macOS

I'm on Catalina and have installed Python 3.9.5 using Homebrew. I created the venv as described in the README. The output is:

(.venv) fhir-populator $ python -m fhir_populator --endpoint http://localhost:8080/fhir --get-dependencies --package de.gecco
[20:49:45] INFO      - endpoint : http://localhost:8080/fhir                                                                                                         populator.py:181
           INFO      - authorization_header : None                                                                                                                   populator.py:181
           INFO      - log_file : None                                                                                                                               populator.py:181
           INFO      - get_dependencies : True                                                                                                                       populator.py:181
           INFO      - non_interactive : False                                                                                                                       populator.py:181
           INFO      - include_examples : False                                                                                                                      populator.py:181
           INFO      - log_level : INFO                                                                                                                              populator.py:181
           INFO      - rewrite_versions : False                                                                                                                      populator.py:181
           INFO      - only_put : False                                                                                                                              populator.py:181
           INFO      - versioned_ids : False                                                                                                                         populator.py:181
           INFO      - exclude_resource_type : None                                                                                                                  populator.py:181
           INFO      - registry_url : https://packages.simplifier.net                                                                                                populator.py:181
           INFO      - packages : ['de.gecco']                                                                                                                       populator.py:181
           INFO     Downloading package with spec [email protected]                                                                                                     populator.py:288
[20:49:46] INFO     Downloading package with spec [email protected]                                                      populator.py:288
           INFO     Downloading package with spec [email protected]                                                        populator.py:288
[20:49:47] INFO     Downloading package with spec [email protected]                                                     populator.py:288
[20:49:48] INFO     Downloading package with spec [email protected]                                                        populator.py:288
[20:49:49] INFO     Downloading package with spec [email protected]                                                                                              populator.py:288
           INFO     Downloading package with spec [email protected]                                                            populator.py:288
           INFO     Downloading package with spec [email protected]                                                                                           populator.py:288
           INFO     Downloading package with spec [email protected]                                                                                                    populator.py:288
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/akiel/fhir-populator/.venv/lib/python3.9/site-packages/fhir_populator/__main__.py", line 4, in <module>
    Populator().populate()
  File "/Users/akiel/fhir-populator/.venv/lib/python3.9/site-packages/fhir_populator/populator.py", line 351, in populate
    dependency_graph = self.download_packages(packages)
  File "/Users/akiel/fhir-populator/.venv/lib/python3.9/site-packages/fhir_populator/populator.py", line 289, in download_packages
    package_path = self.download_untar_package(package_name=package)
  File "/Users/akiel/fhir-populator/.venv/lib/python3.9/site-packages/fhir_populator/populator.py", line 332, in download_untar_package
    download_tar_fs.extractall(extract_path)
  File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/tarfile.py", line 2036, in extractall
    self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
  File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/tarfile.py", line 2077, in extract
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
  File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/tarfile.py", line 2150, in _extract_member
    self.makefile(tarinfo, targetpath)
  File "/usr/local/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/tarfile.py", line 2191, in makefile
    with bltn_open(targetpath, "wb") as target:
OSError: [Errno 92] Illegal byte sequence: '/var/folders/gg/_48t_zj16b53c2vl04s52cjr0000gn/T/fhir-populator0j0pw0l2/extract/kbv.basis_1.1.3/package/examples/Patient - Ludger K\udcfdnigstein.json'

The FHIR server is Blaze but it didn't receive any request.

Don't include a Null ID into Resources

While uploading resources without an ID, using POST, the resource send to the FHIR server includes a JSON null value for the id property. Blaze complains about this, which it shouldn't, but nevertheless you shouldn't send such resources in the first place. I'll fix this in samply/blaze#370.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.