Giter Club home page Giter Club logo

fhir-analytics-pipelines's Introduction

Health Data Analytics Pipelines

Health Data Analytics Pipelines is an open source project with the goal to help build components and pipelines for transforming and moving FHIR and DICOM data from FHIR and DICOM servers to Azure Data Lake and thereby make it available for analytics with Azure Synapse, Power BI, and Azure Machine Learning.

This OSS project currently has the following solutions:

  1. FHIR to Synapse sync agent: This is an Azure Container App that extracts data from a FHIR server using FHIR Resource APIs, converts it to hierarchial Parquet files, and writes it to Azure Data Lake in near real time. This also contains a script to create external tables and views in Synapse Serverless SQL pool pointing to the Parquet files.

    This solution enables you to query against the entire FHIR data with tools such as Synapse Studio, SSMS, and Power BI. You can also access the Parquet files directly from a Synapse Spark pool. You should consider this solution if you want to access all of your FHIR data in near real time, and want to defer custom transformation to downstream systems.

    Supported FHIR server: FHIR Service in Azure Health Data Services, Azure API for FHIR, FHIR server for Azure

  2. FHIR to CDM Pipeline Generator: It is a tool to generate an ADF pipeline for moving a snapshot of data from a FHIR server using $export API to a CDM folder in Azure Data Lake Storage Gen 2 in csv format. The tools requires a user-created configuration file containing instructions to project and flatten FHIR Resources and fields into tables. You can also follow the instructions for creating a downstream pipeline in Synapse workspace to move data from CDM folder to Synapse dedicated SQL pool.

    This solution enables you to transform the data into tabular format as it gets written to CDM folder. You should consider this solution if you want to transform FHIR data into a custom schema as it is extracted from the FHIR server.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

fhir-analytics-pipelines's People

Contributors

boyawu10 avatar dependabot[bot] avatar evachen96 avatar hongyh13 avatar microsoft-github-operations[bot] avatar microsoft-github-policy-service[bot] avatar moria97 avatar mustal-du avatar qiwjin avatar quanwanxx avatar ranvijaykumar avatar ruiyic avatar smithago avatar sowu880 avatar stevewohl avatar tongwu-sh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fhir-analytics-pipelines's Issues

Synapse Cannot Infer Parquet Schema

We'd like to use the parquet files in Synapse notebooks but Synapse cannot infer the schema from the output parquet files. Is there an issue in the parquet files? Or does something special need to happen in Synapse to use these like other parquet files?

CleanShot 2022-10-06 at 11 11 58

Configuration generator unroll error

I'm using PowerShell 7 (x64). I'm a noob at using Node, this is my first experience.

I issue:
node .\generate-config.bat
from the Configuration-Generator root. I get expected results including a new directory named output with all the expected json files and the PropertiesGroup subdirectory with all the expected unrolled json files.

I then run:
node .\program.js generate-config-unrollpath -u output -o configuration
from Configuration-Generator root
and get:

C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\configuration_generator.js:38
Object.keys(schemaDefinitions[property].properties).forEach(function(subProperty, _) {
^

TypeError: Cannot read property 'properties' of undefined
at recursivelyCollectPropertiesGroupTypes (C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\configuration_generator.js:38:45)
at C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\configuration_generator.js:80:52
at Array.forEach ()
at C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\configuration_generator.js:76:44
at Array.forEach ()
at getRelatedPropertiesGroupTypes (C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\configuration_generator.js:75:36)
at Object.publishUnrollConfiguration (C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\configuration_generator.js:313:25)
at Command. (C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\program.js:42:26)
at Command.listener [as _actionHandler] (C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\node_modules\←[4mcommander←[24m\index.js:426:31)
at Command._parseCommand (C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator\node_modules\←[4mcommander←[24m\index.js:1002:14)
PS C:\Users\user\source\repos\FHIR-Analytics-Pipelines\Configuration-Generator>

I have tried reinstalling with
npm install
at the Configuration-Generator root.

npm WARN [email protected] No description
npm WARN [email protected] No repository field.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules\fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"win32","arch":"x64"})

audited 116 packages in 1.686s

19 packages are looking for funding
run npm fund for details

found 0 vulnerabilities

And also reinstalling commander with
npm install commander
at the Configuration-Generator root.

npm WARN [email protected] No description
npm WARN [email protected] No repository field.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules\fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"win32","arch":"x64"})

19 packages are looking for funding
run npm fund for details

found 0 vulnerabilities

If I have the subdirectory with the unrolled elements do I need to be concerned that this command errors? It seems like the job it would do has been done, though I can't be 100% sure.

Configuration-Generator process not recognizing fields in Fhir spec

`#Name of the base resource from which the tables will be derived.
AllergyIntolerance:
#Each node should either resolve to a primitive data type, or a complex data type for which the yaml config is defined in PropertiesGroupConfig.yml.

#List the array-type node paths for which you want to create separate tables. These paths need not be direct children of the resource. Notice 'contact.telecom'.
unrollPath: [identifier, category, reaction] #
#List the nodes that you want to include in the resources main table. If the node is of array type, the first element of the array will be used by default. ResourceId is generated by default using id attribute.
propertiesByDefault: [clinicalStatus, verificationStatus, type, criticality, code, patient, encounter, recordedDate, recorder, asserter, lastOccurrence, note]
#Customize the expressions for columns if the default does not meet your requirements.
customProperties

#For choice types, give all the options.

  • {name: OnsetDateTime, path: onset, type: dateTime}
  • {name: OnsetAge, path: onset, type: age}
  • {name: OnsetPeriod, path: onset, type: period}
  • {name: OnsetRange, path: onset, type: range}
  • {name: OnsetString, path: onset, type: string}`

Causes error

Error: Invalid path: AllergyIntolerance.category

Even though it is part of the spec: https://www.hl7.org/fhir/allergyintolerance.html

A couple other examples of errors are with:
HealthcareService.availableTime.daysOfWeek
CoverageEligibilityRequest.purpose
CoverageEligibilityResponse.purpose

Config File is not working for Coding

Hi,

I have created the config files (attached) but it does not capture the data for coding for example patient Identifier codes (Patient - identifier - CodeableConcept - coding). I have attached the files that I am using. Can anyone please check, and let know what is the issue here.
config (2).zip

Regards,
SAM

Unhandled error occurred in data processing job 3. Reason : Unable to load DLL 'ParquetNative' or one of its dependencies: The specified module could not be found. (0x8007007E)

  1. Unhandled exception when converting input data to parquet for "CarePlan".
  2. Unhandled error occurred in data processing job 3. Reason : Unable to load DLL 'ParquetNative' or one of its dependencies: The specified module could not be found. (0x8007007E)

it's deployed on azure function on windows appservice and Operating System:Windows
What could be the issue.

Add Deployment Templates to Registry

I am working to use this OSS with another project. Bicep deployment templates support publishing to a Azure Container Registry. It would be easier to use this with other solutions if:

  • The deployment templates were Bicep & ARM
  • The templates were published to a public ACR where other deployments can reference these.

Happy to contribute here if this aligns with your roadmap/vision.

Thanks,
Mikael

Please make FhirToCdm targeted

Currently if I want to only capture FhirToCdm data for a targeted dataset I have to run the whole pipeline and selectively extract the bits I care about. Say I'm only interested in QuestionnaireResponses I would like to be able to call &_type=QuestionnaireResponse and only have it run for that resource. If I do that currently it runs generate-schema for ALL resources. transform-data uses the --inputBlobUri switch in a loop and works fine.

Azure Function getting 401 error when calling Fhir

When looking at the Application Insight, I have seen that the Azure Function has trouble authenticating with the Fhir API. Azure Function has Fhir Reader Role, as seen in the image below.

Please feel free to ask any other data that might be helpful.

image

image

image

Example of Log

Exception while executing function: Functions.JobManagerFunction Result: Failure
Exception: System.AggregateException: One or more errors occurred. (Task execution failed)
---> Microsoft.Health.Fhir.Synapse.Scheduler.Exceptions.ExecuteTaskFailedException: Task execution failed
---> System.AggregateException: One or more errors occurred. (Search FHIR server failed. Search url: 'https://fhir-analytics.azurehealthcareapis.com/AdverseEvent?_lastUpdated=ge1970-01-01T00%3a00%3a00%2b00%3a00&_lastUpdated=lt2022-02-21T19%3a56%3a19.6280037%2b00%3a00&_count=1000&_sort=_lastUpdated'. )
---> Microsoft.Health.Fhir.Synapse.DataSource.Exceptions.FhirSearchException: Search FHIR server failed. Search url: 'https://fhir-analytics.azurehealthcareapis.com/AdverseEvent?_lastUpdated=ge1970-01-01T00%3a00%3a00%2b00%3a00&_lastUpdated=lt2022-02-21T19%3a56%3a19.6280037%2b00%3a00&_count=1000&_sort=_lastUpdated'.
---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 401 (Unauthorized).
at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
at Microsoft.Health.Fhir.Synapse.DataSource.FhirDataClient.SearchAsync(TaskContext context, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataSource\FhirDataClient.cs:line 119
--- End of inner exception stack trace ---
at Microsoft.Health.Fhir.Synapse.DataSource.FhirDataClient.SearchAsync(TaskContext context, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataSource\FhirDataClient.cs:line 119
at Microsoft.Health.Fhir.Synapse.DataSource.FhirDataClient.GetAsync(TaskContext context, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataSource\FhirDataClient.cs:line 61
at Microsoft.Health.Fhir.Synapse.Scheduler.Tasks.TaskExecutor.ExecuteAsync(TaskContext taskContext, IProgress`1 progress, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Scheduler\Tasks\TaskExecutor.cs:line 70
at Microsoft.Health.Fhir.Synapse.Scheduler.Jobs.JobManager.<>c__DisplayClass11_1.<b__1>d.MoveNext() in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Scheduler\Jobs\JobManager.cs:line 206
--- End of inner exception stack trace ---

Incomplete CDM translation

The TransformToCDM step seems to be incompatible with one of our test records. Fake patient Aaron Schultz with ResourceId d54661b3-e0b3-bd47-00e7-80de6d12d637 is present in the $export Patient-1.ndjson file, but not in cdm\LocalPatient*.csv file.

cmd /c powershell .\RunAdfCustomActivity.ps1

In fact his record or possibly the one before it (which has a photo) seems to cause the transform to stop transforming patients to csv. We have 4029 fake patients in the ndjson file, but only 3620 make it to the csv file. Record 3621 is the Aaron Schultz patient record in the ndjson file.

Here is the json for Aaron:
{"resourceType":"Patient","id":"d54661b3-e0b3-bd47-00e7-80de6d12d637","meta":{"versionId":"1","lastUpdated":"2021-09-24T13:22:11.896+00:00","profile":["http://standardhealthrecord.org/fhir/StructureDefinition/shr-entity-Patient"]},"text":{"status":"generated","div":"<div xmlns=\"http://www.w3.org/1999/xhtml\">Generated by <a href=\"https://github.com/synthetichealth/synthea\">Synthea</a>.Version identifier: master-branch-latest-2-g1ac455d8\n . Person seed: 4421811931708538165 Population seed: 1624641604004</div>"},"extension":[{"extension":[{"url":"ombCategory","valueCoding":{"system":"urn:oid:2.16.840.1.113883.6.238","code":"2106-3","display":"White"}},{"url":"text","valueString":"White"}],"url":"http://hl7.org/fhir/us/core/StructureDefinition/us-core-race"},{"extension":[{"url":"ombCategory","valueCoding":{"system":"urn:oid:2.16.840.1.113883.6.238","code":"2186-5","display":"Non Hispanic or Latino"}},{"url":"text","valueString":"Non Hispanic or Latino"}],"url":"http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity"},{"url":"http://hl7.org/fhir/StructureDefinition/patient-mothersMaidenName","valueString":"Leora789 Pfeffer420"},{"url":"http://hl7.org/fhir/us/core/StructureDefinition/us-core-birthsex","valueCode":"M"},{"url":"http://hl7.org/fhir/StructureDefinition/patient-birthPlace","valueAddress":{"city":"Lynn","state":"Massachusetts","country":"US"}},{"url":"http://standardhealthrecord.org/fhir/StructureDefinition/shr-actor-FictionalPerson-extension","valueBoolean":true},{"url":"http://standardhealthrecord.org/fhir/StructureDefinition/shr-entity-FathersName-extension","valueHumanName":{"text":"Austin578 Schultz619"}},{"url":"http://standardhealthrecord.org/fhir/StructureDefinition/shr-demographics-SocialSecurityNumber-extension","valueString":"999-82-2338"},{"url":"http://synthetichealth.github.io/synthea/disability-adjusted-life-years","valueDecimal":11.338841186955664},{"url":"http://synthetichealth.github.io/synthea/quality-adjusted-life-years","valueDecimal":59.66115881304434}],"identifier":[{"system":"https://github.com/synthetichealth/synthea","value":"d54661b3-e0b3-bd47-00e7-80de6d12d637"},{"type":{"coding":[{"system":"http://terminology.hl7.org/CodeSystem/v2-0203","code":"MR","display":"Medical Record Number"}],"text":"Medical Record Number"},"system":"http://hospital.smarthealthit.org","value":"d54661b3-e0b3-bd47-00e7-80de6d12d637"},{"type":{"coding":[{"system":"http://terminology.hl7.org/CodeSystem/v2-0203","code":"SS","display":"Social Security Number"}],"text":"Social Security Number"},"system":"http://hl7.org/fhir/sid/us-ssn","value":"999-82-2338"},{"type":{"coding":[{"system":"http://terminology.hl7.org/CodeSystem/v2-0203","code":"DL","display":"Driver's License"}],"text":"Driver's License"},"system":"urn:oid:2.16.840.1.113883.4.3.25","value":"S99925574"},{"type":{"coding":[{"system":"http://terminology.hl7.org/CodeSystem/v2-0203","code":"PPN","display":"Passport Number"}],"text":"Passport Number"},"system":"http://standardhealthrecord.org/fhir/StructureDefinition/passportNumber","value":"X84173412X"}],"name":[{"use":"official","family":"Schultz619","given":["Aaron697"],"prefix":["Mr."]}],"telecom":[{"system":"phone","value":"555-113-3127","use":"home"}],"gender":"male","birthDate":"1949-02-13","address":[{"extension":[{"extension":[{"url":"latitude","valueDecimal":41.708340797966564},{"url":"longitude","valueDecimal":-70.18051732525078}],"url":"http://hl7.org/fhir/StructureDefinition/geolocation"},{"url":"http://hl7.org/fhir/StructureDefinition/geolocationdata"}],"line":["970 Jenkins Byway","1234 Any Street"],"city":"Dennis","state":"MA","country":"US"}],"maritalStatus":{"coding":[{"system":"http://terminology.hl7.org/CodeSystem/v3-MaritalStatus","code":"M","display":"M"}],"text":"M"},"multipleBirthBoolean":false,"contact":[{"relationship":[{"coding":[{"system":"https://build.fhir.org/ig/HL7/US-Core/StructureDefinition-us-core-patient.html","version":"1.21","code":"mother/sister","display":"this is not really real"}],"text":"Here is Jake his mother sister"}]}],"communication":[{"language":{"coding":[{"system":"urn:ietf:bcp:47","code":"en-US","display":"English"}],"text":"English"}}]}

Coverage and DocumentReference Resources Parquet not generated

Observation: If a resource bundle contains "Coverage" Resource within other resource e.g. "ExplanationOfBenefit" or "Claims" no parquet file is generated for the Coverage. This is an issue for us because Coverage details will most likely come as a referenced resource in other resources as per Fhir documentation http://build.fhir.org/coverage.html.
Similar issue is with DocumentReference Resource. DocumentReference resource mostly will be part of other resources http://hl7.org/fhir/R4/documentreference.html.

I tried uploading attached bundle and was expecting Parquet file for Coverage Resource but no file was generated.

Ali918_Stokes453_f5aa3408-57b3-4c05-a1d3-e4511b4be50e.txt

Error Generating the Parquet - Stating it doesn't find the ParquetNative dll

Example

2022-12-29T01:01:56Z [Error] Unhandled exception when converting input data to parquet for "Patient".
2022-12-29T01:01:56Z [Error] Unhandled error occurred in data processing job 5. Reason : Unable to load shared library 'ParquetNative' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libParquetNative: cannot open shared object file: No such file or directory
2022-12-29T01:01:56Z [Error] Unhandled exception. System.DllNotFoundException: Unable to load shared library 'ParquetNative' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libParquetNative: cannot open shared object file: No such file or directory

Automatically add the FHIR Data Reader Role Assignment for the Function App

If Managed Identity is enabled for the Azure FHIR to Parquet pipeline, it would be super great for the deployment template to automatically deploy a "FHIR Reader" role assignment for the function app on the FHIR Service / API for FHIR.

It's possible to parse the URL and determine which SKU is being used. Then you can attempt to deploy a role assignment.

Impossible to have a FHIR hostname that's different from the FHIR URL

I am trying to configure the Synapse Sync agent for a FHIR server that is exposed through a hostname that's different from its URL. I did give the function app a FHIR Data Reader role.

  • When I use the hostname for the parameter Fhir Server Url, I get an error where the access token is unable to be generated:

Microsoft.Azure.Services.AppAuthentication.AzureServiceTokenProviderException: Parameters: Connection String: [No connection string specified], Resource: <REMOVED: fhir hostname>, Authority: . Exception Message: Tried the following 3 methods to get an access token, but none of them worked.\r\nParameters: Connection String: [No connection string specified], Resource: <REMOVED: fhir hostname>, Authority: . Exception Message: Tried to get token using Managed Service Identity. Access token could not be acquired. Failed after 5 retries. MSI ResponseCode: InternalServerError, Response: {"StatusCode":500,"Message":"An unexpected error occured while fetching the AAD Token. Please contact support with this provided Correlation Id","CorrelationId":"4fecc64e-60a0-4bc8-9e20-5523e47825c1"}\r\nParameters: Connection String: [No connection string specified], Resource: <REMOVED: fhir hostname> Authority: . Exception Message: Tried to get token using Visual Studio. Access token could not be acquired. Visual Studio token provider file not found at "C:\local\LocalAppData\.IdentityService\AzureServiceAuth\tokenprovider.json"\r\nParameters: Connection String: [No connection string specified], Resource: <REMOVED: fhir hostname>, Authority: . Exception Message: Tried to get token using Azure CLI. Access token could not be acquired. 'az' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n\r\n\r\n at Microsoft.Azure.Services.AppAuthentication.AzureServiceTokenProvider.GetAuthResultAsyncImpl(String resource, String authority, Boolean forceRefresh, CancellationToken cancellationToken)\r\n at Microsoft.Azure.Services.AppAuthentication.AzureServiceTokenProvider.GetAccessTokenAsync(String resource, String tenantId, Boolean forceRefresh, CancellationToken cancellationToken)\r\n at Microsoft.Health.Fhir.Synapse.DataClient.Api.AzureAccessTokenProvider.GetAccessTokenAsync(String resourceUrl, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataClient\Api\AzureAccessTokenProvider.cs:line 47\r\n at Microsoft.Health.Fhir.Synapse.DataClient.Api.FhirApiDataClient.SearchAsync(FhirSearchParameters searchParameters, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataClient\Api\FhirApiDataClient.cs:line 89\r\n --- End of inner exception stack trace ---\r\n at Microsoft.Health.Fhir.Synapse.DataClient.Api.FhirApiDataClient.SearchAsync(FhirSearchParameters searchParameters, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataClient\Api\FhirApiDataClient.cs:line 89\r\n at Microsoft.Health.Fhir.Synapse.Core.Tasks.TaskExecutor.ExecuteAsync(TaskContext taskContext, JobProgressUpdater progressUpdater, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Tasks\TaskExecutor.cs:line 70\r\n at Microsoft.Health.Fhir.Synapse.Core.Jobs.JobExecutor.<>c__DisplayClass5_1.<b__1>d.MoveNext() in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Jobs\JobExecutor.cs:line 74\r\n --- End of inner exception stack trace ---\r\n --- End of inner exception stack trace ---\r\n at Microsoft.Health.Fhir.Synapse.Core.Jobs.JobExecutor.ExecuteAsync(Job job, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Jobs\JobExecutor.cs:line 61\r\n at Microsoft.Health.Fhir.Synapse.Core.Jobs.JobManager.RunAsync(CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Jobs\JobManager.cs:line 81","ResumedJobId":"efa184faf7fa4e34aeadd1581a760050"}

  • When I use the FHIR URL for the parameter Fhir Server Url, I get a 403 Forbidden error:

Health.Fhir.Synapse.Core.Exceptions.ExecuteTaskFailedException: Task execution failed\r\n ---> System.AggregateException: One or more errors occurred. (Search FHIR server failed. Search url: 'https://z-ago-dhp-aid-ew1-fhir01.azurehealthcareapis.com/Appointment?_lastUpdated=ge1970-01-01T00%3a00%3a00%2b00%3a00&_lastUpdated=lt2022-05-05T10%3a23%3a06.6073304%2b00%3a00&_count=1000&_sort=_lastUpdated'. )\r\n ---> Microsoft.Health.Fhir.Synapse.DataClient.Exceptions.FhirSearchException: Search FHIR server failed. Search url: '<REMOVED: FHIR URL>/Appointment?_lastUpdated=ge1970-01-01T00%3a00%3a00%2b00%3a00&_lastUpdated=lt2022-05-05T10%3a23%3a06.6073304%2b00%3a00&_count=1000&_sort=_lastUpdated'. \r\n ---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 403 (Forbidden).\r\n at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()\r\n at Microsoft.Health.Fhir.Synapse.DataClient.Api.FhirApiDataClient.SearchAsync(FhirSearchParameters searchParameters, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataClient\Api\FhirApiDataClient.cs:line 77\r\n --- End of inner exception stack trace ---\r\n at Microsoft.Health.Fhir.Synapse.DataClient.Api.FhirApiDataClient.SearchAsync(FhirSearchParameters searchParameters, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.DataClient\Api\FhirApiDataClient.cs:line 85\r\n at Microsoft.Health.Fhir.Synapse.Core.Tasks.TaskExecutor.ExecuteAsync(TaskContext taskContext, JobProgressUpdater progressUpdater, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Tasks\TaskExecutor.cs:line 70\r\n at Microsoft.Health.Fhir.Synapse.Core.Jobs.JobExecutor.<>c__DisplayClass5_1.<b__1>d.MoveNext() in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Jobs\JobExecutor.cs:line 74\r\n --- End of inner exception stack trace ---\r\n --- End of inner exception stack trace ---\r\n at Microsoft.Health.Fhir.Synapse.Core.Jobs.JobExecutor.ExecuteAsync(Job job, CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Jobs\JobExecutor.cs:line 61\r\n at Microsoft.Health.Fhir.Synapse.Core.Jobs.JobManager.RunAsync(CancellationToken cancellationToken) in D:\a\1\s\FhirToDataLake\src\Microsoft.Health.Fhir.Synapse.Core\Jobs\JobManager.cs:line 68","ResumedJobId":"d2ab4a23dde8494c91900f6babad8e81"}

Is there a way in the configuration we can distinguish between the URL used to generate the bearer token and the hostname used for the subsequent requests?
Thank you !

Unable to generate csv file

Hi,

We are in process of deploying FHIR to CDM accelerator. We have followed the steps mentioned to export the data from Azure API for FHIR to storage account in CSV format. But it is not generating csv files. We can see the ndjson files getting generated from "StartExportTask" from the ADF pipeline for each resources in FHIR. The ADF pipeline "egreess-pipeline" completes with no error. As mentioned in the document, if there is no files generated, to check in adfjobs folder in storage account. I checked the folder as well. There is stderr.txt files. But the files are empty. Please suggest.

Regards,
SAM

Issue while deploying the PowerShell Script in Step 6 of performing FHIR to Synapse Sync Agent

Hello,

Day 1:
While running through this repo, came across an issue in step 6 while executing the PS script:

image

Verified the content of the SQL Source File “Basic” and looks fine, double-checked the permissions as well and tried running it again the second time as an Administrator in PS, but came across the same issue again.

Day 2:
Reran the PS script in step 6 again today, and this time it was executed successfully:

image

Reporting this observation to understand why did it fail the first time? If somebody could help, that'd be great!

PackageLink broken URL

The ADF pipeline crashes on the GenerateSchema step since the default value for the PackageLink parameter is no longer available.
In the ReadMe it states to not change this parameter.
Could this link be updated to the right location where a valid BatchExecutor.zip exists?
current link is:
https://github.com/microsoft/FHIR-Analytics-Pipelines/releases/download/untagged-d8e03f9d7aef82990abe/BatchExecutor.zip

stderr.txt content:

At line:1 char:1
+ Invoke-WebRequest -Uri 'https://github.com/microsoft/FHIR-Analytics-P ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebExc 
   eption
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

JSON data of "coding" nested field is exported as a string instead of the proper JSON structure in parquet files

Hi,

I observed some inconsistencies between the live data in AHDS FHIR services and the data in parquet files exported by the corresponding FHIR-to-Synapse Sync Agent, specifically on the value of "coding" nested field. The fhir data has the proper JSON structure but in parquet files the same data is written in string formats.

Some examples I found across different FHIR resource types:

  • patient: identifier[].type.coding
    • fhir data: [{"system":"http://terminology.hl7.org/CodeSystem/v2-0203","code":"MR","display":"Medical Record Number"}]
    • parquet data: "[{\"system\":\"http://terminology.hl7.org/CodeSystem/v2-0203\",\"code\":\"MR\",\"display\":\"Medical Record Number\"}]"
  • observation: component[].code.coding
    • fhir data: [{"code":"age","display":"Age"}]
    • parquet data: "[{\"code\":\"age\",\"display\":\"Age\"}]"
  • observation: component[].value.codeableConcept.coding
    • fhir data: [{"code":"date","display":"Exact Date"}]
    • parquet data: "[{\"code\":\"date\",\"display\":\"Exact Date\"}]"

There seems to be a pattern where all values of "coding" field are stored in string format instead of JSON structure.

Could anyone please check if you have seen the same issue?

Best regards,
Tony

fhirServiceToCdm.json final transform run in ForEach

The TransformToCdm step is run in the context of a ForEach loop. When GenerateSchema is run it is only called once. The only difference between the Command in GenerateSchema and TransformToCdm is the presence of -schema in the GenerateSchema step. Why does TransformToCdm need to be run in a ForEach loop? When I run the pipeline ForEach gets 47 as the Input. So it runs the Command 47 times and fails on all 47. The error is:
{
"errorCode": "2508",
"message": "There’re duplicate files in resource folder. Account_Coverage.json. Avoid using activity.json, linkedServices.json, and dataSets.json which are reserved file names.",
"failureType": "UserError",
"target": "TransformToCdm",
"details": []
}
It doesn't say which folder is the "resource folder". "Account_Coverage.json" only exists in 'tableconfig>PropertiesGroup'.

Can you make batchexecutor.exe folder aware

Currently the --cdmFileSystem parameter is not folder aware. I cannot pass cdm/folder. Can you please make it so I can?
In my implementation I create folders with the date and runtime guid to track.
cdm/20220307T164544-d8b2bbbd-5407-4a59-9cac-c7c95bad0061
But I have to pass 'cdm' in the pipeline and let \RunAdfCustomActivity.ps1 put the cdm.json files in the root, then have Copy operations to copy/move them into the folder.

Unhandled exception. Grpc.Core.RpcException: Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request.

it is working fine without vnet integration and private endpoints for storage and fhir server. for azure function app FhirtoDatalake

Getting following error when putting vnet integration on:

023-09-19T14:55:58Z [Error] Unhandled exception. Grpc.Core.RpcException: Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. Http2ConnectionException: The HTTP/2 server sent invalid data on the connection. HTTP/2 error code 'PROTOCOL_ERROR' (0x1).", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.
2023-09-19T14:55:58Z [Error] ---> System.Net.Http.Http2ConnectionException: The HTTP/2 server sent invalid data on the connection. HTTP/2 error code 'PROTOCOL_ERROR' (0x1).
2023-09-19T14:55:58Z [Error] at System.Net.Http.Http2Connection.ThrowProtocolError(Http2ProtocolErrorCode errorCode)
2023-09-19T14:55:58Z [Information] at System.Net.Http.Http2Connection.ReadFrameAsync(Boolean initialFrame)
2023-09-19T14:55:58Z [Information] at System.Net.Http.Http2Connection.ProcessIncomingFramesAsync()
2023-09-19T14:55:58Z [Information] at System.Net.Http.Http2Connection.SendHeadersAsync(HttpRequestMessage request, CancellationToken cancellationToken, Boolean mustFlush)
2023-09-19T14:55:58Z [Information] at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2023-09-19T14:55:58Z [Information] --- End of inner exception stack trace ---
2023-09-19T14:55:58Z [Information] at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2023-09-19T14:55:58Z [Information] at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
2023-09-19T14:55:58Z [Information] at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
2023-09-19T14:55:58Z [Information] at Grpc.Net.Client.Internal.GrpcCall2.RunCall(HttpRequestMessage request, Nullable1 timeout)")
2023-09-19T14:55:58Z [Information] at Grpc.Net.Client.Internal.HttpContentClientStreamWriter2.WriteAsyncCore(TRequest message) 2023-09-19T14:55:58Z [Information] at Microsoft.Azure.Functions.Worker.GrpcWorker.SendStartStreamMessageAsync(IClientStreamWriter1 requestStream) in D:\a\1\s\src\DotNetWorker.Grpc\GrpcWorker.cs:line 86
2023-09-19T14:55:58Z [Information] at Microsoft.Azure.Functions.Worker.WorkerHostedService.StartAsync(CancellationToken cancellationToken) in D:\a\1\s\src\DotNetWorker.Core\WorkerHostedService.cs:line 25
2023-09-19T14:55:58Z [Information] at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
2023-09-19T14:55:58Z [Information] at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
2023-09-19T14:55:58Z [Information] at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
2023-09-19T14:55:58Z [Information] at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(IHost host)
2023-09-19T14:55:58Z [Information] at Microsoft.Health.AnalyticsConnector.FunctionApp.Program.Main() in /home/vsts/work/1/s/FhirToDataLake/src/Microsoft.Health.AnalyticsConnector.FunctionApp/Program.cs:line 26
2023-09-19T14:55:59Z [Verbose] Handling WorkerErrorEvent for runtime:dotnet-isolated, workerId:dotnet-isolated. Failed with: Microsoft.Azure.WebJobs.Script.Workers.WorkerProcessExitException: dotnet exited with code 134 (0x86)
---> System.Exception: Unhandled exception. Grpc.Core.RpcException: Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. Http2ConnectionException: The HTTP/2 server sent invalid data on the connection. HTTP/2 error code 'PROTOCOL_ERROR' (0x1).", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request., ---> System.Net.Http.Http2ConnectionException: The HTTP/2 server sent invalid data on the connection. HTTP/2 error code 'PROTOCOL_ERROR' (0x1)., at System.Net.Http.Http2Connection.ThrowProtocolError(Http2ProtocolErrorCode errorCode)
--- End of inner exception stack trace ---
2023-09-19T14:55:59Z [Verbose] Attempting to dispose webhost or jobhost channel for workerId: '01f90328-9cf1-4124-a9d6-0612b755d12a', runtime: 'dotnet-isolated'
2023-09-19T14:55:59Z [Verbose] Disposing language worker channel with id:01f90328-9cf1-4124-a9d6-0612b755d12a
2023-09-19T14:55:59Z [Verbose] Disposed language worker channel with id:01f90328-9cf1-4124-a9d6-0612b755d12a
2023-09-19T14:55:59Z [Verbose] No initialized worker channels for runtime 'dotnet-isolated'. Delaying future invocations
2023-09-19T14:55:59Z [Verbose] Restarting worker channel for runtime: 'dotnet-isolated'
2023-09-19T14:55:59Z [Error] Exceeded language worker restart retry count for runtime:dotnet-isolated. Shutting down and proactively recycling the Functions Host to recover
2023-09-19T14:55:59Z [Verbose] Hosting stopping
2023-09-19T14:55:59Z [Verbose] Stopping file watchers.
2023-09-19T14:55:59Z [Verbose] Waiting for RpcFunctionInvocationDispatcher to shutdown
2023-09-19T14:55:59Z [Verbose] Draining invocations from language worker channel completed. Shutting down 'RpcFunctionInvocationDispatcher'
2023-09-19T14:55:59Z [Information] Stopping JobHost
2023-09-19T14:55:59Z [Verbose] Stopping ScriptHost instance '345f9d0c-c78e-4b97-8e2c-9a9e395c0366'.
2023-09-19T14:55:59Z [Information] Stopping the listener 'Microsoft.Azure.WebJobs.Host.Listeners.SingletonListener' for function 'JobManagerFunction'

we have convert the JSON template to bicep using https://bicep.kwitantie.app/

Can anyone suggest what will be workaround.
azure-function-fhirtodatalake.txt

FhirToDataLake - Any plans to migrate to Azure Function 4.x

I have noticed that the function currently requires FUNCTIONS_EXTENSION_VERSION set to ~2 and WEBSITE_NODE_DEFAULT_VERSION set to ~10. Link
Are there any plans these versions to be updated and tested?

The reason behind my questions is:
• FUNCTIONS_EXTENSION_VERSION ~2 will be out of support after December 3, 2022 – Link: Azure Functions runtime versions overview | Microsoft Learn
• Node JS 10 is not available on any other Function extension version

Using FHIR JSON schema in Configuration-Generator is not sustainable

I noticed that the FHIR JSON schema is used for figuring out what properties are available along with their types in the Configuration-Generator. And the validation is performed according to this schema file.

However, there is no strict 1:1 mapping between the FHIR JSON schema and the types defined by Hl7.Fhir.Model that is used in Microsoft.Health.Fhir.Transformation.Core.

For example, I had to modify the JSON schema for Observation so that the property effectivePeriod is renamed to effective. That's because in the class Hl7.Fhir.Model.Observation it is called Effective.

Use with Non-MS-FHIR-Server

Is it possible to use the agent with a custom fhir server such as HAPI or IBM?

From my perspective it should be possible and all we have to do would be to customize the authentication part, am I right with this assumption?

Best
Patrick

FHIR-Analytics-Pipelines/Templates/cdmToSynapse.json

It would be much nicer to have the CDMSource be wildcard path aware so that I can use parameters in a ForEach loop.
In lieu of that it would better to have single pipeline that executed all the dataflows, rather than a dataflow and matching pipeline for each CDM entity. It's madness. What's the point of having a script to deploy multiple pipelines if you have to run each and every pipeline individually? I have 1,182 CDM entities from $export. One-at-a-time? nooooooo.
I modified the deployment script to create a MasterPipeline, but I cannot get the scripts and templates to iterate. My desire is to have it loop the entities and add them to the MasterPipeline. As of right now it replaces the MasterPipeline with the current entity, rather than adding it. I think it is a reference issue but I haven't cracked it yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.