Giter Club home page Giter Club logo

social-determinants-of-health's Introduction

DBR CLOUD POC

Social Determinants of Health (SDH)

Many aspects of a person's life style and status can impact health outcomes. Multiple studies suggest that Social Determinents of Health (SDH) account for between 30-55% of health outcomes. [The WHO defines SDH as]((https://www.who.int/health-topics/social-determinants-of-health#tab=tab_1):

.. the non-medical factors that influence health outcomes. They are the conditions in which people are born,
grow, work, live, and age, and the wider set of forces and systems shaping the conditions of daily life.
These forces and systems include economic policies and systems, development agendas, social norms, social
policies and political systems.

Correlation between SDH and health outcomes is very clear: the lower the socioeconomic position, the worse the health, which in turn creates a negative feedback loop (poor health resulting in poor socioeconomic status) which results in widening the gap even more.

There are many public sources of SDH data with different levels of granularity (country level, state/province, county, or zipcode/postal code level) that can be used in analysis of the impact of SDH on health outcomes. One of the main challenegs for data analysis is finding the right data source and data cleaning.

Access SDH data via Delta Share

To read shared data that has been shared with you using the Databricks-to-Databricks protocol, you must be a user on a Databricks workspace that is enabled for Unity Catalog.

In this solution accelerator, we use pre-processed and cleansed tables that have been made available by Rearc via delta sharing protocol. We explore income, healthcare, education and other aspects affect counties vaccinations rates for COVID-19. Delta sharing allows us to offload the "bronze to silver" data prep step to a data provider or internal data team, beginning our analysis with "silver" or "gold" data. Using these data, we train a machine learning model to predict vaccination rates based on different SDH features and then use SHAP to offer insights into different factors impacting vaccination rates.

License

Copyright / License info of the notebook. Copyright [2021] the Notebook Authors. The source in this notebook is provided subject to the the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.

Library Name Library License Library License URL Library Source URL
Pandas BSD 3-Clause License https://github.com/pandas-dev/pandas/blob/master/LICENSE https://github.com/pandas-dev/pandas
Numpy BSD 3-Clause License https://github.com/numpy/numpy/blob/main/LICENSE.txt https://github.com/numpy/numpy
Apache Spark Apache License 2.0 https://github.com/apache/spark/blob/master/LICENSE https://github.com/apache/spark/tree/master/python/pyspark
SHAP MIT https://github.com/slundberg/shap/blob/master/LICENSE https://github.com/slundberg/shap/
Plotly Express MIT License https://github.com/plotly/plotly_express/blob/master/LICENSE.txt https://github.com/plotly/plotly_express/
Author
Databricks Inc.

social-determinants-of-health's People

Contributors

danielsparing avatar dbbnicole avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

social-determinants-of-health's Issues

Accelerator fails with `There is no Credential Scope`

%sql
select * from silver_education limit 20
SparkException: There is no Credential Scope.

org.apache.spark.SparkException: There is no Credential Scope.
at com.databricks.unity.UCSDriver$Manager.$anonfun$getHashedCurrentScopeId$1(UCSDriver.scala:402)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.unity.UCSDriver$Manager.getHashedCurrentScopeId(UCSDriver.scala:401)
at io.delta.sharing.DeltaSharingCredentialsProvider.getCustomTablePath(DeltaSharingCredentialsProvider.scala:122)
at org.apache.spark.delta.sharing.CachedTableManager.register(PreSignedUrlCache.scala:144)
at io.delta.sharing.spark.RemoteSnapshot.filesForScan(RemoteDeltaLog.scala:269)
at io.delta.sharing.spark.RemoteDeltaSnapshotFileIndex.listFiles(RemoteDeltaFileIndex.scala:138)
at org.apache.spark.sql.execution.datasources.FileIndex.listPartitionDirectoriesAndFiles(FileIndex.scala:97)
at org.apache.spark.sql.execution.FileSourceScanLike.listFiles(DataSourceScanExec.scala:563)
at org.apache.spark.sql.execution.FileSourceScanLike.listFiles$(DataSourceScanExec.scala:552)
at org.apache.spark.sql.execution.FileSourceScanExec.listFiles(DataSourceScanExec.scala:1884)
at org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions(DataSourceScanExec.scala:601)
at org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions$(DataSourceScanExec.scala:591)
at org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions(DataSourceScanExec.scala:1884)
at org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions(DataSourceScanExec.scala:664)
at org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions$(DataSourceScanExec.scala:618)
at org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions(DataSourceScanExec.scala:1884)
at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:1927)
at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:1901)
at org.apache.spark.sql.execution.FileSourceScanExec.doExecuteColumnar(DataSourceScanExec.scala:1975)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:254)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:271)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:267)
at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:250)
at org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:524)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:254)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:271)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:267)
at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:250)
at org.apache.spark.sql.execution.ColumnarToRowExec.inputRDDs(Columnar.scala:213)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:752)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:227)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:271)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:267)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:223)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:97)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:108)
at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:115)
at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:104)
at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:88)
at org.apache.spark.sql.execution.qrc.ResultCacheManager.collectResult$1(ResultCacheManager.scala:525)
at org.apache.spark.sql.execution.qrc.ResultCacheManager.computeResult(ResultCacheManager.scala:536)
at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:395)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:389)
at org.apache.spark.sql.execution.qrc.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:291)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeCollectResult$1(SparkPlan.scala:431)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:428)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3425)
at org.apache.spark.sql.Dataset.$anonfun$collectResult$1(Dataset.scala:3416)
at org.apache.spark.sql.Dataset.$anonfun$withAction$3(Dataset.scala:4290)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:779)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4288)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:241)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:390)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:187)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:142)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:340)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4288)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3415)
at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:267)
at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:101)
at com.databricks.backend.daemon.driver.PythonDriverLocalBase.generateTableResult(PythonDriverLocalBase.scala:733)
at com.databricks.backend.daemon.driver.JupyterDriverLocal.computeListResultsItem(JupyterDriverLocal.scala:1359)
at com.databricks.backend.daemon.driver.JupyterDriverLocal$JupyterEntryPoint.addCustomDisplayData(JupyterDriverLocal.scala:490)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Command took 0.17 seconds

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.