Giter Club home page Giter Club logo

riddle's Introduction

riddle

This is a minimal package for programatically interacting with the UNHCR Raw Internal Data Library (RIDL). The main purpose served by this package is to make the RIDL API documentation more readily accessible within an R ecosystem for better automation.

Install and configure authentication token

install.packages("pak")
pak::pkg_install("edouard-legoupil/riddle") 

The riddle package requires you to add your API token and store it for further use. The easiest way to do that is to store your API token in your .Renviron file which is automatically read by R on startup.

You can retrieve your API TOKEN in your user page.

api_token_img

To use the package, you’ll need to store your RIDL API token in the RIDL_API_TOKEN environment variable. The easiest way to do that is by calling usethis::edit_r_environ() and adding the line RIDL_API_TOKEN=xxxxx to the file before saving and restarting your R session.

The package works with both the production and UAT instances of RIDL (aka "User Acceptance Testing"). To use the UAT version, add the corresponding TOKEN within your .Renviron file: RIDL_UAT_API_TOKEN=xxxxx. To use the UAT version, run Sys.setenv(USE_UAT=1) before calling any functions from the package. To go back to the production instance, call Sys.unsetenv("USE_UAT").

A quick intro to RIDL concepts

In order to easily use the ridlle package, it’s important to understand some 3 main concepts of this platform. RIDL is based on CKAN and the documentation is available here for more details.

Container

A container is a placeholder where we can share data on RIDL. A container can hold zero or multiple datasets. As a convention all operations datasets are grouped together within a container but an operation container can also include multiple specific containers. Container are documented within the dataset metadata through the variable owner_org

Container URL are typically formatted as:

https://ridl.unhcr.org/data-container/`__name_of_country__`

Dataset

A dataset is a placeholder where we can share a series of data and documentation files (called resources - see below), eac of them linked to a data project. Each dataset is described with some metadata (using the data documentation initiative (DDI) format) that gives enough context on the project and information to properly store the data files and use them.

Dataset URL are typically formatted as:

https://ridl.unhcr.org/dataset/`__name_of_dataset__`

Data files, e.g an Excel file, as well as any supporting documentation are called resource and are shared as either data or attachment within a specific dataset page.

Resource

A resource is a file shared in dataset page. Depending on the type ( data or attachment ) , it comes with specific minimum metadata that complement the metata from the project itself.

Resources URL are typically formatted as:

https://ridl.unhcr.org/dataset/`__name_of_dataset__`/resource/`__id_of_the_resource__`

How To

As a UNHCR staff, you should have access to a series of containers based on where you are working. Within each container, if you have editor or admin right, you can create a dataset.

Use case 1: create a new dataset

To create a dataset, you need first to document the dataset metadata, including the reference to the container where you would like the new dataset to be created. Once the dataset is created, you can add as many resources as required (either data or attachment).


library(riddle)

## let use UAT 

Sys.setenv(USE_UAT=1)

## First we create the dataset metadata

m <- dataset_metadata(title = "Motor Trend Car Road Tests",
                      name = "testing-riddle",
                      notes = "The data was extracted from the 1974 Motor Trend 
                      US magazine, and comprises fuel consumption and 10 aspects
                      of automobile design and performance for 32 automobiles 
                      (1973–74 models).",
                      owner_org = "americas", ## becarefull this is the 
                                             ##canonincal name of the container
                      visibility = "public",
                      external_access_level = "open_access",
                      data_collector = "Motor Trend",
                      keywords = keywords[c("Environment", "Other")],
                      unit_of_measurement = "car",
                      data_collection_technique = "oth",
                      archived = "False")
                      
                      
## For the above to work - you need to make sure you have at least editor access

# to the corresponding container - i.e. owner_org = "americas"

p <- dataset_create(m)

# The return value is a representation of the dataset we just created in

# RIDL that you could inspect like any other R object.

p 

Use case 2: replace data file within dataset

Ideally, data resources from kobotoolbox should be added using the API connection as described in Part 4 of the documentation.

Though, there might be specific cases where you are building an operational dataset, scrapping an official data source from the web or within a PDF and want to add this on a regular basis as a new data resource within an existing dataset. You can check a practical example of such use case here:darien_gap_human_mobility

Below is simple example using the mtcars dataset as an example.

library(riddle)
## let's get again the details of the dataset we want to add the resource in 
# based on a search...
p <- dataset_search("testing-riddle")

## and now can search for it - checking it is correctly there... 
resource_search("name:mtcars")

m <- resource_metadata(type = "data",
                       url = "mtcars.csv",
                       name = "mtcars.csv",
                       format = "csv",
                       file_type = "microdata",
                       date_range_start = "1973-01-01",
                       date_range_end = "1973-12-31",
                       version = "1",
                       visibility = "public",
                       process_status = "raw",
                       identifiability = "anonymized_public")
## let's get again the details of the dataset we want to add the resource in..
r <- resource_update(p$id, m, uat = TRUE)

# Like before, the return value is a tibble representation of the
# resource.
r


Use case 3: Add a new attachment with your reproducible analysis code

You want to add your own initial data exploration, data interpretation presentation and/or data story telling report as a new attachement resource within a dataset.

You can check a practical example of such use case here:kobocruncher

library(riddle)

# And once we’re done experimenting with the API, we should take down our
# toy dataset since we don’t really need it on RIDL.
dataset_delete(p$id)

Use case 4: Data Landscape Report

The package includes a parameterized notebook template (with parameter including region and year) to assess data landscape.

Based on metadata, the reports looks at what type of data are available per country and provide a ways to perform data gap analysis

  • How many dataset we have per country?
  • Data set collected at household level
  • Data set collected at Community level
  • Data by Access type
  • Data over time
  • Data Collection Mode
  • Data by Topic
  • Data linked to Kobo in RIDL
  • By Sampling type

riddle's People

Contributors

edouard-legoupil avatar galalh avatar matheus-hardt avatar

Stargazers

Jimmy Briggs avatar

Watchers

 avatar

riddle's Issues

Build the data landscape report

The data landscape report should allow to overview data investment per year in each operations and to identify potential data gaps - or issues in terms of data documentations (Metadata QA...).

See initial template here - https://github.com/Edouard-Legoupil/riddle/blob/main/inst/rmarkdown/templates/summary_report/skeleton/skeleton.Rmd - it includes already some ideas on key issues to look at - --

It should work based on RIDL user authentication - so that we build a quick shiny interface over it and give the ability for any data export to generate a data landscape report for his own container - Operation Data Expert should be able to test that metadata fix in RIDL are solving their issues.

If the user has more than one container analysis will be iterated over each container..

API call - check access right - before checking if the container exists

Although being an admin member of https://ridl-uat.unhcr.org/data-container/members/americas

The following does not work -

m <- riddle::dataset_metadata(title = "Motor Trend Car Road Test two",
                       name = "mtcars_ed",
                       notes = "The data was extracted from the 1974 Motor Trend
                       US magazine, and comprises fuel consumption and 10 aspects
                       of automobile design and performance for 32 automobiles
                       (1973–74 models).",
                       owner_org = "Americas",
                       visibility = "public",
                       geographies = "UNSPECIFIED",
                       external_access_level = "open_access",
                       data_collector = "Motor Trend",
                       keywords = keywords[c("Environment", "Other")],
                       unit_of_measurement = "car",
                       data_collection_technique = "oth",
                       archived = "False")
 p <- riddle::dataset_create(metadata = m)

[1] "Running ridl action" "package_create"
[1] " - env: UAT"
[1] " - key" "No-show!"

Error in ridl(action = "package_create", !!!metadata) :
__type: Authorization Error
message: Access denied: User legoupil not authorized to add dataset to this data container

`search_result` function not working with object usage

The search_result function appears to be encountering an issue when used with objects. It seems that the function relies on match.call, which doesn't evaluate the arguments properly in this context, leading to errors. This needs to be addressed and resolved for better functionality.

Need a function thall pulls all nested dataset within a specific container

In order to address the data landscape report - https://edouard-legoupil.github.io/riddle/#use-case-4-data-landscape-report -

We would need a function that retrieves all linked nested dataset within a specific container - owner_org -

Probably some ideas to pull from https://dickoa.gitlab.io/ridl/reference/ridl_container_hierarchy_list.html -- see https://gitlab.com/dickoa/ridl/-/blob/master/R/container.R

using the API call - https://docs.ckan.org/en/2.9/api/index.html#ckan.logic.action.get.organization_show

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.