Probation Case Sampler

Produces a shortlist of cases for HMI Probation

Introduction

The purpose of this service is create a PrimaryCaseSampleProvisional (shortlist) from a PrimaryCaseSample (longlist).

The resultant short list is used by inspectors to determine which cases they will inspect

Sampling

A long list is created by getting all the cases in a region that fall in a certain time frame.

Then certain cases are excluded:

cases that are not eligible:
- Sensitive cases (currently determined by having asterisks in the name)
- Cases that have not yet started
- Cases that are missing information about gender
Cases that are duplicates and not the earliest case in the long list. Matching is determine by:
- same PNC
- same CRN
- Matching first name, last name and date of birth

Once this list has been created, the existing cases are categorised into 1 of 5 stratum:

MALE_COMMUNITY_NON_LOW
MALE_COMMUNITY_LOW
MALE_POST_CUSTODY_NON_LOW
MALE_POST_CUSTODY_LOW
FEMALE

The proportions of these groups are then calculated to ensure fair representation. These proportions are used to calculate the number of samples that should be selected from each stratum given a desired total number of samples. Then a buffer(%) is added to the number of samples in each group (to allow for follow up work where needed)

After the sizes of each Stratum has been determined, the proportion of each cluster, LDU and RO are used to calculate how many samples should be selected from each sub group (to maintain proportionality of across those grups).

There is a limit here to ensure that no more than 6 cases are selected from any one RO. If this were to happenm cases will be picked from other ROs in the same LDU. (In the rare occurence that there may not be enough cases from ROs within an LDU then there maybe a shortfall where the actual sample size is smaller than the requested size)

Once the size of each sub group has been calculated, the appropriate cases are randomly selected for each subgroup and aggregated to build the short list.

Sample sizes will also be adjusted to account for rounding issues. If rounding would result in there being few cases within a sub group, then individual cases would be added to each sub group in turn (smallest sub group to largest), until the sample size matches the requested. Similarly if rounding would result in there being more cases within a subgroup than requested then cases would be removed from subgroups in turn (largest to smallest).

Implementation

This service exposes two endpoints:

POST /sample?size=${size of requested sample}
POST /analyse?size=${size of requested sample}

The /sample endpoint receives a json list of cases and produces a map of Stratum to a list of Rows of the cases that have been selected for the sample. (along with some metadata - Generated ID and timestamp)

The /analyse endpoint also receives a json list of cases and produces the same sample information. Along with that it also produces information about how the allocation of samples across different Stratum, Clusters, LDUs and ROs was determined.

Testing with the example spreadsheet.

Download the sample spreadsheet yr 2 CRC Domain 2 case sample long list v0.1.xlsx and put it in /src/test/resources
Run ImportFullSample test (removing @Disabled annotation)
This will:
- Create a sample.json request file from the spreadsheet
- Start the app
- POST the file to /analyse endpoint and parse the response
- Produce a little breakdown of information about the produced sample

Id:                      d78b278b-238b-4902-bdc0-a61f72e38d65
Timestamp:               2020-06-03T11:59:04.276141
Total sample size:       120
Stratum Summary:
- name: MALE_POST_CUSTODY_NON_LOW      size: 34/55    (original: 28.35%, actual: 28.33%)
- name: MALE_COMMUNITY_NON_LOW         size: 47/75    (original: 38.66%, actual: 39.17%)
- name: FEMALE                         size: 19/32    (original: 16.49%, actual: 15.83%)
- name: MALE_COMMUNITY_LOW             size: 12/19    (original: 9.79%, actual: 10.00%)
- name: MALE_POST_CUSTODY_LOW          size: 8/13     (original: 6.70%, actual: 6.67%)



MALE_POST_CUSTODY_NON_LOW count: 34/55   
- name: Westeros                       size: 16/26    (original: 47.27%, actual: 47.06%)
- name: Utopia                         size: 11/18    (original: 32.73%, actual: 32.35%)
- name: Fantasyland                    size: 7/11     (original: 20.00%, actual: 20.59%)

...

Calling the service:

To grab a token from auth:

TOKEN=$(curl -X POST "https://sign-in-dev.hmpps.service.justice.gov.uk/auth/oauth/token?grant_type=client_credentials" -H 'Content-Type: application/json' -H 'Content-Length: 0' -H "Authorization: Basic $(echo -n client:secret| base64)" | jq -r '.access_token')

To create a sample:

curl -v -X POST "https://probation-case-sampler-dev.prison.service.justice.gov.uk/sample?size=30" -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json'  --data @src/test/resources/sample.json

To get sample along with stats about make up:

curl -v -X POST "https://probation-case-sampler-dev.prison.service.justice.gov.uk/analyse?size=30" -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json'  --data @src/test/resources/sample.json

sample.json can be generated by downloading the sample csv file and running the ImportFullSample test

TODO:

Move service to probation sub domain once it exists
Add CRN to case matching rules
Update list of sentence types and categorisation of sentence types once received

Out of scope for this phase:

Exclude unpaid work cases where the work requirement is less than 40 hours
Domain 3 stratification
Retrieve data from community-api
Create probation areas register
Add handling / validation for:
- Dates match
- Missing stratification info - e.g. RoSH level, Gender, Sentence Type

uk-gov-mirror / ministryofjustice.probation-case-sampler Goto Github PK

ministryofjustice.probation-case-sampler's Introduction

Probation Case Sampler

Introduction

Sampling

Implementation

Testing with the example spreadsheet.

Calling the service:

TODO:

ministryofjustice.probation-case-sampler's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent