Giter Club home page Giter Club logo

corrections's Introduction

This project was done in support of "An example driven introduction to Data Science" presentation that I did for the @KC_DC

Engineer's Notebook

I've documented my thoughts about creating and using a notebook. Here I discuss Today's digital notebook experience.

Finding the data

sourcing the data

learning key words

searching

bearing fruit

Missouri Department of Corrections Sunshine Law Offender Data File

Analysis

Offender Data File Layout specification

The data file is too large for most applications to open. That is why I have included a sample file of the first 200 lines of the file.

Extract, Transform and Load

After you download and extract the Offender Data file, update the path to the file in the LoadOffenderData.py code.

Next, run the python application.

python LoadOffenderData.py > offender_data.json

This will result in a new file offender_data.json which is as its name implies, offender data. You may also see some errors as the result of UTF-8 translation.

Individual documents within this created file will look like:

{
	"sentenceLengthDays":99,
	"suffix":"",
	"sentenceDate":"19560316",
	"MissouriCharge":"10021040",
	"birthDate":"19290622",
	"sentenceProbationDate":"00000000",
	"probationType":"",
	"sentenceLengthYears":9999,
	"OffenseDescription":"TC:MURDER1ST-FIST",
	"middleName":"",
	"sentenceMinimumReleaseDate":"99999999",
	"probationTermYears":0,
	"sentenceLengthMonths":99,
	"CcCsInd":"",
	"probationTermDays":0,
	"docId":"00000001",
	"OffenseCounty":"St.LouisCity",
	"completed":"Y",
	"NcicCode":"0904",
	"firstName":"PAUL",
	"CauseNo":"1265D",
	"probationTermMonths":0,
	"lastName":"SMITH",
	"SentenceCounty":"St.LouisCity",
	"DocLocFuncFlag":"",
	"sentenceMaximumReleaseDate":"99999999",
	"offenderAssignedPlace":"",
	"race":"Black",
	"gender":"Male"
}

Now, take this data file and load it into MongoDB.

mongoimport --db doc --collection offender offender_data.json

You'll end up with a doc database that contains an offender collection that is using ~2G of disk space.

Additional Analysis

Using the distinct operation on the offenders collection is a good way to do some discovery work in the data.

> db.offenders.distinct('race')
[
        "Asian/Pacific Islander",
        "Black",
        "Nat Am/Alaskan",
        "Unknown",
        "White"
]

> db.offenders.distinct('gender')
[ "Male", "Female", "Unknown" ]

Important note about the Missouri Charge field.

Refer back to the DOC format description and you will find:

This will contain the 8-digit code associated with this offense from court papers or the Missouri Charge Code Manual. Felony class may be used to insure the correct match. Positions 1 through 5 are the major category code. Positions 6 and 7 contain the NCIC/State Modifier range. These positions of the MO Code match the last two digits of the NCIC code for the charge. The eighth position may be 0 for Not Applicable, 1 for Attempt, 2 for Accessory or 3 for Conspiracy.

So, querying for a Murder in the 1st degree, the query would be like this:

> db.offenders.distinct('MissouriCharge',{"MissouriCharge":/^10021.*/})
[
        "10021040",
        "10021070",
        "10021990",
        "10021020",
        "10021010",
        "10021030",
        "10021110",
        "10021120",
        "10021991",
        "1002199",
        "10021993",
        "10021992",
        "10021090",
        "10021",
        "10021050",
        "10021033",
        "10021121",
        "10021113",
        "10021013",
        "10021103",
        "10021023",
        "10021043"
]

Other fields

The completed field indicates if they have completed their sentence. SentenceCounty may differ from OffenseCounty, think change of venue. sentenceLengthYears all 9's indicates life sentence.

Visualization

Springboard

I think it would be interesting to get this data into a graph-oriented database and do some querying and visualization that way. For entities (nodes) I'm thinking offenders, counties and charge would be prime candidates.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.