Giter Club home page Giter Club logo

openpolicedata / openpolicedata Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 2.0 1.36 MB

The OpenPoliceData (OPD) Python library is the most comprehensive centralized public access point for incident-level police data in the United States. OPD provides easy access to 425+ incident-level datasets for about 4800 police agencies. Types of data include traffic stops, use of force, officer-involved shootings, and complaints.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
accountability arcgis-api data-science officer-involved-shootings open-data pandas police-complaints police-data python socrata-api traffic-stops transparency use-of-force

openpolicedata's Introduction

PyPI version Streamlit App

OpenPoliceData

The OpenPoliceData (OPD) Python library is the most comprehensive centralized public access point for incident-level police data in the United States. OPD provides easy access to 425+ incident-level datasets for about 4850 police agencies. Types of data include traffic stops, use of force, officer-involved shootings, and complaints.

Users request data by department name and type of data, and the data is returned as a pandas DataFrame. There is no need to manually find the data online or to know how to work with open data APIs (ArcGIS, Socrata, etc.). When data is loaded by OPD, the returned data is unmodified (with the exception of formatting known date fields) from what appears on the source's site, and OPD provides links to the original data for transparency.

OpenPoliceData can be installed from the Python Package Index (PyPI):

pip install openpolicedata

OpenPoliceData provides access to police data with 2 simple lines of code:

> import openpolicedata as opd
> src = opd.Source("New Orleans")
> data = src.load(table_type="USE OF FORCE", year=2022)

NEW STARTING IN VERSION 0.6: OPD now provides tools for automated data standardization. Applying these tools allow you to start your analysis more quickly by replacing column names and data with standard values for some common column types. Learn how it works and how to use it here.

alt text

Latest Datasets Added to OPD

  • Asheville, NC arrests, citations, complaints, incidents, pointing weapon, traffic stops, use of force, and 2023 calls for service
  • Sacramento, CA 2024 calls for service, 2021-2024 incidents, and 2023-2024 citations
  • Albemarle County, VA: Stops
  • Norman, OK: Crashes, incidents, and traffic stops data (new) and most recent arrests, complaints and use of force data
  • Oakland, CA: Stops
  • Washington D.C.: Lawsuits against MPD
  • Bloomington, IN: Use of Force and Citations
  • Wallkill, NY: Employee and Stops
  • Bremerton, WA: Arrests, Citations, and Incidents
  • Phoenix, AZ: Officers Firearm Pointing
  • Phoenix, AZ: 2024 Calls for Service
  • Boston, MA: Deathes in Custody
  • San Jose, CA: 2024 Calls for Service
  • Portland, OR: 2024 Calls for Service
  • Santa Monica, CA: 2022-2023 Incidents

v0.7.1 - 2024-05-10

Added

  • Added POINTING WEAPON (by officer) table type
  • Added data loader to combine multiple files that span a single year into a single dataset
  • Added support for more text date column formats in Arcgis loader.
  • Added url_contains input to get_count, load_iter, load, and load_from_csv of Source class to distinguish between multiple datasets matching a data request
  • Added datasets input to get_years to allow getting the years in specific datasets.
  • Added Year Filter Guide to documentation

Changed

  • Updates to standardization to handle more datasets

Fixed

  • Fixed year filtering for Tucson OFFICER-INVOLVED SHOOTINGS - INCIDENTS dataset. Datasets is no longer available using OpenPoliceData prior to Version 0.7.

Complete change log available at: https://github.com/openpolicedata/openpolicedata/blob/main/CHANGELOG.md

Contributing

All contributions are welcome including code enhancments, bug fixes, bug reports, documentation updates, and locating new datasets. If you're interesting in helping out, see our Contributing Guide or reach out by email.

openpolicedata's People

Contributors

potto216 avatar sowdm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

openpolicedata's Issues

Add ability for users to flag inappropriate data

We will strive to not increase access to data that is not appropriate to share online like people's names, addresses, or other identifying information. Indicate in documentation how users can flag inappropriate data shared in this package

Should our package be added to conda-forge?

Our repo is currently dependent on geopandas. In Windows, it is easier to install geopandas using conda. If we add our package and required packages (as needed) to conda-forge, our installation instructions could be simplified to: conda install -c conda-forge openpolicedata

https://conda-forge.org/docs/user/introduction.html
https://conda-forge.org/docs/maintainer/adding_pkgs.html#the-staging-process

Some of our packages (at least Socrata) aren't available through conda, but it looks like we could add them to conda-forge: "You don’t have to be the upstream maintainer of a package in order to contribute it to conda-forge. "

Example installation using conda-forge package: https://github.com/cenpy-devs/cenpy

Charlotte Traffic Stops ArcGIS page appears to have problems filtering for data

Standard way of filtering is not working with this data set.

Also tried running a query here: https://gis.charlottenc.gov/arcgis/rest/services/CMPD/CMPD/MapServer/14/query

Setting WHERE to Month_of_Stop >= '2020-01-01' AND Month_of_Stop < '2021-01-01' should give dates in 2020.

1st 1000 results are all either Month_of_Stop: 2021/09 or Month_of_Stop: 2021/10.

Same with WHERE = Month_of_Stop >= '2020-01' AND Month_of_Stop < '2021-01'

dataset has been removed from package until this is resolved

Add Data Standardization features

This is a big issue that is being included as a single issue for now for simplicity. This should be broken into multiple issues once it is started.

Data standardization consists of the following:

  • Conversion of raw data fields (column names) to standard ones
  • Conversion of raw data values (i.e. column contents) to standard ones

This might include:

  • Creation of tools that help identify if data standardization works on new data sets that are added
  • Tools for informing the user which fields and data values have changed during standardization
  • Tools or documentation for informing the user what the definitions of standardized data fields and data values are

What are the objectives of the project?

  • Provide a single source for datasets from many different jurisdictions
  • Provide a standard way to access datasets by the name of the jurisdiction rather than by website
  • Provide a standard data type (pandas or geodataframe) rather than CSV, JSON, PDF, etc.
  • Standardize data field names since different jurisdictions use different field names
  • Standardize data since different jurisdictions record data in different ways

Create online documentation

Include:

  • How to use package
  • How to contribute new datasets to package
  • Table of available data including PDs where no data was found. Include how to contribute to this table as well (i.e. suggested datasets or PDs without data)

What is the best host for the documentation?

Include Socrata instructions for getting an app token in CONTRIBUTING.MD

Data from the Socrata source requires an app token if you don't want to be throttled when making data requests.

  1. Get an App Token here: http://dev.socrata.com/docs/app-tokens.html
  2. Copy the app token
  3. Create an environment variable SODAPY_API_KEY and set it equal to the app token

Restructure repo to be compatible with pypi

This has a tutorial that walks you through how to package a simple Python project. It will show you how to add the necessary files and structure to create the package, how to build the package, and how to upload it to the Python Package Index.

Would a GUI have value?

It could provide export to CSV capability. Maybe analysis including analysis by geography

Decide on initial interface for software

Suggestion:

from datasets import datasets

All dataset information is held in a pandas Dataframe. Return a list of available datasets. Columns of dataset include id (that we generate for making loading a specific dataset easier) , state, department, year, type of table (traffic stops, use of force, arrests, etc.), URL, and a description.

datasets.list()

Return all the datasets for Virginia as a pandas Dataframe

datasets.list(state="Virginia")

Return all the datasets for Fairfax County, Virginia as a pandas Dataframe

datasets.list(state="Virginia", county="Fairfax County Police Department")

Load dataset correspond to this id

ds = datasets.load(id)

Export to CSV. Filename uses standard structure so that it can be imported later.

ds.to_csv(outputDir=folder)

Load from CSV.

ds = datasets.from_csv(id, outputDir=folder)

dataset is stored in a pandas or geopandas dataframe

df = ds.df

Auto-updating of annual datasets where the URL is predictable

Current method for adding datasets requires adding each dataset individual. For departments with datasets that only contain a single year and a new dataset is added annually for the latest year, can the URL be predicted? For example, if the URL was whatever.com/police-data-2019 for 2019 and whatever.com/police-data-2020 for 2020, perhaps, we could add code that searches over the years to find all the datasets.

Example from Baltimore Calls for Service Data:

_builder.add_data(state="Maryland", jurisdiction="Baltimore", table_type=TableTypes.CALLS_FOR_SERVICE, url=["https://opendata.baltimorecity.gov/egis/rest/services/Hosted/911_Calls_For_Service_2017_csv/FeatureServer/0", "https://opendata.baltimorecity.gov/egis/rest/services/Hosted/911_Calls_For_Service_2018_csv/FeatureServer/0", "https://opendata.baltimorecity.gov/egis/rest/services/Hosted/911_Calls_For_Service_2019_csv/FeatureServer/0", "https://opendata.baltimorecity.gov/egis/rest/services/Hosted/911_Calls_For_Service_2020_csv/FeatureServer/0"], data_type=DataTypes.ArcGIS, description=" Police Emergency and Non-Emergency calls to 911", years=[2017,2018,2019,2020], lut_dict={"date_field" : "calldatetime"})

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.