Giter Club home page Giter Club logo

txappleseedmap's Introduction

Texas School Discipline Disparities Map

Build Status

Maps for the Texas Appleseed "School to Prison Pipeline" projects: http://txappleseed.github.io/txappleseedmap/

This map is linked from a WordPress hosted site with more information about the School to Prison Pipeline: http://www.texasdisciplinelab.org/

About the Data

This project documents incidents when public schools applied one of four punishments:

  • EXP: Expulsions
  • DAE: Disciplinary Alternative Education Program Removals
  • ISS: In School Suspensions
  • OSS: Out of School Suspensions

to one of ten categories of students, with the following abbreviations:

  • SPE: SPECIAL EDUCATION
  • ECO: ECONOMIC DISADVANTAGE
  • HIS: HISPANIC
  • BLA: BLACK OR AFRICAN AMERICAN
  • WHI: WHITE
  • IND: INDIGENOUS AMERICAN
  • ASI: ASIAN
  • PCI: NATIVE HAWAIIAN/OTHER PACIFIC
  • TWO: TWO OR MORE RACES
  • ALL: ALL STUDENTS

For each category, the data records a count of incidents, as well as an integer scale statistic used for coloring the map. The scale statistic is in the range from 0 to 10, where 5 represents outcomes consistent with a random distribution. The number of steps above or below zero should represent how many standard deviations the actual outcome is above or below a random distribution.

School District level data comes from disciplinary data products and District and Charter Detail Data published on the Texas Education Agency website.

Some additional data is available from open records requests to the Texas Education Agency, but not currently in use. See Open Austin's #texasappleseed Slack channel for more information.

Data Updates

This project includes a command line utility for generating the data used to populate the website each year. If you want to use the utility to generate the data yourself, use Github's "Clone or Download" button to make a copy of this project in a folder called "txappleseedmap" on your computer. Then open a command line shell on your computer and navigate to the "makedata" subfolder.

$ cd txappleseedmap/makedata

To set up a Python environment that can run the utility, use pipenv. If needed, follow the pipenv installation instructions before you continue. Once pipenv is installed, use this pipenv command to install the Python libraries you need:

$ pipenv install

If you need to install or manage different versions of Python in order to run the required version for this project (pipenv will give you a warning), consider pyenv which pipenv integrates with directly.

Next, activate the Pipenv shell. This loads a virtual environment for running the utility.

$ pipenv shell

Then use this command to install the utility in its virtual environment:

$ pip install --editable .

After that, you should be able to use the utility with the command collectFromFile. If you type that command by itself, the utility will look in the data/from_agency folder for files from the TEA to use as input, and then try to convert them to JSON files in the format used by the map. You can also add a --help flag to read the utility's help feature without doing anything else.

$ collectFromFile --help

There are a few other important options. If you don't already have the files you need from the TEA, you can use the --download flag and the utility will download them from the TEA before converting them to a new format.

$ collectFromFile --download

You can use the -f and -l flags to set the first and last years of the range that the utility should try to process. The current defaults are 2006-2016. This example would change the range to 2012-2015.

$ collectFromFile -f 2012 -l 2015

Output Format

The three options to output the processed data are --json-folders, --csv, and --json. The current version of the map is set up to use data exported using the --json-folders option. That option is also the default, so if you don't include any of these three flags, you get the --json-folders format, which includes nested directories labeled by year, demographic, and punishment. Each JSON file contains the data corresponding to one possible user query.

$ collectFromFile --json-folders

If you use the --csv flag, instead of JSONs file you'll get a collection of nested folders containing CSVs. Each file will have the statistics to populate a map about one type of action taken against one demographic group in one year.

$ collectFromFile --csv

If you use the --json flag, you'll get one big JSON file with all the data (about 7 MB).

$ collectFromFile --json

Website

This project uses leaflet.js and carto.js to render the map. https://carto.com/docs/

txappleseedmap's People

Contributors

codecofee19 avatar gryffs avatar itamargal avatar jgrenadier avatar jplaut avatar mlsintx avatar mscarey avatar rebfrank avatar robindykema avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

txappleseedmap's Issues

New Data Available?

Another issue is that this project is now so old that there's another year of data available. If we want, I can do the analysis over with the new data. I've learned a lot since last time, so it should be easier now.
@mscarey

Point locations for charter schools

Currently, charter schools aren't being shown on the map because they don't have districts that can be shown as regions on the map. But they are reported separately on the map, and starting in 2017 there'll be a column in the TEA's data that says whether the row represents a district or a charter. We should either make a separate list of charters (with color-coded disparity data comparable to the map), or else add color-coded point data to a map for each charter school.

An idea for identifying charter schools prior to 2017 would be to take all the unique ID numbers in the TEA dataset, and then subtracting all of the ID numbers that match up with a district shape in the shapefile.

Add test coverage for javascript

I'd like to add some regression coverage for the js as we make changes to the way we process the data to make sure the popup content and display colors are calculated correctly.

duplicate district geometry

It looks like we have multiple copies of the school district shapes in the geojson directory: one copy for each type of disciplinary action. Does that mean if someone visits the map and selects each of the menu options, the identical district shapes will have to be loaded to their browser multiple times? Can we make it so the district shapes only load once, even if the other data reloads when a button is pushed?

Clean up data directory

From @mscarey on Slack:

There's a lot of stuff in the data directory. I'm sure some of it is obsolete, and it's hard for me to know which files actually need to be documented. Is DistrictPercentage2015.csv the source for all the data that's actually in use in the map? I produced some files after that: the ones that came from the "StudentSums2016.ipynb" file on December 12. (Morgan asked for them in an email.) Did any of the December 12 files also get incorporated in the map? There's new data available from the state for 2016, so I could do an update that matches the format of DistrictPercentage2015.csv if that works for everybody.

No data color

There was a concern from Yamanda that the gray #ebeaea (which represents No Data) is too close to #f2f0f7 (which is the 5th color in our 10 color scale going from purple to red). Maybe we could replace the gray with a white. What do y’all think?

Automate annual updates

Appleseed would like to be able to update the map on an annual basis with new data from the TEA, without consulting us. We should think about how we can package all the data-processing code we're writing into a program that's relatively easy to use.

Of course, if the format used to publish the data changes too much, anything we write isn't going to keep working year after year.

evaluate usefulness of color scale

To repeat what I put on the Slack:

Now that I can see the changes I've made to the data files loaded into the map, I wonder if the statistical measure I used for the scale colors is Basically Just a Population Map. I based the map colors on the statistical significance of the racial disparities in each district. But the map looks consistent with the hypothesis that the racial disparities are basically the same everywhere, except that the more highly populated districts have a larger sample size to study, and the larger sample sizes increase the statistical significance of the finding that a disparity exists.

Maybe each color threshold should require a certain percentage disparity in addition to a certain level of statistical significance?

"About this map/data" section for README and maybe for future about section

Morgan wants to be able to explain how we arrived at values in the map and what they represent. I figured @mscarey would be the best person to help write out a brief of the methodology and document some explanations. We could just do this in the README for now, and potentially make a better about page. This would mainly be to help when they get asked questions.

More maps?

  • Alternative School Placements
  • in-school suspensions
  • out-of-school suspensions
  • expulsions

There are apparently three more maps that still need to be created, but I'm not sure if we ever figured out what they were

The three other maps that would have been created would have covered in-school suspensions, out-of-school suspensions, and expulsions, for the same categories as the existing map. Those categories are all in the CSV that was used to create the map. (It would have been best to show everything in one map, but nobody could figure out how.)

Failing to show "no data" message when population is null

For some of the demographic groups in the earlier years, there's no population data, only data about actions taken against members of those groups. So in the JSON files that get loaded into the map, the "P" fields will be null. One example would be for Asian students in 2006-2007.

The map appropriately shows diagonal stripes covering the whole state, but it doesn't update the popup messages. Instead, it continues to give the popup messages for whatever dataset you were viewing previously.

Feedback from 11/16 Call with Morgan

Changes we discussed for the maps:

  • 1. City names darker if possible

  • 2. instead of "no data" use "Data not available for this student group."

  • 3. in rollover text change to: ___% of ______ ISD's 10957 students are classified as _____. _____ students received out-of-school suspensions ____ times, accounting for ___ % of all out-of-school suspensions.

Call to Action Form

Creating a popup form on top of the map that essentially gives the visitor action items:

  • contact school rep.
  • contact TexasAppleseed
  • take other course of action etc.

add option to plot stats for full district populations

One of Appleseed's revision requests was to include statistics for the full district populations, comparing them to the overall statewide outcomes. The CLI already outputs data for that (under "ALL"), but there's no button to select it in the map. We'll also need different popup text to explain the different comparison that's being made in that case.

Clean up /data directory

We have a messy data directory. We should get rid of the stuff we don't need

  • What files are we actually using?
  • What are the "raw" files?
  • What steps did we use for processing along the way?

data update process to GeoJSON

the process of going from what's in /data/processed/ to what's in the geojson folder isn't on Github. We need to get the full automated process into Github

Is "Inequity Level" the best measurement?

The data in the map right now shows a ratio of "1" whenever the racial disparity fails a statistical significance test. That was an awkward compromise to address the problem of outliers in sparsely-populated districts, and I'd feel more comfortable changing it somehow.

Also, the definition of the term "Inequity Level" on the map doesn't explain that aspect of the calculation. To switch to a version of the data that shows the real ratio in those cases, use "ratioDistrictNones.csv" instead of "ratioDistrictSignificant.csv".

The data for "Economically disadvantaged" students is garbage, which is why it was left off of the map.

I'm also not sure about the term "Inequity level". We can't really prove what's equitable or not. I'd prefer something like "Disparity in punishment, compared to districtwide average" or even just "Disparity index".

Change Popup Values

Change the popup from having pure numbers of students to the makeup of the percentage of the student body population

add drop-down to select data from other years

Currently the map can only show data from the current year. It'll only be practical to fix this if we can store the data in some other format than what we currently have, which is four TopoJSON files kept together in the same directory. It wouldn't make sense to store and load all the different TopoJSON files we'd create if we produced four of them for every year.

more informative popup text

I think we should include the raw numbers in the popup text, not just the percentages. Otherwise readers won't know whether a statistic is about a population of 5 students or 5000. I think this feature should be easy to add after we finish revising the process for loading new data in response to the user's selection.

Currently our format is like this:
In Olney ISD, Latino students received 55.42% of in-school suspensions and represent 36.52% of the student population

I think it should be like this:
In Olney ISD, the NUMBER Latino students received NUMBER (55.42%) of the in-school suspensions and represented 36.52% of the population.

Also, we need to change the popup text to handle erroneous data. We currently have this:
In Baird ISD, White students received 1450% of in-school suspensions and represent 76.74% of the student population

I would change it to this:
The TEA's reports seem to have an error. They indicate that in Baird ISD, the NUMBER White students received 1450% (NUMBER) of the in-school suspensions and represented 76.74% of the population.

The alternative would be not to provide the numbers and just say the data is not available. Actually "not available" might be a fairer choice than blaming the TEA unless I do more to make sure I'm not the one who introduced the errors.

Legend and Scaling

I also think the scale on the legend isn't very readable. Maybe we should label the left side with "less than average" and the right with "more than average" to help the readers understand what we're saying. Also, it might be better to change the left part of the scale to a negative percentage, the part that represents "no disparity" to zero (instead of one), and the right part to a positive percentage (but the positive percentage can go higher than 100%).
@mscarey

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.