txappleseed / txappleseedmap Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 15.0 49.06 MB

maps for the school to prison pipeline projects

Home Page: http://txappleseed.github.io/txappleseedmap/

CSS 1.74% HTML 12.65% JavaScript 8.43% Ruby 2.95% Python 16.23% Jupyter Notebook 57.99%

prison-pipeline social-justice openaustin mapping

txappleseedmap's Introduction

Texas School Discipline Disparities Map

Maps for the Texas Appleseed "School to Prison Pipeline" projects: http://txappleseed.github.io/txappleseedmap/

This map is linked from a WordPress hosted site with more information about the School to Prison Pipeline: http://www.texasdisciplinelab.org/

About the Data

This project documents incidents when public schools applied one of four punishments:

EXP: Expulsions
DAE: Disciplinary Alternative Education Program Removals
ISS: In School Suspensions
OSS: Out of School Suspensions

to one of ten categories of students, with the following abbreviations:

SPE: SPECIAL EDUCATION
ECO: ECONOMIC DISADVANTAGE
HIS: HISPANIC
BLA: BLACK OR AFRICAN AMERICAN
WHI: WHITE
IND: INDIGENOUS AMERICAN
ASI: ASIAN
PCI: NATIVE HAWAIIAN/OTHER PACIFIC
TWO: TWO OR MORE RACES
ALL: ALL STUDENTS

For each category, the data records a count of incidents, as well as an integer scale statistic used for coloring the map. The scale statistic is in the range from 0 to 10, where 5 represents outcomes consistent with a random distribution. The number of steps above or below zero should represent how many standard deviations the actual outcome is above or below a random distribution.

School District level data comes from disciplinary data products and District and Charter Detail Data published on the Texas Education Agency website.

Some additional data is available from open records requests to the Texas Education Agency, but not currently in use. See Open Austin's #texasappleseed Slack channel for more information.

Data Updates

This project includes a command line utility for generating the data used to populate the website each year. If you want to use the utility to generate the data yourself, use Github's "Clone or Download" button to make a copy of this project in a folder called "txappleseedmap" on your computer. Then open a command line shell on your computer and navigate to the "makedata" subfolder.

$ cd txappleseedmap/makedata

To set up a Python environment that can run the utility, use pipenv. If needed, follow the pipenv installation instructions before you continue. Once pipenv is installed, use this pipenv command to install the Python libraries you need:

$ pipenv install

If you need to install or manage different versions of Python in order to run the required version for this project (pipenv will give you a warning), consider pyenv which pipenv integrates with directly.

Next, activate the Pipenv shell. This loads a virtual environment for running the utility.

$ pipenv shell

Then use this command to install the utility in its virtual environment:

$ pip install --editable .

After that, you should be able to use the utility with the command collectFromFile. If you type that command by itself, the utility will look in the data/from_agency folder for files from the TEA to use as input, and then try to convert them to JSON files in the format used by the map. You can also add a --help flag to read the utility's help feature without doing anything else.

$ collectFromFile --help

There are a few other important options. If you don't already have the files you need from the TEA, you can use the --download flag and the utility will download them from the TEA before converting them to a new format.

$ collectFromFile --download

You can use the -f and -l flags to set the first and last years of the range that the utility should try to process. The current defaults are 2006-2016. This example would change the range to 2012-2015.

$ collectFromFile -f 2012 -l 2015

Output Format

The three options to output the processed data are --json-folders, --csv, and --json. The current version of the map is set up to use data exported using the --json-folders option. That option is also the default, so if you don't include any of these three flags, you get the --json-folders format, which includes nested directories labeled by year, demographic, and punishment. Each JSON file contains the data corresponding to one possible user query.

$ collectFromFile --json-folders

If you use the --csv flag, instead of JSONs file you'll get a collection of nested folders containing CSVs. Each file will have the statistics to populate a map about one type of action taken against one demographic group in one year.

$ collectFromFile --csv

If you use the --json flag, you'll get one big JSON file with all the data (about 7 MB).

$ collectFromFile --json

Website

This project uses leaflet.js and carto.js to render the map. https://carto.com/docs/

txappleseedmap's People

Contributors

Stargazers

Watchers

Forkers

diego-codes codecofee19 mlsintx mateoclarke mscarey gryffs henryhedges rockinrobin714 durs125 jgrenadier ellenastone alpha-tango furuutsuponchisamurai jplaut rebfrank

txappleseedmap's Issues

New Data Available?

Another issue is that this project is now so old that there's another year of data available. If we want, I can do the analysis over with the new data. I've learned a lot since last time, so it should be easier now.
@mscarey

Point locations for charter schools

Currently, charter schools aren't being shown on the map because they don't have districts that can be shown as regions on the map. But they are reported separately on the map, and starting in 2017 there'll be a column in the TEA's data that says whether the row represents a district or a charter. We should either make a separate list of charters (with color-coded disparity data comparable to the map), or else add color-coded point data to a map for each charter school.

An idea for identifying charter schools prior to 2017 would be to take all the unique ID numbers in the TEA dataset, and then subtracting all of the ID numbers that match up with a district shape in the shapefile.

Add test coverage for javascript

I'd like to add some regression coverage for the js as we make changes to the way we process the data to make sure the popup content and display colors are calculated correctly.

duplicate district geometry

It looks like we have multiple copies of the school district shapes in the geojson directory: one copy for each type of disciplinary action. Does that mean if someone visits the map and selects each of the menu options, the identical district shapes will have to be loaded to their browser multiple times? Can we make it so the district shapes only load once, even if the other data reloads when a button is pushed?

Highlight School District Border on Click

This was a feature that @mlsintx suggested that would parallel the effect you want when you use the search.

🐛 Popover should be in front of the student group picker

Right now it shows behind.

Clean up data directory

From @mscarey on Slack:

There's a lot of stuff in the data directory. I'm sure some of it is obsolete, and it's hard for me to know which files actually need to be documented. Is DistrictPercentage2015.csv the source for all the data that's actually in use in the map? I produced some files after that: the ones that came from the "StudentSums2016.ipynb" file on December 12. (Morgan asked for them in an email.) Did any of the December 12 files also get incorporated in the map? There's new data available from the state for 2016, so I could do an update that matches the format of DistrictPercentage2015.csv if that works for everybody.

No data color

There was a concern from Yamanda that the gray #ebeaea (which represents No Data) is too close to #f2f0f7 (which is the 5th color in our 10 color scale going from purple to red). Maybe we could replace the gray with a white. What do y’all think?

Automate annual updates

Appleseed would like to be able to update the map on an annual basis with new data from the TEA, without consulting us. We should think about how we can package all the data-processing code we're writing into a program that's relatively easy to use.

Of course, if the format used to publish the data changes too much, anything we write isn't going to keep working year after year.

evaluate usefulness of color scale

To repeat what I put on the Slack:

Now that I can see the changes I've made to the data files loaded into the map, I wonder if the statistical measure I used for the scale colors is Basically Just a Population Map. I based the map colors on the statistical significance of the racial disparities in each district. But the map looks consistent with the hypothesis that the racial disparities are basically the same everywhere, except that the more highly populated districts have a larger sample size to study, and the larger sample sizes increase the statistical significance of the finding that a disparity exists.

Maybe each color threshold should require a certain percentage disparity in addition to a certain level of statistical significance?

"About this map/data" section for README and maybe for future about section

Morgan wants to be able to explain how we arrived at values in the map and what they represent. I figured @mscarey would be the best person to help write out a brief of the methodology and document some explanations. We could just do this in the README for now, and potentially make a better about page. This would mainly be to help when they get asked questions.

More maps?

Alternative School Placements
in-school suspensions
out-of-school suspensions
expulsions

There are apparently three more maps that still need to be created, but I'm not sure if we ever figured out what they were

@codecofee19

The three other maps that would have been created would have covered in-school suspensions, out-of-school suspensions, and expulsions, for the same categories as the existing map. Those categories are all in the CSV that was used to create the map. (It would have been best to show everything in one map, but nobody could figure out how.)

@mscarey

Failing to show "no data" message when population is null

For some of the demographic groups in the earlier years, there's no population data, only data about actions taken against members of those groups. So in the JSON files that get loaded into the map, the "P" fields will be null. One example would be for Asian students in 2006-2007.

The map appropriately shows diagonal stripes covering the whole state, but it doesn't update the popup messages. Instead, it continues to give the popup messages for whatever dataset you were viewing previously.

Feedback from 11/16 Call with Morgan

Changes we discussed for the maps:

1. City names darker if possible
2. instead of "no data" use "Data not available for this student group."
3. in rollover text change to: ___% of ______ ISD's 10957 students are classified as _____. _____ students received out-of-school suspensions ____ times, accounting for ___ % of all out-of-school suspensions.

School District Search Bar

Having a search bar to locate a particular school district

Call to Action Form

Creating a popup form on top of the map that essentially gives the visitor action items:

contact school rep.
contact TexasAppleseed
take other course of action etc.

Update no data popover text

should indicate which district

add option to plot stats for full district populations

One of Appleseed's revision requests was to include statistics for the full district populations, comparing them to the overall statewide outcomes. The CLI already outputs data for that (under "ALL"), but there's no button to select it in the map. We'll also need different popup text to explain the different comparison that's being made in that case.

Update map with new data from 2015-16

Clean up /data directory

We have a messy data directory. We should get rid of the stuff we don't need

What files are we actually using?
What are the "raw" files?
What steps did we use for processing along the way?

data update process to GeoJSON

the process of going from what's in /data/processed/ to what's in the geojson folder isn't on Github. We need to get the full automated process into Github

Is "Inequity Level" the best measurement?

The data in the map right now shows a ratio of "1" whenever the racial disparity fails a statistical significance test. That was an awkward compromise to address the problem of outliers in sparsely-populated districts, and I'd feel more comfortable changing it somehow.

Also, the definition of the term "Inequity Level" on the map doesn't explain that aspect of the calculation. To switch to a version of the data that shows the real ratio in those cases, use "ratioDistrictNones.csv" instead of "ratioDistrictSignificant.csv".

The data for "Economically disadvantaged" students is garbage, which is why it was left off of the map.

I'm also not sure about the term "Inequity level". We can't really prove what's equitable or not. I'd prefer something like "Disparity in punishment, compared to districtwide average" or even just "Disparity index".

@mscarey

Change Popup Values

Change the popup from having pure numbers of students to the makeup of the percentage of the student body population

add drop-down to select data from other years

Currently the map can only show data from the current year. It'll only be practical to fix this if we can store the data in some other format than what we currently have, which is four TopoJSON files kept together in the same directory. It wouldn't make sense to store and load all the different TopoJSON files we'd create if we produced four of them for every year.

more informative popup text

I think we should include the raw numbers in the popup text, not just the percentages. Otherwise readers won't know whether a statistic is about a population of 5 students or 5000. I think this feature should be easy to add after we finish revising the process for loading new data in response to the user's selection.

Currently our format is like this:
In Olney ISD, Latino students received 55.42% of in-school suspensions and represent 36.52% of the student population

I think it should be like this:
In Olney ISD, the NUMBER Latino students received NUMBER (55.42%) of the in-school suspensions and represented 36.52% of the population.

Also, we need to change the popup text to handle erroneous data. We currently have this:
In Baird ISD, White students received 1450% of in-school suspensions and represent 76.74% of the student population

I would change it to this:
The TEA's reports seem to have an error. They indicate that in Baird ISD, the NUMBER White students received 1450% (NUMBER) of the in-school suspensions and represented 76.74% of the population.

The alternative would be not to provide the numbers and just say the data is not available. Actually "not available" might be a fairer choice than blaming the TEA unless I do more to make sure I'm not the one who introduced the errors.

Make the map responsive and viewable on mobile phones

⚠ lower priority says Morgan

map is blocked by data toggling box

Legend and Scaling

I also think the scale on the legend isn't very readable. Maybe we should label the left side with "less than average" and the right with "more than average" to help the readers understand what we're saying. Also, it might be better to change the left part of the scale to a negative percentage, the part that represents "no disparity" to zero (instead of one), and the right part to a positive percentage (but the positive percentage can go higher than 100%).
@mscarey

automate download from TEA

Annual updates to the data used for the map involve downloading all 20 of the year's region files from http://rptsvr1.tea.texas.gov/adhocrpt/Disciplinary_Data_Products/Download_Region_Districts.html and pasting them together. This is tedious and it's a part of the process that's likely to generate errors. jgrenadier has looked at automating this with Selenium.

Update README

its out of date... 😿