happyjack27 / autoredistrict Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 14.0 81.68 MB

Programmatically makes a fair congressional district map (prevents gerrymandering)

License: GNU General Public License v3.0

Shell 0.02% Java 99.49% Batchfile 0.01% CSS 0.28% Hack 0.20%

autoredistrict's People

Contributors

Stargazers

Watchers

Forkers

carlschroedl dsummersl eliasingea crashburnrepeat johnrattz justincohler ishan16d manuelvargas1251 dlederle beastneedsmoretorque turnouttokens zvrablik johnathanbostrom noreaster76

autoredistrict's Issues

districts scattered after loading pop from census, saving, loading

map doesn't refresh when you merge in new data

map should refresh when you merge in new data.
currently you have to switch the district column and switch it back

when loading a new column - needs to clear population

when loading a new column - needs to clear population.
sometimes switching from one column to another and back, resets other maps to random. need to close program and reload the map to reset.

better data source for election data?

one thing i've struggled with a lot is getting good clean data on vote counts at the voting tabulation district level. I've been able to get clean vtd-level shapefiles and population data from census.gov, but election data seems to be all over the place and varied not only in format, but quality. e.g. i've found some shapefiles with so many geometric areas that they are unusable, etc.

would like nice clean, centralized way to import this data, despite the apparent great variety in source, format, and quality of data depending on state.

suggestion from fairvote.org - more competitiveness stats

In addition to the absolute number of D/R leaning seats, and the summed victory margins that measure competitiveness, it would be great to have totals for the number of safe R seats, safe D seats, and competitive seats. We usually break it down further than that, with partisanship 8 % points over an electoral threshold defining a safe R or safe D seat, from 3 or 4% to 8% points defining a lean R or D seat, and within 3 or 4% points of a threshold considered a tossup seat. These measures provide a quick way to get a sense of both the overall partisan impact of a plan and the impact on competitiveness.

So as an example, we would say that a five-seat district with 60% democratic partisanship (D+10 in PVI) has 3 safe D seats, 1 lean R seat, and one safe R seat.

(add to summary stats and by party stats. )

Use gson for JSON functionality

Lots of folks at my work and in the wider software word use Google's gson for (de)serializing to/from JSON. Gson has very sensible, concise default serialization behavior, while still permitting you to customize serialization where you need. Incorporating gson would simplify the existing code base by eliminating most of the serialization logic.

The advantages of gson over current checked-in dependencies are similar to those outlined in #14.

it has automated tests, and by virtue of being widely-used, is battle-tested against many corner cases
many developers are already familiar with it, so it would lower the barrier to contribution
the project publishes their artifacts in maven central so if we move to a formal dependency management solution like maven, gradle, or ivy, we could stop checking in the jar

What do you think?

Add "how to build" documentation

I'm so coddled by my formal management tools like maven, that I don't even know where to start building this. Could you provide some advice? I need it to verify the size reduction issue.

in-app explanation of elitism

was asked about it, seems like something a lot of people would wonder about.

it'd be nice to have this explained in-app somewhere.

this was my answer:

What does the % elitism slider do?

https://www.researchgate.net/post/What_is_meant_by_the_term_Elitism_in_the_Genetic_Algorithm

"Elitism involves copying a small proportion of the fittest candidates, unchanged, into the next generation. This can sometimes have a dramatic impact on performance by ensuring that the EA does not waste time re-discovering previously discarded partial solutions. Candidate solutions that are preserved unchanged through elitism remain eligible for selection as parents when breeding the remainder of the next generation."

So basically it takes a small fraction of the best ones, and copies them over unchanged to the next generation. so there are essential your immortals. - every one else only lasts 1 generation.

i'd suggest just leaving that slider at where it starts - around 25%. no need to ever touch it, really.

in the more recent versions there should also be a slider % elites mutated, in the earlier versions it's just a checkbox, mutate elites or not. Notice the description above is that the elites remain unchanged between generations. with mutate elites selected, the elites will slowly mutate along with the rest of the population. this helps it search a little faster, but when it gets down to fine-tuning, where you only want the very best, you want to turn this off, as otherwise you'd just be hovering around the best.

add ability to copy a data column to a new column

would be a real quick update, and quite useful.
in one of the menus, maybe file menu. select which column to copy, type in name for new column, make it.

outreach / syndication

not directly related to the software, but related to the website.
need to work on getting the word out. right now looking at blog/content syndication.

so far strikes me, Google+ and RSS http://www.skilledup.com/articles/blog-syndication

and there's also this: https://www.quicksprout.com/the-complete-guide-to-building-your-blog-audience-chapter-8/ but i'm not happy about "cost-per-click". just want to get the entries on google blog search, really.

feature to lock together tabulation districts?

maybe a feature to lock together tabulation districts so they act as 1 tabulation district?

would go under the communities of interest menu.

mixed feelings about the idea because the more communities of interest there are, the less fair the map is. also it can be abused to gerrymander which violates rule #1.

seats/votes curve window improvements

2x antialias (draw at twice the size on a buffer, then shrink via interpolation)
make bigger?
combine ranked districts window and seats-votes curve window into one?
show partisan gerrymandering measures?
** wang median - mean (x @ y = 0.5 - 0.5)
** grofman / king asymmetry ( y @ x = 0.5 - 0.5)
** baas asymmetry (total shaded area)

jcom doesn't work - complains mix of 32-bit with 64-bit

jcom doesn't work - complains mix of 32-bit with 64-bit
i know it would work, 'cause i tested it outside of autoredistrict.
anycase might need to look for a different lightweight export to excel solution.

make the map prettier?

my maps arent as pretty looking as i've seen on e.g. web-based gis. not sure why.

Remove the weird char stopping compilation

The degree symbol we discussed earlier.

lines don't connect exactly

due to the way shapefiles store data, vertexes aren't shared between lines, nor are lines shared between polygons.

would be nice to be able to post-process the geometry data from the shapefile to make lines shared, so could get rid of the tiny white spaces.

also then would need to redo the polygon vertex count reduction code, so that it would be a line segment vertex count reduction, for each shared border / edge.

add to file menu: open recent?

ability to open recently opened shapefiles, add to file menu?

Enable a Continuous Integration Service

Weee! Let's try this in Agile User Story Format:
As a X I want Y so that Z.

As a maintainer of autoredistrict I want to enable a continuous integration service like travis ci so that I know that the project's master branch currently builds in a clean-slate environment.

As a maintainer of autoredistrict reviewing a pull request I want a continuous integration service to test if the pull request breaks the build so that I can refrain from looking at it too closely until it builds.

As a contributor to autoredistrict I want a continuous integration service to test if my pull request breaks the build so that I receive automatic feedback on whether my contributions would break the build and can react accordingly without waiting for maintainers to pull down my proposed pull request and run the tests themselves.

per-election imputation (substitute columns)

right now you can pick up to 3 elections, and 1 substitute election (for imputation).

instead it should be able to pick a substitute column for each election

also double-check that i'm really averaging them together.

Reduce repository size

Github has a soft repo size limit of 1GB. Though this is a small project, this repository is already approaching the soft limit. Contributors to large file size include checked-in:

dependencies (jars)
data (.zip, shapefiles, etc)
compiled artifacts (the autoredistrict jar)
release artifacts (autoredistrict.zip on the GH releases entries)

I have experience using tooling that we can use to address all three. We could resolve these issues by...

using formal dependency management tools like maven, gradle, ivy, etc. I have used maven on more than 50 projects. If you would prefer a different tool, I'd consider learning it.
moving the data files to git-lfs, or hosting them on an ftp server. Though git-lfs is new, I'm using it on a fledgling work project, so I understand some of the pitfalls.
Using formal dependency management tools like maven, we can upload compiled artifacts to publicly accessible locations so that others can download the autoredistrict jar for purposes of running it or re-using it from other code. I do this at work for all of our java projects.
If you want to include autoredistrict.zip in your github releases, we can automatically attach them when we make the tag by running Travis CI. I've used Travis CI before. I haven't used the github releases deployment step, but it looks easy.

What do you think?

better / smarter annealing

would like a better mutation rate control, and maybe some stopping criteria.
would also like it to be smarter, such as knowing when to stop mutating elites (for fine tuning), when to force mutation of disconnected pieces. stuff like that. to try to make it as good as a human.

clean obsolete items from evolution menu

clean obsolete items from evolution menu - the annealing stuff

a graphical display for racial vote dilution

would be nice to have a graphical display for racial vote dilution.
what i'm thinking is just show a vertical line for the average wasted vote rate,
and then bars on either side, for each ethnicity, showing how much higher or lower it is than the average.

rename and reorder fairness sliders

add slider for anneal rate, anneal floor

add a slider to control annealing rate, and annealing floor.

...or for the floor thing, instead of having it a floor, just force the mutation rate to anneal exactly

some way to visualize the population stats

some way to visualize the relative scores on the metrics for all members of the population.

the graph shows a historical view. this would be more like a lateral view.

to clarify by "population" i mean the current candidate maps.

fairvote.org suggestion - add some by ethnicity stats

add by ethnicity stats that would treat ethnicities like parties, and calculate winners based on demographics (weighed by total vote count) instead of votes.

this is from the fairvote.org policy analyst:

Although I agree with you that using majority minority districts to achieve minority representation is not without serious drawbacks, in the law, the measures of ethnic fairness are strictly about results and the ability to elect an in-group candidate. To be consistent with section 2 of the VRA we try to ensure that plans we propose would at least maintain the current level of likely minority representation. Additionally, with a lower thresholds, the consequences of doing this in multi-seat districts can be less severe.

All that said, it would be great to have a measure of the number of seats voters of different ethnic groups would be likely to win in the stats window (assuming ethnically cohesive voting). It could be seats where their proportion of the population is over the threshold, or within some % of the threshold, or both.

Even better would be an additional fairness criteria slider that allowed you to try and maximize the proportionality of ethnic representation. I know you may have some qualms about this, but I just thought I would make my pitch.

make it so you can run it gui-less

make it so you can read it headless, by just specifying a bunch of stuff from the command line.

add button to aggregate census data from census.gov

add ability to export to html page

on the stats page i have html buttons that just put html into the clipboard.

I want to advance that a lot - instead of that, have it create an .html file in the /autoredistrict_data/[state]/ directory, and then open that up in a web browser.

there'd be a style sheet in the /resources folder.

the export would also export some images, like the maps districts, the seats votes curve, etc. would look real pretty

also the web page could have a link to the data file.

oh- and have it say "made with auto-redistrict" and link to the website. :-)

stats panel doesn't refresh when there's no vote count data

Use geotools for shapefile functionality

We depend on geotools and a lot of tools that depend on geotools at the USGS. It's a stable, well-tested, actively-developed, and widely-used geospatial library. We can import only the shapefile functionality into autoredistrict.

Docs on the shapefile module:
http://docs.geotools.org/latest/userguide/library/data/shape.html

Advantages of geotools over current checked-in dependencies:

it has automated tests, and by virtue of being widely-used, is battle-tested against many corner cases
many GS-savvy developers are already familiar with the geotools APIs, so it would lower the barrier to contribution
geotools has official support for a very large list of geospatial data that we could add autoredistrict support for in the future
the geotools project publishes their artifacts in maven central so if we move to a formal dependency management solution like maven, gradle, or ivy, we could stop checking in the jar

suggestion from fairvote.org better control over contiuity

A check box to mandate full contiguity from the beginning when making a map. Would be nice to not have to worry about getting to full contiguity, since I think it is mandated anyway for congressional districts.

my response:
3. not as simple as what you say. the goal is, yes, to end up with fully contiguous districts. but the way the genetic algorithm works, when you recombine two fully contiguous maps, you're not guaranteed to get a contiguous result. i added the "mutate disconnected" options to help get rid of the discontinuites i'll think of maybe how to improve this, maybe add a "reject" discontinuous". either way this dramatically slows down the search and limits exploration, so it's generally better not to enforce this rule until the very end. i'll make an enhancment ticket, though.

also maybe add an in-app explanation about this.

can't outline districts

because of the way districts are done - as collections of atoms. coloring them is easy enough, but...

to outline them, have to find all the edges that have a different district on the other side, and then only draw those lines.

some sort of draw mode to draw outlines of districts.

add ability to compare stats of two maps

would just use two different columns as data sources.
would show the stats for each one somehow, and which is better
maybe a way to export this comparison to html

switch normalization to rank-based

maybe make switching between single-member and mulit-member districts turn on/off criteria

single member districts should use partisan symmetry and not proportionalness
and conversely multi-member should use proportionalness and not partisan symmetry
maybe have those sliders grey out and be treated as zero when you switch between single and multi member districts?

in app help - add tooltips

add tooltips on everything.

add ability to convert from state planes to gps coordinates

add ability to convert from state plains to gps coordinates

more map coloring modes

more map coloring modes
and clearly dilineate which ones are per-district coloring and which one are per-vtd coloring

new coloring modes:

per-district racial composition
per-district victory margin

able to see more maps at once

right now you can see 0,1, or 4 maps at once. would be nice (and pretty easy) to up that to 9, 16. maybe with scrolling can add more...

Getting more done in GitHub with ZenHub

Hola! @happyjack27 has created a ZenHub account for the happyjack27 organization. ZenHub is the leading team collaboration and project management solution built for GitHub.

How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

Real-time, customizable task boards for GitHub issues;
Burndown charts, estimates, and velocity tracking based on GitHub Milestones;
Personal to-do lists and task prioritization;
“+1” button for GitHub issues and comments;
Drag-and-drop file sharing;
Time-saving shortcuts like a quick repo switcher.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @happyjack27.

deduplicate jcom jars

There are two nearly identical jars in the project:
jcom.jar
and
src/excel/jcom_new.jar

;)

I listed the contents of both jars by running:

unzip -l src/excel/jcom_new.jar > jcom_new.jar.list
unzip -l jcom.jar > jcom.jar.list

Then printed the column headers and the associated diff by running:

head -n 3 jcom.jar.list | tail -n 2
diff jcom.jar.list jcom_new.jar.list

Which yields:

Length      Date    Time    Name
---------  ---------- -----   ----
1c1
< Archive:  jcom.jar

---
> Archive:  src/excel/jcom_new.jar
24c24
<      3452  2004-05-31 02:42   jp/ne/so_net/ga2/no_ji/jcom/IDispatch.class

---
>      4387  2013-09-05 14:12   jp/ne/so_net/ga2/no_ji/jcom/IDispatch.class
41c41
<     67220                     36 files

---
>     68155                     36 files

So, the files within the jar are nearly all the same except for IDispatch.class, which had a bigger version of added to the new jar in 2013. Any idea which one we prefer?

add some summary statistics

statistics to add:

next-most popular compactness metric (which is pretty good): sum of population weighted distances from all units in a district to a central unit in that district. use centroids of vtd's vs centroid of district.
sam wang's proposed measure: the distance of the median district's partisanness from the mean. (mean being the total popular vote.) (this is not as good a seats-votes asymmetry, but minimizing seats votes asymmetry will also minimize this measure.)

districts sometimes reset to blank

districts sometimes reset to blank (0)
may be when a non-numeric or non-integer district column was selected.

add images to html export from stats panel

Achieve 20% test coverage

We need to ensure that the app works and continues to work as it grows. Insert all normal reasons for testing here. We could do 20% in terms of line coverage, or branch coverage, either is good with me; this is just a milestone on a march to a more stable codebase. Should probably use JUnit to run the tests. I'm not sold on any particular tool to measure coverage. I've heard coveralls is cool.

ability to deagg/re-agg in a different shapefile

could go from vtd to election districts resolution or vice-versa.

would need an option to accumulate values are use majority vote.

connect multi-vtd islands

detect islands that aren't connected to the biggest region, and connect them (make the closest vtd's neighbors)

i'd say start with the biggest block, and connect up the closest un-connected and spread out from there. (always the closest unconencted to the biggest block)

maybe this feature could be turned on or off.

add ability to lock vtd's together?

could help with islands and such.

make them act as 1 vtd.

warry about this though cause it can be abused (to gerrymander)

better to do this automatically so cant be abused

(copied from duplicate:)