Giter Club home page Giter Club logo

maps's Introduction

Crime Map

An easy way to see recent 911 calls made in Northwest Arkansas

Awards

We made this project for the 2019 JB Hunt Hackathon and actually won first prize!

We also submitted this project to the 2019 Congressional App Challenge in our district, and won.

Here is our demonstration video.

Installation

Installation is pretty simple. Clone the repo, install python dependencies with pipenv

pipenv install

Next, run the scraper with the following sample command:

python -m maps.cli scrape fay 30 or flask scrape fay 30

which will scrape 911 calls for the past 30 days. Read the Scraper documentation for more info.

Once you've scraped the data, go ahead and start the flask server, which resides in the run.py file.

Configuration

The following vars need to be put in a config.py file inside an instance dir in the root of the repo dir.

  • BING_MAPS_KEY: An API key for the Bing Maps Geocode Dataflow API. It's required for cities like Springdale which don't provide coordinates.

  • MAPS_SENTRY_DSN: A DSN for Sentry. If provided, Sentry error reporting is setup.

An example would look like this

# An API key for the Bing Maps Geocode Dataflow API.
# It's required for cities like Springdale which don't provide coordinates.
BING_MAPS_KEY = "1232134234234242fdsfsfsf"

# A DSN for sentry. If provided, sentry error reporting is set up.
MAPS_SENTRY_DSN = "[email protected]/32423432"

Website

A view of the main map view

Once the server is up and running, this is what users see. We hope it's pretty clear on how to use!

Scraper

The Scraper fetches 911 calls and inserts them into the call database. You can run the scraper like this:

python -m maps.cli scrape fay or flask scrape fay

The above command would scrape all data back to 24 hours ago. You can also specify how far back you want to scrape:

python -m maps.cli scrape fay 15 or flask scrape fay 15

The above command would scrape 15 days.

Available Commands

The following scraping commands are available for use with the cli. (All commands are prefixed with calling the cli module)

  • scrape fay [days] : Scrape Fayetteville calls, and control how many days to scrape back with the days param (int)
  • scrape spr : Scrape the past 24 hours of Springdale calls
  • scrape wash: Scrape the past few days of Washington County calls

Supported Regions

  • Fayetteville, Arkansas
  • Springdale, Arkansas (note: doesn't support historical scraping)
  • Washington County, Arkansas (also doesn't support historical scraping)

maps's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

maps's Issues

City is missing in Bing Maps Response

Sometimes, bing maps doesn't return the city input when returning a Search by City result

Sentry Issue: CRIME-MAP-1E

ValueError: not enough values to unpack (expected 8, got 7)
(11 additional frame(s) were not displayed)
...
  File "maps/scraper/washington.py", line 103, in scrape_to_db
    new_calls = geocode_calls(new_calls)
  File "maps/scraper/washington.py", line 19, in geocode_calls
    address_to_geocode = geocode_lookup_city(addresses_cities)
  File "maps/scraper/geocoder/__init__.py", line 41, in geocode_lookup_city
    job_manager.fetch_results()
  File "maps/scraper/geocoder/jobmanager.py", line 166, in fetch_results
    result = Result(row)
  File "maps/scraper/geocoder/jobmanager.py", line 56, in __init__
    self.id, self.address, self.zipOrCityInput, self.state, self.country, self.lat, self.lon, self.city = values

Catch more Bing Maps Errors

So, looking through sentry, there seems to be cases where something will error out (external server's fault), but it's registering as our error! There are three specific, infrequent spots where this happens.

  • CRIME-MAP-1B: ConnectionError at the initial call for springdale error. (It may be a good idea to try catch for ConnectionErrors on all fetches, so if there's connectionError, we don't log it as our fault. Also, there's almost nothing that would be lost by skipping a scrape, unless we skipped 48 consecutive scrapes, due to connectionerrors, but that's probably unlikely, and probably not our fault, if their server keeps breaking)
  • CRIME-MAP-Z: When we try to JSON decode the Fayetteville data. If the fayetteville site returns a "this broke on our end" error in html, we get a JSONDecode error, we should catch those
  • CRIME-MAP-X: List out of range, on getting job status. Sometime's we will get 404 on bing maps jobs, saying job can't be found. Since it happens so infrequently (only 5 events), it probably doesn't hurt to just throw it out.

None of this is really high priority though, since these are all low occurrence errors.

Instance of "scoped_session" has no "commit" member

For some reason, whenever we use the db instance (SQLAlchemy) session, and then some function like commit or instance, pylint freaks out, and it's weird.

This is not an urgent, or even really affecting the product itself, it's just weird, and if anyone knows why, pls share.

Expose database via endpoint

This will be useful so another server (perhaps my Pi) could download the database once a day or something.

There's lots of important data! Almost a year's worth!

Date Range Issues

So, when the start date and the end date are the same day, no results show up, because it's searching 2020-06-12T00:00:00 through 2020-06-12T00:00:00

Maybe the end date time should be 25:59:59

Better icon management

Right now, the only way to add an icon to a call type, is to edit the JS file. We should split this out to a separate file, probably a JSON file, and remove it from git, so we don't have to make a commit every time we add an icon.

It'd be even cooler if there was a web portal or something where icons could easily be assigned, obviously behind an auth wall.

Implement use of "confidence level"

The Bing Maps API seems to be able to return a property that shows "the level of confidence that the geocoded location result is a match" (docs). If we can utilize this for our purposes, we would be able to weed out and better handle (maybe raise a warning so we can look into them) calls that are incorrectly geocoded.
We'll have to do some more research to figure out if this is feasible.

More Detailed Date Picker

There should be more advanced date picking technology, to allow for selection of a custom date range.

I’m pretty sure the API just does days back, so that may be have to be updated too.

Geocoder struggles when City is wrong

So, Springdale for example sometimes takes care of calls that are not inside of Springdale (but in neighboring towns like Lowell).

Because they don't provide a city in their data, we are forced to guess Springdale is the city (which is usually right).

However, when it's not, Bing Maps has to make a guess on where in Arkansas we are talking about (see Magnolia Arkansas for an example).

Anyways, the solution would either to

  1. Instead of City, do a zip code, and tell Bing to find the closest address to that zip code
  2. If possible, tell bing to find the closest valid address to the city provided, but this is less preferable than the zip code method.
  3. If the above don't work, maybe a method to kinda guess when bing maps guesses wrong (like if a call is 100 miles from Springdale), then some sort of either manual sorting or other system. This however is probably worst case scenario.

Or 4. Anything else that fixes it also works.

This is not highest priority, as a low number of calls are affected by this, but especially if we expand, it'll be something we need to tackle at some point.

Bing Maps Jobs Stall and Cause Errors

So, for some reason, sometimes Bing Maps jobs stall for a few hours. This isn’t great, but what can you do.

Anyways, I’ve scoured the web, and it appears there is no way to terminate a pending job, and because we can only have 3 pending jobs at a time, 3 stalls = lag!

So, honestly, if we get 3 stalls, we’ve just got to wait it out, and they shouldn’t last a whole day (or anywhere close to that), so we can just get the data later.

Now, this also explains the IntegrityErrors we were getting earlier today, as a job would start and stall; 30 mins later, another job starts and does the new data, and the data the OG job never finished, and commits to the DB.

When the stalled job is done, it tries to put its data in the dB, and errors occur. I added some protection against this in 5b30b17, where if integrityerror occurs, because of this silly race condition, it tried to merge. If that fails, then raise error, but that hasn’t happened yet. So that’s good.

Anyways

What needs to be done here is after a certain amount of waiting, maybe 5 minutes or so, just give up, because I had processes running for hours waiting on stalled jobs, this also used 1000s of API requests (not billed, or counts towards our limit, but still!)

Also, if a BingAPIError pops up with a quota error (full error: ["JobUsageQuota: Account already has 3 'Pending' jobs"] ), just silently go away, I guess. This is the best solution I can think of.

Add Job Manager Error Handling

Occasionally, when hitting the Bing Maps API, Bing Maps will error out. Obviously, we can't control this, so maybe instead of it just erroring out our program too, it should first, just try it again, and if it fails again, fail elegantly (basically don't hit sentry if it's a bing maps internal error)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.