Giter Club home page Giter Club logo

ais's Introduction

Address Information System (AIS)

AIS provides a unified view of City data for an address.

API usage documentation

Goals

  • Simplify relationships between land records, real estate properties, streets, and addresses
  • Provide a way of standardizing addresses citywide
  • Support applications that require geocoding and address-based data lookups
  • Provide a feedback mechanism for continually improving parity between department datasets
  • Deprecate legacy systems for geocoding and address standardizing

Components

  • geocoder
  • address standardizer (see passyunk)
  • integration environment for address-centric data
  • API

Production Processes

This AIS repository is used in 2 distinct ways.

  1. Built into a docker container and pushed to AWS ECR where it will run in ECS. This is done using the 'build-test-compose.yml' docker-compose file, and should be programmatically done in the Github Actions file build_and_deploy.yml If the action fails, you can troubleshoot manually by simply following the steps laid out in build_and_deploy.yml on our production AIS build server.

  2. Installed and run directly on our production AIS build server, where we run various build engine scripts to create database tables from various sources. These tables are then pushed up to our AWS RDS PostgreSQL instances for use by our ECS cluster.

Development

To develop locally:

  1. git clone https://github.com/CityOfPhiladelphia/ais
  2. cd ais
  3. Create and activate a virtualenv.
  4. pip install -r requirements.txt. You may have to work through installing some dependencies by hand, especially on Windows.
  5. Copy Passyunk data files. See README for more instructions.
  6. Create an empty file at /ais/instance/config.py. To run engine scripts, you'll need to add dictionary to this called DATABASE mapping database names to connection strings. (TODO: commit sample instance config)
  7. Rename .env.sample in the root directory to .env and add real environment settings. (TODO: commit .env.sample)
  8. honcho start. This will start start serving over port 5000. Note that this is blocked on CityNet, so you'll have to be on a public network to access http://0.0.0.0:5000.

Docker Container Dev

For building the docker container, you'll need some environment/build arg variables first. Copy the example .env file used with docker-compose and populate it:

  1. cp env.example .env && chmod +x .env
  2. populate it, set the $ENGINE_DB_HOST var to your database CNAME or IP. Note that in our build deploy process at citygeo this is done automatically and is not needed.

Check to make sure docker-compose is populating your args:

  1. docker-compose -f build-test-compose.yml config

Note that you may need to set ENGINE_DB_HOST to a direct IP instead of a CNAME to get it working in-office. Now run the 'pull-private-passyunkdata.sh' script to download CSVs needed in the DockerFile.

  1. chmod +x pull-private-passyunkdata.sh; ./pull-private-passyunkdata.sh

Then build and start the container.

  1. Via docker-compose: docker-compose -f build-test-compose.yml up --build -d
    1. Directly:
docker build -t ais:latest .
docker run -itd --name ais -p 8080:8080 -e ENGINE_DB_HOST=$ENGINE_DB_HOST -e ENGINE_DB_PASS= $ENGINE_DB_PASS ais:latest` 

If the container could successfully contact the DB then it should stay up and running. You may now run tests to confirm functionality.

  1. docker exec ais bash -c 'cd /ais && . ./env/bin/activate && pytest /ais/ais/api/tests/'

Testing

The API and the Engine can be tested separately using pytest after sourcing the virtual environment venv. Important Note If you want to run pytests against a locally running database, you can either set your ENGINE_DB_HOST and ENGINE_DB_PASS parameters to the local instance and password, OR you can export DEV_TEST='true' to have this automatically use the local creds, as specified in your .env file.

export DEV_TEST='true'
pytest $WORKING_DIRECTORY/ais/tests/engine -vvv -ra --showlocals --tb=native --disable-warnings --skip=$skip_engine_tests 

pytest $WORKING_DIRECTORY/ais/tests/api -vvv -ra --showlocals --tb=native --disable-warnings --skip=$skip_api_tests 

To make direct queries using AIS, you can run the following on the dev box:

source ~/.env/bin/activate
export DEV_TEST='true'
gunicorn application --bind 0.0.0.0:8080 --workers 4 --worker-class gevent --access-logfile '-' --log-level 'notice'
curl localhost:8080/search/1234%20Market%20Street | jq .

For reasons currently unknown, the tests/api/test_views.py cannot be tested on their own -- almost all the tests will fail with a 404 Response Error -- so all api tests must be run simultaneously.

ais's People

Contributors

ajrothwell avatar alexander-m-waldman avatar bertday avatar dependabot[bot] avatar floptical avatar jrmidkiff avatar mjumbewu avatar timwis avatar tswanson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ais's Issues

load_addresses doesn't explode range addresses with a unit num

When load_addresses encounters a range address with a unit num like 1234-36 CHESTNUT ST # 200, it adds the base address 1234-36 CHESTNUT ST but it doesn't create the child addresses for that range. Note that those addresses should not have the unit number of the original address -- use address_link instead to make that connection.

Create true range view in DB bootstrap script

Add a create statement for this view:

SELECT COALESCE(r.seg_id, l.seg_id) AS seg_id,
   r.low AS true_right_from,
   r.high AS true_right_to,
   l.low AS true_left_from,
   l.high AS true_left_to
  FROM ( SELECT asr.seg_id,
           min(a.address_low) AS low,
           GREATEST(max(a.address_low), max(a.address_high)) AS high
          FROM address a
            JOIN address_street asr ON a.street_address = asr.street_address
         GROUP BY asr.seg_id, asr.seg_side
        HAVING asr.seg_id IS NOT NULL AND asr.seg_side = 'R'::text) r
    FULL JOIN ( SELECT asl.seg_id,
           min(a.address_low) AS low,
           GREATEST(max(a.address_low), max(a.address_high)) AS high
          FROM address a
            JOIN address_street asl ON a.street_address = asl.street_address
         GROUP BY asl.seg_id, asl.seg_side
        HAVING asl.seg_id IS NOT NULL AND asl.seg_side = 'L'::text) l ON r.seg_id = l.seg_id
 ORDER BY r.seg_id

Honcho + .env files

Hey @mjumbewu ! I'm trying to get the API running on my local machine and couldn't remember exactly how you were using .env files. I remember you had separate ones for dev/prod; how did that work again? If you give me some pointers I'll write them into the docs and maybe commit an .env.sample.

Thanks!

Store service area values as key-value pairs in new ServiceAreaTag model

Currently, service area values are calculated in make_service_area_summary and stored in the service_area_summary table, where each column corresponds to a service area. Since service areas can be added ro removed in config.py, the table is generated dynamically on each run and does not have a fixed schema in models.py. This eases the configuration process but has created problems with migrations and initializing new engine databases. It may be helpful to deprecate the service_area_summary in favor of a service_area_tag table with a key-value structure like;

|   id   |   street_address   |   service_area_id   |   value   |
=================================================================
|   1    |   12 OAK LN        |   census_block      |   1       |
|   2    |   12 OAK LN        |   rubbish_day       |   Thurs   |

@mjumbewu do you think this could be workable in terms of API performance? With the right indexes?

New /search response

MATCH TYPES

  • exact
  • base_address
  • unit_child: if we search for 1769 FRANKFORD AVE specifying include_units, return 1) an exact match for the base address followed by all children with type unit_child
  • unit_sibling: assume AIS has # 4 and APT 4. if we search for UNIT 4, it should return # 4 and APT 4 with match type same_unit. Use matches_unit relationship type from address_link.
  • unmatched: if we search for an address that doesn't exist in AIS, drop down to centerline geocode and return service areas.

sample response:

{
  "search_type": "address", TODO
  "search_params": { TODO
    
  },
  "query": "1769 frankford ave",
  "normalized": "1769 FRANKFORD AVE",
  "page": 1,
  "page_count": 1,
  "page_size": 1,
  "total_size": 1,
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "match_type": TODO (exact/base_address/unit_child/unit_sibling/unmatched)
      "ais_feature_type": "address",
      "properties": {
        "street_address": "1769 FRANKFORD AVE",
        "address_low": 1769,
        "address_low_suffix": "",
        "address_low_frac": "",
        "address_high": null,
        "street_predir": "",
        "street_name": "FRANKFORD",
        "street_suffix": "AVE",
        "street_postdir": "",
        "unit_type": "",
        "unit_num": "",
        "street_full": "FRANKFORD AVE",
        "street_code": 34960,
        "seg_id": 543011,
        "zip_code": "19125",
        "zip_4": "2422",
        "usps_bldgfirm": "",
        "usps_type": "S",
        "election_block_id": "24010847",
        "election_precinct": "1810",
        "pwd_parcel_id": "128133",
        "dor_parcel_id": null,
        "li_address_key": "748166",
        "pwd_account_nums": [
          "3496001769001"
        ],
        "opa_account_num": null,
        "opa_owners": null,
        "opa_address": null,
        "geom_type": "centroid",
        "geom_source": "pwd_parcel",
        "center_city_district": "",
        "cua_zone": "Asociaci\u00f3n Puertorrique\u00f1os en Marcha for Everyon",
        "li_district": "North",
        "philly_rising_area": "",
        "census_tract_2010": "015800",
        "census_block_group_2010": "1",
        "census_block_2010": "1009",
        "council_district_2016": "5",
        "political_ward": "18",
        "political_division": "1810",
        "planning_district": "River Wards",
        "elementary_school": "Adaire",
        "middle_school": "Adaire",
        "high_school": "Penn Treaty",
        "zoning": "RM1",
        "police_division": "EPD",
        "police_district": "26",
        "police_service_area": "263",
        "recreation_district": "6",
        "rubbish_recycle_day": "FRI",
        "recycling_diversion_rate": 0.062,
        "leaf_collection_area": "Saturday Bag Dropoff",
        "sanitation_area": "3",
        "sanitation_district": "3F",
        "historic_street": "",
        "highway_district": "3",
        "highway_section": "3F",
        "highway_subsection": "3F10",
        "traffic_district": "1",
        "traffic_pm_district": "1212",
        "street_light_route": "48",
        "pwd_maint_district": "3E",
        "pwd_pressure_district": "TLS",
        "pwd_treatment_plant": "BAXTER",
        "pwd_water_plate": "39",
        "pwd_center_city_district": "",
        "related_addresses": [  TODO
          {
            "address": "",
            "relationship": "range_child/range_parent/base_address/same_unit"
          },
          ...
        ]
      },
      "geometry": {
        "geocode_type": "",
        "type": "Point",
        "coordinates": [
          -75.131696154867,
          39.976398436979
        ]
      }
    }
  ]
}

Zip code searches can time out

When I searched for a valid zipcode (e.g. 19143) I eventually got a 502 Bad Gateway. When I do the search directly against the API EC2 machine, after a long wait, I get, e.g.:

GET /addresses/19143

{
  "query": "19143",
  "normalized": [
    null
  ],
  "page": 1,
  "page_count": 6592,
  "page_size": 100,
  "total_size": 659161,
  "type": "FeatureCollection",
  "features": [
    ...
  ]
}

The API currently searches for addresses, and will return the full normalized version of the address string in normalized. It's returning null because there is no address string here; Passyunk correctly identifies this as just a zipcode. However, the API then tries to go and query for everything, apparently ("total_size": 659161,).

Overlapping ranged address.

Ranged addresses that overlap each other need to be flagged and reported.

2108-14 MARKET ST - OPA
2110-12 MARKET ST - DOR?

Non-ranged addresses
2108 MARKET ST
2110 MARKET ST
2112 MARKET ST
2114 MARKET ST

1830 rittenhouse sq

There are 75 records returned. about half do not have coordinates, the other half do

If unit address can't be found, return base address

Searching for 8201 HENRY AVE APT 32B returns a 404 because the address doesn't exist in the database. As a result, if you look up that address in the new AIS-backed Property Search you get 0 properties found, even though 8201 HENRY AVE is an OPA address. It seems like the current Property Search has the same behavior so this is "no harm done", but might be worth revisiting later.

404/500 handlers throwing error

Noticed an error in Sentry where this line of the 400/500 error handler is throwing an error because the e object doesn't have an attribute code.

@app.errorhandler(404)
@app.errorhandler(500)
def handle_errors(e):
    error = json_error(e.code, e.description, None)  # happening here
    return json_response(response=error, status=e.code)

Cached (?) errors return 200 status code

If you search an /addresses/randomzxoicuvk it returns an HTTP status code of 404 the first time, but if you hit refresh it returns 200 every time after that. Could this be gatekeeper caching it? Hopefully it's just a flasky thing.

Ranged addresses w/ children rows don't return in block search response

Address summary has rows for these 3 addresses: 949-51 N LAWRENCE ST, 951 N LAWRENCE ST, 949 N LAWRENCE ST.
The address link table lists 949 and 951 Lawrence St is as “in range” of 949-51 N Lawrence St, but does not have an entry for the ranged address version (949-51).
Block search uses exclude_children fct to filter out addresses with address_link relationship=”in range”. (This happens for other addresses having this relationship as well)
This ranged address (949-51) also doesn’t have an OPA account #, so none of these show up in the block search response.

opa_address is being normalized by passyunk

Change update statement (ln. 341) to get opa_address from op.source_address instead of op.street_address

print('Populating OPA accounts...')
prop_stmt = '''
update address_summary asm
set opa_account_num = op.account_num,
opa_owners = op.owners,
opa_address = op.street_address
from address_property ap, opa_property op
where asm.street_address = ap.street_address and
ap.opa_account_num = op.account_num
'''
db.execute(prop_stmt)
db.save()

Implement `Street` model

Currently AIS has a single model for street segments, but for implementing a /search endpoint that supports a generic street lookup we'll need a more abstract model called Street that relates all the segs with the same street code. The fields should be:

predir: text (not null)
name: text (not null)
suffix: text (not null)
postdir: text (not null)
full: text (not null, indexed, unique)
code: integer (not null, indexed, unique)

Add a foreign key to StreetSegment called street that references a Street object via its street code. This should be indexed as well. Remove fields in StreetSegment that are covered in the new model (street_predir, street_name, street_suffix, street_postdir, street_code).

Some of the engine scripts will need to be tweaked to use this model, most importantly load_streets, load_dor_parcels, and load_addresses.

API Requirements

This is a list of requirements that should be satisfied to launch the AIS API. Tests will be built around these requirements.

Address Search API Requirements

  • Can filter to return only addresses that have OPA numbers

  • A single base address that is not in a range returns a single result.

  • A range address query returns a single result (i.e., does not include
    child addresses)

  • A query for an address that falls in a ranged address should return
    a single result where the address is the one queried, but the
    opa_address field is the ranged address. For example, a query for
    523 N Broad St will return the address:

    {
        "type": "Feature",
        "properties": {
            "street_address": "523 N BROAD ST",
            "opa_address": "523-25 N BROAD ST",
            ...
        },
        "geometry": {
            "type": "Point",
            "coordinates": [
                -75.16053912062759,
                39.962545839472334
            ]
        }
    }
    
  • A query for a base address returns the base address as well as any units
    in that base.

  • A query for a base address that is a range returns the units in all of
    the child addresses for that range.

  • The API treats unit types of Apt, Unit, #, and Ste as interchangeable;
    searching for one will search for any of them.

  • The API does not treat other unit types interchangeably; a unit type of
    "floor" will only match "floor".

Owner Search API Requirements

  • A query should match any portion of any owner's first or last name.
  • Parts of a query separated by spaces should act as a disjunction.

Block Search

  • A query should return a list all addresses that lie in the same 100-range as the address searched

OPA Number Search

  • A query should always return a single result.

All Endpoints Requirements

  • Queries should be case insensitive.
  • Pagination built-in
  • Data is returned as valid GeoJSON
  • User can request only addresses that have an OPA account number

1935-37 Chestnut St, Confusing matching to the high and low numbers

These addresses are matched to PWD parcel 996352.
1935 CHESTNUT ST APT 2F
1935 CHESTNUT ST APT 2R
1935 CHESTNUT ST APT 3F
1935 CHESTNUT ST APT 3R
1935 CHESTNUT ST
1935-37 CHESTNUT ST

These addresses match to DOR Parcel 001S110288
1935 CHESTNUT ST
1935-37 CHESTNUT ST
1937 CHESTNUT ST - dor_parcel match

These addresses are true_range
1937 CHESTNUT ST # 1
1937 CHESTNUT ST APT 2F
1937 CHESTNUT ST APT 2R
1937 CHESTNUT ST APT 3F
1937 CHESTNUT ST APT 3R
1937 CHESTNUT ST APT 4F

Landmarks/place names

ULRS resolves TAGGERT SCHOOL to 1701-47 CHELTEN AVE. AIS API is not handling this currently.

Add geom to address_summary

Currently address geometry is stored as text values in geocode_x, geocode_y fields (srid: 2272). Create geometry in this table for users to easily explore data spatially.

Logging

AIS should use logging and have some framework in place for managing/shipping logs.

Intersection results

/addresses/WELSH RD AND FRANKFORD AVE seems to return all WELSH RD addresses. Passyunk is typing this is an intersection_addr.

{'components': {'address': {'addr_suffix': None,
                            'addrnum_type': None,
                            'fractional': None,
                            'full': '1',
                            'high': None,
                            'high_num': None,
                            'high_num_full': None,
                            'isaddr': False,
                            'low': None,
                            'low_num': None,
                            'parity': None},
                'base_address': 'WELSH RD & FRANKFORD AVE',
                'bldgfirm': None,
                'cl_addr_match': 'RANGE:2',
                'matchdesc': None,
                'responsibility': 'TOWNSHIP',
                'seg_id': '1060825',
                'st_code': '82240',
                'street': {'full': 'WELSH RD',
                           'is_centerline_match': True,
                           'name': 'WELSH',
                           'parse_method': '2ANS',
                           'postdir': None,
                           'predir': None,
                           'suffix': 'RD'},
                'street_2': {'full': 'FRANKFORD AVE',
                             'is_centerline_match': True,
                             'name': 'FRANKFORD',
                             'parse_method': '2ANS',
                             'postdir': None,
                             'predir': None,
                             'suffix': 'AVE'},
                'street_address': 'WELSH RD & FRANKFORD AVE',
                'unit': {'unit_num': None, 'unit_type': None},
                'uspstype': None,
                'zip4': None,
                'zipcode': None},
 'input_address': 'WELSH RD AND FRANKFORD AVE',
 'type': 'intersection_addr'}

Create `Street` view

After #63 is complete, we'll need a view that takes a street_full and returns the corresponding Street object. The response should look like:

{
    "query": "MARKET STREET",
    "normalized": [
        "MARKET ST"
    ],
    "page": 1,
    "page_count": 1,
    "page_size": 1,
    "total_size": 1,
    "type": "FeatureCollection",
        "features": [
        {
            "type": "Feature",
            "properties": {
                "street_predir": "",
                "street_name": "MARKET",
                "street_suffix": "ST",
                "street_postdir": "",
                "street_code": 53560
             },
           "geometry": {
               "type": "Point",
               "coordinates: [
                   <midpoint x>,
                   <midpoint y>
               ]
           }
       ]
}

For the geometry, query all the segs with that street code and find the midpoint (snapped to a seg).

address_property table has invalid "generic unit" relationships

The load_addresses script is relating 1911 GREEN ST FL 2 to the property for 1911 GREEN ST # 2, which is not really valid (and causing FL 2 to show up in Property Search even though it's not an OPA address). Only interchangeable units like #, APT, UNIT, and STE should be allowed to have generic unit relationships.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.