udst / urbanaccess Goto Github PK

View Code? Open in Web Editor NEW

232.0 27.0 56.0 25.65 MB

A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim

Home Page: https://udst.github.io/urbanaccess/index.html

License: GNU Affero General Public License v3.0

Python 100.00%

openstreetmap network graph gtfs accessibility-analysis network-analysis transit-networks

urbanaccess's Introduction

UrbanAccess

A tool for computing GTFS transit and OSM pedestrian networks for accessibility analysis.

Integrated AC Transit and BART transit and pedestrian network travel times for Oakland, CA

Overview

UrbanAccess is tool for creating multi-modal graph networks for use in multi-scale (e.g. address level to the metropolitan level) transit accessibility analyses with the network analysis tool Pandana. UrbanAccess uses open data from General Transit Feed Specification (GTFS) data to represent disparate operational schedule transit networks and pedestrian OpenStreetMap (OSM) data to represent the pedestrian network. UrbanAccess provides a generalized, computationally efficient, and unified accessibility calculation framework by linking tools for: 1) network data acquisition, validation, and processing; 2) computing an integrated pedestrian and transit weighted network graph; and 3) network analysis using Pandana.

UrbanAccess offers the following tools:

GTFS and OSM network data acquisition via APIs
Network data validation and regional network aggregation
Compute network impedance:
- by transit schedule day of the week and time of day
- by transit mode
- by including average passenger headways to approximate passenger transit stop wait time
Integrate pedestrian and transit networks to approximate pedestrian scale accessibility
Resulting networks are designed to be used to compute accessibility metrics using the open source network analysis tool Pandana
- Compute cumulative accessibility metrics
- Nearest feature analysis using POIs

Let us know what you are working on or if you think you have a great use case by tweeting us at @urbansim or post on the UrbanSim forum.

Citation and academic literature

To cite this tool and for a complete description of the UrbanAccess methodology see the paper below:

Samuel D. Blanchard and Paul Waddell. 2017. "UrbanAccess: Generalized Methodology for Measuring Regional Accessibility with an Integrated Pedestrian and Transit Network." Transportation Research Record: Journal of the Transportation Research Board. No. 2653. pp. 35–44.

For other related literature see here.

Reporting bugs

Please report any bugs you encounter via GitHub issues.

Contributing to UrbanAccess

If you have improvements or new features you would like to see in UrbanAccess:

Open a feature request via GitHub issues.
Contribute your code from a fork or branch by using a Pull Request and request a review so it can be considered as an addition to the codebase.

Install the latest release

conda

UrbanAccess is available on Conda Forge and can be installed with:

conda install urbanaccess -c conda-forge

pip

UrbanAccess is available on PyPI and can be installed with:

pip install urbanaccess

Development Installation

Developers contributing code can install using the develop command rather than install. Make sure you are using the latest version of the codebase by using git's git pull inside the cloned repository.

To install UrbanAccess follow these steps:

Git clone the UrbanAccess repo
in the cloned directory run: python setup.py develop

To update to the latest development version:

Use git pull inside the cloned repository

Documentation and demo

Documentation for UrbanAccess can be found here.

A demo jupyter notebook for UrbanAccess can be found in the demo directory.

Minimum GTFS data requirements

The minimum GTFS data types required to use UrbanAccess are: stop_times, stops, routes and trips and one of either calendar or calendar_dates.

Related UDST libraries

urbanaccess's People

Contributors

Stargazers

Watchers

urbanaccess's Issues

Specific Time Point Aggregations

Hi @sablanchard,

We are evaluating using Pandana for a project, but getting deeper into the API it seem you use the average headway to identify the appropriate wait times. I have used Pandana before for its speed, and hope to again, but I need to investigate access variability as well as the average (specific time points are helpful here).

If you use a very constrained time range to define a network, does not fill out all the headway values as a result?
I saw headway statistics can also get the standard deviation, min, and max. One approach could be to manually adjust headway based on some assumed distribution of arrival events to show a range of accessibility values. Is it possible to adjust network headways up or down manually ?
Is it on the road map to test at specific time points, or is that pandana incompatible for now?

Merge attempted to be performed on service_id when it is not included

This line calls the function calendar_dates_agencyid :
https://github.com/UDST/urbanaccess/blob/master/urbanaccess/gtfs/utils_format.py#L417

A subset of the following dataframes are included as keyword arguments: routes_df, trips_df, and agency_df.

In the calendar_dates_agencyid function definition, they prior 3 dataframes are merged together and then need to be merged left onto a calendar_dates_df dataframe (here). The left merge states it needs to occur on service_id.

At this point, the function will error out with the following: KeyError: 'service_id'.

Solution: Include service_id (for example, it exists in trips.txt) from one of the first 3 dataframes.

Workflow bug: Destinations for accessibility queries should be linked only to the base network

This issue came up after investigating a report that adding a rail network alongside a bus network had the unexpected effect of reducing accessibility (jobs within x minutes) calculated by Pandana for certain locations.

The cause turned out to be a workflow error, but it's something we should fix in the demo notebook and explain better in the documentation.

Problem

Suppose you have job counts by census block. You want to link these to an UrbanAccess network so that you can calculate how many jobs are accessible within x minutes from each location.

If you link each block to the closest node in the integrated network, some of the job counts may end up associated with transit station nodes rather than street nodes. Transit nodes typically have an impedance for people coming from other networks (to capture headways), making the jobs less accessible than if they were associated with a neighboring street node.

Solution

The solution is to make sure jobs (or other destinations) are only assigned to the subset of nodes in the integrated graph that come from the base network. Code examples here: https://github.com/ual/pandana-urbanaccess-issue

Next steps

The UrbanAccess demo notebook currently does this incorrectly, so we should fix it: simple_example.ipynb
We should add a note to the documentation as well.
Identifying the correct subnetwork currently requires some manual filtering, but it would be easy to automate with a new UrbanAccess helper function or two!

Gracefully handle when requesting more bins than there are unique edges results in error

Right now, if you request more bins than there are unique edges, an error will be raised. I wonder if it would be reasonable to simply pick themin() of the number of unique edges or the requested argument parameter num_bins?

Example:

>>> colors = urbanaccess.plot.col_colors(
...                         df=network,
...                         col='mean',
...                         num_bins=10,
...                         cmap='spectral',
...                         start=0.1,
...                         stop=0.9)
Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
  File "urbanaccess/plot.py", line 160, in col_colors
    categories = pd.qcut(x=col_values, q=num_bins, labels=bin_labels)
  File "/usr/local/lib/python2.7/site-packages/pandas/tools/tile.py", line 175, in qcut
    precision=precision, include_lowest=True)
  File "/usr/local/lib/python2.7/site-packages/pandas/tools/tile.py", line 194, in _bins_to_cuts
    raise ValueError('Bin edges must be unique: %s' % repr(bins))
ValueError: Bin edges must be unique: array([  0.        ,   8.15789474,  13.        ,  16.71428571,
        25.5       ,  29.66666667,  30.        ,  30.        ,
        30.6       ,  37.5       ,  86.        ])

replace use of geopy vincenty with geodesic

Need to replace use of geopy vincenty with geodesic in the instance it is used in urbanaccess. vincenty is discontinued in geopy version 2.0 and replaced with geodesic with default projection WGS84.

Reference:
vincenty depreciation notice: https://geopy.readthedocs.io/en/stable/#geopy.distance.vincenty
geodesic: https://geopy.readthedocs.io/en/stable/#geopy.distance.geodesic

Python 3 Support or Clarification

From what I understand, Pandanas was written in and only support Python 2.x (The Travis config indicates 2.7 be used in tests, for example). Because this library is built on Pandanas, I suspect is has the same constraints. If that is the case, it would be helpful to state that in the repo's README.

More broadly, I would be interested in hearing from those who have spent more time with Pandanas than I as to estimates on how difficult it would be to enable Pandanas to work with Python 3. If the lift is not that severe, this would be something I would be quite interested in.

TypeError when generating plot colors

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
  File "urbanaccess/plot.py", line 163, in col_colors
    colors = [color_list[cat] for cat in categories]
TypeError: list indices must be integers, not numpy.float64

This happens because, in col_colors, this operation (colors = [color_list[cat] for cat in categories]), requires that each cat not be a float. Yet, if you were to print the type of each prior, you would see that they can result in for loops.

Here is a quick example of what a segment of that would look like, were you to log the type of each:

<type 'numpy.float64'>
0.0
<type 'numpy.float64'>
0.0
<type 'numpy.float64'>
0.0
<type 'numpy.float64'>
0.0
<type 'numpy.float64'>
nan

...which identifies another issue! As you can see in the last logged float, it's a nan. These can't be converted to an integer, so we need to decide what to do with them. I think, at least for the purposes of binning, they should be pruned from the parent data set and removed from the binning process.

Why sklearn vs scikit-learn?

[Ques] According to the package page, one should use scikit-learn, rather than sklearn. Is there a specific reason why sklearn is being used, instead?

Ref: https://pypi.python.org/pypi/sklearn/0.0

Unicode decode error when trying to load gtfs of buses in Buenos Aires

Description of the bug

Hi, thanks for making this software available.

I am trying to load the GTFS feed for buses in Buenos Aires from either of:

But I run into the encoding error below.

I seemed to make some limited progress by changing line 128 in urbanaccess > gtfs > load.py to

with open(os.path.join(csv_rootpath, folder, textfile), encoding='latin-1') as f:

but it still seems to load only a fraction of the data in the gtfs feed. Any advice that you have would be great.

Many thanks,

Nick Bristow

GTFS feed or OSM data (optional)

https://transitfeeds.com/p/colectivos-buenos-aires/1037

direct link to zip:

https://openmobilitydata-data.s3-us-west-1.amazonaws.com/public/feeds/colectivos-buenos-aires/1037/20190810/gtfs.zip

Environment

Operating system:
Windows 10 v 1903
Python version:
3.6.7 (running inside latest version of Conda)
UrbanAccess version:
0.2.0
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

import os # added for dummy calendar.txt file
import pandas as pd
import urbanaccess as ua

# download and extract feeds
ua.gtfsfeeds.feeds.add_feed(add_dict={'colectivos': 'https://openmobilitydata-data.s3-us-west-1.amazonaws.com/public/feeds/colectivos-buenos-aires/1037/20190810/gtfs.zip'})
ua.gtfsfeeds.download()

# Colectivos feed lacks a calendar.txt - creating dummy following Issue #56 
script_path = os.path.dirname(os.path.abspath(''))
root_path = os.path.join(script_path, 'mwe', 'data')
dummy_txt_file = os.path.join(root_path,
                              'gtfsfeed_text',
                              'colectivos',
                              'calendar.txt')
data = {'service_id': -99, 'monday': 0, 'tuesday': 0, 'wednesday': 0,
        'thursday': 0, 'friday': 0, 'saturday': 0, 'sunday': 0}
index = range(1)
pd.DataFrame(data, index).to_csv(dummy_txt_file, index=False)

%%time
loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path=None,
                                           validation=True,
                                           verbose=True,
                                           bbox=(-59.8,-35.6,-57.2,-33.6),
                                           remove_stops_outsidebbox=False,
                                           append_definitions=True)

Paste the error message (if applicable):

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<timed exec> in <module>

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in gtfsfeed_to_df(gtfsfeed_path, validation, verbose, bbox, remove_stops_outsidebbox, append_definitions)
    220                 'must be specified for validation.')
    221 
--> 222     _standardize_txt(csv_rootpath=gtfsfeed_path)
    223 
    224     folderlist = [foldername for foldername in os.listdir(gtfsfeed_path) if

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in _standardize_txt(csv_rootpath)
     35     if six.PY2:
     36         _txt_encoder_check(gtfsfiles_to_use, csv_rootpath)
---> 37     _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)
     38 
     39 

~\Miniconda3\envs\caf_urban_access\lib\site-packages\urbanaccess\gtfs\load.py in _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)
    127                 # Read from file
    128                 with open(os.path.join(csv_rootpath, folder, textfile)) as f:
--> 129                     lines = f.readlines()
    130                 lines[0] = re.sub(r'\s+', '', lines[0]) + '\n'
    131                 # Write to file

~\Miniconda3\envs\caf_urban_access\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 4618: character maps to <undefined>

Custom headway assessment

Idea:

Thread through a way to perform custom summary stat calculations other than those generated by Pandas' describe() method in the headways computation step of UA.

This is currently in headways.py in the gtfs dir.

Proposal:
After the results[unique_stop_route] = pd.Series(adjusted_time_diff).describe() is applied on a grouped set, go ahead and run a custom method (which could just be selecting, say, the mean result value).

This will return all the describe columns as well as a new column which is selected_result or whatever. So, downstream _add_headway_impedance would not need a parameter for which headway to use, it could simply opt to rely on selected_result column.

Optimize transit edge formater function format_transit_net_edge() for speed

The format_transit_net_edge() function currently operates with a loop that is not optimized for computational speed. The loop can be found here: https://github.com/UDST/urbanaccess/blob/master/urbanaccess/gtfs/network.py#L382-L418 and can potentially be vectorized and improved upon for speed.

Penalty being applied for both boarding and alighting transit route

Heads up that the internal logic suggests that, while the boarding penalty (1/2 headway) from OSM network to transit network (ped_to_transit_edges_df) was intended to be applied to only that direction, the argument being passed to it (net_connector_edges) is actually both to and from edges. Thus, the library is applying a penalty for both the boarding and alighting from a transit route line onto a walk network.

Location of the issue:

urbanaccess/urbanaccess/network.py

Lines 166 to 174 in 84478b7

 net_connector_edges = _connector_edges( 

 osm_nodes=urbanaccess_network.osm_nodes, 

 transit_nodes=urbanaccess_network.transit_nodes, 

 travel_speed_mph=3) 

 urbanaccess_network.net_connector_edges = _add_headway_impedance( 

 ped_to_transit_edges_df=net_connector_edges, 

 headways_df=urbanaccess_gtfsfeeds_df.headways, 

 headway_statistic=headway_statistic)

interpolatestoptimes() function continues to interpolate past known time values

The interpolatestoptimes() function continues to interpolate past known time values using the last value in the series. This results downstream in a travel time value of 0 between those stops. These consecutive nans at the end of a series should be removed and not interpolated.

incorrect:
input = nan, 2, 3, nan, 5, 6, nan, nan
output = nan, 2, 3, 4, 5, 6, 6, 6

correct:
input = nan, 2, 3, nan, 5, 6, nan, nan
output = nan, 2, 3, 4, 5, 6 (drop the last consecutive records with nan from result)

integrate_network returning floats instead of ints

I'm unable to create a pandana network object after processing OSM and GTFS data with urbanaccess. I'm using this osm network and this gtfs feed. If I use either of those data sources directly, I can sucessfuly insantiate a pandana.Network.

Once I try to create a multimodal network with 'integrate_network`, it completes successfully:


Loaded UrbanAccess network components comprised of:
     Transit: 2,415 nodes and 8,385 edges;
     OSM: 486,514 nodes and 742,113 edges
Connector edges between the OSM and transit network nodes successfully completed. Took 1.16 seconds
Edge and node tables formatted for Pandana with integer node ids: id_int, to_int, and from_int. Took 3.15 seconds
Network edge and node network integration completed successfully resulting in a total of 488,929 nodes and 755,328 edges:
     Transit: 2,415 nodes 8,385 edges;
     OSM: 486,514 nodes 742,113 edges; and
     OSM/Transit connector: 4,830 edges.
<urbanaccess.network.urbanaccess_network at 0x7fb8d761df50>

however, if I try to create a pdna.Network from the integrated data, I get

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-2640ece2a378> in <module>
      4                                urbanaccess_net.net_edges["to_int"],
      5                                urbanaccess_net.net_edges[["weight"]],
----> 6                                twoway=False)

~/anaconda3/envs/healthacc/lib/python3.7/site-packages/pandana/network.py in __init__(self, node_x, node_y, edge_from, edge_to, edge_weights, twoway)
    101                                                           .astype('double')
    102                                                           .values,
--> 103                             twoway)
    104 
    105         self._twoway = twoway

src/cyaccess.pyx in pandana.cyaccess.cyaccess.__cinit__()

ValueError: Buffer dtype mismatch, expected 'long' but got 'double'

looking closer at the ua_network.net_edges object, I can see that the two columns to_int and from_int are actually floats, though looking at the code I cant see why that would be the case. I'm guessing its the underlying reason I cant build a network since the docs seem to indicate pandana needs integers in the from/to cols (though it also seems to work ok with strings if I try and build a network exclusively from the GTFS data) but was curious if you had any insight.

I could post the whole notebook if its useful

Operating system:
macos
Python version:
3.7
UrbanAccess version:
0.2.0 (albeit with a small local fix for the as_matrix issue)

`create_transit_net` not producing a complete network from GB Rail GTFS

Description of the bug

Incomplete network produced from gtfs data for the GB Rail network. I am assuming something about the gtfs data is causing the issue but urbanaccess did not throw any real errors so I'm not sure. The script below is fairly self contained so it should be easy to replicate. Many thanks for making this module of course!

I also had to fake a 1 row dummy calendar_dates.txt (below) as urbanaccess doesn't run without one present, but none of the feeds I've come across have this file, this may also be causing the issue but I wouldn't know how.

service_id,date,exception_type
FCK,99999999999999999999,2

GTFS feed or OSM data (optional)

http://www.gbrail.info/gtfs.zip

Environment

Operating system:
Ubuntu 16.04
Python version:
2.7
UrbanAccess version:
Built from github today

Paste the code that reproduces the issue here:

import urbanaccess as ua
import pandas as pd
import pandana as pdna
import urllib

# get gtfs data
gtfs_zip = urllib.URLopener()
gtfs_zip.retrieve("http://www.gbrail.info/gtfs.zip", "gb_rail/gtfs.zip")

# unzip
import zipfile
zip_ref = zipfile.ZipFile('gb_rail/gtfs.zip', 'r')
zip_ref.extractall('gb_rail/gtfs')
zip_ref.close()

# build GTFS dataframe within boundingbox
gtfs_df = ua.gtfs.load.gtfsfeed_to_df(r'gb_rail/gtfs',
                                      validation=False, verbose=False,
                                      bbox=None,
                                      remove_stops_outsidebbox=None,
                                      append_definitions=True)

# build network from GTFS dataframe
tfl_net = ua.gtfs.network.create_transit_net(gtfs_df,
                                   day='monday',
                                   timerange=["07:00:00","19:00:00"],
                                   overwrite_existing_stop_times_int=False,
                                   use_existing_stop_times_int=True)

tfl_net.transit_edges.to_csv(r'gb_rail/network/edge.csv')
tfl_net.transit_nodes.to_csv(r'gb_rail/network/node.csv')

Paste the error message (if applicable):

The process completed without any fatal error messages, with only these two warnings:

sys:1: DtypeWarning: Columns (9,17) have mixed types. Specify dtype option on import or set low_memory=False.
sys:1: DtypeWarning: Columns (9,11) have mixed types. Specify dtype option on import or set low_memory=False.

The green points are the original gtfs/stops.csv, the purple points are what is saved in node.csv from the Urban Access dataframe of the nodes. The edge network is similarly incomplete. I chose a 7 hour time period on a Monday so the majority of the routes should be included.

Missing nodes in green

Incomplete edge network

Dockerfile for repo

Why? Dockerfiles are valuable for creating a consistent environment for developers to work in. They also remove any complexities in setting up the appropriate environment within which to run a repo. This is valuable both for persons new to a project when working through examples or documentation, as well as to direct contributors developing new features/fixes.

I've got a Dockerfile that creates a container with the necessary environment dependencies to run UrbanAccess. Given that the dependencies for this are a bit tricky to set up (HDF5 store, etc.), it's valuable (I believe) to have something "set in stone" that can be used by anyone to get something consistent to work in.

I'd be happy to make a PR to include this. I think it would also be helpful for folks who would want to work through example projects without running into setup blockers (also starting example documentation of how to set this up as I infer the workflow from the codebase here - although could make more sense as a Gist or blog post).

Hi,

Please note that while GTFS feeds are a standardized format, not all transit services follow the same data standards or supply all of the necessary information in their GTFS feed that are required for UrbanAccess to compute a travel time weighted graph. UrbanAccess attempts to deal with the most common GTFS feed data schema issues however it is expected some GTFS feeds may raise errors or may not fit the expected data schema. If you encounter a bug in UrbanAccess please: 1) first search the previously opened issues to see if the problem has already been reported; 2) If not, fill in the template below and tag with the appropriate Type label (and Data label if applicable). You can delete any sections that do not apply.

Description of the bug

GTFS feed or OSM data (optional)

If the issue is related to a specific GTFS feed or OSM data please provide the URL to download the GTFS feed or the bounding box used to extract the OSM data.

Environment

Operating system:
Python version:
UrbanAccess version:
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

# place code here

Paste the error message (if applicable):

# place error message here

UrbanAccess/Pandana network not working with Pandana nearest_pois query

Description of the bug

In the code below, I have successfully created a Pandana network from an integrated UrbanAccess network and everything seems to work when running a Pandana aggregate query. However, when running a nearest_pois query, the code crashes on the last line and provides the error message below, which seems to suggest that the set of coordinates I am using for the query is out of bounds. However, I am using the same set of coordinates for both queries and I have confirmed that the coordinates are within the bbox of the network. Edit- looking closer at libch.cpp- the error has to do with a data array not the bounding box.

At one point I was able to get the pois query to work, but it was with a Pandana network that had the default Oneway parameter set to True. I changed this to False, cleaned up the code a bit, and then it no longer worked. Even when reverting back, I was not able to get it to work again. I still have the working network saved and can provide it if it will help the troubleshooting process.

We are really excited about this library and have already made some cool maps with the aggregate function. Thanks!

-Stefan

GTFS feed or OSM data (optional)

GTFS:
https://file.ac/-LhHHOc0HaA/

Environment

Operating system:
Windows
Python version:
2.7
UrbanAccess version:
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

import urbanaccess as ua
import pandas as pd
import pandana as pdna 

# GTFS
gtfs_df = ua.gtfs.load.gtfsfeed_to_df(r'D:\stefan\Isochrone\repository\gtfs\transit_agencies', 
                                      validation=False, verbose=True, 
                                      bbox=(-122.459702,47.480999,-122.224426,47.734138), 
                                      remove_stops_outsidebbox=True, 
                                      append_definitions=False)


ua_net = ua.gtfs.network.create_transit_net(gtfs_df, 
                                   day='monday', 
                                   timerange=["06:00:00","09:00:00"], 
                                   overwrite_existing_stop_times_int=False, 
                                   use_existing_stop_times_int=True, 
                                   save_processed_gtfs=False, 
                                   save_dir='data', 
                                   save_filename=None)

ua.gtfs.headways.headways(gtfs_df,["06:00:00","09:00:00"])

# OSM:
osm_data = ua.osm.load.ua_network_from_bbox(bbox=(-122.459702,47.480999,-122.224426,47.734138), 
                                           network_type='walk', 
                                           timeout=180, 
                                           memory=None, 
                                           max_query_area_size=2500000000L, 
                                           remove_lcn=True)

ua_net = ua.osm.network.create_osm_net(osm_edges=osm_data[1], osm_nodes=osm_data[0], travel_speed_mph=3, network_type='walk')

# Integrate the network
ua_integrated_net = ua.network.integrate_network(ua_net, 
                             headways=True, 
                             urbanaccess_gtfsfeeds_df=gtfs_df, 
                             headway_statistic='mean')

# I thought this was the fix when I got it to work:
ua.network._format_pandana_edges_nodes(ua_integrated_net.net_edges, ua_integrated_net.net_nodes)

# Create a pandana network
imp = pd.DataFrame(ua_integrated_net.net_edges['weight'])
net = pdna.network.Network(ua_integrated_net.net_nodes.x, ua_integrated_net.net_nodes.y, 
                                ua_integrated_net.net_edges.from_int, ua_integrated_net.net_edges.to_int, imp, False)

dist = 30

def assign_nodes_to_dataset(dataset, network, column_name, x_name, y_name):
    """Adds an attribute node_ids to the given dataset."""
    dataset[column_name] = network.get_node_ids(dataset[x_name].values, dataset[y_name].values)


# *******Code works for Pandana aggregate query:
coords_dict = [{'x' : -122.355, 'y' : 47.687, 'var' : 1}]
df = pd.DataFrame(coords_dict)
assign_nodes_to_dataset(df, net, 'node_ids', 'x', 'y')
net.set(df.node_ids, variable=df['var'], name='test') 
aggr = net.aggregate(distance = dist, type = 'sum', decay='flat', imp_name = 'weight', name='test')
newdf = pd.DataFrame({'test': aggr, "node_ids": aggr.index.values})
print len(newdf[(newdf.test==1)])


# *******Code crashes on Pandana nearest_pois query:
net.init_pois(1, dist, 1)
x = pd.Series(-122.355)
y = pd.Series(47.687)
# Set as a point of interest on the pandana network
net.set_pois('tstop', x, y)
# Find distance to point from all nodes, everything over max_dist gets a value of 99999
res = net.nearest_pois(dist, 'tstop', num_pois=1, max_distance=99999)

Paste the error message (if applicable):

[error [error src/contraction_hierarchies/src/libch.cpp[error src/contraction_hierarchies/src/libch.cpp:[error src/contraction_hierarchies/src/libch.cpp:[error src/contraction_hierarchies/src/libch.cpp:[error 368src/contraction_hierarchies/src/libch.cpp:] [error src/contraction_hierarchies/src/libch.cpp[error src/contraction_hierarchies/src/libch.cpp:368] [error POI Category is out of Boundssrc/contraction_hierarchies/src/libch.cpp:368
An internal error has occurred in the Interactive window.  Please restart Visual Studio.
[error src/contraction_hierarchies/src/libch.cpp:
The Python REPL process has exited

Use Custom Pandana Network/Weights

This is definitely an enhancement idea:
It seems that currently the assumption is the street network is pulled from OSM, but there is little stopping it from using an arbitrary pandana network already created. What is the possibility of integrating custom travel networks?

Generally, I think this is possible, but the big concern I have here is you can't set the CRS assumptions for urbanaccess because of its use of the vincenty function from geopy (which will be removed in 2.0 as an FYI). So you will have to set the pandana network to WGS 1984, but we could document that.

Curious what your thoughts on this might be @sablanchard.

I implemented this with the following as a quick draft, and I think it works (same principle as loading OSM, but loading WGS 1984 projected edges/nodes).

def add_custom_network(network_edges,network_nodes,ua_network,chosen_weight="weight",net_type="walk"):
    """This function will integrate a custom set of nodes and edges that is projected to WGS 1984 so that it integrates with an
    urban access network."""
    network_edges['weight'] = network_edges[chosen_weight]
    # assign node and edge net type
    network_edges['net_type'] = net_type
    network_nodes['net_type'] = net_type
    ua_network.osm_nodes = network_nodes
    ua_network.osm_edges = network_edges
    print("Updated urban access network with updated edges and nodes")
    return ua_network ```

Errors on module import

Description of the bug

When importing the module, errors are produced and the module fails to be imported.

Environment

Operating system: Xubuntu 16.04
Python version: 2.7.12
UrbanAccess version: Latest development version as of now
UrbanAccess required packages versions (optional):
The pip install process indicated that all dependencies are installed and up to date.

The code that reproduces the issue:

import urbanaccess

The error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "urbanaccess/__init__.py", line 5, in <module>
    from .osm.load import *
  File "urbanaccess/osm/load.py", line 12, in <module>
    reserve_num_graphs(40)
  File "/usr/local/lib/python2.7/dist-packages/pandana/network.py", line 16, in reserve_num_graphs
    raise Exception("reserve_num_graphs is no longer required - remove from your code")
Exception: reserve_num_graphs is no longer required - remove from your code

Add integration tests to CI builds

We should include scripts that run a "full" workflow to Travis CI builds with a good number of regions with different data schemas and varying degrees of data cleanliness. I think it would help catch the kind of edge cases that are coming up now.

Support transit land in the GTFS query tool

This is currently en route via this branch: kuanb-gtfs-feeds-query-util.

Should allow the same functionality as curr exists w/ GTFS Data Exch.

Plot does not render title/axis titles

Description of the bug

PLotting with Pandana/UrbanAccess doesn't properly format title when calling .set_title

GTFS feed or OSM data (optional)

bbox = [43.435966,-80.682529,43.528884,-80.418858]

Environment

Operating system: Windows
Python version: 3.9.7
UrbanAccess version:

Paste the code that reproduces the issue here:

n = 1
bmap, fig, ax = network.plot(access[n], bbox=bbox, plot_kwargs=plot_kwargs, fig_kwargs=fig_kwargs, 
                            )
ax.set_axis_bgcolor('k')
ax.set_title('Walking distance (m) to nearest amenity around Waterloo', fontsize=15)

Paste the error message (if applicable):

No error but this happens:

Kernel appears to have died when saving h5

Description of the bug

GTFS feed or OSM data (optional)

If the issue is related to a specific GTFS feed or OSM data please provide the URL to download the GTFS feed or the bounding box used to extract the OSM data.

Environment

Operating system: Windows
Python version: 2
UrbanAccess version: '0.2.0'
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

ua.network.save_network(urbanaccess_network=urbanaccess_net,
                        filename='final_net.h5',
                        overwrite_key = True)

Paste the error message (if applicable):

Kernel appears to have died

optimize _nearest_neighbor to restrict operation to a max search distance

_nearest_neighbor can be optimized for speed when using large networks by restricting the operation to be within a max search distance.

403 error with certain transit providers due to missing headers in request

import geopandas as gpd

rside = gpd.read_file("https://www.dropbox.com/s/u4ah7y8t4a9jg45/rside.zip?dl=1")

feed =  {'riversidetransitagency': 'http://www.riversidetransit.com/google_transit.zip'}

import urbanaccess as ua

ua.gtfsfeeds.feeds.add_feed(feed)
ua.gtfsfeeds.download(data_folder=".")

yields

1 GTFS feeds will be downloaded here: ./gtfsfeed_zips
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-1-273db80af85d> in <module>
      8 
      9 ua.gtfsfeeds.feeds.add_feed(feed)
---> 10 ua.gtfsfeeds.download(data_folder=".")

~/anaconda3/envs/catshoods/lib/python3.7/site-packages/urbanaccess/gtfsfeeds.py in download(data_folder, feed_name, feed_url, feed_dict, error_pause_duration, delete_zips)
    469 
    470         if 'http' in feed_url_value:
--> 471             status_code = urlopen(feed_url_value).getcode()
    472             if status_code == 200:
    473                 file = urlopen(feed_url_value)

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/envs/catshoods/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden

this can be resolved by #71

Read in txt files as txt files, rather than deal with encoding errors

As per our discussion IRL (w/ Sam at Maptime), we observed that this line (

urbanaccess/urbanaccess/gtfs/load.py

Line 59 in 2f37f8e

if raw.startswith(codecs.BOM_UTF8):

) and the corresponding actions if true are designed to deal with GTFS zip files stores as .txt rather than .csv.

Alternately, just read in the csv with file type set as txt. Check the file type via something like foo.lower().endswith('.txt') and then, if true, then go ahead and read in as a text file.

`geopy` 2.0 does not support ` vincenty

It seems that the 2.0 version of geopy does not support the vincenty distance (link).

On a fresh install, I get:

In [1]: import urbanaccess
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-ddb7d43afadf> in <module>
----> 1 import urbanaccess

/opt/conda/lib/python3.7/site-packages/urbanaccess/__init__.py in <module>
      1 from .gtfs.load import *
----> 2 from .gtfs.network import *
      3 from .gtfs.gtfsfeeds_dataframe import *
      4 from .gtfs.headways import *
      5 from .osm.load import *

/opt/conda/lib/python3.7/site-packages/urbanaccess/gtfs/network.py in <module>
      5
      6 from urbanaccess.utils import log, df_to_hdf5, hdf5_to_df
----> 7 from urbanaccess.network import ua_network
      8 from urbanaccess import config
      9 from urbanaccess.gtfs.gtfsfeeds_dataframe import gtfsfeeds_dfs

/opt/conda/lib/python3.7/site-packages/urbanaccess/network.py in <module>
      2 from sklearn.neighbors import KDTree
      3 import pandas as pd
----> 4 from geopy.distance import vincenty
      5
      6 from urbanaccess.utils import log, df_to_hdf5, hdf5_to_df

ImportError: cannot import name 'vincenty' from 'geopy.distance' (/opt/conda/lib/python3.7/site-packages/geopy/distance.py)

In [2]: import geopy

In [3]: geopy.__version__
Out[3]: '2.0.0'

In [4]:

Might be an easy fix to update to recommended distance?

Unnecessary merge in _stops_agencyid()?

In _stops_agencyid, a merge is made with one dataframe and another that is only one column wide. Since that column already exists in the left merged dataframe output, this step should not happen (or, it is missing another 2nd column, that should be in there!).

Happy to submit a PR to fix but wanted to check first as to what the intent was with this merge.

Location: https://github.com/UDST/urbanaccess/blob/73fcab917a3c044a5360836237910fcdd81da05e/urbanaccess/gtfs/utils_format.py#L300

This happens in a few of these helper functions, as well, such as:
https://github.com/UDST/urbanaccess/blob/73fcab917a3c044a5360836237910fcdd81da05e/urbanaccess/gtfs/utils_format.py#L241
in _calendar_agencyid

Which makes me wonder if the intent is to account for/capture column values where, after the left join, there are nulls? If so, I assume that is the purpose of the null check and update here:
https://github.com/UDST/urbanaccess/blob/73fcab917a3c044a5360836237910fcdd81da05e/urbanaccess/gtfs/utils_format.py#L478

Two feeds not passed into network

I am trying to create a network with all train lines in Spain, but two feeds are not passed into the network, i.e. they are properly
downloaded, but when I try to create the network, they are not included. The two GTFS giving me problems are the following.

GTFS feeds

https://ssl.renfe.com/ftransit/Fichero_CER_FOMENTO/fomento_transit.zip

https://www.fgc.cat/google/google_transit.zip

Environment

Operating system: Windows 10
Python version: 3.7.8
UrbanAccess version: 0.2.2

The code I'm running is found here

https://github.com/marcbosch-idencity/urbanaccess-example/blob/main/longitudes_frecuencias_trenes.ipynb

Here is a specific script only for one of the GTFS feeds giving me problems.

https://github.com/marcbosch-idencity/urbanaccess-example/blob/main/red_Cercanias_feve.ipynb

When I run this line

ua.gtfs.network.create_transit_net(gtfsfeeds_dfs=loaded_feeds,
                                   day='tuesday',
                                   timerange=['00:00:00', '23:59:59'],
                                   calendar_dates_lookup=None)

I get the following error

WARNING: Time range passed: ['00:00:00', '23:59:59'] is a 23 hour period. Long periods over 3 hours may take a significant amount of time to process.
Using calendar to extract service_ids to select trips.
48 service_ids were extracted from calendar
14,057 trip(s) 14.47 percent of 97,135 total trip records were found in calendar for GTFS feed(s): ['cercanias']
NOTE: If you expected more trips to have been extracted and your GTFS feed(s) have a calendar_dates file, consider utilizing the calendar_dates_lookup parameter in order to add additional trips based on information inside of calendar_dates. This should only be done if you know the corresponding GTFS feed is using calendar_dates instead of calendar to specify service_ids. When in doubt do not use the calendar_dates_lookup parameter.
14,057 of 97,135 total trips were extracted representing calendar day: tuesday. Took 0.09 seconds
There are no departure time records missing from trips following the specified schedule. There are no records to interpolate.
Difference between stop times has been successfully calculated. Took 0.00 seconds

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-8-356b49904f86> in <module>
      2                                    day='tuesday',
      3                                    timerange=['00:00:00', '23:59:59'],
----> 4                                    calendar_dates_lookup=None)

C:\ProgramData\Anaconda3\envs\gds\lib\site-packages\urbanaccess\gtfs\network.py in create_transit_net(gtfsfeeds_dfs, day, timerange, calendar_dates_lookup, overwrite_existing_stop_times_int, use_existing_stop_times_int, save_processed_gtfs, save_dir, save_filename)
    156         df=gtfsfeeds_dfs.stop_times_int,
    157         starttime=timerange[0],
--> 158         endtime=timerange[1])
    159 
    160     final_edge_table = _format_transit_net_edge(

C:\ProgramData\Anaconda3\envs\gds\lib\site-packages\urbanaccess\gtfs\network.py in _time_selector(df, starttime, endtime)
    709         '.2f} seconds'.format(
    710             starttime, endtime, len(selected_stop_timesdf), len(df),
--> 711             (len(selected_stop_timesdf) / len(df)) * 100,
    712             time.time() - start_time))
    713 

ZeroDivisionError: division by zero

When running the notebook with all feeds, the script does not return any errors, it just does not include the stops from the two 'problematic' feeds into the network.

Multiple networks in memory at once

I'm having trouble loading more than one network into memory at the same time using urbanaccess.load_network(). Earlier network variables end up pointing to the most recently loaded network:

Diagnosis

UrbanAccess maintains a record of the current network, since most of the time users are only working with one at a time. It looks like the problem might be with how load_network() updates it -- it's overwriting the nodes and edges rather than replacing the object. I'll see if i can fix this. network.py#L539-L542

Environment

actransit_with_headways.h5
actransit_bart_with_headways.h5

MacOS 10.14
Python 3.8
UrbanAccess 0.2.1
Pandas 1.1
Jupyter 1.0

Mean wait time vs. mean headway

When UrbanAccess creates connector edges for agents moving from the base network to the transit network, the default impedance is the mean headway at a given transit stop.

This happens in urbanaccess.integrate_network(), and current options are mean, std, min, and max, defined here: network.py#L103.

I'd like to propose adding an option for half of the mean, which would be a better approximation for the average expected wait time. It's also possible to achieve this through post-processing of the integrated network, but automatic calculation of it might be nice!

This is related to issues #36 and #51, i think.

GTFS feed won't load to df (calendar_dates file instead of calendar file)

Hi,

My name is Martijn Verhoeven and I work for the City of Amsterdam. I am currently trying to apply the bay area demo to Amsterdam with a gtfs feed from our local transit authority GVB. However when trying to load the feed into a dataframe it gives the following error:

'ValueError: calendar.txt is a required GTFS tekst file and was not found in folder'

The feed does have a calendar_dates.txt file however, but I don't know how to apply the parameter calendar_dates_lookup=None to loading a feed to a dataframe.

Do you have any suggestions or python scripts to sort this out?

Kind regards,

Martijn Verhoeven

stop_times_int is never calculated

I'm not certain this is the case, but am a bit confused as it appears that, although stop_times_int is instantiated during the init of a class object from this class, it is never updated.

Specifically, in urbanaccess.gtfs.load.gtfsfeed_to_df, in load.py, the following attributes are updated:

    # set gtfsfeeds_df object to merged GTFS dfs
    gtfsfeeds_df.stops = merged_stops_df
    gtfsfeeds_df.routes = merged_routes_df
    gtfsfeeds_df.trips = merged_trips_df
    gtfsfeeds_df.stop_times = merged_stop_times_df
    gtfsfeeds_df.calendar = merged_calendar_df
    gtfsfeeds_df.calendar_dates = merged_calendar_dates_df

That is, all but stop_times_int. Yet, subsequent steps, such as urbanaccess.gtfs.headways.headways check to ensure that stop_times_int is not False or .empty(). I suspect there is an intermediary step that is missing (?).

Analysis with defined transit service area (limited walk distance to transit stops)

Description of the bug

I stepped through the demo notebook using our own transit feed and walk network. The code worked well but when inspecting result looks like places far away from transit network also got significant job counts.

My understanding is that after combining transit and walk network, pandana query will search jobs disregard network types. So transit accessibility analysis could end up including travel by walk + transit, transit only and walk only mode. But in general, transit accessibility study targets people lives within certain distance to transit service. So places beyond reasonable walking distance/time to transit should be filtered out from the analysis. And transit should be one of the major/necessary mode in the travel.

This issue is similar to issue 38 but for a different reason. Also, after filtering far away walk network nodes, the indicator locations need to be filtered as well (get_node_ids with distance limits).

Let me know if anything is wrong with my analysis or interpretation.

Environment

Operating system:
Python version:
UrbanAccess version:
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

# place code here

Paste the error message (if applicable):

# place error message here

Minor problems with automatic installation of dependencies

I'm running into a couple of problems trying to run python setup.py develop in a clean Python 2.7 conda environment (Mac OS 10.14). (Pip install works fine.)

One problem has to do with dependencies whose newest versions no longer support Python 2.7. For example, if matplotlib isn't installed yet, I get this error:
```
error: Setup script exited with 
Matplotlib 3.0+ does not support Python 2.x, 3.0, 3.1, 3.2, 3.3, or 3.4.
Beginning with Matplotlib 3.0, Python 3.5 and above is required.
```
Adding 'matplotlib < 3.0' to setup.py resolves this.
The second problem is something with osmnet that I'm not sure how to diagnose. After fixing the matplotlib error, I get this one:
```
ValueError: invalid wheel name: 'osmnet-0.1.0-py2.py3-any.whl'
```
Manually installing osmnet allows python setup.py develop to run properly for urbanaccess, but I'm not sure what the general fix is.

OSM download function broken

Description of the bug

Hello everyone,

I am trying to follow the urbanaccess example code, but it returns an error at the OSM data download part, something related to geopandas...

GTFS feed or OSM data (optional)

GTFS data is for Belgian transit agency De Lijn. OSM bounding box is 4.6109,50.5534,6.1737,51.5273 (Basically Belgian Limburg and Dutch Limburg, with Eindhoven, Aachen, and Liege included.)

Environment

Operating system: Windows 10
Python version: 3.9.13
UrbanAccess version: 0.2.2
UrbanAccess required packages versions (optional): requests 2.28.1 / six 1.16.0 / pandas 1.4.4 / geopandas 0.12.2 / numpy 1.21.5 / osmnet 0.1.6 / pandana 0.6.1 / matplotlib 3.5.2 / geopy 2.3.0 / pyyaml 6.0 / scikit-learn 1.0.2
(I just installed it using conda install urbanaccess -c conda-forge)

Paste the code that reproduces the issue here:

nodes, edges = urbanaccess.osm.load.ua_network_from_bbox(bbox=bbox,
                                                remove_lcn=True)

Everything else is exactly same as the example code, except:

Loading GTFS instead of downloading first. Downloading GTFS instead of loading from PC did not change anything with the error.

validation = True
verbose = True
remove_stops_outsidebbox = True
append_definitions = True
# bbox for Euregio
bbox = (4.6109,50.5534,6.1737,51.5273)
# Change directory to change scenarios
loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path="D:\Data\GTFS",
                                           validation=validation,
                                           verbose=verbose,
                                           bbox=bbox,
                                           remove_stops_outsidebbox=remove_stops_outsidebbox,
                                           append_definitions=append_definitions)

Converted date column from int64 to object. Removing this didn't change anything with the error.

loaded_feeds.calendar_dates=loaded_feeds.calendar_dates.astype({'date':'object'})
loaded_feeds.calendar_dates['date']=loaded_feeds.calendar_dates['date'].apply(lambda x: str(x))
print (loaded_feeds.calendar_dates.dtypes)

loaded_feeds.calendar_dates.head()

ua.gtfs.network.create_transit_net(gtfsfeeds_dfs=loaded_feeds,
                                   day='thursday',
                                   timerange=['07:30:00', '09:30:00'],
                                   calendar_dates_lookup={'date':'20211007'})

Paste the error message (if applicable):

C:\Anaconda\lib\site-packages\geopandas\geodataframe.py:202: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
   super().__setattr__(attr, val)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Anaconda\lib\site-packages\geopandas\geodataframe.py in crs(self)
    430         try:
--> 431             return self.geometry.crs
    432         except AttributeError:

C:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5574             return self[name]
-> 5575         return object.__getattribute__(self, name)
   5576 

C:\Anaconda\lib\site-packages\geopandas\geodataframe.py in _get_geometry(self)
    231 
--> 232             raise AttributeError(msg)
    233         return self[self._geometry_column_name]

AttributeError: You are calling a geospatial method on the GeoDataFrame, but the active geometry column ('geometry') is not present. 
There are no existing columns with geometry data type. You can add a geometry column as the active geometry column with df.set_geometry. 

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
C:\Anaconda\lib\site-packages\pandas\core\generic.py in __setattr__(self, name, value)
   5599             try:
-> 5600                 existing = getattr(self, name)
   5601                 if isinstance(existing, Index):

C:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5574             return self[name]
-> 5575         return object.__getattribute__(self, name)
   5576 

C:\Anaconda\lib\site-packages\geopandas\geodataframe.py in crs(self)
    432         except AttributeError:
--> 433             raise AttributeError(
    434                 "The CRS attribute of a GeoDataFrame without an active "

AttributeError: The CRS attribute of a GeoDataFrame without an active geometry column is not defined. Use GeoDataFrame.set_geometry to set the active geometry column.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_19732\129358322.py in <module>
----> 1 nodes, edges = ua.osm.load.ua_network_from_bbox(bbox=bbox,
      2                                                 remove_lcn=True)

C:\Anaconda\lib\site-packages\urbanaccess\osm\load.py in ua_network_from_bbox(lat_min, lng_min, lat_max, lng_max, bbox, network_type, timeout, memory, max_query_area_size, remove_lcn)
     74     two_way = False
     75 
---> 76     nodes, edges = network_from_bbox(lat_min=lat_min, lng_min=lng_min,
     77                                      lat_max=lat_max, lng_max=lng_max,
     78                                      bbox=bbox, network_type=network_type,

C:\Anaconda\lib\site-packages\osmnet\load.py in network_from_bbox(lat_min, lng_min, lat_max, lng_max, bbox, network_type, two_way, timeout, memory, max_query_area_size, custom_osm_filter)
    844         'lat_min, lng_min, lat_max, and lng_max must be floats'
    845 
--> 846     nodes, ways, waynodes = ways_in_bbox(
    847         lat_min=lat_min, lng_min=lng_min, lat_max=lat_max, lng_max=lng_max,
    848         network_type=network_type, timeout=timeout,

C:\Anaconda\lib\site-packages\osmnet\load.py in ways_in_bbox(lat_min, lng_min, lat_max, lng_max, network_type, timeout, memory, max_query_area_size, custom_osm_filter)
    648     """
    649     return parse_network_osm_query(
--> 650         osm_net_download(lat_max=lat_max, lat_min=lat_min, lng_min=lng_min,
    651                          lng_max=lng_max, network_type=network_type,
    652                          timeout=timeout, memory=memory,

C:\Anaconda\lib\site-packages\osmnet\load.py in osm_net_download(lat_min, lng_min, lat_max, lng_max, network_type, timeout, memory, max_query_area_size, custom_osm_filter)
    135     polygon = Polygon([(lng_max, lat_min), (lng_min, lat_min),
    136                        (lng_min, lat_max), (lng_max, lat_max)])
--> 137     geometry_proj, crs_proj = project_geometry(polygon,
    138                                                crs={'init': 'epsg:4326'})
    139 

C:\Anaconda\lib\site-packages\osmnet\load.py in project_geometry(geometry, crs, to_latlong)
    443     """
    444     gdf = gpd.GeoDataFrame()
--> 445     gdf.crs = crs
    446     gdf.name = 'geometry to project'
    447     gdf['geometry'] = None

C:\Anaconda\lib\site-packages\geopandas\geodataframe.py in __setattr__(self, attr, val)
    200             object.__setattr__(self, attr, val)
    201         else:
--> 202             super().__setattr__(attr, val)
    203 
    204     def _get_geometry(self):

C:\Anaconda\lib\site-packages\pandas\core\generic.py in __setattr__(self, name, value)
   5614                         stacklevel=find_stack_level(),
   5615                     )
-> 5616                 object.__setattr__(self, name, value)
   5617 
   5618     @final

C:\Anaconda\lib\site-packages\geopandas\geodataframe.py in crs(self, value)
    441         """Sets the value of the crs"""
    442         if self._geometry_column_name not in self:
--> 443             raise ValueError(
    444                 "Assigning CRS to a GeoDataFrame without a geometry column is not "
    445                 "supported. Use GeoDataFrame.set_geometry to set the active "

ValueError: Assigning CRS to a GeoDataFrame without a geometry column is not supported. Use GeoDataFrame.set_geometry to set the active geometry column.

pyrosm for network download

hey folks, thanks again for this library. I was wondering if you'd seen the recently released pyrosm? it's able to parse OSM data much faster than alternative methods, so might be a more performant way to construct networks. Just wanted to raise the possibility of incorporating it here

Top level agency ids propogated

Put an agency ID in the GTFS Collection data frame. It has then an additional .agency frame that would keep the hash.

So at the very beginning when we start working on a new GTFS feed, we are going to need to generate_name and it can take the agency.txt file and whatever we have there are try and generate unique names with hashes as well ( in case there is a naming collision ). With these results, we always will refer to the resultant names in the rest of the cleaning process.

ua.network.integrate_network drops transit_nodes.stop_names

Description of the bug

calling with downloaded osm network data overwrites the network's transit_nodes.stop_names

GTFS feed or OSM data (optional)

While I think this issue is pandas-only, for the sake of testing, here's what I used:
bbox=(-72.720566,42.221907,-72.464134,42.441194)
{'gtfs_feeds': {'PVTA': 'http://www.gtfs-data-exchange.com/agency/pvta/latest.zip'}}

Environment

Operating system:
Ubuntu 16.04
Python version:
2.7
UrbanAccess version:
0.1a
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

# ripped nearly exactly from the demo
import urbanaccess as ua
from urbanaccess.config import settings
from urbanaccess.gtfsfeeds import feeds
from urbanaccess import gtfsfeeds
from urbanaccess.gtfs.gtfsfeeds_dataframe import gtfsfeeds_dfs
from urbanaccess.network import ua_network, load_network

settings.log_console = True
# main 
gtfsfeeds.search(search_text='Pioneer Valley',
                 search_field=None,
                 match='contains',
                 add_feed=True,
                 overwrite_feed=True)
gtfsfeeds.download()
validation = True
verbose = True
append_definitions = True
bbox=(-72.720566,42.221907,-72.464134,42.441194)

loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(
    gtfsfeed_path=None,
    validation=validation,
    bbox=bbox,
    remove_stops_outsidebbox=True,
    verbose=verbose,
    append_definitions=append_definitions
)

ua.gtfs.network.create_transit_net(
    gtfsfeeds_dfs=loaded_feeds,
    day='monday',
    timerange=['07:00:00', '10:00:00'],
    calendar_dates_lookup=None
)
urbanaccess_net = ua.network.ua_network
nodes, edges = ua.osm.load.ua_network_from_bbox(
    bbox=bbox,
    remove_lcn=True
)
ua.osm.network.create_osm_net(
    osm_edges=edges,
    osm_nodes=nodes,
    travel_speed_mph=3
)
ua.gtfs.headways.headways(
    gtfsfeeds_df=loaded_feeds,
    headway_timerange=['07:00:00','10:00:00']
)
ua.network.integrate_network(
    urbanaccess_network=urbanaccess_net,
    headways=True,
    urbanaccess_gtfsfeeds_df=loaded_feeds,
    headway_statistic='mean'
)

Commenting ua.network.integrate_network(...) retains the stop_names in urbanaccess_net.transit_nodes. loaded_feeds retains stop_names.

gtfs import fails if `calendar_dates.txt` is missing

calendar_dates.txt is an optional file, but this line raises an error if it's missing. Could this be changed to a warning instead?

Hi,

Description of the bug

GTFS feed or OSM data (optional)

If the issue is related to a specific GTFS feed or OSM data please provide the URL to download the GTFS feed or the bounding box used to extract the OSM data.

Environment

Operating system:
Python version:
UrbanAccess version:
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

# place code here

Paste the error message (if applicable):

# place error message here

Trying to read utf-8 file on cp1252 system

Description of the bug

Trying to read utf-8 file on cp1252 system at _txt_header_whitespace_check.

As far as I know, Python uses some system encoding information to read files, leading to errors like this, where it needs to pass as a parameter the correct encoding. It would be nice if we could pass it from Urbanaccess itself.

Environment

Operating system:
Windows 10
Python version:
3.8.5
UrbanAccess version:
0.2.1

Paste the code that reproduces the issue here:

import urbanaccess as ua
loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(
    validation=True,
    bbox=bbox,
    remove_stops_outsidebbox=True,
    append_definitions=True
)

Paste the error message (if applicable):

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
c:\Users\arqui\Documents\Repositorios\urbanaccess-poa\urban.py in 
      27 append_definitions = True
     28 
---> 29 loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(
     30     gtfsfeed_path=gtfsfeeds,
     31     validation=validation,

~\anaconda3\envs\geo\lib\site-packages\urbanaccess\gtfs\load.py in gtfsfeed_to_df(gtfsfeed_path, validation, verbose, bbox, remove_stops_outsidebbox, append_definitions)
    220                 'must be specified for validation.')
    221 
--> 222     _standardize_txt(csv_rootpath=gtfsfeed_path)
    223 
    224     folderlist = [foldername for foldername in os.listdir(gtfsfeed_path) if

~\anaconda3\envs\geo\lib\site-packages\urbanaccess\gtfs\load.py in _standardize_txt(csv_rootpath)
     35     if six.PY2:
     36         _txt_encoder_check(gtfsfiles_to_use, csv_rootpath)
---> 37     _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)
     38 
     39 

~\anaconda3\envs\geo\lib\site-packages\urbanaccess\gtfs\load.py in _txt_header_whitespace_check(gtfsfiles_to_use, csv_rootpath)
    127                 # Read from file
    128                 with open(os.path.join(csv_rootpath, folder, textfile)) as f:
--> 129                     lines = f.readlines()
    130                 lines[0] = re.sub(r'\s+', '', lines[0]) + '\n'
    131                 # Write to file

~\anaconda3\envs\geo\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1570: character maps to <undefined>

Saving network to shapefile

Description of the bug

Any way to save an integrated urban access network with pandana to shapefile?

GTFS feed or OSM data (optional)

If the issue is related to a specific GTFS feed or OSM data please provide the URL to download the GTFS feed or the bounding box used to extract the OSM data.

https://github.com/UDST/urbanaccess/blob/dev/demo/simple_example.ipynb

Environment

Operating system:
Python version:
UrbanAccess version:
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

# place code here

Paste the error message (if applicable):

# place error message here

Edit to variable name in documentation, headways() positional args

In both the code and the documentation for urbanaccess.gtfs.load.gtfsfeed_to_df, the return variable is called gtfsfeeds_df.

This is a bit confusing as the appending of _df typically indicates that the variable is a dataframe. In this case, it is not - it is a dictionary containing dataframes. Perhaps a better name would be gtfsfeeds_dict or gtfsfeeds_dfs_dict.

This naming convention could also be carried over to urbanaccess.gtfs.headways.headways, where the parameter gtfsfeeds_df could be similarly renamed.

11:10 AM PST - Edit: In the above I refer to the response as a dictionary when I realize it is an object instantiated from the class described in this file. That said, some rename still would be very helpful in clarifying that it is not a dataframe.

Routing that's sensitive to departure time

UrbanAccess currently creates static networks representing aggregate transit service over a given time window. If a transit route runs during the window, it will be available, and the wait time for all agents will be based on the average headway.

How hard would it be to represent each scheduled bus or train, and route agents onto specific vehicles based on their departure time? I think this is a feature request that comes up from time to time, although it's more relevant for travel modeling than for accessibility calculations.

This issue writes up some notes about potential approaches, for future reference. Tagging issue #51 as related to this.

Pandana improvements

First, as of Pandana v0.5 it's now much easier to calculate and inspect point-to-point routings (see Pandana-demo.ipynb). This makes it easier to understand what's happening in UrbanAccess networks: which transit lines are used for which trips, how long the transit and non-transit portions take, etc.

This provides more information than has previously been available, and might actually make departure-time-sensitive routing less important.

Helpful paper

I recently came across this paper about using contraction hierarchies with transit networks, which is helpful for understanding what's feasible using UrbanAccess + Pandana. (This is the algorithm that Pandana uses for fast shortest-path calculations.)

https://i11www.iti.kit.edu/_media/teaching/theses/da-wirth-15.pdf

Earliest arrival queries

Drawing from that paper, one way to place trips onto individual vehicles (which the author calls “earliest arrival queries”) is to put this info into the graph itself — each departure has its own nodes and edges that overlap in physical space but are connected based on the GTFS schedule (section 3).

This gets impractical for large time windows because of graph complexity, but it might work well for travel-modeling use cases. The big catch is that it’s hard to capture incoming connections from the street network without also duplicating the street nodes — because an agent's starting point needs to capture time as well as location (section 5).

Implementing this in UrbanAccess

I think our presumption has been that contraction hierarchies aren't compatible with routing that's sensitive to departure time. It seems like this isn't necessarily true.

It's probably feasible to adapt UrbanAccess to create networks that can route agents onto specific vehicles based on their departure time. But the catch is that we'd have to be strict about delineating catchment areas for trip origins and station transfers -- which may be complex to implement, as well as making the routings themselves less useful.

Before doing this, I think we'd want to explore using Pandana's new route-inspection functionality to generate metrics that might be able to stand in for more detailed trip assignment.

Needs updating to assure Pandas v.0.23 compatibility

Description of the bug

Pandas v. 0.23.0 (released on May 15, 2018) has deprecated data frame.as_matrix(), used when creating the integrated transit and pedestrian network.

It is used in _nearest_neighbor(df1,df2) within network.py

It should be replaced by dataframe.values

GTFS feed or OSM data (optional)

If the issue is related to a specific GTFS feed or OSM data please provide the URL to download the GTFS feed or the bounding box used to extract the OSM data.

Environment

Operating system: MacOS
Python version: 3.7.0
UrbanAccess version: 0.2.0
UrbanAccess required packages versions (optional):

Paste the code that reproduces the issue here:

# ua.network.integrate_network(urbanaccess_network=urbanaccess_net, headways=False)

Paste the error message (if applicable):

# AttributeError: 'DataFrame' object has no attribute 'as_matrix'

	net_connector_edges = _connector_edges(
	osm_nodes=urbanaccess_network.osm_nodes,
	transit_nodes=urbanaccess_network.transit_nodes,
	travel_speed_mph=3)

	urbanaccess_network.net_connector_edges = _add_headway_impedance(
	ped_to_transit_edges_df=net_connector_edges,
	headways_df=urbanaccess_gtfsfeeds_df.headways,
	headway_statistic=headway_statistic)

udst / urbanaccess Goto Github PK

urbanaccess's Introduction

UrbanAccess

Overview

Citation and academic literature

Reporting bugs

Contributing to UrbanAccess

Install the latest release

conda

pip

Development Installation

Documentation and demo

Minimum GTFS data requirements

Related UDST libraries

urbanaccess's People

Contributors

Stargazers

Watchers

Forkers

urbanaccess's Issues

Problem

Solution

Next steps

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

Missing nodes in green

Incomplete edge network

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

Description of the bug

Environment

The code that reproduces the issue:

The error message:

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

GTFS feeds

Environment

Diagnosis

Environment

Description of the bug

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Paste the error message (if applicable):

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here:

Description of the bug

GTFS feed or OSM data (optional)

Environment

Paste the code that reproduces the issue here: