rsginc / bca4abm Goto Github PK

View Code? Open in Web Editor NEW

7.0 8.0 5.0 45.37 MB

Benefit Cost Analysis for Travel Demand Models

Home Page: http://rsginc.github.io/bca4abm/

License: Other

Python 100.00%

python travel-modeling scientific-computing data-science

bca4abm's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger jasonchenrsg steventrev figo2002

bca4abm's Issues

move model step overall settings to model step yaml file

All the model step settings should be in the model step yaml files. For example, move aggregate_zone_file_names from settings.yaml to aggregate_zone.yaml.

expose aggregate_zone.csv filename as yml parameter

See https://github.com/RSGInc/bca4abm/blob/zone_ids/bca4abm/processors/four_step/aggregate_zone.py#L36

write out correct aggregate_od file

remove docs and tutorials folder?

I think we should remove the docs and tutorials folder. Maybe we move the tutorial to @toliwaga's account, or maybe branch it off, or move it to devtools, or maybe the doc folder.

migrate documentation from wiki to rst

like ActivitySim - https://activitysim.github.io/activitysim/

need a more generic version of aggregate_data_manifest.csv

We should update the ABM aggregate_trips_processor (and therefore aggregate_data_manifest.csv) to allow for a more flexible set of inputs like the four_step aggregate_od processor.

at statement considered harmful

The activitysim convention of prefixing expressions with an at sign to indicate that they should be evaluated with a python eval rather than pandas eval is a little bit inconvenient if you want to maintain your csv file of expressions in excel as those statement are interpreted as formulas. We might want to use a different method of tagging these expressions that plays better with excel.

locals_OD_aggregate:

should be

locals_aggregate_od:

in example_4step/configs/settings.yaml

non-sequential zones don't quite work

Update default crash cost to a more reasonable value to avoid confusion

drop @ in expressions and just use Python eval

outdated block comment in assign_variables

assignment_expressions is now a df, not a series

read tab separated files in addition to csv

Daysim writes tsv by default. Please expose the pandas read csv sep argument in the settings.yaml file

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

specify build folder in order to better support multiple build folders

Often many build scenarios are run against a single base scenario. Should we copy the base folder input data each time, or just add support for multiple build folders? Or maybe just a base and build input and outputs folder location in the settings file?

Sandag BCA Tool code ownership/license?

What is the copyright/ownership status of the existing sandag bca tool - can I quote/excerpt from the MSSQL stored procedures in the bca4abm source or documentation?

aggregate_zone processor file handling

For the aggregate_zone processor, we will allow the analyst to specify a list of csv files that should be combined into a single table.

I am wondering whether to include the 1024 column cval table, and whether that needs to be special-cased. All the other zone files will have two versions, one for the build and one for the base scenario. I was thinking we could handle that by automatically prepending base_ or build_ to the column names, but we probably don’t want to create two versions of the 1024 cval columns (unless they can be different in build and base? In which case we also need two versions in the aggregate demographics processor?)

clean-up link tables being in the base/build matrices folders

bcatest6 vot doesn't depend on tour purpose

I thought there was supposed to be a different vot for commute and noncommote tours?

orca 1.4 breaks bca4abm

tests fail with orca 1.4.0

daysim purpose category codes

Which of the various daysim purpose category codes can appear as tour purposes (pdpurp)?

0 'home'
1 'work'
2 'school'
3 'escort'
4 'pers.bus'
5 'shop'
6 'meal'
7 'social/recreation'
10 'change mode (park and ride)

how much ancillary info is carried into summary/analysis

The sandag bca tool carries a very large number of columns into the multiyear processor and final report. This greatly increases its complexity, and reduces its flexibility.

The more parsimonious we are in this area, the easier it will be for the tool to be adapted to different abm data sources. Also it will make it more flexible in terms of experimental modifications to the bca calculations.

My intent is to start by implementing the bare minimum and we can debate the tradeoffs inherent in adding more detailed reporting as we move forward.

Scream if you have a problem with this approach.

improve runtime

it would be good to speed up the slow processors - most likely the od processor. We can test with the full Oregon Metro example data set and set of expressions.

Python 3 support

Need to make sure bca4abm works for Python 3 (and update all related materials as well). Updating ActivitySim to work for both 2 and 3 wasn't a big deal, so updating bca4abm should be relatively straightforward.

terminology and model setup

@toliwaga and I decided on the following terminology and model setup.

base and build

The BCA tool will expect the user to provide a base folder location and a build folder location. The two folders will contain the same input files.

The four sets of alternatives are named as follows:

basetrips_baselos
basetrips_buildlos
buildtrips_buildlos
buildtrips_baselos

alt

Within the code and in the exposed expressions, alt and altlos is used.

alt means the other scenario
altlos means the LOS (skims) from the other scenario

input daily links table was not in the right place in the example

aggregate_results does not output anything

for the four step example, aggregate_results does not output anything. The bug appears to be in add_aggregate_results().

rename cval to hhs

cval is Metro specific language, so we should change this to something more generic like hhs. See cval_file_name: mf.cval.csv in https://github.com/RSGInc/bca4abm/blob/master/example_4step/configs/tables.yaml for example.

travis build hardwires toolz 0.8.0

hardwired travis to use toolz=0.8.0 because 0.8.2 not in cache

We should check back and remove this when it is fixed

need to implement multiyear processor capability?

The sandag bca tool takes scenario runs for multiple years and does some sort of interpolation/extrapolation between the years. Do we need to support anything like that? And if so, what is the spec for which columns are to be handled in what ways?

need to break out ovt time for transit?

The sandag bca tool weights ovt differently for vot calculations but ovt isn't broken out for transit trips in bcatest6 sample db

Are we just going to ignore this for the initial version?

add reading of externally defined COCs as well

read multiple files for link_daily

need to be able to read multiple link files to sum up to daily

read zone csvs in addition to omx matrices for the aggregate od processor

Sometimes we need to read a zone vector into the od processors as well. For example, parking cost at the destination. We should add the ability to also read in the zone data and specify if the zone vector is replicated to a full matrix by row or by column. For now you can pre-process the zone vector to create a matrix.

no race and hispanic demographics daysim household or person fields?

at least there isn't any in the trip file variables in bcatest6 spreadsheet.

need full scale dataset to test performance

Up until now we have only run this against the trivially small sample database.

We have no idea what the performance will be with a big dataset. It would be nice to find out.

Who is tasked with assembling a full scale dataset for testing?

why bother with orig and dest taz or microzone ids in trips file?

why bother with orig and dest taz or microzone ids in trips file if we have travel time and cost columns?

taxi mode in daysim

Sandag BCA tool maps taxi to auto. I'm looking at daysim TMODEDETP and MODE category codes and wondering how taxi trips are categorized in daysim.

Also wondering whether toll and fare are aggregated into travcost - and whether this is handled the same in CT-RAMP and daysim?

coc grouping for auto ownership

vehicle ownership is by households, but some coc determinations (age) are person-based and some are hh based (income)

to allocate auto ownership by coc, do we use the age of the oldest hh member?

Suppose there is a household with two members ages 35 and 85 and two cars, do we try and be really clever and allocate one car to each hh member? And what if there are three adults hh members and two cars?

aggregate_od output file label is incorrect

The aggregate_od.csv output file label is "link" but should be something like "od". This is the last line in
https://github.com/RSGInc/bca4abm/blob/mce/bca4abm/processors/four_step/aggregate_od.py

custom matrix handling for aggregate calculations

The specification that @jfdman provided for SANDAG includes special logic for dealing with the toll matrix since some of the OD pairs have a toll + 10000 to identify it as a special toll. @mabcal suggested using a SCALE column for calculating the auto operating cost since some agencies only use a distance matrix times a scalar. To support these types of customizations in the matrix specification, I think we should add an EXPRESSION column for each matrix read that supports a Python expression. This expression could be applied when the matrix is first read, or could be applied on demand. Exposing this flexibility in the form of a Python expression is the spirit of ActivitySim and would go a long way toward a more generic tool. @toliwaga and I decided to wait and implement this after we get the minimal operating solution up and running.

output a link results table to CSV as well

a link results output table is missing

descriptions in spec files should be used to annotate results

specifically they could go in transposed summary_results.csv and coc_silos.csv

Signs reversed on link level benefits

Link level benefits are currently computed as : results = results['base'] - results['build']
should be : results['build'] - results['base']
Line 93 on link.py

load_data_processor to import data into hdf5 store

It might be worthwhile to have a load_data_processor that reads the csv data files into an hdf5 store and then the individual processors could read their input from the store (a la activitysim.)

This might be a lot faster, as the load process wold only need to be run if the input data changed, which might be convenient while the model is being initially built, tweaked, and re-run with the same data, but revised specs and settings.

Also, it would make it more flexible because different versions of the load_data_processor could read data from different sources, including reading the data from, say, a MSSQL database.

output od benefit trace data for districts pairs

We currently write out trace files for 4step as follows;

for demographic and zone processor - a taz csv file with all zones and calculations
for links - a link csv file with all links in the trace zones and calculations
for od - aggregation across columns (i.e rowSums) for all od calculations

What we want to do instead for the od processor is write a sum for each origin district to destination district pair. We will code the districts at the TAZ level, probably in the COCs definition file. We will then trace out a FROM DISTRICT, TO DISTRICT, aggregated calculation result for expression 1, aggregated calculation result for expression 2, etc. The output file will look something like this:

FROM DISTRICT	TO DISTRICT	TRIPS	TIMES	...
zone-group-1	zone-group-1	5678	78	...
zone-group-1	zone-group-2	456	34	...
zone-group-2	zone-group-1	1234	234	...
zone-group-2	zone-group-2	8786	222	...

Trip cost pro-rated across persons?

If the the travcost trip file variable is for an exploded person-trip, then has the trip travel cost been pro-rated or otherwise allocated across/to the appropriate persons travelling on that trip?

suppress or selectively enable dumping merged_persons file

it will get long for big datasets...

create example of groupby calculation based on link to_node

@VinceBernardin said "I think the logic to determine which nodes are intersections would be pretty basic. If the node is connected to a freeway link, it is not an intersection, and if the node is a centroid it is not an intersection. I think that would probably be good enough. Then we would just need to determine the number of legs and the max and min volume approaches which takes a little work, requiring a join between the links and nodes but isn’t really too difficult."

implement chunking to reduce memory overhead

see ActivitySim/activitysim#61

Update outputs example table

In order to illustrate the outputs, I've created an outputs table in the wiki for the example. We need to populate it with descriptions, units, and anything else important.

https://github.com/RSGInc/bca4abm/wiki/Outputs