Comments (9)
Hi @jamiecook. I'm no longer actively supporting this project, but when we added the multiprocessing component, we did several comparisons / validation for the Oregon statewide model implementation. This was for @bettinardi at ODOT and was done by @goreaditya at RSG. Maybe @bettinardi can help investigate?
from populationsim.
Thanks @bstabler ! Its my understanding that our tests showed exactly the same results? There's nothing that stands out in your configuration file that seems problematic. @goreaditya or @bettinardi - any ideas?
from populationsim.
Versions I'm using
λ pip list | grep sim
activitysim 1.0.4
populationsim 0.5.1
I've set up a tar ball here with the simple test I'm running. The first one works correctly as the run.py disables MP, the second one removes that line and generates the strange output.
github_issue_mp=1.tar.gz
github_issue_mp=2.tar.gz
The easiest way to see the differnce is to count the persons by their SA3.
☢ cut -d, -f2 github_issue_mp\=1/output/synthetic_persons.csv | sort | uniq -c
57958 30204
45277 30402
1 SA3
20220202 18:36:17 jamie@hikaru:/mnt/hdd_data/jamie_data/move2.0/runs/InterimResults/population_synthesis/domestic/processing
λ cut -d, -f2 github_issue_mp\=2/output/synthetic_persons.csv | sort | uniq -c
63870 30204
1 SA3
from populationsim.
I'm hoping @goreaditya can weigh-in. I have reviewed overall results at a higher level of than this discussion and do not have anything immediate to contribute to this issue. I am thankful that @jamiecook is flagging this and hope that we can find the issuse(s) if they exist and have a cleaner product if there is a bug here.
from populationsim.
Any update on this issue? At the moment I'm pushing ahead by wrapping my own multiprocess Pool around multiple calls to activitysim.cli.run - but that seems less than ideal in the long run.
from populationsim.
I am using PopulationSim on a different project, but with the same geographies (household travel survey at Region level, controls at SA3 and SA1, where SA1 is the smallest level). If you do the multiprocessing like the test example, i.e. only parallelise over the lowest level (TAZ there, here SA1), then the results look correct for me.
In terms of mp settings, the last 20 lines of the yaml Jamie attached would then read
slice_geography: SA3
multiprocess: True
multiprocess_steps:
- name: mp_seed_balancing
begin: input_pre_processor
- name: mp_sub_balancing_SA1
begin: sub_balancing.geography=SA1
num_processes: 2
slice:
tables:
- slice_crosswalk
- crosswalk
# don't slice any tables not explicitly listed above in slice.tables
except: True
# the following tables are added by sub_balancer and should be coalesced
coalesce:
- SA1_weights
- SA1_weights_sparse
- trace_SA1_weights
- name: mp_summarize
begin: expand_households
Also, @jamiecook is no longer working on this project, do you have any further updates on this Matt (sorry for the link, cannot tag m-richards but sent him a message)?
from populationsim.
@janzill Thanks for checking (for context, I have picked up the work Jamie was doing using populationsim to produce the above outputs) the code is now at a point where I haven't been able to replicate the problem documented in this issue.
I'm seeing reasonable, comparable results using both manual multiprocessing pool and running multiprocessing at the SA1 (smallest geography) level.
from populationsim.
So ... was this a Jamie problem all along? Or was anyone else actually able to reproduce the example that I uploaded?
from populationsim.
Could we have this reviewed and finalized (either closed as not an issue, or resolved if there is an issue, or if the bug is large, having a clear issues established on what it will take to fix) - under Phase 9.
from populationsim.
Related Issues (20)
- repop-pop-pop feature - pipeline file is not updated HOT 2
- Convert distribution system to conda-forge HOT 2
- add to documentation based on Zephyr review
- Installation process issues HOT 3
- example_calm_repop not running HOT 8
- Repop mode not populating the second level geography (example_calm_repop)
- Allow specification of non-boolean expressions HOT 1
- Running Error: ValueError("Lengths must match to compare")
- Running Error: Lengths must match to compare on "integerize_final_seed_weights'" HOT 9
- Adjust a third party synthetic population HOT 8
- WGTP and PWGTP Calculations HOT 6
- NumPy deprecation HOT 1
- ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int' in setup_data_structures.py HOT 1
- ValueError exception running sub_balancing.geography=TRACT model: Length of values (2) does not match length of index (37) HOT 8
- Repop Error HOT 1
- Multiprocessing Start_time Error
- Randomness of PopulationSim outputs related to API calls HOT 2
- Installation outdated and not functional HOT 2
- example_calm NOT Running due to issues with activitysim
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from populationsim.