Giter Club home page Giter Club logo

Comments (26)

LAAP avatar LAAP commented on September 27, 2024

Hi @doorleyr ,

Thanks for sharing this I think it is a very clear example. @nqlong-vlab , please, let us know your thoughts

from csl_hcmc.

Leon-Carto avatar Leon-Carto commented on September 27, 2024

Hi @LAAP

  1. In survey data, the salary were diveded in 15 ranges, so do you want to aggregate into 3 groups : low, medium and high.

Screen Shot 2021-05-17 at 00 16 16

2. About the age group, I will agrreagate by 10: 1-10, 11-20, 21-30.... Is this right.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

I think low, medium and high makes sense for both income and age. Ideally, the categories should be like quantiles. i.e. overall they contain equal proportions of the population.

from csl_hcmc.

Leon-Carto avatar Leon-Carto commented on September 27, 2024

I will classify both incomge and age by 3 equal intervals ( 33% -33% -33%) and we will see which insight is.
But in my thought, I prefer the quantiles 4 intervals 25%
Or I will aggregate 2 methods.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

An O-D matrix file (Survey_OD) has been added. However, this is district-to-district which is too coarse. Since the survey data contain ward-level addresses it should be possible to create a ward-to-ward O-D matrix.

from csl_hcmc.

Leon-Carto avatar Leon-Carto commented on September 27, 2024

I have uploaded the O-D ward to ward in This Folder . You can view it now.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@nqlong-vlab thanks, the ward-to-ward OD matrix looks good.

As for the residential-population and working-population files, please let me know if you want my help in producing these. I'm happy to help but I just want to make sure we don't duplicate efforts.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@nqlong-vlab On closer inspection of the O-D file, I'm having some trouble with it. I need to be able to link the origins and destinations to a shape file. Therefore, the origin ward and destination wards should be specified using a ward ID which corresponds to the IDs used in a shape file. There are several columns in the OD file relating to wards but it's not clear to me which GIS file I can link these to and which columns should be used. eg. the columns start_code and end_code seem to refer to wards. However, this numbering is different to the numbering of the population shapefile.

Please make sure that there is a ward shapefile in the repository which uses the same unique identifiers as the OD file and tell me which columns to use in each file.

from csl_hcmc.

Leon-Carto avatar Leon-Carto commented on September 27, 2024

@doorleyr I have update the code matching to ward code in Population in 6f352bd
Screen Shot 2021-06-01 at 21 01 52
Screen Shot 2021-06-01 at 21 02 03

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

The O-D file looks good now. Keeping the issue open as we still need the home and work files as described in the original issue.

from csl_hcmc.

Hai-Hoang-88 avatar Hai-Hoang-88 commented on September 27, 2024

Thanks Long @nqlong-vlab . The table is expected to be available this weekend.
@doorleyr We are having a bit issue with finding the number of jobs (work_population) in WAC dataset
we plan to scrawl the location of working places (offices, market, ...) from google map and other sources, and link that Points to our buildings footprint to have working areas. From that we calculate the number of jobs.
Another issue relating to work_population divide by age group, we currently have no clue how to fit them in age.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@Hai-Hoang-88 since we have the O-D matrix, this means we must already have information about the number of jobs in each district. It's just a matter of aggregating the survey data by work ward only rather than aggregating by both home and work ward.

The "field of employment" is also in the survey data (Q29) so you can aggregate by work ward and job type to get the number of jobs in each field in each ward.

Similarly, you can aggregate the survey data by home ward and income level (Q32) to get the number of people of each income group living in each ward.

If you're not sure how to do this, I'm happy to help but I can't work with the survey data right now as it's all in Vietnamese. Even with Google Translate I'm finding it too difficult to understand. One option would be for your team to create a file from the survey data with just the columns we need in English (home ward, work ward, job status, industry, income level, eduction level). Then I can can do the aggregations.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

Update: @Hai-Hoang-88 can you please take a look at this notebook where I have attempted to create a basic version of the home and work area files:
https://github.com/CityScope/CSL_HCMC/blob/main/notebooks/aggregate_survey_data.ipynb

Does this make sense? Also can you confirm if the home and work address fields in these survey data correspond to the 'Com_ID' field in the Population shapefile?

from csl_hcmc.

dangbuingochan avatar dangbuingochan commented on September 27, 2024

Hi @doorleyr, with HIDS surveys, we have an English version of the survey form. I attach it here to you: (1) form for all members, (2) form for the household head.
I hope it can help you more. If you need any more, please let me know.
Thank you!

from csl_hcmc.

Hai-Hoang-88 avatar Hai-Hoang-88 commented on September 27, 2024

Update: @Hai-Hoang-88 can you please take a look at this notebook where I have attempted to create a basic version of the home and work area files:
https://github.com/CityScope/CSL_HCMC/blob/main/notebooks/aggregate_survey_data.ipynb

Does this make sense? Also can you confirm if the home and work address fields in these survey data correspond to the 'Com_ID' field in the Population shapefile?

@doorleyr the file looks good to me. @nqlong-vlab Do you have any comments on this?
For EN version, please refer to what @dangbuingochan's (Han) comment. The file you have loaded in Jupiter is excels file, the questions includes:

  1. number of question
  2. the question themselves (in VN, unfortunately)

You can find correspondent in EN version by matching the number question with the file Han has sent. (I think you made it correctly)

And I confirm 'Com_ID' stand for "commune ID" == ward ID.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@Hai-Hoang-88 the home and work address fields in the survey data range from 1 to 311. The Com_ID in the Population file is a 5 digit number eg '27268'.

How can I map between these two different codings?

From looking at the two iterations of the OD file, it looks like the 'Start_Code' field was updated between iterations (eg. 1 -> 26734). So I think I just need this mapping. Can you tell me where to find it?

from csl_hcmc.

dangbuingochan avatar dangbuingochan commented on September 27, 2024

@doorleyr I double-checked, "Start_code" in OD_2014 and "Com_id" in Population is the same code. Please check it again and tell me if it has any errors.

The map code for the HIDS survey is the Zone code, you can find it here (table 4.2.1/p65). And I have uploaded the Census shapefile that covers income and occupation that was extracted from this survey. But I have a problem is the HIDS doesn't cover all wards in the model area (only 26/40 wards have obs) and I don't know how to estimate for others. Please go through it and tell me some hints for the next step.

Thank you!

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@dangbuingochan yes, it's true that the 'Start_code' in OD_2014 matches the 'ComId' in Population.shp. However, I am now working directly with the survey data and the home and work addresses in the raw survey data are integers between 1 and 311. This does not match with the 'Start_code' in OD_2014 or the 'ComId' in Population.shp.

We previously had the same problem with an earlier iteration of the OD_2014 file and @nqlong-vlab updated it with the new codes. I need the mapping between these two sets of codes. i.e.. {1: 26734, 26: 27274 ......}. Can someone please add this to the repo?

from csl_hcmc.

dangbuingochan avatar dangbuingochan commented on September 27, 2024

@doorleyr Oh, I see. I uploaded it to google drive, you can find it here.

from csl_hcmc.

agrignard avatar agrignard commented on September 27, 2024

@dangbuingochan would it be possible to update all the relevant file in the github instead as agreed so it's easier for us to keep track of the work done.

At some point once the file are consolidated it would be nice to document (why not in a wiki) which files are relevant and what are the corresponding OD matrices. @doorleyr is it something you can take care of in a wiki page for instance?

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@dangbuingochan thanks, this is the right type of file but unfortunately it only goes up to 265. The codes in the survey data go up to 311. Can you provide the complete mapping?

from csl_hcmc.

dangbuingochan avatar dangbuingochan commented on September 27, 2024

@doorleyr The zone's codes from 266 to 311, from the HIDS survey, belong to the regions outside Ho Chi Minh city.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@dangbuingochan thanks for the clarification. I noticed another issue with the mapping. Several of the survey areas map to the same ward code. eg. areas 3, 4 and 5 all map to the code '26740'. Is this expected or an error? If it's expected can you explain why?

from csl_hcmc.

dangbuingochan avatar dangbuingochan commented on September 27, 2024

@doorleyr In the HIDS survey, have some cases like that, you can see the image I attach below.
Screen Shot 2021-06-15 at 17 48 34
Screen Shot 2021-06-15 at 17 53 50

from csl_hcmc.

Hai-Hoang-88 avatar Hai-Hoang-88 commented on September 27, 2024

@doorleyr As @dangbuingochan mentioned to me, when income aggregate into 3 groups: low, medium, high, the result showed number of people in range medium is significant low in comparison to other ranges. Our concerns is about sampling bias, is there any method to work around or validate this statistic.

from csl_hcmc.

doorleyr avatar doorleyr commented on September 27, 2024

@Hai-Hoang-88 I have done a first pass of the aggregations in this notebook:
https://github.com/CityScope/CSL_HCMC/blob/main/data_analysis/aggregate_survey_data.ipynb
I used 4 income level groups which have approximately equal numbers of people.

from csl_hcmc.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.