Giter Club home page Giter Club logo

predicting-poverty's People

Contributors

brunosan avatar nealjean avatar sangmichaelxie avatar wmadavis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

predicting-poverty's Issues

How to create the trained CNN model?

Hi Neal,
I tried to replicate this work for predicting poverty in another country. However, in this work, you have already provided the trained CNN (predicting_poverty_trained.caffemodel) to extract 4096 image features corresponding to each cluster (extract_features.py). Since I want to build a model for another country using the training images from this country, I would like to know how you built the trained CNN model.

  1. Do we train the model to perform a classification task since I noticed you used SOFTMAX in the last layer? what is the label for each image (I don't think we have the labels or the classes for images)?
  2. Is there any reason why you use the features in the layer conv7?

Thank you so much for your help. Looking forward to hearing from you.

Why not getting good results?

Hi Neal,
why I am not getting very good results. In following figure I have choose first 800 images from candicate_download_locs.txt and use them and then extract features and generate figure 3.

nigeria

In following figure I have used cluster Lat Lon directly to download images and then extract features and generate figure 3.

nigeria1

could you please help me out? It would be highly appreciated

Is the satellite imagery georeferenced?

Hey I am looking into using satellite imagery to predict economic activity. I saw previous questions about how the images are downloaded. I just wanted to ask, if your images are georeferenced?

400*400 pixel daytime image

Hey...i saw in one of the issues about the watermark at the bottom and you have downloaded a slightly larger image. May I know what pixel size you have used to download the images. And did it effect the square km area of the images downloaded.

Issues in ProcessSurveyData.R and the README

Hi! I am trying to reproduce your research to learn more about applied machine learning with satellite imagery. I ran into a few issues I thought you might want to hear:

First, in ProcessSurveyData.R, line 131 (for Malawi) two arguments are given for the nl function. I see that it takes two, but I get an unused argument error:

Error in nl(., mwi13.vars, 2013) : unused argument (2013)
In addition: There were 15 warnings (use warnings() to see them)

My guess from the other code is that there used to be multiple parameters to this function and the vars is no lnger needed and should be removed:

nl(mwi13.vars, 2013) -> nl(2013)

Second, in the README.md, you mention that the Tanzania data from LSMS should be relabeled to DATA:

  3. Unzip these files so that **data/input/LSMS** contains the following folders of data:
       1. UGA_2011_UNPS_v01_M_STATA
       2. DATA (formerly TZA_2012_LSMS_v01_M_STATA_English_labels before a re-upload in January 2016)
       3. NGA_2012_LSMS_v03_M_STATA
       4. MWI_2013_IHPS_v01_M_STATA

But in the code in ProcessSurveyData.R you have "DATA" as the directory for the Nigeria data:

## Nigeria ##
nga13.cons <- read.dta('data/input/LSMS/DATA/cons_agg_w2.dta') %$%
  data.frame(hhid = hhid, cons = pcexp_dr_w2/365)
nga13.cons$cons <- nga13.cons$cons*110.84/(79.53*100)
nga13.geo <- read.dta('data/input/LSMS/DATA/Geodata Wave 2/NGA_HouseholdGeovars_Y2.dta')
nga13.coords <- data.frame(hhid = nga13.geo$hhid, lat = nga13.geo$LAT_DD_MOD, lon = nga13.geo$LON_DD_MOD)
nga13.rururb <- data.frame(hhid = nga13.geo$hhid, rururb = nga13.geo$sector, stringsAsFactors = F)
nga13.weight <- read.dta('data/input/LSMS/DATA/HHTrack.dta')[,c('hhid', 'wt_wave2')]
names(nga13.weight)[2] <- 'weight'
nga13.phhh8 <- read.dta('data/input/LSMS/DATA/Post Harvest Wave 2/Household/sect8_harvestw2.dta')
nga13.room <- data.frame(hhid = nga13.phhh8$hhid, room = nga13.phhh8$s8q9)
nga13.metal <- data.frame(hhid = nga13.phhh8$hhid, metal = nga13.phhh8$s8q7=='IRON SHEETS')
nga13.elev <- raster('data/input/DIVA-GIS/NGA_alt.gri') %>%
  extract(., nga13.coords[,c('lon', 'lat')]) %>%
  data.frame(hhid = nga13.coords$hhid, elev = .) %>% na.omit()

Which should be fixed: the code or the README?

Different output file names in different scripts

Hi!

The script 'extract_features.py' stores the CNN features and other aspects of the model as 'conv_features.npy' and 'image_counts.npy'. But, the modules 'load_country_lsms' and 'load_country_dhs' in 'fig_utils.py' seem to be looking for the files 'cluster_conv_features.npy' and 'cluster_image_counts.npy' and they are not being generated at any other point in the workflow. The same goes for 'nightlights.npy', 'consumptions.npy' and 'households.npy'. Am I missing something here, or are both the scripts supposed to be referring to the same file?

Thanks

fig D - Tanzania differs

Hi Neal,

maybe not an issue, just a note.
After replication of Fig1 I noticed that in original paper - fig D Tanzania has much different shape than other countries.
Could be caused by not enough data points for higher consumption segment. Actually it looks like for Tanzania there is much more data for that period.

My figure for this country looks similar to Uganda and Nigeria. Can supply the fig if needed. Used the same data downloaded these days.

Btw, nice work, regards
Tom

Out-of-sample/cluster Prediction

I've succeeded in replicating your results (great work by the way), but I'm now trying to make predictions for consumption/assets in places outside of the original DHS/LSMS clusters, i.e. out-of-sample predictions derived from additional satellite imagery from non-DHS/LSMS locations. I can see from extract_features.py that the features are estimated for every image provided before being aggregated to the cluster level, so this should be feasible. But I'm then unsure how to use these image-specific features in the regression model produced in fig_utils.py, partly because everything is coded at the cluster level, reflecting the available level of DHS/LSMS data, and partly because I'm not familiar with the cross-validation approach. How would you advise applying the regression model to make predictions for individual images? Thanks,

pixel2coord in get_image_download.py

Hi Neal,

My name is Kishen. I was looking at your code.

In pixel2coord in /scripts/get_image_download.py function.
Every pixel spans some latitude and longitude.
Shouldn't you use mean of the pixel's latitude and longitude.
That will correspond to 0.45km shift on the ground?

Training data and testing data for Caffe Model

Greetings Authors,
I have a few questions on the training data set for the Caffe Model. After going over your code, it seems like all modified coordinates from LSMS were used in creating the downloaded_locs.txt, and downloaded_locs.txt is used by extract_features.py and thus becomes the test set before it could be used regression. To simplify, I am concerned if training set and testing set were mutually exclusive? If yes is it possible for you to share the training coordinates for each country?

Thanks,
Vinit

Per pixel area of night time light?

I was wondering that night lights are very large in size and cannot view in normal photo editor. I want to know what is the resolution of nightlight images and how much area it is covering. (per pixel area of nightlight) ?

Thanks in advance

n*4096 features

Hey
So i have extracted the n*4096 features from the satellite images. I was wondering if all the features are meaningful for your poverty measure and from the output you know which feature represents what?

Also in one of your videos in youtube i saw the output you refer to is a linear combination of the features extracted..is that a summation of the features or something else.

Object "pcexp_dr_w2" not found

Hello,

Ive trained running the ProcessSurveyData.R script but I keep getting the following error:

Error in data.frame(hhid = hhid, cons = pcexp_dr_w2/365) : object 'pcexp_dr_w2' not found

The error comes from the following part of the code:

nga13.cons <- read.dta('./data/input/LSMS/DATA/cons_agg_wave2_visit2.dta') %$% data.frame(hhid = hhid, cons = pcexp_dr_w2/365) nga13.cons$cons <- nga13.cons$cons*110.84/(79.53*100) nga13.geo <- read.dta('./data/input/LSMS/DATA/Geodata Wave 2/NGA_HouseholdGeovars_Y2.dta')

How can I fix the code?

the 2nd step

Hey so for the 2nd step training step, night lights are used. Is that in image form or values ranging from 0 to 62?
Also the second training step predicts the nightlight intensities from the daytime images. So does that mean you also derive a new data set of predicted light intensity values from your training along with the third row of images from figure 2?

Missing Model Weights - FOUND

The saved weights for the trained model are missing. In extract_features.py (see here) we expect the weights file to be located at ../model/predicting_poverty_trained.caffemodel, but such a file does not exist in the repo.

Downloading images from Google Map API at correct coordinates

Greetings Authors,
Thanks for sharing your code. As the README in the repository mentions the output of the candidate_download_locs.txt is in the form of [image_lat] [image_long] [cluster_lat] [cluster_long] and these coordinates generate locations meant to download 1x1 km RGB satellite images of size 400x400 pixels. In context to using Google Map API to download images, I am assuming that [image_lat], [image_long], [cluster_lat] and [cluster_long] were the rectangular coordinates of the Geometry object that were used to download the 400 x 400 image, i.e, top-left corner = {[image_lat], [image_long]} and bottom-left corner = {[cluster_lat], [cluster_long]}. To verify this assumption I used the haversine distance formula but I obtained areas greater that 25km in some cases. Now, I am assuming that you might have taken 1 km x 1km patches around the {[image_lat] , [image_long]} , i.e, considering {[image_lat] , [image_long]} as the center point. Is this what you had done or was some other method used?
Thank you,
Vinit

Deriving specific filtered images

Hi,
I have run the extract feature script on some images to derive the convolutional features and also the filtered images. I was able to derive the n*4096 array conv_features.npy and the 64 filtered images. But I see from Figure 2 of your paper, you have identified different convolutional filters. I was wondering if you ran a separate script to identify a specific convolutional feature such as roads, buildings, concrete structures etc. In particular, I was wondering if it is possible to extract values (in a tabular format) that measure the total number of particular features in the image. For example, out of the total number of pixels, there are X pixels that have the features of concrete structures, Y pixels that are roads etc.

How to actually download satellite images?

Hi there!

Thanks for the detailed description of how to get and process the data. It seems to me that the only missing piece is how to actually download the satellite images (and where to download them from). Is it possible to do that automatically using GDAL? It would be wonderful if you could to share the script you used to retrieve the images.

Thanks,
Maruan

Problem replicating results after using extract_features.py

Hello,

I'm having some trouble replicating the figures after trying to extract_features.py myself, and am getting results looking like attached below for Figure 3 Cluster Level Consumptions:

screen shot 2017-12-12 at 14 44 38
screen shot 2017-12-12 at 14 44 32
screen shot 2017-12-12 at 14 44 27
screen shot 2017-12-12 at 14 44 17

I'm pretty sure have followed all the steps correctly unless I missed something - do you have any idea what I may have done wrong?

Thanks!

trainable?

is this model trainable? i want to train the model further on some other data ?

Training Data

Hi,
In the third step to predict poverty, we require the survey data along with the corresponding extracted features. If so ,then we need some sort of corresponding training data for the images to predict poverty? Then how can we calculate poverty or economic measure just using the daytime images? I dont know if i am missing something? My question is how can we predict some values of poverty or economic activity based on the daytime images without any survey or training data?

Out of sample training

Hey I was able to train for some countries replicating your work. I want to do some out of sample predictions. I see you have used countries for the out of sample prediction which are similar in characteristics. Do you suggest we can use a model trained in a country which is very different economic development wise? For instance, using a model trained with say Netherlands to do out of sample training lets say for Nigeria?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.