wfp-vam / hrm Goto Github PK
View Code? Open in Web Editor NEWHigh Resolution Mapping of Food Security
Home Page: https://wfp-vam.github.io/HRM/
License: MIT License
High Resolution Mapping of Food Security
Home Page: https://wfp-vam.github.io/HRM/
License: MIT License
NNExtractor needs GRID.download_images to be run previously otherwise GRID.image_dir=None
network = NNExtractor(id, sat, GRID.image_dir, network_model, step, GRID)
ZCA improves performance -- so we should replace PCA transform with ZCA. Problem - this requires using the Y variable as well to do feature selection once we have transformed variables by ZCA.
fyi this is Gaurav showing Jackson how to use Github tho the comment still applies
try to use a non linear regressor as http://scikit-learn.org/stable/modules/kernel_ridge.html
test to see if without the within-folder parallelization when downlaoding the images solves the issue of slow performance for images already downloaded.
lets move the methods of download()
and score_merge
in mater_utils.py
as methods of their respecive classes" img_lib
and nn_extractor
.
downloading images is an async job so easy, but multiprocessing
requires that the async function is a top-level thing so need to rethink g and s data sources classes.
we should make it feasible to score only a subset of images, for example in a production environment to score only relevant pictures.
At the moment the features are standardized before the evaluation loops (mean removal and dividing by variance) with the following:
data_features = (data_features - data_features.mean()) / data_features.std()
in master.py
And they are also normalized (mean removal and dividing by l2-norm) in each cross-validation fold with the following:
model = Ridge(normalize=True)
in modeller.py
This is not optimal because:
Strangely for some configs (2000 for example) removing the normalization in the Ridge Regression impacts a lot the results (R2 from 20% to 0%)!
A possibility to implement more complexed transformations in cross-validation fold is to use the Pipeline class of sklearn. For example to perform scaling (between 0 and 1) and Ridge, we would do:
model = Ridge()
minmax_scaler = MinMaxScaler()
pipeline = make_pipeline(minmax_scaler, model)
scores = cross_val_score(pipeline, X, y)
However, my attempts to combine Normalization and Ridge in a piepline have led to very different results compared to using the normalize=True argument of the Ridge regression...
in src/sentinel_images.py
we download the zip and compose image out of it. We can just unzip to memory-file.
re-implement the logic that filters "bad" satellite images.
File "../Src/img_lib.py", line 183, in download_images
self._save_img(url, self.image_dir, file_name, provider)
File "../Src/img_lib.py", line 238, in _save_img
gee_tif = sentinel_utils.download_and_unzip(buffer, 3, 6, file_path)
File "../Src/sentinel_utils.py", line 88, in download_and_unzip
zip_file = ZipFile(buffer)
File "/home/anaconda3/envs/HRM/lib/python3.5/zipfile.py", line 1026, in init
self._RealGetContents()
File "/home/anaconda3/envs/HRM/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
I think that loading batches of images and call the predict() at every batch should be much faster than calling the predict for every image.
and rename it too
title is self explanatory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.