Comments (13)
Hi @adamjstewart I am interested to contribute to this, but I am fairly new and will need more guidance. What is a good place for me to start?
from torchgeo.
Hi @Haimantika, thanks for volunteering!
Let's pick a single dataset, maybe
torchgeo/datasets/benin_cashews.py
, and try to convert it to the new syntax. Once that's working, we can repeat for the other 7 datasets and remove all mention of radiant-mlhub.This is the new dataset website: https://beta.source.coop/technoserve/cashews-benin/
If you create an account, log in, and click generate credentials, you'll see that the Azure URI is https://radiantearth.blob.core.windows.net/mlhub/technoserve-cashew-benin
We'll add a new dependency on azure-storage-blob in
pyproject.toml
,requirements/datasets.txt
, andrequirements/min-reqs.old
. I can help determine the minimum supported version.We'll probably add something similar to
download_radiant_mlhub_dataset
but for Azure blobs intorchgeo/datasets/utils.py
. This can then be imported intorchgeo/datasets/benin_cashews.py
and used in_download
.Let me know if anything is unclear. The first dataset is going to be a bit of work, but once we have one working, the rest should be easy.
This is very helpful. Thanks a lot. I will start working on it and get back with doubts, if any.
from torchgeo.
I have not seen any PRs that implement download support for Source Cooperative. Which PR are you referring to?
My bad. This one just mentioned the issue.
from torchgeo.
Yes, this is a 9th dataset that will benefit from your contribution.
P.S. I reached out to the folks at Source Cooperative. One thing to note is that azure-storage-blob will copy raw files/directories, not zip/tar files. So there won't be an easy way to checksum these. For now, let's just focus on downloading and ignore checksumming.
from torchgeo.
Hi @Haimantika, thanks for volunteering!
Let's pick a single dataset, maybe torchgeo/datasets/benin_cashews.py
, and try to convert it to the new syntax. Once that's working, we can repeat for the other 7 datasets and remove all mention of radiant-mlhub.
This is the new dataset website: https://beta.source.coop/technoserve/cashews-benin/
If you create an account, log in, and click generate credentials, you'll see that the Azure URI is https://radiantearth.blob.core.windows.net/mlhub/technoserve-cashew-benin
We'll add a new dependency on azure-storage-blob in pyproject.toml
, requirements/datasets.txt
, and requirements/min-reqs.old
. I can help determine the minimum supported version.
We'll probably add something similar to download_radiant_mlhub_dataset
but for Azure blobs in torchgeo/datasets/utils.py
. This can then be imported in torchgeo/datasets/benin_cashews.py
and used in _download
.
Let me know if anything is unclear. The first dataset is going to be a bit of work, but once we have one working, the rest should be easy.
from torchgeo.
Hi @adamjstewart I finally got some time to work on it. I see a PR has been raised, is the issue solved already?
from torchgeo.
I have not seen any PRs that implement download support for Source Cooperative. Which PR are you referring to?
from torchgeo.
download_radiant_mlhub_dataset
Yes, this is a 9th dataset that will benefit from your contribution.
P.S. I reached out to the folks at Source Cooperative. One thing to note is that azure-storage-blob will copy raw files/directories, not zip/tar files. So there won't be an easy way to checksum these. For now, let's just focus on downloading and ignore checksumming.
Hi, I was doing a bit of research and the latest version of source cooperative that I could find was - beta.source.coop
Is that it? Or am I missing something? I have made the changes, can make a PR for you to take a look.
from torchgeo.
Yes, that's the new website.
from torchgeo.
@adamjstewart I have raised a PR. There are chances that this is not the solution you are looking for. However I would like to give it one more try after your review and then unassign myself if it does not work to respect your time. :)
from torchgeo.
review of MSFT azure-sdk-for-python that includes examples like this. Second view of the azcopy
tool. python is preferred for torchGeo
; not clear how portable dependency management would work for azcopy
.. Spack and conda
have hooks but pip
does not have good hooks for this kind of binary tool depends. Simply recommending azcopy
and failing gracefully when it is not present was discussed briefly. not yet resolved
from torchgeo.
We definitely don't need all of azure, azure-storage-blob would suffice.
from torchgeo.
this file appears to implement basic functionality https://github.com/kartAI/kartAI/blob/master/azure/blobstorage.py
from torchgeo.
Related Issues (20)
- Incompatible image size with RandomGeoSampler HOT 3
- Easier way to use Data Processing steps outside of datamodule HOT 4
- Benchmarking of all pre-trained weights HOT 4
- Add instructions on downloading the DeepGlobeLandCover dataset HOT 5
- The new lightly release breaks BaseTask with timm imports HOT 5
- SSL Weight Decay HOT 6
- Datamodule augmentation defaults HOT 8
- NCCM checksum error HOT 6
- Support additional SatlasPretrain models. HOT 6
- Document significance of macro vs micro averaging HOT 3
- Add BalancedRandomGeoSampler balancing positives and negatives HOT 2
- Add support for Lightning Streaming Dataset HOT 14
- OSCDDataModule initialises with batch_size 1, ignoring the configured batch_size HOT 4
- Add `ignore_index` support for Jaccard Loss HOT 1
- Unpin torch, use a min or range? HOT 4
- trainers.segmentation JaccardLoss receiving num_classes, should be a List[int]? HOT 8
- GeoDataset: non-deterministic behavior HOT 5
- Sentinel 2 dataset can't see files downloaded from Copernicus Browser - filename doesn't fit regex HOT 1
- Errors & improvements in Metrics descriptions HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchgeo.