Giter Club home page Giter Club logo

crops-parser's Introduction

Crops parser

This shell script parses data from the Food and Agriculture Organization of the United Nations about the cultivated/planted plants/fruits on the world into a YAML file, which groups them per country to see the top 15.

It has been created for the OpenStreetMap mapping app StreetComplete, see this issue for details.

How to download data?

Go to the FAQ website and download the FAO data. Things to remember:

  1. Select all countries and make sure to select the FAO coding system.
  2. Either select the area harvested (in ha) or the production quantity (in tonnes) to get useful results.
  3. Select all crops in the items list. (The new FAO website merged crops [C] and livestock [L].)
  4. Save the data.

screenshot of the FAO website export with important things to select highlighted as explained above

How to run it?

The script is mostly POSIX-compliant, so it should work on all systems, but a CLI tool called csvtool has to be installed as it is used as a CSV parser.

If this is done, you can just execute it:

$ ./parseCrops.sh source/area_harvested_2019+2020.csv result/OsmOnly/mostAreaHarvest_2019+2020.yml    
Prepare CSVā€¦
Adjusting datasetsā€¦
Sum up duplicate elementsā€¦
Summed up 289 duplicates.
Calculate yearly averageā€¦
Sort dataā€¦
Evaluate dataā€¦
WARNING: No language code for China could be found. Skip.
Finish processingā€¦

The language code warning for China is to be expected, see the contributing guide for details.

What does it?

This is an overview of what happens:

  • Prepare CSVā€¦ ā€“ It strips the table header and extracts the columns of interest.
  • Adjusting datasetsā€¦ ā€“ Adjusts each dataset. E.g. it strips commas for easier processing, applies the blacklist and coverts the crop names to OSM keys (optional).
  • Sum up duplicate elementsā€¦ ā€“ Finds exact duplicates (considering the year too) and sums them up. Afterwards reports the sucess. (Usually items should only be summed up when converting OSM tags.)
  • Calculate yearly averageā€¦ ā€“ Calculates the average tonnes/area in production when multiple years are given.
  • Sort dataā€¦ ā€“ It sorts the whole data according to the tonnes of produced crops, independent of the country.
  • Evaluate dataā€¦ ā€“ It extracts all crops for each country and transforms the first fifteen crops listet into the YAML format. Additionally it replaces the country name with the 2-letter country code (ISO 3166).
  • Finish processingā€¦ ā€“ It adds the header and default crops and sorts the YAML another time, so the countries are sorted.

Result

The results can be seen in the directory result. All legacy and more up-to-date data are included.

The script can handle multiple data from multiple years quite well. After summing up equal items per year (and country) it later calculates the average of the production numbers from both years.

Extras

Additionally, there is a collection of square images of all "OSM fruits", which are included in the top-15. You can find it in the directory images.

Legal stuff

The data taken from the FAO is licensed under the terms they describe, i.e. CC BY-NC-SA 3.0 IGO. This is described in detail in this document.

This work is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 IGO license (CC BY-NC-SA 3.0 IGO; https://creativecommons.org/licenses/by-nc-sa/3.0/igo). In addition to this license, some database specific terms of use are listed in the Terms of Use of Datasets.

Apart from that, all code part is licensed under the MIT license.

crops-parser's People

Contributors

rugk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

crops-parser's Issues

Batch processing for images

@westnordost

In which size do we need these images? I saw we need an HDPI version and a low oneā€¦
(In case we later need to restore the image, it is in the git history of this repo šŸ˜ƒ)

Also I'd pass them through https://tinyjpg.com/, so we can save much space.

Coconut

Needs review as you mostly only find pictures of them when they are not on the tree anymore. And they are yellow on a tree, BTW. But maybe that picture even helps to find them.

Hazel nut

As the hazel nut looks quite different when at the tree as a fruit and later freed, etc. I've included a collage version hanzel_with_nut.jpg (and source .xcf for further editing).

May we use this picture? @westnordost

Remove Jute

In my opinion, Jute should not be in that list. It is harvested fully, it is not perennial and it does not look like an orchard from aerial pictures. (Curiously, a lot of sisal plantation are shown if you search images for "jute plantation").

Strawberries tag

Why are strawberries in the list? Is this a mistake?

Also, in the list they are mentioned as crop=strawberry but all the other berries are mentioned like i.e. trees=raspberry, that looks like a mistake.

trees=*_plants vs crop=*

In the csv, it is simply crop=vanilla. If it is grown on an orchard, and it seems that it is, trees=vanilla_plants should be ok as well, as for all the other values.

Apricot

My girlfriend thought the current picture is a nectarine. Apricots usually have a similar surface as plums or peaches while nectarines have a smooth surface.

Tomatoes?

Currently excluded, wonder whyā€¦ usually picked up.

  • is perennial/always grown in monoculture (mostly greenhouses)
  • is grown on orchards/fruit gardens/satellite images can be confused with orchards

Rubber alternatives

As for the rubber tree, I, of course, not choose an image showing the fruits, but the tapped tree showing the rubberā€¦

However, one may also like to see the tree as it is likely not harvested at any time and one should recognize the tree without the rubber.

So do you like the image I choose (soon uploaded)? If not, here are nice alternatives:

@westnordost

Pineapple

I think the current picture for the pineapple is not so good. It should be a photo slightly from above, so that not a house is in the background but the plant itself.

Cherries

The current cherries have a quite an unusual color, darker ones would be better IMO

Final review of OSM tag list

@westnordost

When all other issues here are closed, please have a final look at the crop OSM list. Look for crops, which may not fit the criteria of perennial/monoculture and are either planted in orchards or can be confused with orchards from satellite images. Maybe also have a look at the new OSM tags.
It's also a good idea to check all crops we removed/excluded from the data. Just have a look over them.

Steps:

  • Close all other issues regarding inclusion/exclusions of crops.
  • @rugk: Regenerate another list with all blacklisted crops
  • @westnordost: Review OSM "whitelist". Any crop is listed here.
  • @westnordost: Review the regenerated removed crops list. Did I/we remove some, which may be included?
  • Possibly apply changes.
  • Let unofficial OSM tags to be approved/suggest them/just add them to the wiki page
  • Possibly apply changes to OSM tags.
  • Regenerate top-5 list.

Afterwards we should have data we can work with and can continue with streetcomplete/StreetComplete#368.

Fallback

So for a few countries, there are very few, sometimes only one, entries in the mostPlantedCrops_2014%2B2013_OSMonly.yml which is a problem because then for i.e. Vietnam, you'd have only one choice - bananas. Obviously the FAO data is not complete, so there must be some kind of fallback.

Do you have ideas?

Solutions come to my mind:

  • don't show the quest at all for countries with less then 3 choices. More a workaround than a solution
  • have the "show more" button show all defined fruits - this is... not so good because most are irrelevant for any given location
  • sort entries by climate zones, have a dictionary (country -> climate zones) and do the same as above
  • manually research at least for the countries that have too few choices what else would grow there / what else is being known for growing there

Raspberry

I like the current raspberry pic, because you can see the leaves, but this one looks really great, promising and is a good photo. (On Wikicommons it's also a "quality image".)

So leave or change it? Or maybe yet another photo?

Apple

From f94e800:

apple: better a picture where the top part with the stem is also visible as this is one characteristic of apples?

Chili tag

What is the correct trees=??? tag for chilis?

StreetComplete source data: tonnes or area?

@cyanate made me aware in #47 that we could also use the harvested area instead of tonnes as a source data. The script stays the same, only the result may naturally differā€¦

Just opening this opportunity. If you think we should switch, then we may switch and I'll look for missing pics for crops or so, in case this is needed.

Hop

I just noticed that in the CSV, hop plants are in landuse=farmland. Is this an error?

Image for brazil nut

(German: Paranuss)

I could not find one, which satisfies our requirements.

Pictures are really rare.

We could, however, use the bloom. There we have enough pictures.

Some nice images by fir0002 have license problems

https://commons.wikimedia.org/wiki/File:Pair_of_lemons.jpg and https://commons.wikimedia.org/wiki/File:Lemon_closeup.jpg have license problems.

License: CC BY-NC or GFDL 1.2 or GFDL v1.2

If I understand it correctly also Wikipedia/Wikicommons does not allow NC, so they seem to use GFDL v1.2 and do that crazy attribution required there. You can also mail the author to negotiate a different licenseā€¦

These images are very yellow šŸ˜„ šŸ‹, but I'll add another not-so-nice, but acceptable photo for now.

Sugar cane?

@westnordost
I excluded sugar beets (certainly not on orchards), but what about sugar cane? Could it be confused with trees/bushes, unlikey?

In #2 (comment) you were for including it, but we sugar cane fields are unlikely to be recognized as orchards from satellite images, right?

Grapes

I've added two versions? grape1 or grape2? The second one is a bit dark, but has a nice form, the first one is light, but looks like a big pileā€¦

Missing data

In the list everything marked with "???????" has an OSM tag, but we have no data for it.
I think we can likely ignore it for now, as I trust the FAO data quite much. I mean if blackberry is not produced and there are no statistics for it, then we cannot do anything and maybe there are really no orchards with them.

Also we have no data for tree nurseries (a thing to re-tag) as the FAO of course does not includes this (it's neither agriculture nor food).
We could integrate this data orā€¦ wellā€¦ leave it out for now.

Sum up same values

E.g. after the OSM tag conversion we could sum up all "crop=nut" as we have a list of diverse nuts there and most (all?) are just tagged as nuts. This of course affects the statistics as countries were many different nuts are grown, may get this "nuts" the top-5 afterwards.

Hemp?

In some countries they are legal, as it seems.

Affects Hemp tow waste and Hempseed

'NO'

Because YAML is weird, you need to put 'NO' (Norway) in '' in the output.

Strawberries?

Needs inclusion. (Previously excluded, becauseā€¦ no idea)

I mean in any case these are clearly berries: #5

Pepper, piper, what?

Cleaning up my confusion:

  • Black pepper = Schwarzer Pfeffer = Piper nigrum = piper.jpg
    Genus: Piper/Pepper; pepper family
  • Paprika (inlc. chilli) = Capsicum = pepper.jpg is our chili image, the Thai chili version
    Genus: Capsicum; nightshade family
    The fruits of this plant are called "sweet/hot/ā€¦ pepper".

So these are two heavily different plants. See also this for a detailed explanation.
Quoting one Wiki article about Capsicum:

The name "pepper" comes from the similarity of the flavor to black pepper, Piper nigrum, although there is no botanical relationship with it or with Sichuan pepper.

Who has invented such a thingā€¦? The German names are clearly easier. šŸ˜„

So that is, why the source data has one entry where they name it "piper" and the other "peppers".

And, we may want to differentiate between chilli šŸŒ¶ļø and "other" Capsicum such as paprika. At least we do not want to have all Capsicum fields marked as "chilli" fields. I think this is already satisfied as we split the data.

Ideas about special cases

Questions for each case:

  1. Should this be handled in the parser (i.e. should it change the entries in the YAML file?) or can/should/may StreetComplete handle this when parsing/using the YAML file?
  2. In any case, how should it be done?
  3. Are syntax changes necessary?

Some special cases:

  1. Sometimes the source data is so broad to not differentiate between two different OSM tags. This is e.g. the case for lime vs lemon trees.
    1.1. This script could split these into two entries. But what to do with the data ("value in tonnes")? Duplicate/bisect? (IMHO this would be faking the data) Or maybe ignore the "top-5" limit for this case and just add one item more?
    1.3. Possible synax: trees=lemon_tree|trees=lime_tree
  2. Sometimes we decided we include a crop and then re-tag it under a different tag. Should this information be included in the YAML file?
    2.3. Possible synax: landuse=farmland+crop=hemp (retag this orchard as landuse=farmland)

Nuts & palms

I think these two should be excluded as well because they are too generic. Especially "palms", what does this even mean?
Regarding nuts, the main difficulty is that the image shows the processed nuts, so how would a surveyor know that any kind of plantation is a plantation that grows nuts (but none of the other ones)? For other values, there are no category-answers like "fruits", "berries" etc., so "nuts" feels out of place, imo.

Beans?

@westnordost

grafik

Grow on such fieldsā€¦ Possibly include them, right?

However, soybeans, are not that tall, so maybe exclude them?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.