falling-fruit / dfg-trees Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ezwelty/opentrees-harvester

0.0 0.0 1.0 1.85 MB

Develop for Good – Harvest and harmonize tree inventories

License: Other

JavaScript 100.00%

dfg-trees's Introduction

Falling Fruit Legacy

This is a Rails 3 web application and API for Falling Fruit, built for use with a PostgreSQL + PostGIS database.

Who is responsible?

Falling Fruit co-founders Caleb Phillips and Ethan Welty. More info at fallingfruit.org/about.

How can I help?

If you want to help with development, feel free to fork the project. If you have something to submit upstream, send a pull request from your fork. Cool? Cool!

Status

The website is live at fallingfruit.org. However, maintaining both a website and a mobile app that do not share any code proved too time consuming, so we are slowly phasing out this project in favor of a mobile-friendly web app (falling-fruit-web). However, all versions of the mobile app still rely on the Rails API.

Development

Install PostgreSQL, Ruby, and dependencies

PostgreSQL (15) & PostGIS (3.3), for example with Homebrew:

brew install postgresql@15 postgis

ImageMagick:

brew install imagemagick

Ruby (2.3.4), for example with rbenv:

rbenv install 2.3.4
rbenv shell 2.3.4

Bundler (1.17.3) with RubyGems:

gem install bundler -v 1.17.3

Project gems with Bundler:

bundle install

Initialize the configuration files:

cp config/database.yml.dist config/database.yml
cp config/s3.yml.dist config/s3.yml
cp config/initializers/credentials.rb.dist config/initializers/credentials.rb
cp config/initializers/secret_token.rb.dist config/initializers/secret_token.rb
cp .phraseapp.yml.dist .phraseapp.yml

Add a desired development database name and your database username, password, and port to config/database.yml. Add Amazon S3 and Google API credentials to config/s3.yml and config/initializers/credentials.rb. If working with translations, add Phrase credentials to .phraseapp.yml.

Start the app

Create and structure the database, then seed it with db/seeds.rb:

rake db:create
rake db:migrate
rake db:seed

Install and start the Falling Fruit API.

Finally, start the web server and navigate to localhost:3000:

thin start

Translation

For translators

Website translations are managed on the Phrase project Falling Fruit (web). To contribute, email us ([email protected]) and we'll add you as a translator.

For developers

Install the Phrase CLI:

brew install phrase-cli
cp .phraseapp.yml.dist .phraseapp.yml

Add your Phrase access token to .phraseapp.yml.

Add a new translation

In the Falling Fruit (web) project, select the default locale (English/en), and add a new translation key. If the same word or phrase appears often, add it as glossary.<key name>.

Then, update your translation files (in config/locales/*.yml):

phrase pull

Use the translation key in your template.

<!-- Instead of adding text to the markup: -->
<span>Map</span>

<!-- Evoke the translation key value with translate() -->
<span><%= translate("glossary.map") %></span>

dfg-trees's People

Contributors

Watchers

Forkers

michaelpregan

dfg-trees's Issues

Semi-automated Matching to Common Schema(3): Advanced Matching from input to target schema

Team members: Aadit, Michael, Jaqueline
Sprint 6

Overall goal:
Improve the previous semi-automated system for writing crosswalks for new data sets. We want to look at the different data types and assign them to the correct target schema column names. Also, we want to be able to account for a target schema column name having multiple possible input column names.

Comparing hashes of data to prevent saving the same data (2): compare hashes + either store or point to previous version

Team members: Jayesh, Kajoyrie, Jason
Sprint 5: 6/26-7/3

Overall Goal:
After creating a hash, we see if it is different from the hash of the last version of the data. If it is different then we store the hash of the new data and the timestamp. If it is not different to the previous hash, we do not store the hash the second time and instead we want to have entry in the registry telling us that we downloaded the second time and it was the same hash as the previous version. This can be done as an additional attribute or have the path point to the previous file with the previous timestamp.

Store metadata (especially crosswalk) into the archive too.

This depends on the archive existing (see: “Insert data directly into the archive”). The goal here would actually be to add more data to our archive by storing the “crosswalk” data as a field in the json object for a certain archive entry.

Semi-automated Matching to Common Schema(1): Scrape Data

Team members: Aadit, Michael, Jaqueline
Sprint 4: 6/19-6/26

Overall goal:
Create a function that can scrape out all of the crosswalks that we have in the source metadata files. Then for each of the columns names in the target schema, get the input column names that were matched to it and see if there are any useful patterns. We want to introspect current crosswalks and create a lookup table for column names of input and target schema so that we can look at past 100-200 crosswalks and identify the crosswalk that might be a good start.

Problem Statement:
Data from our datasets have all different types of names for their data (e.g. location can be described with an address, or latitude and longitude, or miles from a city center, etc). Every time we add a new dataset, someone has to manually look through the data and find out how to transform the data into a target schema. We have a lot of datasets where this has been done already, and we would like to find patterns in how datasets are structured so we can eventually generate a draft of the correct mappings based on previous mappings. This would save a lot of time as we try to add thousands more datasets.

What does success look like?

First, get the data for 100 - 200 cross walks and try to identify any patterns (list out maybe 3 - 5).
Then, test your ideas about these patterns on a larger set of crosswalks and see if it still holds true.
Summarize your findings for the team!

Comments:

The functions were written in similar ways so using regular expressions may be helpful and efficient.
Cross walks are a dictionary (also called an Object in JS) where the key is the target data label and the value is either 1. a header name in the data, 2. a function that uses one or more header names to produce a processed value from one or more headers.
You'll notice that the set of keys is not always consistent. This is because there are special keys that help magically convert different units to a target schema. You can find the true target schema in the readme. Crosswalks are often missing data so many of these do not appear in crosswalks.

Download the data homepage (About/Info URL)

There is currently no tool to store the about pages as mhtml.

See the code and usage examples in https://github.com/falling-fruit/dfg-trees/tree/main/archiver. It provides a minimum working example (in Javascript). The archiver should not care what the URL is for (e.g. About URL of a dataset), it just needs to know whether it should treat the URL as a webpage (to render with browser) or a dataset (to download).

Integrate the archiver tools into Source.get() and the main workflow

Insert data directly into the archive (add a line to the registry)

We want to be able to add data directly into our directory of information sources.

Create sources_archive.jsonl using the JSON Lines data format and a cfunction that adds data in the format proposed by the PRD in the Engineering Details section (https://docs.google.com/document/d/17QpEdYUS41PH88iS2k875Y36Y7y7jun3wxnA4UPxJqM/edit#).

The function should take as parameters: url, source_id, url_type, and api_type. url_type and api_type are optional.

Comparing hashes of data to prevent saving the same data (1): hashing data file

Team members: Jayesh, Kajoyrie, Jason
Sprint 4: 6/19-6/26

Overall Goal:
When we download data within the archiver, we key and hash the data and then we check within the registry and see if it is different from the hash of the last version of the data.

What does success look like?

We first want a function in archiver.js that takes a data file and a registry id, applies a MD5 hash to the data file, searches in the registry for the data assigned to that registry id, finds the data hash (that we will later implement to be stored in the registry too), and compares the two hashes, returning true is they are the same.
If the data hash or registry id does not exist in the registry, return false.
We will then want to add the data hash as a piece of data stored in the registry too when we archive data.

Comments:

Ideally, we will compute the hash when we download the data initially so that we do not have to read the data twice.
Hashing: we can use MD5 to hash the data when reading the contents of the file and do it incrementally, instead of the whole thing in the memory.
- Good resource to look at when starting: archiving the file name in Ethan's archive demo
- It will be interesting to see if the hash is the same depending on if we read the file as the bytes vs. text.
We want to only do this for data files because the about info files would change very frequently without much benefit for us.

WFS Download

WFS is one type of endpoint that we can receive data from. Currently, downloading from WFS is not supported in our tool. See (http://www.e-cartouche.ch/content_reg/cartouche/webservice/en/html/wfs_whatWFSis.html) for more info about WFS and the Notion for more details.

Add support for additional URLs in addition to metadata files (e.g. other web pages describing the dataset)

The current data source scheme allows one webpage URL (info) and either a single download URL (download : string), multiple download URLs (download: string array – a rare case where each part of a shapefile is provided as a separate download, rather than in a zip file), or as an ArcGIS REST endpoint (featureLayer).

This ticket will affect the way we do downloading and archiving.
One way to generalize this to multiple webpages or metadata files could be something like this, where pages are URLs to web pages describing the data, metadata are URLs to files describing the data, and data are URLs to the data itself. To support API endpoints more generally, perhaps each (meta)data url could either be a string (HTTP URL) or an object {url, api}.

Please see the Notion for more details.

Semi-automated Matching to Common Schema(2): Simple Matching from input to target schema

Team members: Aadit, Michael, Jaqueline
Sprint 5: 6/26-7/3

Overall goal:
Create a basic semi-automated system for writing crosswalks for new data sets. We want to use the existing crosswalk information as a blueprint on how to transform the data that we downloaded into the format we want it. The current crosswalk data is defined by the user so we want a faster/semi-automated way to do this.

What does success look like?

We want a function that takes in a dataset and returns a draft crosswalk mapping (as an Object in JS). When we integrate this into existing functionality, the user will be able to review this object and approve or modify it.
We will also need a json file called crosswalk_mappings.json that contains a mapping from each target header to an array of source headers and functions containing one or more source headers.
Start with 1:1 mappings between target data labels and the data labels we have. This is enough for success now.
Later we can consider adding mappings from target headers to functions of one or more source headers.

Comments:

Based on previous crosswalk data and seeing which input column names were associated with each target schema column name, we want to create a basic system where we can give in an input column name and generate the target schema column name that is associated with it.

falling-fruit / dfg-trees Goto Github PK

dfg-trees's Introduction

Falling Fruit Legacy

Who is responsible?

How can I help?

Status

Development

Install PostgreSQL, Ruby, and dependencies

Start the app

Translation

For translators

For developers

Add a new translation

dfg-trees's People

Contributors

Watchers

Forkers

dfg-trees's Issues

Recommend Projects

Recommend Topics

Recommend Org