Giter Club home page Giter Club logo

import's Introduction

Data Commons: Import Tools and Pipelines

This is a repository for tools and pipelines for importing data into Data Commons.

About Data Commons

Data Commons is an Open Knowledge Graph that provides a unified view across multiple public data sets and statistics. It includes APIs and visual tools to easily explore and analyze data across different datasets without data cleaning or joining.

Using Import Tool

Detailed documentation on the Import Tool is available here.

  • Make sure Java 11+ is installed(download link).

  • Download the tool and run it with:

    java -jar <path-to-jar> lint <list of mcf/tmcf/csv files>

    It's useful to create an alias like

    alias dc-import='java -jar <path-to-jar>'

    so you can invoke the tool as dc-import lint

  • If there are warnings or errors, the tool will produce a JSON report with a table of exemplar errors.

    • It's useful to install an extension like Json-As-Table to view the JSON report (but be sure to allow the extension access to file URLs like this).

      Another option is to copy/paste the JSON content in jsongrid.

  • To see the list of flags that can be used and what the default values are: dc-import --help.

Development

NOTE: The instructions below are relevant for developing the tool. To use the tool, you just need the Java (11+) installed (see instructions above).

Prerequisites

  1. The tools are built using Apache Maven version 3.8.0.
    • For MacOS: brew install maven
  2. The tools use protobuf and require that protoc be installed.
    • For MacOS: brew install protobuf
  3. Make sure Java 11+ is installed
    • You can install it from here
  4. Check what version of Java Maven is using: mvn --version

Build and Test Import Tool

You can build and test the Java code from a Unix shell.

To build: mvn compile

To run tests: mvn test

To build binary: mvn package

  • which will produce tool/target/datacommons-import-tool-0.1-alpha.1-jar-with-dependencies.jar

  • and you can run it with

    java -jar tool/target/datacommons-import-tool-0.1-alpha.1-jar-with-dependencies.jar

To run the above maven commands on M1 macs (details), use the -Dos.arch=x86_64 option.

e.g. mvn compile -Dos.arch=x86_64

Run Server

The repo also hosts an experimental server for private DC.

To build: mvn compile

To run tests: mvn test

To build binary: mvn package

  • which will produce server/target/datacommons-server-0.1-alpha.1.jar

  • and you can run it with

    java -jar server/target/datacommons-server-0.1-alpha.1.jar <file1.tmcf> <file2.csv>

Send a request:

curl http://localhost:8080/stat/series?place=country/USA&statVar=<statVar>

Then should see "Hello World!" in the console output.

Coding Guidelines

The code is formatted using google-java-format. Please follow instructions in the README to integrate with IntelliJ/Eclipse IDEs.

The formatting is done as part of the build. It can be checked by running: mvn com.spotify.fmt:fmt-maven-plugin:check

Contributing Changes

From the repo page, click on "Fork" button to fork the repo.

Clone your forked repo to your desktop.

Add datacommonsorg/import repo as a remote:

git remote add dc https://github.com/datacommonsorg/import.git

Every time when you want to send a Pull Request, do the following steps:

git checkout master
git pull dc master
git checkout -b new_branch_name
# Make some code change
git add .
git commit -m "commit message"
git push -u origin new_branch_name

Then in your forked repo, you can send a Pull Request. If this is your first time contributing to a Google Open Source project, you may need to follow the steps in contributing.md.

Wait for approval of the Pull Request and merge the change.

License

Apache 2.0

Support

For general questions or issues, please open an issue on our issues page. For all other questions, please send an email to [email protected].

Note - This is not an officially supported Google product.

import's People

Contributors

pradh avatar keyurva avatar chejennifer avatar enjoythecode avatar shifucun avatar jehangiramjad avatar ajaits avatar spaceenter avatar lucy-kind avatar fructokinase avatar pulkit-s avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.