Giter Club home page Giter Club logo

datapackage-java's People

Contributors

dependabot[bot] avatar isnow avatar jdbranham avatar johannesjander avatar pwalsh avatar roll avatar shashigharti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

datapackage-java's Issues

NPE when creating Dialect from CSVFormat

Overview

I get a NullPointerException when trying this:

Dialect.fromCsvFormat(CSVFormat.Predefined.DEFAULT.getFormat());

The error comes from

dialect.setDoubleQuote(format.getQuoteMode().equals(QuoteMode.MINIMAL));

As

format.getQuoteMode()

returns null for some CSVFormats (e.g. DEFAULT or EXCEL because the value will not be set).


Please preserve this line to notify @iSnow (lead of this repository)

Rename DataPackage to Package class?

Overview

We've renamed DataPackage class to Package in all FD implementations lately. Not sure it's possible in Java. But just a note that it's happened in other libs.

character set of a resource

Overview

It seems that the character set of a resource is not taken into account. Here is a simple test case with ISO8859-1 encoding.

URL url = new URL("https://opendata.zitsh.de/frictionless/haltestellen-smartes-dorfshuttle-stand-01-2022.json");
String jsonString =  new Scanner(url.openStream()).useDelimiter("\\A").next();
ObjectNode resourceJson = (ObjectNode) JsonUtil.getInstance().createNode(jsonString);
Resource resource = Resource.build(resourceJson, null, false);
Iterator<String[]> iter = resource.stringArrayIterator();
while(iter.hasNext()) {
    System.out.println(Arrays.toString(iter.next()));
}

Please preserve this line to notify @iSnow (lead of this repository)

Read and validate a data package descriptor, including profile-specific validation via the registry of JSON Schemas

Read and validate a data package descriptor, including profile-specific validation via the registry of JSON Schemas.

  • Create package from JSONObject descriptor.
  • Create package from JSON String descriptor.
  • Create package from Remote File descriptor.
  • Create package from Local File descriptor.
  • Schema validation.
  • Support strict validation flag (GitHub Issue #17).
  • Profile-specific validation via the registry of JSON Schemas.

Streamline `Package.java` API

Overview

Besides the no-args constructor, the Package class has 8 constructors, half of them just convenience functions (See "AS IS" below).

The way the API is structured is also not very Java-like, as

  • URL-based constructors are limited to HTTP/HTTPS protocols according to the JavaDoc, but should (and probably do) also work for file-URLs.

  • Usage of filepaths in the form of Strings is at least unusual and feels like a direct Python-port. A Java-API preferably should use either File/Path or InputStream (which would cover URLs, Files, even JSON-Strings if need be)

  • Using a String to either hold JSON-content or a file-path to a ZIP-file in public Package(String jsonStringSource, boolean strict) is rather unusual for a Java-API.

Proposal 1: Trim and make idiomatic Java

It would be sufficient to trim the host of constructors to three, maybe four:

/**
 * Load from String representation of JSON object.
 */
public Package(String jsonStringSource, boolean strict)   

/**
 * Load from URL (either 'http'/'https' or file URL).
 */
public Package(URL urlSource, boolean strict) 

/**
 * Load from File
 */
public Package(File sourceFile, boolean strict) 

/**
 * Load from InputStream.
 */
public Package(InputStream source, boolean strict) 

The most fundamental constructor would rely on an InputStream, the other constructors are syntactic sugar and would simply create an InputStream on either the URL or the JSON-String and delegate to this.

It would be necessary to validate that this setup supports ZIP-packaged DataPackages, but that should be possible. Going through the code, it seems that Resource resolution has a handful of problems in both directory-based datapackages with resources (paths can't really be relative to the datapackage.json) and ZIP-packaged DataPackages, but I would have to look into this more deeply.

Proposal 2: Switch to a Builder-pattern

Since fluid interfaces are very much a part of Java since a couple of years, it would maybe make sense to go with the practice and switch over to a builder-based API:

public Package {
	private Package();

	public static PackageBuilder builder();

	public static class PackageBuilder {

	   public PackageBuilder fromSource(String jsonStringSource);

	   public PackageBuilder fromSource(URL urlSource);

	   public PackageBuilder fromSource(InputStream source);
	
	   //maybe add
	    public PackageBuilder fromZipSource(URL urlSource);

	   public PackageBuilder setStrict (boolean strict);

	   public Package build();
	}
}

The special method for reading from a ZIP file might be needed to read Resources from inside the ZIP-files, I am not totally sure.

If there's any interest in this, I would volunteer to work on either proposal to demo its validity.


AS IS:

 /**
     * Load from native Java JSONObject.
 */
public Package(JSONObject jsonObjectSource, boolean strict) 

/**
 * Load from native Java JSONObject.
 */
public Package(JSONObject jsonObjectSource) 

/**
 * Load from String representation of JSON object or from a zip file path.
 */
public Package(String jsonStringSource, boolean strict)         

/**
 * Load from String representation of JSON object or from a zip file path.
 */
public Package(String jsonStringSource) 

/**
 * Load from URL (must be in either 'http' or 'https' schemes).
 */
public Package(URL urlSource, boolean strict) 

/**
 * Load from URL (must be in either 'http' or 'https' schemes).
 * No validation by default.
 */
public Package(URL urlSource) 

/**
 * Load from local file system path.
 */
public Package(String filePath, String basePath, boolean strict) 

/**
 * Load from local file system path.
 * No validation by default.
 */
public Package(String filePath, String basePath) 

Please preserve this line to notify @georgeslabreche (maintainer of this repository)

Data Resource JSON schema definition for `schema` seems to contradict specification

Overview

Caution: there's a lot of different uses of "schema" coming, so bear with me...

in src/main/resources/, we find the JSON schema definition files for the various parts of a DataPackage. Among them is data-resource.json, where the formal definition for a schema property of a Resource is:

"schema": {
  "propertyOrder": 40,
  "title": "Schema",
  "description": "A schema for this resource.",
  "type": "object"
},

This allows only JSON-objects as schema definitions, essentially JSON-Strings. It disallows URLs or file paths.

This seemingly contradicts the specification of a schema property:

The value for the schema property on a resource MUST be an object representing the schema OR a string that identifies the location of the schema.

(Emphasis mine).

I believe it should read (not sure about the syntax):

"oneOf": [
    {
      "title": "Schema path",
      "description": "A fully qualified URL, or a POSIX file path..",
      "type": "string"
    },
    {
      "title": "Schema encoded as JSON",
      "type": "object"
    }
]

Please preserve this line to notify @georgeslabreche (lead of this repository)

Document Relations

Can you please document the expected behaviour of relations which is referenced in many places, e.g. here?

I had naively assumed a DP reader would follow foreign keys and return the end of the relation in a nested object inline.

Thanks!

Implement more thorough testing for resource schema dereferencing with invalid resource schema

Putting an invalid resource schema like this one does not trigger a validation exception:
https://github.com/frictionlessdata/datapackage-java/blob/6710fe22ded6674af7699a773a70bff918e7f76c/src/test/resources/fixtures/schema/invalid_population_schema.json

Because the data package schema only checks if resource schema is an object:

"schema": {
"propertyOrder": 40,
"title": "Schema",
"description": "A schema for this resource.",
"type": "object"

Certainly, something is missing here.

Finishing up

Description

This issue describes the set of tasks to complete in order to finish up work on the library.

Tasks

  • Touch base with the @jobarratt and @pwalsh to notify that you consider the work complete
  • Provide a short description / link to code for how each action is implemented, with a link to unit tests that prove each action
  • Tag your candidate code as v0.1
  • Setup travis to auto deploy tagged versions to the package management solution for your language
  • Ensure that the OKI account on the package management platform is an administrator/maintainer of the package, along with yourself
  • Receive code review from @pwalsh and address any remaining issues
  • Publish final version

Migrate out of org.json library

Using org.json as a JSON library is problematic, as the licence of this library includes an additional clause "The Software shall be used for Good, not Evil."

This is regarded as a non-free licence (it is non OSI-compliant).

That makes it impossible for projects relying on this library (such as OpenRefine) to be OSI-compliant in turn.

Incorrect check asText on ArrayNode element

Here you're checking contributors are not empty:

https://github.com/frictionlessdata/datapackage-java/blob/main/src/main/java/io/frictionlessdata/datapackage/Package.java#L775

That doesn't look like a correct way to do that. Contributors is ArrayNode which derived from ContainerNode and asText implemented there as:

@Override
public String asText() { return ""; }

https://github.com/FasterXML/jackson-databind/blob/2.15/src/main/java/com/fasterxml/jackson/databind/node/ContainerNode.java#L41

I think the right way would be:

!jsonNodeSource.get(Package.JSON_KEY_CONTRIBUTORS).isEmpty()

@iSnow

Getting started

Description

This issue documents the initial steps to get started with a new Frictionless Data implementation.

Tasks

  • Travis configuration
  • Coveralls configuration
  • Basic setup of README with badges
  • Basic setup of license
  • Review the whole family of specifications
  • Review the implementation notes
  • Review either the Python (Data Package Table Schema) or JavaScript (Data Package Table Schema) reference implementations (whichever language you feel most comfortable reading)
    • Note that we have high nineties test coverage on these libraries. Similar test coverage is expected here
  • Review the stack reference
  • Review the blog post that announces v1 of the specifications
  • Review the test packages that can be used to test your work (in addition to the normal and expected unit tests)
  • Review the OKFN Coding Standards
    • Parts of the coding standard are language specific, and parts are workflow specific. The workflow points are important for you. If you want to contribute language-related standards to our docs for your language, we welcome it!
  • Write a set of high-level issues for each library, on the respective issue tracker, that outline the work plan
    • Note the structure of this issue: A narrative description and a specific list of tasks. Follow a similar pattern
    • The sequence of work is important: start with the Table Schema library first, as the Data Package library has a direct dependency on it
  • Note the communication protocol for this work: All communication around the implementation must be in public. We want our work dynamic here to serve as an example for other implementors, and to share as much information as possible. There are two channels of communication:
  • Note the staff from OKI who are here to support you:
    • Jo Barratt - Project Manager for Frictionless Data
    • Evgeny Karev - Tech Lead for core Frictionless Data libraries
    • Serah Rono - Developer Advocate at Open Knowledge International, Dissemination Lead for the Tool Fund
    • Dan Fowler - Developer Advocate at Open Knowledge International, Pilot Lead for Frictionless Data, OKI Labs Lead
    • Adam Kariv - Engineering Lead at Open Knowledge International, Tech Lead on OpenSpending
    • Paul Walsh - Chief Product Officer at Open Knowledge International
  • Any communication around the grant agreement should be directly done by email with Jo Barratt, Frictionless Data Project Manager

Create a Resource class. Handle get/set Resource with the created Resource object.

At the moment, a Resource is simply a JSONObject and a list of Resources is a JSONArray of JSONObjects.

Something like getResources() return a JSONArray of JSONObjects representing Resource objects. This goes back to my first question in Issue #7. The reason I went ahead with JSONObjects and JSONArray is because its seems like 1) we don't necessarily want to map every single datapackage JSON elements to equivalent class and properties and 2) in the other libraries we are interacting with JSON structure directly so I thought I'd preserve that element. Nothing stops us from creating a Resource class and working with that, but where do we draw the lines as to what is represented by a class and what isn't?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.