frictionlessdata / datapackage-java Goto Github PK
View Code? Open in Web Editor NEWA Java library for working with Frictionless Data Data Packages.
License: MIT License
A Java library for working with Frictionless Data Data Packages.
License: MIT License
I get a NullPointerException when trying this:
Dialect.fromCsvFormat(CSVFormat.Predefined.DEFAULT.getFormat());
The error comes from
As
format.getQuoteMode()
returns null for some CSVFormats (e.g. DEFAULT or EXCEL because the value will not be set).
Please preserve this line to notify @iSnow (lead of this repository)
We've renamed DataPackage
class to Package
in all FD implementations lately. Not sure it's possible in Java. But just a note that it's happened in other libs.
My understanding based on the spec is that all the data files in the json file should also get packaged in a zip file along with datapackage.json file.
It seems that the character set of a resource is not taken into account. Here is a simple test case with ISO8859-1 encoding.
URL url = new URL("https://opendata.zitsh.de/frictionless/haltestellen-smartes-dorfshuttle-stand-01-2022.json");
String jsonString = new Scanner(url.openStream()).useDelimiter("\\A").next();
ObjectNode resourceJson = (ObjectNode) JsonUtil.getInstance().createNode(jsonString);
Resource resource = Resource.build(resourceJson, null, false);
Iterator<String[]> iter = resource.stringArrayIterator();
while(iter.hasNext()) {
System.out.println(Arrays.toString(iter.next()));
}
Please preserve this line to notify @iSnow (lead of this repository)
Read and validate a data package descriptor, including profile-specific validation via the registry of JSON Schemas.
Create and edit a data package descriptor, including methods to add and remove resources and validation after editing.
Besides the no-args constructor, the Package
class has 8 constructors, half of them just convenience functions (See "AS IS" below).
The way the API is structured is also not very Java-like, as
URL-based constructors are limited to HTTP/HTTPS protocols according to the JavaDoc, but should (and probably do) also work for file-URLs.
Usage of filepaths in the form of Strings is at least unusual and feels like a direct Python-port. A Java-API preferably should use either File/Path or InputStream (which would cover URLs, Files, even JSON-Strings if need be)
Using a String to either hold JSON-content or a file-path to a ZIP-file in public Package(String jsonStringSource, boolean strict)
is rather unusual for a Java-API.
It would be sufficient to trim the host of constructors to three, maybe four:
/**
* Load from String representation of JSON object.
*/
public Package(String jsonStringSource, boolean strict)
/**
* Load from URL (either 'http'/'https' or file URL).
*/
public Package(URL urlSource, boolean strict)
/**
* Load from File
*/
public Package(File sourceFile, boolean strict)
/**
* Load from InputStream.
*/
public Package(InputStream source, boolean strict)
The most fundamental constructor would rely on an InputStream
, the other constructors are syntactic sugar and would simply create an InputStream on either the URL or the JSON-String and delegate to this.
It would be necessary to validate that this setup supports ZIP-packaged DataPackages, but that should be possible. Going through the code, it seems that Resource
resolution has a handful of problems in both directory-based datapackages with resources (paths can't really be relative to the datapackage.json
) and ZIP-packaged DataPackages, but I would have to look into this more deeply.
Since fluid interfaces are very much a part of Java since a couple of years, it would maybe make sense to go with the practice and switch over to a builder-based API:
public Package {
private Package();
public static PackageBuilder builder();
public static class PackageBuilder {
public PackageBuilder fromSource(String jsonStringSource);
public PackageBuilder fromSource(URL urlSource);
public PackageBuilder fromSource(InputStream source);
//maybe add
public PackageBuilder fromZipSource(URL urlSource);
public PackageBuilder setStrict (boolean strict);
public Package build();
}
}
The special method for reading from a ZIP file might be needed to read Resource
s from inside the ZIP-files, I am not totally sure.
If there's any interest in this, I would volunteer to work on either proposal to demo its validity.
/**
* Load from native Java JSONObject.
*/
public Package(JSONObject jsonObjectSource, boolean strict)
/**
* Load from native Java JSONObject.
*/
public Package(JSONObject jsonObjectSource)
/**
* Load from String representation of JSON object or from a zip file path.
*/
public Package(String jsonStringSource, boolean strict)
/**
* Load from String representation of JSON object or from a zip file path.
*/
public Package(String jsonStringSource)
/**
* Load from URL (must be in either 'http' or 'https' schemes).
*/
public Package(URL urlSource, boolean strict)
/**
* Load from URL (must be in either 'http' or 'https' schemes).
* No validation by default.
*/
public Package(URL urlSource)
/**
* Load from local file system path.
*/
public Package(String filePath, String basePath, boolean strict)
/**
* Load from local file system path.
* No validation by default.
*/
public Package(String filePath, String basePath)
Please preserve this line to notify @georgeslabreche (maintainer of this repository)
Caution: there's a lot of different uses of "schema" coming, so bear with me...
in src/main/resources/
, we find the JSON schema definition files for the various parts of a DataPackage. Among them is data-resource.json, where the formal definition for a schema
property of a Resource is:
"schema": {
"propertyOrder": 40,
"title": "Schema",
"description": "A schema for this resource.",
"type": "object"
},
This allows only JSON-objects as schema
definitions, essentially JSON-Strings. It disallows URLs or file paths.
This seemingly contradicts the specification of a schema
property:
The value for the schema property on a resource MUST be an object representing the schema OR a string that identifies the location of the schema.
(Emphasis mine).
I believe it should read (not sure about the syntax):
"oneOf": [
{
"title": "Schema path",
"description": "A fully qualified URL, or a POSIX file path..",
"type": "string"
},
{
"title": "Schema encoded as JSON",
"type": "object"
}
]
Please preserve this line to notify @georgeslabreche (lead of this repository)
Support for schema dereferencing before validation:
Support for csv dialects.
Supporting strict/non-strict validation when creating data package object.
Cover the latest changes in the specs such as string/array for path etc.
Provide an API to interact with a data package descriptor.
Streaming and reading of resources through a table schema with cast on iteration.
Save a data package descriptor to a file path.
The Package() constructor with empty argument should take a template json file and fill in the jsonObject field instead of leave it empty. So user can fill in the value against the key from template instead of creating a blank instance user cannot use.
Can you please document the expected behaviour of relations
which is referenced in many places, e.g. here?
I had naively assumed a DP reader would follow foreign keys and return the end of the relation in a nested object inline.
Thanks!
Putting an invalid resource schema like this one does not trigger a validation exception:
https://github.com/frictionlessdata/datapackage-java/blob/6710fe22ded6674af7699a773a70bff918e7f76c/src/test/resources/fixtures/schema/invalid_population_schema.json
Because the data package schema only checks if resource schema is an object:
datapackage-java/src/main/resources/schemas/data-package.json
Lines 308 to 312 in 6710fe2
Certainly, something is missing here.
Implement read() method in Resource class. Should be using tableschema-java to achieve this.
Save a data package as a zip file on disc.
According to the frictionless specs contributor's role
is only recommended to be one of author
, publisher
, maintainer
, wrangler
, and contributor
. Current implementation allows only values from the enum https://github.com/frictionlessdata/datapackage-java/blob/main/src/main/java/io/frictionlessdata/datapackage/Contributor.java#L98
There are plenty of cases where those five is not enough, so it should be String.
Read a zip file containing a data package.
This issue is not a part of first iteration of work and created for a future implementation if possible
Specification - http://dataprotocols.org/data-package-identifier/
This issue describes the set of tasks to complete in order to finish up work on the library.
Using org.json as a JSON library is problematic, as the licence of this library includes an additional clause "The Software shall be used for Good, not Evil."
This is regarded as a non-free licence (it is non OSI-compliant).
That makes it impossible for projects relying on this library (such as OpenRefine) to be OSI-compliant in turn.
Here you're checking contributors are not empty:
That doesn't look like a correct way to do that. Contributors is ArrayNode
which derived from ContainerNode
and asText
implemented there as:
@Override
public String asText() { return ""; }
I think the right way would be:
!jsonNodeSource.get(Package.JSON_KEY_CONTRIBUTORS).isEmpty()
Support for multipart resources.
Most Java libraries are available on Maven (http://search.maven.org/), it seems that this one is not. It would be very useful to upload it there.
This issue documents the initial steps to get started with a new Frictionless Data implementation.
With lib now updated with appropriate open source json parser let's do a release so downstream can pull. Would be relevant for OpenRefine see e.g. OpenRefine/OpenRefine#778 (comment)
@roll would you be able to lead on this?
/cc @lwinfree
At the moment, a Resource is simply a JSONObject and a list of Resources is a JSONArray of JSONObjects.
Something like getResources() return a JSONArray of JSONObjects representing Resource objects. This goes back to my first question in Issue #7. The reason I went ahead with JSONObjects and JSONArray is because its seems like 1) we don't necessarily want to map every single datapackage JSON elements to equivalent class and properties and 2) in the other libraries we are interacting with JSON structure directly so I thought I'd preserve that element. Nothing stops us from creating a Resource class and working with that, but where do we draw the lines as to what is represented by a class and what isn't?
@roll we used Travis-CI dot ORG (https://travis-ci.org/frictionlessdata/datapackage-java) as a CI and test pipeline, but that's gone away. Could you please register us at https://www.travis-ci.com/ instead?
TIA
This issue is not a part of first iteration of work and created for a future implementation if possible
See - https://github.com/frictionlessdata/datapackage-js#foreign-keys
Would like to see the datapackage-java and reference the tableschema-java data model (or vise verse) so user can benefit from both spec/repo.
Will handle this objects as JSONArray objects.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.