Giter Club home page Giter Club logo

Comments (4)

danfowler avatar danfowler commented on September 1, 2024 1

Hi @esjewett just flagging that I am available for any questions, both here and in our Frictionless Data chat: https://gitter.im/frictionlessdata/chat 👍

from breve.

esjewett avatar esjewett commented on September 1, 2024

I think we'd certainly be interested in adding support to these formats to Palladio's capabilities and we have looked at them before, but there are a few questions. It's worth noting that Breve uses Palladio's data processing engine internally and doesn't expose all of Palladio's capabilities. If you can provide some insight into these issues, that would be helpful:

  1. Is there a single-file format available? The ability for users to port data using a single file is fairly integral to the approach of Palladio and Breve, not least because the ability to download the file from the browser client-side is limited to one file at a time.
  2. How much flexibility is there in the types? Here are some examples of where the JSON Table Schema types and the Palladio types don't agree, as far as I recall:
    1. No URL/URI type in JSON Table Schema
    2. Palladio supports heterogenous data columns (e.g. a column with both 1234-12-12 YYYY-MM-DD and 1234 YYYY formats) and we hope to support even more in the future. The goal is to be able to support different types of fuzzy dates and durations.
    3. Breve supports ordinal/nominal indicators and such a concept doesn't seem to exist in JSON data table.
  3. The Palladio format stores multiple tables in a single data file along with information about the user-defined mapping/join between the tables. Would this be possible with the Data Packages format?
  4. Related to 2 above, the Palladio format stores further user-supplied information about dimensions. Does the format allow storing this type of additional information. A couple of examples:
    1. If the user has selected the dimension as displayed or not.
    2. If the dimension uses multi-value delimiters internally

I don't mean to imply that the Palladio format is some sort of superior format. It is really just based on Palladio's internal representation. If we could switch to a format based on Data Packages, that would probably be ideal, but as a research project we also need to maintain the ability to be flexible and expressive in areas where Data Packages has made the sort of decisions to limit expression that make perfect sense in a standard format.

Thanks!

from breve.

danfowler avatar danfowler commented on September 1, 2024

Hi Ethan, thanks for your quick, thoughtful, and thorough response!

  1. Data Packages can support a single-file use case similar to Palladio's export format. A given "resource" in the resources array (equivalent to a "file" in the Palladio export JSON files array) in a Data Package can have one of "url", "path", or "data"; the "data" attribute can be used to store in-line data in a JSON array exactly equivalent to the "data" attribute in Palladio's export format.

    http://specs.frictionlessdata.io/data-packages/#inline-data

  2. Types

    1. Each field type in JSON Table Schema has a set of format options. For strings, this does, in fact, include a uri format: http://specs.frictionlessdata.io/json-table-schema/#string

    2. For a field type of date (or time or datetime), there actually is a format option "any" which is specified like so:

      any: Any parsable representation of the type. The implementing library can attempt to parse the datetime via a range of strategies. An example is dateutil.parser.parse from the python-dateutils library.

      I'm not sure if this fully supports your use case, so let me know.

    3. No explicit support for "ordinal" or "nominal" indicators on a field.

  3. The Tabular Data Package format supports multiple tables and user-defined relations between them. See this section for details:

    http://specs.frictionlessdata.io/json-table-schema/#foreign-keys

    Example: countries-and-currencies/datapackage.json

  4. The Data Package format does allow for extra fields.

    NOTE: A Data Package author MAY add any number of additional fields beyond those listed in the specification here.

    http://specs.frictionlessdata.io/data-packages/#optional-fields

cc'ing @rgrp @pwalsh as they might have further thoughts on the above

from breve.

esjewett avatar esjewett commented on September 1, 2024

Hi Dan,

This is really encouraging. Given all this, I think it may be possible to simply move Palladio's data format to Data Packages, which would solve this problem for Palladio as well as Breve. It's going to a process, but I'll probably start prototyping in a branch of the Palladio repo soon.

Thanks,
Ethan

from breve.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.