Giter Club home page Giter Club logo

Comments (6)

jpivarski avatar jpivarski commented on June 13, 2024

You can share types between different sections of PFA, and Hadrian (and pfainspector, which is actually part of Titus) support this. To diagnose your error, it would be best to see an example and the specific error message, but here's an example that works, which might help as a guide.

{"input": "double",
 "output": {"type": "record",
            "name": "SharedType",
            "fields": [{"name": "num", "type": "double"},
                       {"name": "sum", "type": "double"}]},
 "cells": {"tally": {"type": "SharedType", "init": {"num": 0, "sum": 0}}},
 "action":
     {"cell": "tally",
      "to": {"params": [{"old": "SharedType"}],
             "ret": "SharedType",
             "do": {"type": "SharedType",
                    "new": {"num": {"+": ["old.num", 1]},
                            "sum": {"+": ["old.sum", "input"]}}}}}
 }

Here, SharedType is defined in the output and used in a cell named "tally". The scoring engine uses that cell to keep a counter and a running sum.

Perhaps the issue is that a type can only be defined once: after that, it has to be referenced. (Or before that, since order is irrelevant.) In this example, the SharedType is represented as

{"type": "record",
 "name": "SharedType",
 "fields": [{"name": "num", "type": "double"},
            {"name": "sum", "type": "double"}]}

in the output section and just "SharedType" in the cell's type, the parameter and return types of the inline function ("params": [{"old": "SharedType"}], "ret": "SharedType"), and the new record constructor ("type": "SharedType", "new": {...}). The full definition could have been given in any one of these places, as long as it is given exactly once.

This is an Avro rule. It ensures that types are not redefined in different ways. Unfortunately, this rule complicates the building of PFA files programmatically.

At least, that's what I remember. However, I'm testing it now and Hadrian and Titus are both allowing SharedType to be multiply defined as long as the type definition is exactly the same each time. I must have made some concession to convenience at some point. I'll have to check to see if multiple definitions of the same type, defined exactly the same way, is allowed by the PFA specification, not just Hadrian and Titus.

Please open a new issue with your exact error, unless the above example clarifies things in such a way that you can fix it on your own. Thanks!

from pfa.

jpivarski avatar jpivarski commented on June 13, 2024

I don't see any mention of this rule (or its negation) on pages 34-36 of the specification. This is a candidate for clarification. I vote for allowing repeated type definitons, as long as the type is defined exactly the same way each time, the way that Hadrian and Titus do it now.

from pfa.

ludovicc avatar ludovicc commented on June 13, 2024

@jpivarski Thanks for the answer. I'm doing something similar to your example, but using an array of records:

https://gist.github.com/ludovicc/045ccfba464a99d7cff0

pfainspector doesn't like the duplicated ReducedData name:

pfa valid tsne_pfa
Could not resolve the following types:
{"doc": "New representation for the data", "type": "array", "items": {"fields": [{"doc": "Reduced dimension 1", "type": "double", "name": "1"}, {"doc": "Reduced dimension 2", "type": "double", "name": "2"}, {"doc": "Variable prov", "type": {"symbols": ["AD1", "AD2"], "type": "enum", "name": "Enumprov"}, "name": "prov"}, {"doc": "Variable age", "type": "double", "name": "age"}], "type": "record", "name": "ReducedData"}} (Items schema ({u'fields': [{u'doc': u'Reduced dimension 1', u'type': u'double', u'name': u'1'}, {u'doc': u'Reduced dimension 2', u'type': u'double', u'name': u'2'}, {u'doc': u'Variable prov', u'type': {u'symbols': [u'AD1', u'AD2'], u'type': u'enum', u'name': u'Enumprov'}, u'name': u'prov'}, {u'doc': u'Variable age', u'type': u'double', u'name': u'age'}], u'type': u'record', u'name': u'ReducedData'}) not a valid Avro schema: The name "ReducedData" is already in use. (known names: [u'Query', u'AdditionalParameters', u'Enumprov', u'ReducedData']))

from pfa.

jpivarski avatar jpivarski commented on June 13, 2024

The doc strings make it not exactly the same. You have two choices: either define the record type once and only reference it later (e.g. {"type": "array", "items": "ReducedData"}) or ensure that the two definitions are identical.

The way Hadrian and Titus ignore multiple definitions of the same type is by putting the JSON in canonical form (ignore whitespace and sort keys) and using this as a key in a hashmap. Thus, differences in typesetting don't affect equality, but doc strings do. Perhaps they ought to strip the doc strings when defining the canonical form, too.

The convenience I was referring to is in building PFA programatically. If you're building up a data structure representing the PFA, reused types are probably copies or references of the same data structure in memory, and a naive conversion to JSON would cause these definitions to be written out each time, exactly the same way. This is not a likely pattern when writing PFA by hand--- then it's quite easy to create types that are not exactly the same because it fails the DRY principle. That case could introduce ambiguity: if you take snapshots of the running PFA engine, which doc strings should it write out? (It would only be one because each type should be identified as a single entity at runtime.)

The PFA specification should clarify this: how identical do the multiple definitions of the types have to be to allow them? Hadrian/Titus currently require JSON-identity, but since these are Avro schemas, perhaps we should extend the definition of equality to ignore doc strings (accepting the first doc string as the true one? the last?).

For your problem, @ludovicc, just make the doc strings the same or replace the second definition of the ReduceData record with "ReduceData". (The latter is the DRYer solution.) I put a correction on your Gist.

from pfa.

ludovicc avatar ludovicc commented on June 13, 2024

@jpivarski

Type checking failing because of a difference in doc strings is quite... surprising!
Docs should be stripped when comparing types, even if it's at the cost of some confusion when resolving documentation for a type.

I have tried to replace the second definition of ReduceData, but pfainspector still fails:
https://gist.github.com/ludovicc/f0eaeb28ebdfb0f51810

Could not resolve the following types:
{"items": {"type": "ReducedData"}, "type": "array"} (Items schema ({u'type': u'ReducedData'}) not a valid Avro schema: Undefined type: ReducedData (known names: [u'Query', u'AdditionalParameters', u'Enumprov', u'ReducedData']))

If I use exactly the same documentation, validation works. Thanks!

Btw, @jpivarski , Hadrian looks nice but why didn't you use a proper open source license, even GPL would have been usable for me and my project (HumanBrainProject).

from pfa.

jpivarski avatar jpivarski commented on June 13, 2024

I commented on the first Gist with a working DRY version. The second has a minor error: {"type": "ReducedData"} should be "ReducedData". (Avro spec)

It may be surprising that two Avro type schemas with different doc strings are (currently) regarded as different, but it's part of the can of worms that gets opened when we allow the type to be specified multiple times. Like I said, one solution would be to just never allow it, which makes the PFA completely unambiguous, but that would complicate algorithms that write PFA. The current solution of allowing multiply defined schemas if they are JSON-identical is completely friendly to algorithms (which would write them exactly the same way each time), but can be confusing if you're writing the PFA by hand (not the intended use-case: it's hard to code in JSON). Allowing multiply defined schemas with different doc strings introduces an ambiguity: only one version of the doc string can be stored in memory. Hours later, when the user takes a snapshot of their running PFA, they'll get one doc string or the other, or none. If we take the first or the last or a random one, somebody someday will ask, "Why is PFA dropping my doc strings?"

Issues related to Hadrian's license should be directed to the Hadrian GitHub project: https://github.com/opendatagroup/hadrian

Thanks!

from pfa.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.