Giter Club home page Giter Club logo

Comments (3)

piotrszul avatar piotrszul commented on June 15, 2024

I have had a look at this issue and unfortunately the user defined datatypes are not public in spark 2.x. They might be made public in 3.x.x see: https://issues.apache.org/jira/browse/SPARK-7768

I think however that there might be a way around by modifying the schema (and encoders) od elements that have decimal primitives to include an extra field with precision. e.g.: here is the schema for Observation (the relevant part):

|-- valueQuantity: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- value: decimal(12,4) (nullable = true)
 |    |-- value_precision: integer (nullable = true)
 |    |-- comparator: string (nullable = true)
 |    |-- unit: string (nullable = true)
 |    |-- system: string (nullable = true)
 |    |-- code: string (nullable = true)
 |-- valueRange: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- low: struct (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- value: decimal(12,4) (nullable = true)
 |    |    |-- value_precision: integer (nullable = true)
 |    |    |-- comparator: string (nullable = true)
 |    |    |-- unit: string (nullable = true)
 |    |    |-- system: string (nullable = true)
 |    |    |-- code: string (nullable = true)
 |    |-- high: struct (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- value: decimal(12,4) (nullable = true)
 |    |    |-- value_precision: integer (nullable = true)
 |    |    |-- comparator: string (nullable = true)
 |    |    |-- unit: string (nullable = true)
 |    |    |-- system: string (nullable = true)
 |    |    |-- code: string (nullable = true)
 |-- valueRatio: struct (nullable = true)
 |    |-- id: string (nullable = true)
 |    |-- numerator: struct (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- value: decimal(12,4) (nullable = true)
 |    |    |-- value_precision: integer (nullable = true)
 |    |    |-- comparator: string (nullable = true)
 |    |    |-- unit: string (nullable = true)
 |    |    |-- system: string (nullable = true)
 |    |    |-- code: string (nullable = true)

So essentially for every value field we add a hidden value_precision field after it.
In SQL queries then we can still use 'value' as Decimal but the encoders will be aware of extra fields and adjust the scale/precision accordingly.

The schema generation part is easy, but the encoders seem to depend on the order of the fields (not their names) so I need to understand how to fix them.

@johngrimes would that be a reasonable solution?

from pathling.

johngrimes avatar johngrimes commented on June 15, 2024

Yes, that sounds fine.

Have you thought about what revised values for precision and scale might be appropriate?

from pathling.

piotrszul avatar piotrszul commented on June 15, 2024

Yes, I think DECIMAL(24, 6) is appropriate. This gives us at least 18 digits with any scale (as per xs:decimal min requirements) as well as the ability to represent up to 6 decimal places, which in terms of coordinates (which the FIHR spec says are most likely the ones that need largest scale) gives location precision of ~ 10cm.
I have implemented all the changes, but since the schema all the data need to be re-imported.

from pathling.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.