coursera / courier Goto Github PK
View Code? Open in Web Editor NEWData interchange for the modern web + mobile stack.
Home Page: http://coursera.github.io/courier/
License: Apache License 2.0
Data interchange for the modern web + mobile stack.
Home Page: http://coursera.github.io/courier/
License: Apache License 2.0
Example Courier schema:
namespace org.example.other
import org.coursera.customtypes.CustomUnionTestId
import org.coursera.customtypes.CustomRecord
typeref EvilUnion = union[CustomUnionTestId, CustomRecord]
The Scala compiler will fail because Custom
and SingleElementCaseClassCoercer
have not been imported.
Pegasus schemas have a 'null' type. Courier has not supported it up to this point, largely because, when writing idiomatic Scala, there is no need for it. But to be 100% compatible with Pegasus, we should support it (and then discourage it's use!). Once supported, we can guarantee developers that any .pdsc schema can be handled correctly by Courier.
Hello! An older, unknown version of courier currently lives at https://dl.bintray.com/coursera/generic/courier .
Can somebody with credentials please release the latest version? Thanks!
Erem
Based on how we are observing the courier schema language used in practice. We plan to remove = nil
defaults. What we’ve observed is that in many cases optional fields that could/should be marked as nil
are not. This is inconvenient because it then requires that callees explicitly provide None for parameters that they should not need to. To fix this we are going to change the default behavior. All optional fields will be generated with = None
by default in Scala code (= nil
in Swift) unless a @explicit
property is added to the field (exact name of the property TBD) .
It says in the document that the "include" key will add all the fields but does not do inheritance. While for new projects, this is easy to get around, people looking to switch to this framework are possibly relying on inheritance in their code (at least I am)
I think it's a good idea to allow an option to choose classes for records instead of structs (default). This ticket is a suggest for the future rather than a bug with the current implementation.
First, a couple of things:
Here's why I think classes are better for this use-case than structs:
I would opt instead to use final class
. As far as I can tell, for immutable models, struct and final class are completely interchangeable. As in, from the view of the developer, there is no change in functionality or behavior between the two. Both are immutable, unsubclassable, and thread-safe. (There may be differences under the hood, but none of this will create any changes in program execution).
Therefore, I'd always opt for final class
over struct
if I could and it'd be cool if this library supported it.
Its a lot faster than GSON, consumes lesser memory and does not use reflection. The generated code is a bit more verbose, but the tradeoff is worth it.
[UnusedParameter] Parameter record is not used in method unapply.
The docs reference outdated version 1.2.2 and there is no indication that SBT >1 is not supported
There are a number of binary protocols that provide "JSON equivalent" semantics:
We should carefully review the performance of these (https://github.com/eishay/jvm-serializers/wiki is a good start) and determine which is the best and add codec support to Courier for that protocol.
Pegasus already includes support for PSON and BSON, so we really only need to review UBJSON, Smile, CBOR or any other contenders and determine if they are better than PSON or BSON, and if so, add codec support.
I'm not clear on all the details here, but there are certainly some problems: https://phabricator.dkandu.me/D49527
Right now, records that use field inclusion have the standard two construction methods in their companion (building from DataMap
and with all fields specified explicitly). I think it'd be useful to add additional methods, perhaps called assemble
, that accept the included arguments as records instead of individual fields. For example:
record A {
field1: int
// many others
}
record B {
...A
fieldB: string
}
I'd like to construct B
with B.assemble(A(...), "fieldB")
, instead of B(1, ..., "fieldB")
.
For records with multiple inclusions, I think generating a single assemble
that accepts all included records plus all additional fields is sufficiently useful (rather than worrying about the complexity of overloads for different inclusion combinations).
Courier is unable to compile structures like so:
namespace example
record Test1 {
tst: Test2
}
namespace example
record Test2 {
tst: Test1
}
So it's impossible to express complex recursive structures (like JSON tree, for example) with courier schemas.
Hey y'all,
Having Python bindings for Courier has become increasingly important for us at Instrumental as we scale up our dependence on the language for ML and other flows. As such, I've posted a work-in-progress PR that generates python3 bindings from courier templates. It's at the point where it works in most cases and can be used to idiomatically create, serialize, deserialize, and validate the types in your courier templates. It's not at the point where it's worth reviewing the code.
In the README you will see the details of what is yet-to-be-implemented and what I intend to leave for later. At this point I will be interested in a couple high-level questions:
courier.dumps(obj)
)I will be continuing to work on it here and there for the next few weeks before it will be ready for merge. Particularly we will be dogfooding it internally to hammer out the rough points of the generated API.
See the python test-cases for examples of how the API works as-written. The basic gist is:
# Assume we have generated python bindings into the `generated` package
import generated.courier as courier
from generated.org.example.MagicEightBall import MagicEightBall
from generated.org.example.MagicEightBallAnswer import MagicEightBallAnswer
json = """{"question": "Will I ever love again?", "answer": "IT_IS_CERTAIN"}"""
ball = courier.parse(MagicEightBall, json) # raises courier.ValidationError if doesn't match schema
assert(courier.serialize(ball) == json) # Passes
ball.message = 'Am I human?'
new_ball = MagicEightBall(message='Am I human?', answer=MagicEightBallAnswer.IT_IS_CERTAIN)
assert(ball == new_ball) # Passes
It would be great to see support for Scala 2.13 for this project.
On quick review, it looks like the main breaking change here is the change to the Map API.
Other than that it seems like https://github.com/coursera/courscala would also need to be updated.
While Courier supports data validation, the validator functions it provides need to be called explicitly.
That's feasible when used with the right frameworks - such as Naptime - which call the validator functions at interface boundaries, but it is impractical for other use cases.
For example, consider the following model:
record JeopardyResponse {
@validate.regex = {
"regex": "^(What|Who) is"
}
question: string
}
A user might prefer if invalid constructions such as new JeopardyResponse(question = "Why is the sky blue")
failed early without any explicit call to validation.
Currently the closest they can get is by either
final
)The bottom line is that users who want their models validated at construction time have to write code that could be generated.
Adding a new annotation would let users label types which need to be validated at construction time:
@validateConstruction
record JeopardyResponse {
@validate.regex = {
"regex": "^(What|Who) is"
}
question: string
}
Courier would then generate constructor code that calls the validator function and signals failure in an idiomatic way. For example, in Java and Scala, new JeopardyResponse(question = "Why is the sky blue")
would throw an IllegalArgumentException
.
Hello
I have a case where json has keys with boolean values encoded as strings "0" and "1".
It is from external system and cannot influence it.
Did implement coercer and custom type, and managed to get
coerced values from "0" or "1" to IntBoolean(false) and IntBoolean(true)
but cannot manged to get those values with record.data()
nor to get them into avro.
Simple example of what I did is
IntoBooleanRecord.courier
namespace test
record IntBooleanRecord {
key : IntBoolean
}
IntBooleanCoercer.scala
package test
import com.linkedin.data.template.{Custom, DirectCoercer}
case class IntBoolean(value: Boolean) extends AnyVal
class IntBooleanCoercer extends DirectCoercer[IntBoolean] {
override def coerceInput(obj: IntBoolean): AnyRef = {
Boolean.box(obj.value)
}
override def coerceOutput(obj: Any): IntBoolean = {
obj match {
case value: String =>
if (value =="0") {
IntBoolean(false)
}else if (value =="1") {
IntBoolean(true)
} else {
throw new IllegalArgumentException(s"$value is not 0 or 1")
}
case _: Any =>
throw new IllegalArgumentException(
s"Field must be string with value 0 or 1, but was ${obj.getClass}"
)
}
}
}
object IntBooleanCoercer {
registerCoercer()
def registerCoercer(): Unit = {
Custom.registerCoercer(new IntBooleanCoercer, classOf[IntBoolean])
}
}
IntBoolean.courier
namespace test
@scala.class = "test.IntBoolean"
@scala.coercerClass = "test.IntBooleanCoercer"
typeref IntBoolean = boolean
scala code to test
val json=
"""{
| "key": "1"
|}""".stripMargin
val dataMap = DataTemplates.readDataMap(json)
val record=IntBooleanRecord(dataMap,DataConversion.SetReadOnly)
println(record) // IntBooleanRecord(IntBoolean(true))
println(record.data()) // {key=1}
I was expecting for record.data() to output {key=true}
Is there something I am doing wrong or this is not supposed to function this way?
if not, what would be the way to do it?
Help would be appreciated.
Example input:
record Record {
field: int
/** dangling doc comment */
}
Hello,
Thanks for writing courier !
It is unclear to me which JVM is targeted by courier but it would be really nice to have default bindings for common non-primitive types from the java standard library :
java 1.6 +
java 1.8 +
(I'm talking about the generated scala code, and I've never looked at the code generators for other languages, but I assume that those might have similar behavior.)
When you give a field in a courier record a default value, it generates an apply method with default arguments for that field. This can hide bugs where you construct the record without giving it all the data that it needs. Therefore, I suggest that the generated methods not have default arguments.
Obviously, this will probably break a lot of existing code that uses courier. Maybe there should be some sort of "generateDefaultArgument
" annotation to ease the transition.
Here is an example of a bug that the current behavior hid from me:
record SpecificationWithId {
creatorName: CreatorName
name: SpecificationName
...Specification
}
record Specification {
isStandalone: boolean?
template: AnyData
children: array[NodeRequest] = []
preCreatedNodes: array[PreCreatedNode] = []
}
object SpecificationWithIds {
def toTuple(specificationWithId: SpecificationWithId):
(QualifiedSpecificationName, Specification) = {
val qualifiedSpecificationName = QualifiedSpecificationName(
specificationWithId.creatorName,
specificationWithId.name)
val specification = Specification(
specificationWithId.isStandalone,
specificationWithId.template,
specificationWithId.children)
(qualifiedSpecificationName, specification)
}
}
I added the preCreatedNodes
field after I wrote the SpecificationWithIds.toTuple
"deconstructor", and I forgot that I needed to update the "deconstructor".
Courier currently has simple JMH benchmark runnable:
https://github.com/coursera/courier/tree/benchmark/benchmark
We should flesh out these benchmarks and test out cases such as large arrays. We should also benchmark our supported binary protocols using this utility. Once we are satisfied with the benchmarks, we should merge this into master
.
When generating typescript bindings from courier records that contain recursive definitions, the generated typescript interface file includes an invalid import of itself.
record test {
recursiveField: test
}
import { test } from "./.test";
export interface test {
recursiveField : test;
}
From talking to Swift developers, idiomatic bindings should look something like:
Record:
/**
A fortune cookie.
*/
struct FortuneCookie {
/**
A fortune cookie message.
*/
let message: String
var certainty: Float?
let luckyNumbers: [Int]
let map: [String: Int]
let simple: Simple
}
Union:
import Foundation
enum Telling {
case FortuneCookieType(FortuneCookie)
case MagicEightBallType(MagicEightBall)
case StringType(String)
}
Enum:
import Foundation
enum MagicEightBallAnswer {
case IT_IS_CERTAIN
/**
Where later is at least 10ms from now.
*/
case ASK_AGAIN_LATER
case OUTLOOK_NOT_SO_GOOD
}
Typeref:
import Foundation
/**
IOS 8601 date-time
*/
typealias DateTime = String
JSON serializer:
We are currently prototyping with SwiftyJSON.
Open issues:
let
instead of var
)? If so, how?We have a number of persisted models with enum types in which the string representations of the enums are camel-cased. Courier only supports all-caps enums, so these enums cannot be migrated to Courier.
Possible solutions:
I noticed that unapply
on unions was added in September, and so it is not part of the current 0.4.1. Can we get a 0.12.3
release into sonatype?
I love twirl, but its one down-side is legibility of the generated code due to excessive whitespace.
What do you guys think would be a good solution to this? Adding a scalariform pass is the first thing that comes to mind, but I'm not sure what your thoughts are wrt adding that dependency.
In Swift, we can handle modifications to immutable types in a similar way to how we modify Scala immutable types-- via a copy method. In Scala copy methods look like:
def copy(field1: String = this.field1, field2: Int = this.field2, field3: Boolean = this.field3)
Which makes it easy to perform a copy and change only whatever fields one wants changed using named parameters, e.g.:
instance.copy(field2 = 5)
or
instance.copy(field1 = "updated")
Courier provides an API in the generator-api project that is used by the build system integrations (gradle-plugin and sbt-plugin). This API is then implemented by each language specific generator (scala, swift, android java).
However, there is no implementation of the API for the standard Pegasus Java data binding generator. As a result, it is not possible to generate Pegasus Java data bindings using the Courier schema language.
This is a relatively straight forward task. We simply need to define a new java/generator
project and define a java/generator/src/main/java/org/coursera/courier/JavaGenerator.java
class that implements PegasusCodeGenerator
with a generate method that simply delegates to the existing Pegasus Java generator implementation.
Be sure to document how to set up a Courier project for Java in a README and link to it from the main courier documentation!
We’ve used the .pdsc file format up to this point for Pegasus schemas for a few reasons:
However, writing JSON by hand has some limitations:
And the .pdsc JSON structure has a number of warts as well:
We have a few goals for a grammar:
Examples:
namespace org.coursera.fortune
include org.coursera.models.common.DateTime
/** A fortune. */
record Fortune {
@{ "isTranslatable": true }
title: string,
/** The fortune telling. */
telling: FortuneCookie | MagicEightBall | string // a union
createdAt: DateTime? = nil // optional defaulted to nil/None
}
namespace org.coursera.fortune
enum MagicEightBallAnswer {
IT_IS_CERTAIN
/** Where later is at least 10 ms from now. */
ASK_AGAIN_LATER
OUTLOOK_NOT_SO_GOOD
}
namespace org.coursera.fortune
record FortuneCookie {
...SomeRecord // include fields from another record
luckyNumbers: array[int]
exampleMap: map[int, string]
}
namespace org.coursera.models.common
/** ISO 8601 date-time. */
@{
"scala": {
"class": "org.joda.time.DateTime",
"coercerClass": "org.coursera.models.common.DateTimeCoercer"
}
}
typeref DateTime = string
Features:
/* */
and //
/** */
style doc strings, but with markdown support<identifier> ":" <type> (= <default>)?
“a string”
, 1
, 3.14
, true
, ... , [ … ]
, { … }
<type>[<typeParams>]
syntax@...
)<type> “?”
(Alternatively we could use Scala style instead, e.g. optional[])<type>[<typeParams>]
. We will use this for maps (map[key, value]
) arrays (array[items]
) and unions (union[Member1, Member2, ...]
) and may the syntax user defined generic types in the future.For example
namespace org.coursera.learning.course.activity
record Example {
// The example's field
field: int
}
causes
[info] Courier: Generating Scala bindings for .pdsc and .courier files for 'compile' configuration.
[error] Courier generator error, cause: java.io.IOException: /Users/marc/base/coursera/infra-services/libs/models/src/main/pegasus/org/coursera/learning/course/activity/Example.courier,"field" or "org.coursera.learning.course.activity.field" cannot be resolved.
[error] 4,19: Type not found: field
[error] 4,16: token recognition error at: '''
[error] 4,19: missing ':' at 'field'
It compiles fine if I remove the apostrophe.
Also interesting: apostrophes in multi-line comments (/**/) don't break it.
Developers often want to represent a set in Courier.
Currently, the only available approach is to use a Pegasus array
and transform it to/from a Scala Set manually.
Adding an option in our .pdsc files that developer could use to specify the desired binding type, e.g.:
{ "type": "array", "items": "Example", "scala": { "type": "set" } }
Would allow us to generate an appropriate Set binding class.
Right now Courier record
s work nicely as resource models in Naptime, but Courier doesn't support non-primitive resource keys well.
It seems natural to use record
s for composite resource keys. For example, suppose we have a repositories.v1
resource with key :organization~:repositoryName
. We can define a record
:
record RepositoryId {
organization: string
repositoryName: string
}
and make requests like GET /repositories.v1/coursera~courier
.
However, if we use this key in another model, it'll be serialized as an object, not as a URL-usable string like "coursera~courier"
. For example:
record PullRequest {
title: string
repositoryId: RepositoryId
}
may be serialized as:
{
"title": "Example PR",
"repositoryId": {
"organization": "coursera",
"repositoryName": "courier"
}
}
For client convenience, it'd be nice to have this instead:
{
"title": "Example PR",
"repositoryId": "coursera~courier"
}
because then clients can easily pull out the repositoryId
field and construct a repositories.v1
request.
A possible workaround right now is to define all ids as typeref RepositoryId = string
, which produces the desired serialization, but requires language-specific coercion to preserve type safety.
Proposed new syntax:
id RepositoryId {
organization: string
repositoryName: string
}
where id
objects are string-serialized when used in other record
or id
objects, using one of the string codecs Courier already supports.
(See #71 for reproduction)
For quite a few versions now (since 5acb817) I think generated scala records have been incompatible with the readRecord
methods in org.coursera.courier.templates.DataTemplates
Attempting to read a record results in java.lang.NoSuchMethodException: org.coursera.records.test.Simple$.apply
, of course because the previous apply(DataMap, DataConversion)
is now build(DataMap, DataConversion)
My temptation is to update DataTemplates
to call build
instead of apply
, but that would be backwards incompatible for Courier users who are still operating on old templates. Could also perform two lookups in case the first one fails, but that will double the reflection work for either old or new clients. Curious your thoughts how to remediate?
This is relatively high priority for us at Instrumental so would love your thoughts.
The plugin currently only resolves files in the <lib>/pegasus
directory but does not offer a simple way to configure or resolve .courier
files from a dependent library. This leads to minor annoyances with shared models from other packages that you want to reference in courier sources and is a blocker for splitting up a centralized models repository. Note that courier codegen classes are resolvable because that is post-SBT plugin resolver code path.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.