galoisinc / daedalus Goto Github PK
View Code? Open in Web Editor NEWThe Daedalus data description language
License: BSD 3-Clause "New" or "Revised" License
The Daedalus data description language
License: BSD 3-Clause "New" or "Revised" License
The PDF definition should be extended to support inherited features (Sec. 7.7.3.4).
From the chat:
Simon Winwood 3:06 PM
I suppose we could comma separate pattern cases
case v of {
Foo x, Bar x, Baz x -> ...;
}
Iavor Diatchki 3:07 PM
Maybe: case e of { P1 -> e1; P2 -> e2 }.
I'd avoid the union patterns
Simon Winwood 3:07 PM
main reason I want them is that they are used in NITF pretty heavily
(I think)
do we know the type of v when checking e.g. case?
Iavor Diatchki 3:08 PM
Ah, if we have use case that's OK. Is the semantics that case e of { P1,P2 -> e} is equivalent to case e of { P1 -> e; P2 -> e}?
Simon Winwood 3:09 PM
yes
that is how it would be in TC
Iavor Diatchki 3:10 PM
We don't know the type because it might not exist yet. I think you can type-check this in a similar way to the way is for union works (i.e., emit constraints) and it should be able to build the type on its own, or complain
Simon Winwood 3:10 PM
perhaps TCDestUnion :: [(TCName Value, TC a k)] -> TCF a k
Iavor Diatchki 3:11 PM
I think we'd definately want a version of the case statemetn that works on literals
Simon Winwood 3:11 PM
yeah, but that would be either desugared to <| or have another ctor in TC
Iavor Diatchki 3:11 PM
so case e of { 1 -> e1; 2 -> e2 }
Yeah, I guess that makes sense
(we should have a constructor)
Simon Winwood 3:12 PM
and you could maybe do case e of { 1, 2, 3 -> e1 ; _ -> e2 }
Iavor Diatchki 3:12 PM
yes, that seems nice
enum UInt8 is {
FOO = -1,
BAR = 0,
BAZ,
}
This should be similar to what we do to parsers, but some of them might get parameters too.
def Main = Many (Match1 'a' <| Match1 'b' <| Match1 'c' <| Match1 'd')
ParserGen
is working on the normalized
AST but it is preferable that it uses the TCExpr
from the specializer
. This issue will be resolved when all the dependencies on the normalized syntax are removed from ParserGen
.
It would appear that reusing definitions is not working quite as expected. I noticed it when specializing PdfValue.Value
, the output contains the following duplicates, for example:
PdfValue.When__14 (x6 : PdfValue.Number) (sign1 : PdfValue.Sign) : Grammar PdfValue.Number =
do do sign1 is pos
pure {}
($$9 : PdfValue.Number) <- pure x6
pure $$9
.
PdfValue.When__15 (x5 : PdfValue.Number) (sign : PdfValue.Sign) : Grammar PdfValue.Number =
do do sign is neg
pure {}
($$8 : PdfValue.Number) <- pure x5
pure $$8
The issue might be that we are missing some cases in the unification code, or maybe we need to do alpha renaming on the instances?
Consider the following example:
[<</A 2 /A 3>>]
The issue here is that there is a duplicate key, but instead we report "expected <<
at offset 0" which is confusing. This is likely due to us selecting the wrong possible error, but it is not quite clear why that is---in particular we first parse the dictionary entries, and then build the map, so I'd expect the error to be right after the last >>
which should be "further ahead" than offset 0, which is the beginning of the value...
Currently, daedalus takes input using the -i
flag, which is optional. This creates annoying issues with the syntax of input args because we need to omit the gap between the flag and the argument:
daedalus tests/T001.ddl -itests/T001.in
This should be fixed in a couple of ways:
Readme.rst
on the repo.Also, we should make it possible to take input from stdin
, eg:
echo "A" | daedalus tests/T001.ddl -i -
The mavlink
directory currently contains a DaeDaLus definition for which daedalus
generates C++ that does not typcheck. The DaeDaLus file is mavlink/MissionItemInt.ddl
. The commands that cause the failure are:
daedalus MissionItemInt.ddl --compile-c++ --out-dir=mission_item_int_parser
g++ -std=c++17 mission_item_int_parser/main_parser.cpp -I DAEDALUS_DIR/daedalus/rts-c/
Note:
mavlink
currently contains definitions of three formats: this is the only one that results in untypable C++.init_
methods that declared to be parameterized on the first typename, but which are called on no arguments.We also have a mechanism for adding user-defined primitives to Daedalus, which provides a way to call some custom user code from within a Daedalus parser. Obviously, one needs to be careful when adding such extensions, as to not violate assumptions made by the code generator, which should be identified and documented. Providing definitions for these primitives is done when compiling the code, at the moment, but there are also other options that we are considering (e.g., resolve primitives during linking).
Provide examples and/or documentation for each language supported, possibly including:
Some good examples of this in Haskell are the calls to FlateDecode
, which is declared as a DaeDaLus parser in PdfDecl.ddl and then implemented directly in Haskell as pFlateDecode
in src/PdfDecl.hs
.
Currently as
is used for two different things, which is confusing in combination with the implicit lifting
It is common to write parsers like this:
@x = UInt8;
Choose {
A = { Guard (x == 7); x = UInt8; y = UInt8 };
B = { Guard (x == 8); }
}
It'd be nice to write this with a case instead of all the Guard
, but our "inject in union" notation is pretty noisey
The current spec validates only weak properties of Content Streams. It should be extended to:
/Length
field or a computation over /Width
, /Height
, and /BitsPerComponent
;How to do so depends on one open question to the PDF Association concerning the interaction of the graphics state stack and other operand pairs that must be balanced.
When I run the install command as follows:
cabal install exe:daedalus --installdir=/Users/miked/bin --overwrite-policy=always
I get the following error:
Building executable 'daedalus' for daedalus-0.1.0.0..
[1 of 3] Compiling CPPDriver ( exe/CPPDriver.hs, dist/build/daedalus/daedalus-tmp/CPPDriver.o )
exe/CPPDriver.hs:8:18: error:
• Exception when trying to run compile-time code:
exe/driver-template: getDirectoryContents:openDirStream: does not exist (No such file or directory)
Code: embedDir "exe/driver-template"
• In the untyped splice: $(embedDir "exe/driver-template")
|
8 | template_files = $(embedDir "exe/driver-template")
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
cabal: Failed to build exe:daedalus from daedalus-0.1.0.0. See the build log
above for details.
Temporary fix: edit the file CPPDriver.hs
as follows:
< template_files = $(embedDir "exe/driver-template")
---
> template_files = []
Compiling the Pdf spec in the repo causes a daedalus panic:
$ daedalus --compile-hs --out-dir=tmp-hs pdf-cos/spec/PdfDecl.ddl
You have encountered a bug in Daedalus's implementation.
*** Please create an issue at https://github.com/GaloisInc/daedalus/issues
%< ---------------------------------------------------
Revision: 150b3658a46963db4368c91fad07628a3f9f7bd6
Branch: master (uncommited files present)
Location: lkpInEnv
Message: Missing variable
Name: PdfDecl.ASCII85Decode
Env: envExtern
Keys:
CallStack (from HasCallStack):
panic, called at src/Daedalus/Panic.hs:17:9 in daedalus-utils-0.1.0.0-inplace:Daedalus.Panic
panic, called at src/Daedalus/CompileHS.hs:33:16 in daedalus-0.1.0.0-inplace:Daedalus.CompileHS
%< ---------------------------------------------------
It appears the .ddl is bad/incomplete.
The panic occurs in previous versions of pdf-cos/spec/PdfDecl.ddl, I need to revert (the spec) back to 1430ec6 to get a clean run:
$ git ch 1430ec64 -- pdf-cos/spec
$ daedalus --compile-hs --out-dir=tmp-hs pdf-cos/spec/PdfDecl.ddl
pdf-cos/spec/PdfValue.ddl:14:45--14:53: Failed to match types:
• bool
• uint 8
We need to work out how types interact with the module system. Currently if A
imports B
, which imports C
, and C
defines a types T
it looks like it autmatically ends up being in scope in A
, which is probably not intentional
def Main = Many { Match1 'a'; Match1 'b'; Match1 'c' }
There are a number of places where we use a traversal when what we need is a foldMap, mainly because foldMapTCF wasn't yet implemented (e.g. tcBounds).
An updated definition of the PDF spec in pdf-driver
, now pushed to the master
branch, causes the following daedalus panic:
000577-wharris:pdf-driver wrharris$ ./scripts/build
Up to date
pdf-generator: You have encountered a bug in Daedalus's implementation.
*** Please create an issue at https://github.com/GaloisInc/daedalus/issues
%< ---------------------------------------------------
Revision: eae34fe043a72174d06510ae0df311238d457447
Branch: master (uncommited files present)
Location: hsPat
Message: Unexepected type in unoin constructor
?a5
CallStack (from HasCallStack):
panic, called at src/Daedalus/Panic.hs:17:9 in daedalus-utils-0.1.0.0-inplace:Daedalus.Panic
panic, called at src/Daedalus/CompileHS.hs:673:24 in daedalus-0.1.0.0-inplace:Daedalus.CompileHS
%< ---------------------------------------------------
One update in the current definition that may potentially cause the panic is that a single daedalus source file, Stdlib.ddl
that contains the Guard
parser, is imported by multiple clients that do not import from each other.
The idea is that this is something like trace
in Haskell, i.e., a handy utility for debugging parser rather than something that should be used in production.
It would be nice if each phase had an associated invariant, which could then be optionally checked after the pass (and after later passes where appropriate).
There is no reason to have these, and at least some cases confuse the code generator.
Define a minimal wrapper that traverses image XObjects and validates any JPEGs.
JPEGs are designated in a stream by a Filter
key set to DCTDecode
.
Point: Guards are indeed the primary mechanism for deciding if the parse should pass by inspection of the semantic value. As for processing, various language constructs can be used to do reasonably flexible processing of semantic values; numBase
and strlen
in nitf_lib.ddl
are decent first examples.
Counterpoint: The above is not strictly speaking quite right, depending on what is meant by "guards", but I don't know that it is really worth going into details here. Currently, the main way to inspect and validate values is to use the various forms of is
. This is likely to change in the future as we are planning to add some sort of case
and if
statement, which would make some checks easier to express more directly.
daedalus gives no errors:
daedalus --compile-hs pdf-driver/Toy.ddl >| pdf-driver/src/toy-subset/Toy.hs
but compiling the generated code gives a type error:
$ cabal v2-build exe:toy-subset
Build profile: -w ghc-8.8.3 -O1
In order, the following will be built (use -v for more details):
- pdf-driver-0.1.0.0 (exe:toy-subset) (file src/toy-subset/Toy.hs changed)
Preprocessing executable 'toy-subset' for pdf-driver-0.1.0.0..
Building executable 'toy-subset' for pdf-driver-0.1.0.0..
[1 of 2] Compiling Toy ( src/toy-subset/Toy.hs, /Users/tullsen/src/mydaedalus/dist-newstyle/build/x86_64-osx/ghc-8.8.3/pdf-driver-0.1.0.0/x/toy-subset/build/toy-subset/toy-subset-tmp/Toy.o )
src/toy-subset/Toy.hs:360:11: error:
• Couldn't match type ‘EDict_0’ with ‘HS.Integer’
arising from a use of ‘pToMap’
• In the second argument of ‘RTS.pEnter’, namely
‘(pToMap
@(Vector.Vector EDict_0)
@(Vector.Vector (RTS.UInt 8))
@Value
@HS.Integer
@EDict_0
es)’
In a stmt of a 'do' block:
(__ :: Map.Map (Vector.Vector (RTS.UInt 8)) Value) <- RTS.pEnter
"Toy.ToMap"
(pToMap
@(Vector.Vector EDict_0)
@(Vector.Vector (RTS.UInt 8))
@Value
@HS.Integer
@EDict_0
es)
In the expression:
do (es :: Vector.Vector EDict_0) <- RTS.pEnter "Toy.EDict" pEDict
(__ :: Map.Map (Vector.Vector (RTS.UInt 8)) Value) <- RTS.pEnter
"Toy.ToMap"
(pToMap
@(Vector.Vector EDict_0)
@(Vector.Vector (RTS.UInt 8))
@Value
@HS.Integer
@EDict_0
es)
HS.pure __
|
360 | (pToMap @(Vector.Vector EDict_0) @(Vector.Vector (RTS.UInt 8))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
The full source for Toy.ddl:
-- utils (from PdfValue.ddl): ----------------------------------------
def Name = Token { Match "/"; Many NameChar }
def NameChar = Match1 (!"\0\9\10\12\13\32()<>[]{}/%#")
def $lf = 10
def $cr = 13
def $simpleWS = 0 | 9 | 12 | 32
def SimpleEOL = { $cr; $lf } | $lf
def EOL = SimpleEOL <| $cr
def Comment = { Match "%"; Many (Match1 (! ($lf | $cr))); EOL }
def AnyWS = $simpleWS | Comment | EOL
def Token P = { $$ = P; Many AnyWS }
def KW x = @ (Token (Match x))
def Between open close P = { KW open; $$ = P; KW close }
def When P x = { P; ^ x }
--- Values [Toy] -------------------------------------------------------
def Bool =
When (KW "true") true
| When (KW "false") false
def Null = KW "null"
def Value =
Choose1 {
null = Null;
bool = Bool;
name = Name;
}
-- new stuff ----------------------------------------------------------
def EDict = {
Between "<<" ">>" (Many { key = Name; value = Value });
}
def Main = { EDict }
-- explore ------------------------------------------------------------
-- EMap1 - works as expected:
def EMap1 = {
@es = Between "<<" ">>" (Many { key = Name; value = Value });
for (d = empty; e in es) (Insert e.key e.value d);
}
-- EMap2 - works as expected:
def EMap2 = {
@es = EDict;
for (d = empty; e in es) (Insert e.key e.value d)
}
-- EMap3 - daedalus compiles this but the haskell code generated gives me a type error
def EMap3 = {
@es = EDict;
ToMap(es)
}
def ToMap es = {
for (d = empty; e in es) (Insert e.key e.value d);
}
It's very common using Daedalus to get this error:
BUG: not enough type arguments for Main.Main. This usually indicates some expression in the program became polymorphic, which can be fixed by adding more type annotations.
The canonical way to diagnose this at the moment is to go digging through the translated Haskell output, but this is pretty unpleasant when the spec is big, and in any case is completely non-obvious if you're not already a Daedalus developer.
A better output would be something like this:
BUG: types under-constrained. Consider adding a type annotation to variable blah at line blah.
Even better would be actually type checking the program at the point it's synthesized but I know this is potentially a big task.
The issue tracker should point here at Github, rather than at our internal source control.
The current definition of objects as Value
has some limitations:
Value
should be restructured accordingly.Currently all specialized instances of a definition go in the same module as the definition. This is incorrect because the specialization may require definitions from the module containing the call that gave rise to the specialization.
We can solve this by placing the specializations in the same module as the call, which may lead to code duplication, or perhaps be careful to compute the "earlier" (in dependency order) module that contains all definitions needed by a specialization.
Here is an example:
module A where
F P = ...
module B where
P1 = F UInt8
P2 = F P1
-- Specialized version of B
module B where
X1 = ... UInt8/P ... -- OK to move to `A`
P1 = X1
X2 = ... P1/P ... -- Can't move to `A` because `P1` is deifned here; we may need to add extra imports btw
P2 = X2
File empty.ddl:
def Main = Choose {}
When running the following command, I get a panic:
You have encountered a bug in Daedalus's implementation.
*** Please create an issue at https://gitlab-ext.galois.com:safedocs/safedocs
%< ---------------------------------------------------
Revision: 468c7efa5cb5fc9875218dbc7183424f742bb388
Branch: master
Location: specialiseOne
Message: Specializing a poly function
CallStack (from HasCallStack):
panic, called at src/Daedalus/Panic.hs:17:9 in daedalus-utils-0.1.0.0-inplace:Daedalus.Panic
panic, called at src/Daedalus/Specialise.hs:159:33 in daedalus-0.1.0.0-inplace:Daedalus.Specialise
%< ---------------------------------------------------
Example issue:
If I run a parser using a command of the form:
cabal run .:daedalus -- SPEC.ddl -i INPUT_FILE
it prints a fairly complete parse result to standard out. Were you running parsers in some other way, or requiring some additional information that isn't output currently?
We ran the parser the following way according to the github documentation and this method you specified. From github documentation
$ daedalus mavlink.ddl -i ./msg_types/heartbeat.bin
--- Found 1 results:
’_’
From your method (above), are there additional steps or dependencies we are missing?
$ cabal run .:daedalus -- mavlink.ddl -i ./msg_types/heartbeat.bin
cabal: There is no <pkgname>.cabal package file or cabal.project file.
Clarify the solution to the above issue in the README or elsewhere:
So, cabal
is a tool for building and installing Haskell source code. When you are within a Haskell project you may use cabal run MyExecutable
to build and execute the executable in the current project. From the error message, it looks like Andrew was trying to use this command outside of the repo, which in retrospect is not surprising. The right way to invoke the daedalus
interpreter, once it's been built and installed, would be to simply say something like daedalus MyGrammar.ddl -i my_input_file.txt
, as long as daedalus
is in the path.
The semantic value in the nonstandard_utf8
case is odd, we should figure out what it is supposed to be.
This is in the nitf
example
At the moment, Daedalus requires the existence of a Main
parser.
We should allow the user to specify the entry point to the parser. This would make it easier to debug parsers and would avoid the need for spurious Main
declarations.
Expected behavior:
daedalus foo.ddl --entry="MyParser" --interp=myfile.txt
Daedalus should use MyParser
as the top-level parser.
TCCase generalises a number of existing constructors, so they can be desugared as a Case instead of having their own constructors.
Currently the result of specialization is a single module. It might be nice to preserve at least some of the original module structure but there are some tricky issue we'd need to solve, related to the modules that should contain the specialized declarations. In particular:
To integrate Daedalus with an actual application one needs to use the Daedalus compiler, to translate the Daedalus grammar into the target language of the application. At the moment we only have a Haskell backend, and the interface to the compiler is not really user friendly at all---the compiler itself is packaged as Haskell library, so to use it you have to write a small Haskell program that configures the compiler and runs it. Have a look at "pdf-cos/generator/Main.hs" if you are curious about what "invoking the compiler" looks like. The Daedalus grammars for the compiler are in the pdf-cos/spec
and the compiler output is in pdf-cos/src
, with a name similar to the .ddl
file but with .hs
extension.
Provide documentation/tips for each backend language supported as they are likely to work a bit differently in each case. In particular, we should document what files are generated by the Daedalus compiler, and what's the expected API for using them (e.g., how to map DDL types to native types in the language, various build system related details, etc.). The best way to illustrate this might be to have a real simple and well-commented example, rather than a bunch of textual documentation. The benefit of this is that as we change things (and we are quite likely to do so), we can update the example and make sure that it still works, while with a manual, things might quickly get stale.
Once you have compiled Daedalus, you can link it and call the parsers from your application in much the same way you'd call another function, and you may call the same parser multiple times, or examine and print the values in whatever way the application needs. To see an example of what this looks like in Haskell, have a look in "pdf-driver/src/dom/Main.hs"---wherever you see runParser
that's a call to a parser generated from Daedalus
.
Currently we only build the daedalus
executable, but we don't check that other executables build.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.