haskell-hvr / hsyaml Goto Github PK
View Code? Open in Web Editor NEWYAML 1.2 implementation in pure Haskell
Home Page: https://hackage.haskell.org/package/HsYAML
License: GNU General Public License v2.0
YAML 1.2 implementation in pure Haskell
Home Page: https://hackage.haskell.org/package/HsYAML
License: GNU General Public License v2.0
I was looking at the result of decodeNode
and wanted to get rid of the source locations via void
…
https://matrix.yaml.io/details/9KAX.html doesn't pass as of HsYAML-0.1.2.0 specifically because of
---
!!map
&a8 !!str key8: value7
which yaml2token
decodes as
<stdin>:1:0: BeginDocument |
<stdin>:1:0: DirectivesEnd | ---
<stdin>:1:3: BeginNode |
<stdin>:1:3: BeginScalar |
<stdin>:1:3: EndScalar |
<stdin>:1:3: EndNode |
<stdin>:1:3: Break | \n
<stdin>:2:0: Unparsed | !!map
<stdin>:2:5: Unparsed | \n
<stdin>:3:0: Unparsed | "&a8 !!str key8: value7"
<stdin>:3:22: Unparsed | \n
<stdin>:4:0: EndDocument |
in other words, parsing chokes on the !!map
token
TODO: review YAML 1.2 spec
In some cases you might have yaml like:
list-o-things:
- "Im text"
- 2
- name: complex-case
extra-info: stuff
In yaml
/ aeson
that is pretty simple to handle:
newtype Thing = Thing Text
instance FromJSON Thing where
parseJSON (String a) = pure $ Thing a
parseJSON (Object v) = -- v is a HashMap and quite easy to work with
In HsYAML
there doesn't seem to ways to do this without getting deep into library internals.
instance FromYAML Thing where
parseYAML (Scalar _ (SStr a)) = pure $ Thing a
parseYAML (Mapping _ _ _) = -- withMap would be redundant, but how else to write the parser?
I'm adding support for HsYAML to hlint
and it would be useful for us if HsYAML
and/or HsYAML-aeson
exported a compatibility module with an identical interface to the yaml
library. Would such an interface be useful to others as well?
Blocked by:
The aeson
package uses Scientific which avoids numeric instability from floating point numbers. This package uses Double, and that causes its behavior to diverge from that of aeson
when dealing with decimal numbers. Here's an example ghci session that illustrates the issue:
λ fromRight "error" (Data.Aeson.eitherDecode "{\"foo\": 0.1}") :: Data.Aeson.Value
Object (fromList [("foo",Number 0.1)])
λ fromRight "error" (Data.YAML.Aeson.decode1Strict "foo: 0.1") :: Data.Aeson.Value
Object (fromList [("foo",Number 0.1000000000000000055511151231257827021181583404541015625)])
Would you be willing to accept a pull request that solves this problem? If so, any pointers as to what solution you'd prefer?
It seems like empty lines do not have a corresponding Event:
a: 1
# comment
b: 2
Results in (after passing through yaml-test yaml2yaml
):
a: 1
# comment
b : 2
Would it be possible to add a new event or comment attribute to preserve empty lines?
It seems like the comment event is missing some information to indicate if it is inline or standalone:
key: "value" # a comment
Results in (after passing through YE.writeEvents YT.UTF8 . map eEvent . rights . YE.parseEvents
key: "value"
# a comment
The events are:
MappingStart Nothing Nothing Block
Scalar Nothing Nothing Plain "key"
Scalar Nothing Nothing DoubleQuoted "value"
Comment " a comment"
MappingEnd
Would it be possible to add an attribute to the Comment event to preserve its line position?
See e.g. the haddocks for mapping
: http://hackage.haskell.org/package/HsYAML-0.2.0.0/docs/Data-YAML.html#v:mapping
HsYAML fails ungracefully on:
---
foo: 1e
---
In GHCI:
:set +m
import qualified Data.YAML as YAML
import Data.ByteString.Char8
case YAML.decodeStrict (pack "foo: 1e") of
Right xs -> show xs
Left err -> err
gives:
"*** Exception: Prelude.read: no parse
Seems to come from https://github.com/haskell-hvr/HsYAML/blob/0.2/src/Data/YAML/Schema.hs#L321
We need this for jgm/pandoc#5177
Thanks!
This #if:
#if !MIN_VERSION_mtl(2,2,2) || (__GLASGOW_HASKELL__ == 804 && __GLASGOW_HASKELL_PATCHLEVEL1__ < 2)
helps if you are on the very first release of ghc-8.4.1, but if you are on ghc-8.4.2 patchlevel 1 or ghc-8.4.3 patchlevel 1 it breaks the build.
See also #15 (comment)
We should have two modes: one which throws an exception when an invariant is broken by the stream; and one mode which silently fixes up the stream; this could either be a separate phase (i.e. a separate "stream transformer" function) or be integrated into the printing phase.
Something of type
Node v -> (v -> [String]) -> ByteString
where v -> [String]
argument would specify the "comments on top of node" would be great.
I have a messy implementation in https://github.com/haskell-CI/haskell-ci/blob/master/src/HaskellCI/YamlSyntax.hs#L181
I'll probably move to use Node
anyway at some point. But let's see when.
I've been trying out dumping YAML from the structure we use to represent pandoc command-line options (with HsYAML-aeson).
I'm getting this:
abbreviations: null
ascii: false
"base-header-level": 1
"cite-method": Citeproc
etc.
Keys with hyphens are quoted. This isn't necessary in YAML, so the quotes are overkill and undesirable. Can the code that determines when a key name needs quoting be tweaked to make it less aggressive? (I could take a look if you point me to the right place in the code.)
I'm assuming this is an issue with HsYAML itself rather than HsYAML-aeson.
[EDIT: fixed YAML nomenclature]
It's just slightly inconvenient to have to pattern match on Doc
.
https://matrix.yaml.io/details/S98Z.html doesn't pass as of HsYAML-0.1.2.0
empty block scalar: >
# comment
I think this test-case is illicit under the YAML 1.2 specification as there's leading empty lines that contain more spaces than the first non-empty line; but this needs a proper refutation
It would be great if HsYAML would be able to generate the instances automagically through generics, like aeson does, so that there would be no need to tediously mirror data declarations with trivial instances.
If I do
case decode inp of
Right (opt :: Opt : _) -> doSomethingWith opt
Right [] -> oneKindOfError
Left (pos, err) -> anotherKindOfError
and inp
is something like
---
foo: bar
...
baz
then I get an error when it hits baz
, even though it can successfully parse the one YAML document I'm asking for. Is this expected? Could the parser be made lazier, so it returns the result after having parsed the first YAML document in the stream, and doesn't worry about the rest unless I ask for it?
Alternatively, could there be an option telling it to ignore non-YAML content after the first YAML document, if you just use decode1
?
I'm trying to install HsYAML on alpine linux 3.12.0.
Dockerfile:
https://github.com/yaml/yaml-runtimes/blob/update-alpine/docker/haskell/alpine-builder.dockerfile
Install script:
https://github.com/yaml/yaml-runtimes/blob/update-alpine/docker/haskell/utils/hs-build.sh
Error message:
https://gist.github.com/perlpunk/86d8701d971bcb423b7d821954e4da52#file-build-hsyaml-log
Any idea what I could do?
See also yaml/yaml-runtimes#8
When installing on gentoo with dlist-0.8.0.2 I get the following error. It is fixed by updating the dependency to dlist-0.8.0.4.
Preprocessing library for HsYAML-0.1.1.2..
Building library for HsYAML-0.1.1.2..
[1 of 7] Compiling Util ( src/Util.hs, dist/build/Util.o )
[2 of 7] Compiling Data.YAML.Token.Encoding ( src/Data/YAML/Token/Encoding.hs, dist/build/Data/YAML/Token/Encoding.o )
[3 of 7] Compiling Data.YAML.Token ( src/Data/YAML/Token.hs, dist/build/Data/YAML/Token.o )
src/Data/YAML/Token.hs:28:1: error:
Data.DList: Can't be safely imported! The module itself isn't safe.
At least the base
bounds need to be relaxed! :)
yaml2yaml
sub-command fails to round-trip the following Input YAML
Input YAML
"\\"
Output YAML
"\"
Expected Output
"\\"
…says the readme.
Are these tests automated somewhere? I suspect not. They probably should be though, so we don't accidentally break compatibility with the YAML spec.
Encoding the following mapping
encode [mapping ["a" .= (1 :: Int), "d" .= (2 :: Int), "c" .= (3 :: Int), "b" .= (4 :: Int)]]
results in:
a: 1
b: 4
c: 3
d: 2
HsYAML is particularly interesting to me, as it compiles with GHCJS, while the normal yaml does not. That would be useful in HLint, for a web version. However, I strive to avoid GPL dependencies where possible, since in my experience they reduce adoption and don't increase contributions. My current thought is to have an explicit GHSJS Cabal flag + preprocessor flag, and do a compile-time switch on YAML library. That's a bit grim, and I hate writing grim code for legal reasons.
If you would consider moving to the more usual Haskell licenses of BSD/MIT/Apache that would be great - but it's your code, so entirely your choice. If not, perhaps put in the README that the project is deliberately GPL (including the reason, if you feel comfortable sharing) so that people know not to ask?
Line 115 in 777165d
Right now it's a bit confusing when you try to build this package from source.
If I understood Herbert (@hvr) correctly, he will not be available for maintenance work in the near future.
I stepped in to keep stuff buildable, but I have no genuine interest in the HsYAML
family (at least not so far). I wouldn't do more than keep the package afloat on the Haskell ecosystem and merge bugfixes.
There seem a couple of active developers though that want hang on and see improvements. If you are one of them, maybe this is your call to step forward and volunteer as maintainer?
Applications in the comments or via email to me. (@andreasabel)
CC: @jgm @mightybyte @TristanCacqueray @vaibhavsagar @vijayphoenix
I've switched to using HsYAML in pandoc, but my Travis build has a mysterious failure for ghc 8.4.1.
Building library for HsYAML-0.1.1.1..
[1 of 7] Compiling Util ( src/Util.hs, dist/build/Util.o )
src/Util.hs:12:7: error: Not in scope: ‘liftEither’
|
12 | ( liftEither
| ^^^^^^^^^^
Other ghc versions work fine. As far as I can see, HsYAML should work for both newer and older versions of mtl. So I don't understand where this error could be coming from. Any thoughts?
One pandoc user has run into an issue with a large (100k line) bibliography in YAML format (for details see jgm/pandoc#6084). Prior to pandoc 2.8 (when we used the yaml
package), this was handled fairly quickly, but now that we use HsYAML it takes 18 seconds to read the bibliography. I confirmed that the slowdown is due to HsYAML, by loading the file in a GHCI session as b
and trying
GHCI> :set +s
GHCI> let x = decodeNode b in x `seq` 3 -- this is just to ensure it's evaluated
(25.28 secs, 82,135,579,376 bytes)
What are the performance expectations for HsYAML? Have you made efforts to optimize here? aeson claimed decoding speeds of 46M/sec on a slower machine than mine; this file is 3M. I wouldn't expect that YAML parsing could be as fast as JSON parsing, but it would be nice to get in the 4M/sec range (10x slower than aeson).
EDIT: 82G allocated with 1G max residency seems an awful lot to parse a 3M file!
Profiling reports these as the biggest cost centers:
applyParser Data.YAML.Token src/Data/YAML/Token.hs:220:1-30 32.5 0.4
*>.\ Data.YAML.Token src/Data/YAML/Token.hs:(435,5)-(439,67) 9.4 25.3
^. Data.YAML.Token src/Data/YAML/Token.hs:73:1-30 9.3 0.8
& Data.YAML.Token src/Data/YAML/Token.hs:567:1-44 4.3 4.0
<|>.decideParser Data.YAML.Token src/Data/YAML/Token.hs:(599,7)-(609,95) 3.9 5.2
nextIf.consumeNextIf Data.YAML.Token src/Data/YAML/Token.hs:(791,5)-(817,52) 3.1 0.4
prefixErrorWith.\ Data.YAML.Token src/Data/YAML/Token.hs:(913,5)-(917,95) 2.7 7.8
prefixErrorWith.\.reply Data.YAML.Token src/Data/YAML/Token.hs:913:9-49 2.4 0.0
append Data.DList src/Data/DList.hs:34:1-46 1.8 2.7
/ Data.YAML.Token src/Data/YAML/Token.hs:572:1-68 1.8 0.1
reject.\ Data.YAML.Token src/Data/YAML/Token.hs:673:5-67 1.7 8.1
*> Data.YAML.Token src/Data/YAML/Token.hs:(434,3)-(439,67) 1.6 0.0
returnReply Data.YAML.Token src/Data/YAML/Token.hs:(387,1)-(390,52) 1.5 7.9
Heap profiling shows that the DLists account for a lot of the allocation.
https://matrix.yaml.io/details/X38W.html doesn't pass as of HsYAML-0.1.2.0
{ &a [a, &b b]: *b, *a : [c, *b, d]}
fails on the first [
"" BeginDocument
"" BeginNode
"" BeginMapping
"{" Indicator
" " White
"" BeginPair
"" BeginNode
"" BeginProperties
"" BeginAnchor
"&" Indicator
"a" Meta
"" EndAnchor
"" EndProperties
"" BeginScalar
"" EndScalar
"" EndNode
"" BeginNode
"" BeginScalar
"" EndScalar
"" EndNode
"" EndPair
" " White
"" EndMapping
"" EndNode
"Unexpected '['" Error
"[a, &b b]: *b, *a : [c, *b, d]}" Unparsed
"\n" Unparsed
"" EndDocument
needs investigation
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.