pantoniou / libfyaml Goto Github PK
View Code? Open in Web Editor NEWFully feature complete YAML parser and emitter, supporting the latest YAML spec and passing the full YAML testsuite.
License: MIT License
Fully feature complete YAML parser and emitter, supporting the latest YAML spec and passing the full YAML testsuite.
License: MIT License
It seems that some resources are exhausted when calling fy_document_build_from_file
.
The following code crashes for me after 1021 iterations
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include <stdlib.h>
#include <stdio.h>
#include <libfyaml.h>
int main(int argc, char *argv[])
{
struct fy_document *fyd = NULL;
int i=1;
while(i<10000)
{
fyd = fy_document_build_from_file(NULL, "test.yaml");
if (!fyd) {
fprintf(stderr, "failed to build document");
return 1;
}
fprintf(stdout,"%d\n",i);
i++;
}
return 0;
}
The content of the file (test.yaml
) seems to be irrelevant, I use the following small example.
---
test: 1
When this file: anchortest.txt
... is given to fy-tool, the expected output should be close to the input. Instead, the output seems to be truncated:
%YAML 1.1 --- - !!tagaaa name: somename1 somekey1: - !!tagbbb somekey2: !!tagccc groups: - &group_
If the non-printable characters in the anchor name are replaced with something innocuous like _
, (as in this file: anchortest_clean.txt), then the output is as expected:
%YAML 1.1 --- - !!tagaaa name: somename1 somekey1: - !!tagbbb somekey2: !!tagccc groups: - &group_____QMCEQMTN_AGMOQDA_B1_rsan`ps._agmoQda/ !!group name: "óà\x8A¼QMCEQMTN\x1FAGMOQDA\x1FB1\x1Ersan`ps.\x1FagmoQda/" - !!tagaaa name: end ...
It would be great to have the tool(s) available for download.
I'm getting a double free error. Debugging the code I got to line 198 of fy-doc.c:
if (data_copy)
fyi = fy_input_from_malloc_data((void *)text, len, &handle, true);
I suppose that should read:
if (data_copy)
fyi = fy_input_from_malloc_data((void *)data_copy, len, &handle, true);
Hi @pantoniou :
I did some "stress tests" to the latest comment-parsing code, and found some edge cases (perhaps you were already familiar with them):
Parsing this block:
myMap:
e1: 10.0 # Right comment for e1 value
myMap2:
# Top comment for e2
e2: 10.0
myMap3:
# Top comment for e3
e3: 10.0 # right comment for e3 value
# Top comment for myMap4
myMap4:
~
# Top comment for myMap5
myMap5:
# top comment for a4
a4: 1
generates these events:
>> Event: Stream start
>> Event: Doc start
>> Event: MAP START
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap
>> Event: MAP START
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: e1
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: 10.0
>> token: 0x55a48705bcd0 comment [1]: 'Right comment for e1 value'
>> Event: MAP END
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap2
>> Event: MAP START
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: e2
>> token: 0x55a486fbd8d0 comment [0]: 'Top comment for e2'
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: 10.0
>> Event: MAP END
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap3
>> Event: MAP START
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: e3
>> token: 0x55a486fbdde0 comment [0]: 'Top comment for e3'
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: 10.0
>> token: 0x55a48705bcd0 comment [1]: 'right comment for e3 value'
>> Event: MAP END
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap4
>> token: 0x55a486fbdde0 comment [0]: 'Top comment for myMap4'
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: ~
>> token: 0x55a486fbd8d0 comment [1]: 'Top comment for myMap5'
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap5
>> Event: MAP START
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: a4
>> token: 0x55a48705bcd0 comment [0]: 'top comment for a4'
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: 1
>> Event: MAP END
>> Event: MAP END
Note how "Top comment for myMap5"
is reported as being a comment on the right of scalar ~
, instead of its correct top comment of the next item, the map key "myMap5".
Hope this test case helps finding the issue...
First off dejson
is a bad name for this.
How about yamlfmt
, yfmt
,, tidy
or something?
JSON has nothing to do with this.
So here are some changes to the default emitter format I'd like to see...
$ echo $'foo:\n- bar' | fy-tool -m yamlfmt -
foo:
- bar
$ echo $'foo:\n- "bar\\nbaz\\nboom\\n"' | fy-tool -m yamlfmt -
---
foo:
- |
bar
baz
boom
Do those for starters.
In general @perlpunk's yamlpp-load-dump --preserve 0
does a good job for this kind of formatting.
You can install with cpanm YAML::PP
.
Hello,
I seem to have run into an issue when installing this library from scratch onto a remote linux box (Ubuntu 18.04). I am finding that when I build the software, I eventually get to the screen in which the major/minor/etc/ versioning information is displayed. I am receiving "UNKNOWN" for all versioning metrics, which flows onto the actual library itself. Inspecting the /usr/local/lib folder, I am seeing all associated files with this software having the same effect. ie; libfyaml-UNKNOWN.UNKNOWN.la/so/etc.
Do you have any idea why this may be? I have installed this same software only one month ago and didnt experience the same issue. For reference, I downloaded the source code zip from release 0.5.4., and have installed all pre-requisite software as per the README.
Do let me know if you need further information, thank you.
Best regards,
Adam
Given
foo: &foo 5
bar: *foo
libfyaml/src/fy-tool -m json foo.yaml
yields:
{
"foo": 5,
"bar": "foo"
}
I would expect an error message and nonzero exit code when attempting to emit JSON corresponding to any input YAML that contains unresolved aliases.
(Note that alias resolution when emitting JSON appears to work just fine.)
The v1.2 spec states that "true", "false", "on", "off" etc. should be interpreted as their boolean value counterparts.
As this library claims to support YAML 1.2, how is that handled, or do I just have to strcmp
regardless?
Thanks
build_from_file, emit to string, build_from_string again , then emit to file, the file not the same as origin
fyd = fy_document_build_from_file(NULL, "key.yaml");
char *out = NULL, *in = NULL;
out = fy_emit_document_to_string(fyd, FYECF_DEFAULT);
if (out) {
printf("out[\n%s\n] line[%d]\n", out, __LINE__);
}
fy_document_destroy(fyd); fyd=NULL;
in = out;
printf("in[\n%s\n] len[%ld] line[%d]\n", in, strlen(in), __LINE__);
fyd = fy_document_build_from_string(NULL, in, FY_NT); // BUG,乱码
//fyd = fy_document_create(NULL);
//fyn = fy_node_build_from_string(fyd, in, strlen(in)); // BUG, 乱码
free(out); out=NULL;
fy_emit_document_to_file(fyd, FYECF_DEFAULT, "key.yaml.tmp2");
Hi @pantoniou ,
Just a kind note: please check this section on why /debian
shouldn't appear in released tar balls...
Failed to build on Windows using MinGW.
For example,
I think something useful to the DeviceTree community would be a fy_parse_cfg_flags option that supports handling tabs. One of the limitation of the DT binding docs in YAML is that with existing parsers, one can not leave the usual tabs in the example usage (typically copied from a kernel .dts file with spaces). Something like FYPCF_ENABLE_TABS could engage digestion of tabs for this case.
I have the following content:
- Aa,
Bb, C,
D
- Eee,
F, Gg,
E
fy-dump -m flow-oneline test.yaml
gives
[Aa, Bb, C, D, Eee, F, Gg, E]
where only the coloring of the commas indicates that the list contains 2 string entries, not 8. When using the library directly to convert to oneline flow style,
void to_flow(char **flow, int* length_flow, const char *mixed){
struct fy_document *fyd = NULL;
enum fy_emitter_cfg_flags emit_flags = FYECF_MODE_FLOW_ONELINE | FYECF_STRIP_LABELS | FYECF_STRIP_TAGS |FYECF_STRIP_DOC;
fyd = fy_document_build_from_string(NULL, mixed, -1);
if (!fyd) {
*length_flow = -1;
return;
}
int err = fy_document_resolve(fyd);
if (err) {
*length_flow = -1;
return;
}
*flow = fy_emit_document_to_string(fyd,emit_flags);
*length_flow = strlen(*flow);
fy_document_destroy(fyd);
}
the coloring is lost and one gets a list with 8 entries. Is it possible to automatically add quotes to multiline strings?
Hi,
I also find an UAF bug and a dynamic-stack-buffer-overflow bug when doing experiments for AFLAPI.
Environment: Ubuntu 20.04 + gcc 9.4.0
Harness (attached: file named as "test_fy_document_build_from_file.c"):
#include <libfyaml.h>
#include <stdio.h>
int main(int argc, char** argv) {
if(argc != 2) return 0;
struct fy_document *fyd = NULL;
fyd = fy_document_build_from_file(NULL, argv[1]);
if (!fyd) {
fprintf(stderr, "failed to build document");
goto failed;
}
failed:
fy_document_destroy(fyd);
return 0;
}
Poc:
Poc2.zip
To reproduce:
• Complie the hole project with ASAN:
CFLAGS="-fsanitize=address -g" ./bootstrap.sh
CFLAGS="-fsanitize=address -g" ./configure
make && sudo make install
• Complie the harness with ASAN:
gcc -fsanitize=address -o test_fy_document_build_from_file test_fy_document_build_from_file.c -lfyaml
• Run harness:
./test_fy_document_build_from_file ./UAF.yaml # for reporduce UAF
./test_fy_document_build_from_file ./dynamic-stack-buffer-overflow.yaml # for reporduce dynamic stack buffer overflow
About UAF, ASAN says:
UAF.yaml:3:18: error: cannot use tab for indentation of block entry
? a complex key
:
^~~~~~~
=================================================================
==1614640==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000000080 at pc 0x7f0adbbea7be bp 0x7ffed9c79870 sp 0x7ffed9c79860
READ of size 8 at 0x606000000080 thread T0
#0 0x7f0adbbea7bd in list_del lib/fy-list.h:120
#1 0x7f0adbbea7bd in fy_simple_key_list_del lib/fy-parse.h:79
#2 0x7f0adbbea7bd in fy_simple_key_list_pop lib/fy-parse.h:79
#3 0x7f0adbbea7bd in fy_simple_key_vacuum_internal lib/fy-types.c:31
#4 0x7f0adbba6c75 in fy_parse_cleanup lib/fy-parse.c:842
#5 0x7f0adbc2fd0e in fy_document_build_internal lib/fy-doc.c:3287
#6 0x7f0adbc3030c in fy_document_build_from_file lib/fy-doc.c:3320
#7 0x55bb3c28628b in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x128b)
#8 0x7f0adb9b2082 in __libc_start_main ../csu/libc-start.c:308
#9 0x55bb3c28616d in _start (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x116d)
0x606000000080 is located 0 bytes inside of 64-byte region [0x606000000080,0x6060000000c0)
freed by thread T0 here:
#0 0x7f0adbdcf40f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122
#1 0x7f0adbbea782 in fy_simple_key_vacuum_internal lib/fy-types.c:31
#2 0x60600000007f (<unknown module>)
previously allocated by thread T0 here:
#0 0x7f0adbdcf808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
#1 0x7f0adbbea5f1 in fy_simple_key_alloc_simple_internal lib/fy-types.c:31
#2 0x7f0adbbea5f1 in fy_simple_key_alloc_simple_internal lib/fy-types.c:31
SUMMARY: AddressSanitizer: heap-use-after-free lib/fy-list.h:120 in list_del
Shadow bytes around the buggy address:
0x0c0c7fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c0c7fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c0c7fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c0c7fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c0c7fff8000: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa
=>0x0c0c7fff8010:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
0x0c0c7fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c0c7fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c0c7fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c0c7fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c0c7fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1614640==ABORTING
About dynamic stack buffer overflow, ASAN says:
dynamic-stack-buffer-overflow.yaml:2:255: error: plain scalar is malformed UTF8
^
~~~~~~~~~~~~~~~~
=================================================================
==1614738==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7ffdf6134cd5 at pc 0x7f73f2f75f3d bp 0x7ffdf6134ad0 sp 0x7ffdf6134278
WRITE of size 1793 at 0x7ffdf6134cd5 thread T0
#0 0x7f73f2f75f3c in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762
#1 0x7f73f2e3a93e in memset /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
#2 0x7f73f2e3a93e in fy_diag_error_atom_display lib/fy-diag.c:789
#3 0x7f73f2e3c236 in fy_diag_error_token_display lib/fy-diag.c:806
#4 0x7f73f2e3c236 in fy_diag_error_token_display lib/fy-diag.c:801
#5 0x7f73f2e3c236 in fy_diag_vreport lib/fy-diag.c:854
#6 0x7f73f2e3ed3f in fy_reader_diag_report lib/fy-diag.c:1243
#7 0x7f73f2e1ec4f in fy_reader_fetch_plain_scalar_handle lib/fy-parse.c:4261
#8 0x7f73f2e261c3 in fy_fetch_plain_scalar lib/fy-parse.c:4707
#9 0x7f73f2e2827f in fy_fetch_tokens lib/fy-parse.c:5022
#10 0x7f73f2e2a057 in fy_scan_peek lib/fy-parse.c:5093
#11 0x7f73f2e2a057 in fy_scan_peek lib/fy-parse.c:5038
#12 0x7f73f2e2f6ac in fy_parse_internal lib/fy-parse.c:5989
#13 0x7f73f2e84bff in fy_document_builder_load_document lib/fy-docbuilder.c:529
#14 0x7f73f2e7b6be in fy_parse_load_document_with_builder lib/fy-doc.c:1940
#15 0x7f73f2e7bacd in fy_document_build_internal lib/fy-doc.c:3242
#16 0x7f73f2e7c30c in fy_document_build_from_file lib/fy-doc.c:3320
#17 0x55698ebce28b in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x128b)
#18 0x7f73f2bfe082 in __libc_start_main ../csu/libc-start.c:308
#19 0x55698ebce16d in _start (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x116d)
Address 0x7ffdf6134cd5 is located in stack of thread T0
SUMMARY: AddressSanitizer: dynamic-stack-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762 in __interceptor_memset
Shadow bytes around the buggy address:
0x10003ec1e940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003ec1e950: 00 00 00 00 00 00 00 00 00 00 00 00 ca ca ca ca
0x10003ec1e960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003ec1e970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003ec1e980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10003ec1e990: 00 00 00 00 00 00 00 00 00 00[05]cb cb cb cb cb
0x10003ec1e9a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10003ec1e9b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
0x10003ec1e9c0: f1 f1 04 f2 00 00 00 00 00 00 00 00 00 00 00 f2
0x10003ec1e9d0: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
0x10003ec1e9e0: 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 f3 f3 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1614738==ABORTING
FAIL: testsuite.test 1 [A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9] -
FAIL: jsontestsuite.test 1 - y_*.json
Source: https://gitlab.alpinelinux.org/oxr463/aports/-/jobs/117003#L707
I'm getting a bad symbol error on HPE NonStop when running bootstrap.sh:
configure.ac:99: error: possibly undefined macro: AC_LTDL_ENABLE_INSTALL
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
I am running autoconf 2.69.
At present, there is only one api document, which is difficult to read.
No collapsing, no functional navigation.
Can you introduce the functions of each series of APIs separately? as a separate page
See: https://gitlab.com/oxr463/overlay#installation
ACCEPT_KEYWORDS="~amd64" emerge -nq dev-libs/fyaml
ERROR: testemitter-streaming.test - missing test plan
ERROR: testemitter-streaming.test - exited with status 127 (command not found?)
============================================================================
Testsuite summary for libfyaml 0.7.12
============================================================================
# TOTAL: 1490
# PASS: 1485
# SKIP: 2
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 2
============================================================================
See test/test-suite.log
Please report to [email protected]
============================================================================
FAIL: testerrors
================
1..10
errmsg: :2:1: error: duplicate key
ok 1 0002 - Duplicate key (plain scalar)
PASS: testerrors.test 1 0002 - Duplicate key (plain scalar)
errmsg: :2:2: error: duplicate key
ok 2 0003 - Duplicate key (plain scalar, quoted scalar)
PASS: testerrors.test 2 0003 - Duplicate key (plain scalar, quoted scalar)
errmsg: :3:1: error: duplicate key
ok 3 0004 - Duplicate key (plain scalar, literal scalar)
PASS: testerrors.test 3 0004 - Duplicate key (plain scalar, literal scalar)
errmsg: :2:1: error: duplicate key
ok 4 0005 - Duplicate key (sequence)
PASS: testerrors.test 4 0005 - Duplicate key (sequence)
errmsg: :2:1: error: duplicate key
Segmentation fault
ok 5 0006 - Duplicate key (simple mapping)
PASS: testerrors.test 5 0006 - Duplicate key (simple mapping)
errmsg:
--- ./test-errors/0007//test.error 2019-12-18 23:13:37.000000000 -0800
+++ /tmp/tmp.TKUbWAy0 2022-11-02 23:53:05.336522000 -0700
@@ -1 +1 @@
-:2:1: error: duplicate key
+
not ok 6 0007 - Duplicate key (complex sorted mapping)
FAIL: testerrors.test 6 0007 - Duplicate key (complex sorted mapping)
errmsg: :1:5: error: invalid alias
ok 7 0008 - Unknown alias
PASS: testerrors.test 7 0008 - Unknown alias
errmsg: :2:9: error: invalid merge key value
ok 8 0009 - Invalid merge key (referencing not a mapping)
PASS: testerrors.test 8 0009 - Invalid merge key (referencing not a mapping)
errmsg: :1:8: error: invalid merge key value
ok 9 0010 - Invalid merge key (not an alias, scalar)
PASS: testerrors.test 9 0010 - Invalid merge key (not an alias, scalar)
errmsg: :2:8: error: invalid merge key value
ok 10 0011 - Invalid merge key (not an alias sequence item)
PASS: testerrors.test 10 0011 - Invalid merge key (not an alias sequence item)
Version: 0.7.12
OS: FreeBSD 13.1
FreeBSD doesn't have the alloca.h
header.
It would be great to have this awesome project ported to CMake, for easier integration with other tools, and probably as a first step to solve #10 too.
My initial attempt (working for the library itself for now) is here, for your reference: https://github.com/MRPT/libfyaml/commit/75addc8ec95a55eeb9339519d28f85887b748bc2
... because bash is installed into different locations on different systems.
FreeBSD 13.1 STABLE
https://github.com/pantoniou/libfyaml/blob/cd04f85/src/lib/fy-parse.c#L40
error(compilation): clang failed with stderr: ./libfyaml/src/lib/fy-parse.c:40:2: error: invalid preprocessing directive
Is there a way to stream output? I attempted to by creating and fy_event
manually, but there appears to be no way to create tokens.
See YAML 1.2 Section 5.4. Line Break Characters. It mentions this is specifically for JSON compatibility so it doesn't really matter here, but I think fy_is_yaml12_lb
should be adjusted.
There seem to be a number of issues with the JSON output provided by fy-tool
. e.g. given an input file x.yml
of:
foo:
bar: a\s+b
a simple dump command of src/fy-tool -m json x.yml
yields
{
"foo": ,
"bar": "a\s+b"
}
...which isn't JSON for two reasons:
foo
should be a null
literalbar
's string isn't escaped properly.When creating a scalar node that starts with &, libfyaml does not quote or escape the string value. When subsequently parsed, the value is mistaken for an anchor. See attached test case (change extension from .txt to .cpp).
The relevant bits of the test case:
...
fy_node* val1 = fy_node_create_scalar(fydoc, "&Hello \"Value1\"", -1);
...
fy_emit_document_to_file(fydoc, flags, yaml_file);
...
fy_document_build_from_file(&fypcfg, yaml_file);
...
const char* nodeval = fy_node_get_scalar(val1, &nodeval_len);
if (nodeval =="&Hello \"Value1\"") { print "PASS"; } else { print "FAIL"; }
I'm trying to build libfyaml 0.2 on alpine linux 3.10. It seems they are not using glibc, so qsort_r is missing:
lib/fy-doc.c: In function 'fy_node_mapping_perform_sort':
lib/fy-doc.c:3608:2: warning: implicit declaration of function 'qsort_r'; did you mean 'qsort'? [-Wimplicit-function-declaration]
qsort_r(fynpp, count, sizeof(*fynpp), fy_node_mapping_sort_cmp, &ctx);
^~~~~~~
qsort
CC lib/fy-emit.o
CC lib/fy-utils.o
LINK libfyaml-0.2.la
ar: `u' modifier ignored since `D' is the default (see `U')
CC tool/fy_tool-fy-tool.o
LINK fy-tool
/usr/lib/gcc/x86_64-alpine-linux-musl/8.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: ./.libs/libfyaml-0.2.so: undefined reference to `qsort_r'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:641: fy-tool] Error 1
make[1]: *** [Makefile:575: all-recursive] Error 1
make: *** [Makefile:474: all] Error 2
$ make check V=1 CFLAGS='-O2 -g -pipe -fsanitize=address,undefined -fsanitize-address-use-after-scope -fno-sanitize-recover=all -fno-omit-frame-pointer -fno-common'
...
# TOTAL: 996
# PASS: 650
# SKIP: 0
# XFAIL: 0
# FAIL: 346
# XPASS: 0
# ERROR: 0
In test/test-suite.log
, it seems all the failures arise from the following 2 errors:
xxhash/xxhash.c:227:52: runtime error: member access within misaligned address 0x7f5615565007 for type 'struct U32_S', which requires 4 byte alignment
lib/fy-emit.c:1685:47: runtime error: left shift of 3 by 30 places cannot be represented in type 'int'
Hi,
I am running some experiments for AFLAPI(fuzzing) and it has found a buffer-overflow (to be exact, out-of-bounds access after debugging) in fy_atom_raw_line_iter_next. This bug seems to be harmless because it happened when insert with an invalid character in alias (fy_diag_error_atom_display
-->fy_atom_raw_line_iter_next
--> fy_utf8_get_right
).
Environment: Ubuntu 20.04 + gcc 9.4.0
I have debugged it a few hours ago, but cannot find what really cause this bug. But I found that this bug access out-of-bounds in fy-utf8.h/line 93, so this bug seems to be harmless.
Harness (attached: file named as "test_fy_document_insert_at.c"
):
#include <libfyaml.h>
#include <stdio.h>
int main(int argc, char** argv) {
struct fy_document *fyd = NULL;
fyd = fy_document_build_from_file(NULL, "test1.yaml");
if (!fyd) {
fprintf(stderr, "failed to build document");
goto failed;
}
int rc;
char key[12] = {0x26, 0x2b, 0x74, 0x68, 0x65, 0x62, 0x65, 0x86, 0x6e, 0x67, 0x77, 0x00}; // here is the poc (len: 0xc, but access position 0xd?)
rc = fy_document_insert_at(fyd, key, FY_NT, fy_node_buildf(fyd, "abc"));
if (rc) {
fprintf(stderr, "failed to emit document to stdout\n");
goto failed;
}
rc = fy_emit_document_to_fp(fyd, FYECF_DEFAULT | FYECF_SORT_KEYS, stdout);
if (rc) {
fprintf(stderr, "failed to emit document to stdout\n");
goto failed;
}
failed:
fy_document_destroy(fyd);
return rc;
}
The test1.yaml (attached):
base: &base
name: this-is-a-name
Poc:
Poc.zip
To reproduce:
• Complie the hole project with ASAN:
CFLAGS="-fsanitize=address -g" ./bootstrap.sh
CFLAGS="-fsanitize=address -g" ./configure
make && sudo make install
• Complie the harness with ASAN:
gcc -fsanitize=address -o test_fy_document_insert_at test_fy_document_insert_at.c -lfyaml
• Run harness:
./test_fy_document_insert_at
ASAN says:
<memory-@0x7ffd52519e40-0x7ffd52519e4a>:1:14: error: invalid character in anchor
=================================================================
==1614159==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd52519e4c at pc 0x7f0b71a150ce bp 0x7ffd52517ba0 sp 0x7ffd52517b90
READ of size 1 at 0x7ffd52519e4c thread T0
#0 0x7f0b71a150cd in fy_utf8_get_right lib/fy-utf8.h:93
#1 0x7f0b71a150cd in fy_atom_raw_line_iter_next lib/fy-atom.c:1732
#2 0x7f0b71a04339 in fy_diag_error_atom_display lib/fy-diag.c:661
#3 0x7f0b71a06236 in fy_diag_error_token_display lib/fy-diag.c:806
#4 0x7f0b71a06236 in fy_diag_error_token_display lib/fy-diag.c:801
#5 0x7f0b71a06236 in fy_diag_vreport lib/fy-diag.c:854
#6 0x7f0b71a06eee in fy_parser_diag_vreport lib/fy-diag.c:963
#7 0x7f0b71a0706f in fy_parser_diag_report lib/fy-diag.c:976
#8 0x7f0b719d7a6d in fy_fetch_anchor_or_alias lib/fy-parse.c:2894
#9 0x7f0b719f1c77 in fy_fetch_tokens lib/fy-parse.c:4976
#10 0x7f0b719f4057 in fy_scan_peek lib/fy-parse.c:5093
#11 0x7f0b719f4057 in fy_scan_peek lib/fy-parse.c:5038
#12 0x7f0b719f7144 in fy_parse_internal lib/fy-parse.c:5524
#13 0x7f0b71a4ebff in fy_document_builder_load_document lib/fy-docbuilder.c:529
#14 0x7f0b71a456be in fy_parse_load_document_with_builder lib/fy-doc.c:1940
#15 0x7f0b71a45acd in fy_document_build_internal lib/fy-doc.c:3242
#16 0x7f0b71a45e9f in fy_document_build_from_string lib/fy-doc.c:3299
#17 0x7f0b71a4603b in fy_node_mapping_lookup_pair_by_string lib/fy-doc.c:3793
#18 0x7f0b71a4603b in fy_node_mapping_lookup_pair_by_string lib/fy-doc.c:3784
#19 0x7f0b71a4609c in fy_node_mapping_lookup_by_string lib/fy-doc.c:3810
#20 0x7f0b71a434a4 in fy_node_by_path_internal lib/fy-doc.c:4202
#21 0x7f0b71a47171 in fy_node_by_path lib/fy-doc.c:4467
#22 0x7f0b71a47386 in fy_document_insert_at lib/fy-doc.c:2484
#23 0x55bc66b654a7 in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_insert_at+0x14a7)
#24 0x7f0b717c8082 in __libc_start_main ../csu/libc-start.c:308
#25 0x55bc66b6522d in _start (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_insert_at+0x122d)
Address 0x7ffd52519e4c is located in stack of thread T0 at offset 44 in frame
#0 0x55bc66b652f8 in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_insert_at+0x12f8)
This frame has 1 object(s):
[32, 44) 'key' (line 14) <== Memory access at offset 44 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow lib/fy-utf8.h:93 in fy_utf8_get_right
Shadow bytes around the buggy address:
0x10002a49b370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002a49b380: 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1 04 f2 00 f2
0x10002a49b390: f2 f2 04 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
0x10002a49b3a0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f3 f3
0x10002a49b3b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10002a49b3c0: 00 00 00 00 f1 f1 f1 f1 00[04]f3 f3 00 00 00 00
0x10002a49b3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002a49b3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002a49b3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002a49b400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10002a49b410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1614159==ABORTING
We found a case where libfyaml does not adhere to the yaml specification. We're storing ip addresses in a sequence, and one of the ip addresses is "::". That is a valid IPv6 address. A very short yaml file that's valid according to the specification is
---
- ::
...
libfyaml tries to parse the :: as a mapping. It should only be a mapping if there's a space after the last colon. libfyaml is in good company making this mistake. I've tried libyaml, ruby's yaml parser, and a number of online validators. Only one got it right:
here's the yaml file
someversion: 1.2.9
core:
name: some name of something
it will load up till "some" then cut off
In https://github.com/pantoniou/libfyaml#usage-and-examples, there is
count = fy_document_scanf(fyd, "/invoice %u " "/bill-to/given %256s", &invoice_nr, given);
User has to hardcode the key in the _scanf() call. Is there a way to do something like this
count = fy_document_scanf(fyd, "%s %u " "%s %256s", key1, &invoice_nr, key2, given);
In the doc page of fy_node_scanf(), https://pantoniou.github.io/libfyaml/libfyaml.html#fy-node-scanf
fyn = { foo: 3 } -> fy_node_scanf(fyn, "/foo d", struct var) -> var = 3
Again, user has to hardcode the key, e.g. "/foo", in the _scanf() call. Since fyn has been located already, is there a way to do something like this?
fyn = { foo: 3 } -> fy_node_scanf(fyn, "d", struct var) -> var = 3
There is also fy_node_get_scalar() API which returns a char*. Are there API's to get numeric values?
Thank you very much for the insight.
I am using the stdin based input for reading a sequence of documents.
These documents are produced in another program that pipes the result to the next program.
My problem right now is that the output on stdout is flushed without problem, but the receiving program does to good a job in buffering the input. I am passing several tiny documents before the first one is even recognized by the parser.
Question is, why is the buffering done anyway? the stdio functions already have a buffering and some standard ways to tweak it. So fyaml doesn't need to add another layer of buffer in my opinion. For my purposes I am removed the buffering and it worked fine.
When running the test suite in parallel, the two tests testsuite.test
and jsontestsuite.test
may be executed before their dependencies test-suite-data
and json-test-suite-data
are finished, as the download / cloning of the git repository takes a considerable amount of time. In this case, those two tests fail due to missing test data.
You may be able to reproduce this issue by running make check with the -j flag, e.g. env TESTS="jsontestsuite.test" make -e -j16 check
. If the test is executed too fast / before the git clone(s) of test-suite-data
and json-test-suite-data
are finished, the test log file will include errors like:
[ERR]: failed to open json-test-suite-data/test_parsing/n_*.json
[ERR]: failed to open json-test-suite-data/test_parsing/i_*.json
It may be a solution to clone the two repos during initial build in e.g. bootstrap.sh - or, if it's desired to only download those files when tests are actually executed, inside the actual tests.
I tried using the fy_document_scanf()
function to read a scalar value from an array and it looks like it doesn't work? This test code:
#include <stdlib.h>
#include <libfyaml.h>
int main(int argc, char *argv[])
{
static const char *yaml =
"---\n"
" - aaa\n"
" - bbb\n"
" - ccc\n";
static const char *json = "{\"bla\":[\"AAA\",\"BBB\",\"CCC\"]}";
struct fy_document *fyd1, *fyd2;
char sout[10];
if (!(fyd1=fy_document_build_from_string(NULL, yaml, (size_t)-1))) return EXIT_FAILURE;
if (!(fyd2=fy_document_build_from_string(NULL, json, (size_t)-1))) return EXIT_FAILURE;
if (fy_document_scanf(fyd1, "/[1] %10s", sout)!=1) return EXIT_FAILURE;
printf("sout.yaml='%s'\n", sout);
if (fy_document_scanf(fyd2, "/bla/[-2] %10s", sout)!=1) return EXIT_FAILURE;
printf("sout.json='%s'\n", sout);
return EXIT_SUCCESS;
}
I looked in the library and after a small intervention it looks better, I can also read the value from a document containing only one scalar (path /):
--- src/lib/fy-doc.c.orig 2022-06-11 19:57:44.000000000 +0100
+++ src/lib/fy-doc.c 2022-06-12 07:51:53.000000000 +0100
@@ -4070,9 +4070,10 @@
__func__, __LINE__, fy_node_get_path(fyn), (int)(e - s), s); */
fyn = fy_node_follow_aliases(fyn, flags, true);
- /* scalar can't match (it has no key) */
+ /* scalar can be only last element in the path (it has no key) */
if (fy_node_is_scalar(fyn)) {
- fyn = NULL;
+ if (*s)
+ fyn = NULL; /* not end of the path - fail */
goto out;
}
Please, is there a documented search path format?
"it's testsuite" should be "its testsuite".
Please add the ability to specify an indent specifically for array elements in the emitter. The current output is this:
key:
- a: 1
- b: 2
Many prefer the readability of arrays like this:
key:
- a: 1
- b: 2
I'm trying to get the emitter to output an array of documents in full block style. This is what the library outputs in block mode:
key: [
a: 1,
b: 2]
and this is what I'm trying to emit as output:
key:
- a: 1
- b: 2
Is this possible?
fy_event.type
is never set by the event create method.
libfyaml includes alloca.h
, which doesn't exist on FreeBSD. Replacing alloca.h
by stdlib.h
seems to solve the issue. This also works on all other systems I tested (see JuliaPackaging/Yggdrasil#4306).
Currently to use fy-tool --tsv-event
you also need --testsuite
.
When I asked for the tsv output, it had nothing to do with the test suite, even
though that output is also one event per line.
The purpose of tsv was for writing a loader/composer using fyaml by just
reading stdin rather than needing to do a formal C binding.
This could be useful for all sorts of things, and honestly we might do well to
make this into a separate binary called bin/yaml-parse-events
at some point.
Also it's a bit wrong to have --testsuite
as the option for events formatted
for testing with the suite. In the future we will likely have other tests in
the suite besides parse events.
My suggestions at this point would be:
--mode=events-tsv
--mode=events-testsuite
You can keep --testsuite
as an alias for --mode=events-testsuite
.
I'd also like the events-tsv output to add a =ERR
output line when a parse
error occurs. It should contain the line position numbers and the error
message.
That way a loader built over it could handle/format the error as it sees fit.
This would be cleaner than the loader having to capture an actual parse error
from fy-tool.
Note: This =ERR
should not be added to the events-testsuite
output.
I'm using YAML for configuration of scientific simulations and in exceptional cases, the file size can exceed 2GB. When using such a large file, I get [ERR]: fy_parse_load_document() failed
.
Could this be related to using int
for some length-related operations or is it most likely caused by another limitation?
If it is only an integer overflow: Is a MR changing int
to size_t
where needed welcomed? Or is the higher memory consumption not acceptable (having in mind that I'm probably the only person with ridiculously large YAML files).
If the gcc option -fvisibility=hidden
is set as part of CFLAGS
when compiling, libfyaml's make all
fails when attempting to link fy-tool. A work-around is to add the lines
#if __GNUC__ >= 4
#pragma GCC visibility push(default)
#endif
to libfyaml.h
. This makes sure that the public functions are not hidden by the -fvisibility=hidden
flag.
Thanks for developing the library. It is working great!
I assume that dump_scan_token
in fy-tool.c in case of FYTT_VALUE
should print VALUE or something like that, but not KEY.
Not running ldconfig
after installing yields:
fy-tool: error while loading shared libraries: libfyaml-0.3.so.0: cannot open shared object file: No such file or directory
The install step should do this automatically, or documentation/README should be updated to indicate it as necessary.
I've been trying to add an API to get the parsed comments, via adding this commit https://github.com/MRPT/libfyaml/commit/2e67a1528fabb9f8478ebadc66d744a3b4227bea
both, over your project "master", and over the "comment-wip" branch (both result in the same error described next).
Token values with a comment correctly have fy_atom_is_set()
to TRUE, but trying to get the comment text from a parser event loop with the new function fy_token_get_comment()
only returns a pointer to rubbish...
Please, take a look at my use of fy_atom_format_text()
since something must be wrong but can't find it.
PS: Thanks for this amazing library!
I get an odd result when trying to install from libfyaml-v0.5.7.tar.gz
which doesn't happen when I use libfyaml-0.5.5.tar.gz.
The build is on CentOS 7. I believe I'm following identical steps for each.
When I configure version 0.5.5 I get this:
---{ libfyaml 0.5.5 }---
VERSION: 0.5.5
MAJOR.MINOR: 0.5
PATCH: 5
EXTRA: 0.5.5
.
.
.
However when I configure version 0.5.7 I get this:
---{ libfyaml UNKNOWN }---
VERSION: UNKNOWN
MAJOR.MINOR: UNKNOWN.UNKNOWN
PATCH: UNKNOWN
EXTRA: UNKNOWN
As a result when I install the libraries the file names are incorrect:
for example:
libfyaml-UNKNOWN.UNKNOWN.a
I am not experienced developing with autoconf but I've installed many
packages that use it and I've never seen this before.
It seem's that it is forbidden to licesne your library under MIT license with code licensed under GPL2.0 in it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.