Releases: biojppm/rapidyaml
Release 0.7.2
Release 0.7.1
New features
- PR#459: Add version functions and macros:
#define RYML_VERSION "0.7.1" #define RYML_VERSION_MAJOR 0 #define RYML_VERSION_MINOR 7 #define RYML_VERSION_PATCH 1 csubstr version(); int version_major(); int version_minor(); int version_patch();
Fixes
- Fix #455: parsing of trailing val-less nested maps when deindented to maps (PR#460)
- Fix filtering of double-quoted keys in block maps (PR#452)
- Fix #440: some tests failing with gcc -O2 (hypothetically due to undefined behavior)
- This was accomplished by refactoring some internal parser functions; see the comments in #440 for further details.
- Also, fix all warnings from
scan-build
.
- Use malloc.h instead of alloca.h on MinGW (PR#447)
- Fix #442 (PR#443):
- Ensure leading
+
is accepted when deserializing numbers. - Ensure numbers are not quoted by fixing the heuristics in
scalar_style_query_plain()
andscalar_style_choose()
. - Add quickstart sample for overflow detection (only of integral types).
- Ensure leading
- Parse engine: cleanup unused macros
Thanks
Release 0.7.0
Most of the changes are from the giant Parser refactor described below. Before getting to that, some other minor changes first.
Fixes
- #PR431 - Emitter: prevent stack overflows when emitting malicious trees by providing a max tree depth for the emit visitor. This was done by adding an
EmitOptions
structure as an argument both to the emitter and to the emit functions, which is then forwarded to the emitter. ThisEmitOptions
structure has a max tree depth setting with a default value of 64. - #PR431 - Fix
_RYML_CB_ALLOC()
using(T)
in parenthesis, making the macro unusable. - #434 - Ensure empty vals are not deserialized (#PR436).
- #PR433:
- Fix some corner cases causing read-after-free in the tree's arena when it is relocated while filtering scalars.
- Improve YAML error conformance - detect YAML-mandated parse errors when:
New features
- #PR431 - append-emitting to existing containers in the
emitrs_
functions, suggested in #345. This was achieved by adding abool append=false
as the last parameter of these functions. - #PR431 - add depth query methods:
Tree::depth_asc(id_type) const; // O(log(num_tree_nodes)) get the depth of a node ascending (ie, from root to node) Tree::depth_desc(id_type) const; // O(num_tree_nodes) get the depth of a node descending (ie, from node to deep-most leaf node) ConstNodeRef::depth_asc() const; // likewise ConstNodeRef::depth_desc() const; NodeRef::depth_asc() const; NodeRef::depth_desc() const;
- #PR432 - Added a function to estimate the required tree capacity, based on yaml markup:
size_t estimate_tree_capacity(csubstr); // estimate number of nodes resulting from yaml
All other changes come from #PR414.
Parser refactor
The parser was completely refactored (#PR414). This was a large and hard job carried out over several months, but it brings important improvements.
- The new parser is an event-based parser, based on an event dispatcher engine. This engine is templated on event handler, where each event is a function call, which spares branches on the event handler. The parsing code was fully rewritten, and is now much more simple (albeit longer), and much easier to work with and fix.
- YAML standard-conformance was improved significantly. Along with many smaller fixes and additions, (too many to list here), the main changes are the following:
- The parser engine can now successfully parse container keys, emitting all the events in correctly, but as before, the ryml tree cannot accomodate these (and this constraint is no longer enforced by the parser, but instead by
EventHandlerTree
). For an example of a handler which can accomodate key containers, see the one which is used for the test suite attest/test_suite/test_suite_event_handler.hpp
- Anchor keys can now be terminated with colon (eg,
&anchor: key: val
), as dictated by the standard.
- The parser engine can now successfully parse container keys, emitting all the events in correctly, but as before, the ryml tree cannot accomodate these (and this constraint is no longer enforced by the parser, but instead by
- The parser engine can now be used to create native trees in other programming languages, or in cases where the user must have container keys.
- Performance of both parsing and emitting improved significantly; see some figures below.
Strict JSON parser
- A strict JSON parser was added. Use the
parse_json_...()
family of functions to parse json in stricter mode (and faster) than flow-style YAML.
YAML style preserved while parsing
- The YAML style information is now fully preserved through parsing/emitting round trips. This was made possible because the event model of the new parsing engine now incorporates style varieties. So, for example:
- a scalar parsed from a plain/single-quoted/double-quoted/block-literal/block-folded scalar will be emitted always using its original style in the YAML source
- a container parsed in block-style will always be emitted in block-style
- a container parsed in flow-style will always be emitted in flow-style
Because of this, the style of YAML emitted by ryml changes from previous releases.
- Scalar filtering was improved and is now done directly in the source being parsed (which may be in place or in the arena), except in the cases where the scalar expands and does not fit its initial range, in which case the scalar is filtered out of place to the tree's arena.
- Filtering can now be disabled while parsing, to ensure a fully-readonly parse (but this feature is still experimental and somewhat untested, given the scope of the rewrite work).
- The parser now offers methods to filter scalars in place or out of place.
- Style flags were added to
NodeType_e
:FLOW_SL ///< mark container with single-line flow style (seqs as '[val1,val2], maps as '{key: val,key2: val2}') FLOW_ML ///< mark container with multi-line flow style (seqs as '[\n val1,\n val2\n], maps as '{\n key: val,\n key2: val2\n}') BLOCK ///< mark container with block style (seqs as '- val\n', maps as 'key: val') KEY_LITERAL ///< mark key scalar as multiline, block literal | VAL_LITERAL ///< mark val scalar as multiline, block literal | KEY_FOLDED ///< mark key scalar as multiline, block folded > VAL_FOLDED ///< mark val scalar as multiline, block folded > KEY_SQUO ///< mark key scalar as single quoted ' VAL_SQUO ///< mark val scalar as single quoted ' KEY_DQUO ///< mark key scalar as double quoted " VAL_DQUO ///< mark val scalar as double quoted " KEY_PLAIN ///< mark key scalar as plain scalar (unquoted, even when multiline) VAL_PLAIN ///< mark val scalar as plain scalar (unquoted, even when multiline)
- Style predicates were added to
NodeType
,Tree
,ConstNodeRef
andNodeRef
:bool is_container_styled() const; bool is_block() const bool is_flow_sl() const; bool is_flow_ml() const; bool is_flow() const; bool is_key_styled() const; bool is_val_styled() const; bool is_key_literal() const; bool is_val_literal() const; bool is_key_folded() const; bool is_val_folded() const; bool is_key_squo() const; bool is_val_squo() const; bool is_key_dquo() const; bool is_val_dquo() const; bool is_key_plain() const; bool is_val_plain() const;
- Style modifiers were also added:
void set_container_style(NodeType_e style); void set_key_style(NodeType_e style); void set_val_style(NodeType_e style);
- Emit helper predicates were added, and are used when an emitted node was built programatically without style flags:
/** choose a YAML emitting style based on the scalar's contents */ NodeType_e scalar_style_choose(csubstr scalar) noexcept; /** query whether a scalar can be encoded using single quotes. * It may not be possible, notably when there is leading * whitespace after a newline. */ bool scalar_style_query_squo(csubstr s) noexcept; /** query whether a scalar can be encoded using plain style (no * quotes, not a literal/folded block scalar). */ bool scalar_style_query_plain(csubstr s) noexcept;
Breaking changes
As a result of the refactor, there are some limited changes with impact in client code. Even though this was a large refactor, effort was directed at keeping maximal backwards compatibility, and the changes are not wide. But they still exist:
- The existing
parse_...()
methods in theParser
class were all removed. Use the correspondingparse_...(Parser*, ...)
function from the headerc4/yml/parse.hpp
. - When instantiated by the user, the parser now needs to receive a
EventHandlerTree
object, which is responsible for building the tree. Although fully functional and tested, the structure of this class is still somewhat experimental and is still likely to change. There is an alternative event handler implementation responsible for producing the events for the YAML test suite intest/test_suite/test_suite_event_handler.hpp
. - The declaration and definition of
NodeType
was moved to a separate header filec4/yml/node_type.hpp
(previously it was inc4/yml/tree.hpp
). - Some of the node type flags were removed, and several flags (and combination flags) were added.
- Most of the existing flags are kept, as well as their meaning.
KEYQUO
andVALQUO
are now masks of the several style flags for quoted scalars. In general, however, client code...
Release 0.6.0
Add API documentation
- PR#423: add Doxygen-based API documentation, now hosted in https://rapidyaml.readthedocs.io/!
- It uses the base doxygen docs, as I couldn't get doxyrest or breathe or exhale to produce anything meaningful using the doxygen groups already defined in the source code.
Error handling
Fix major error handling problem reported in #389 (PR#411):
- The
NodeRef
andConstNodeRef
classes are now conditional noexcept usingRYML_NOEXCEPT
, which evaluates either to nothing when assertions are enabled, and tonoexcept
otherwise. The problem was that these classes had many methods explicitly markednoexcept
, but were doing assertions which could throw exceptions, causing an abort instead of a throw whenever the assertion called an exception-throwing error callback. - This problem was compounded by assertions being enabled in every build type -- despite the intention to have them only in debug builds. There was a problem in the preprocessor code to enable assertions which led to assertions being enabled in release builds even when
RYML_USE_ASSERT
was defined to 0. Thanks to @jdrouhard for reporting this. - Although the code is and was extensively tested, the testing was addressing mostly the happy path. Tests were added to ensure that the error behavior is as intended.
- Together with this changeset, a major revision was carried out of the asserting/checking status of each function in the node classes. In most cases, assertions were added to functions that were missing them. So beware - some user code that was invalid will now assert or error out. Also, assertions and checks are now directed as much as possible to the callbacks of the closest scope: ie, if a tree has custom callbacks, errors within the tree class should go through those callbacks.
- Also, the intended assertion behavior is now in place: no assertions in release builds. Beware as well - user code which was relying on this will now silently succeed and return garbage in release builds. See the next points, which may help.
- Added new methods to the
NodeRef
/ConstNodeRef
classes:/** Distinguish between a valid seed vs a valid non-seed ref. */ bool readable() const { return valid() && !is_seed(); } /** Get a child by name, with error checking; complexity is * O(num_children). * * Behaves as operator[](csubstr) const, but always raises an * error (even when RYML_USE_ASSERT is set to false) when the * returned node does not exist, or when this node is not * readable, or when it is not a map. This behaviour is similar to * std::vector::at(), but the error consists in calling the error * callback instead of directly raising an exception. */ ConstNodeRef at(csubstr key) const; /** Likewise, but return a seed node when the key is not found */ NodeRef at(csubstr key); /** Get a child by position, with error checking; complexity is * O(pos). * * Behaves as operator[](size_t) const, but always raises an error * (even when RYML_USE_ASSERT is set to false) when the returned * node does not exist, or when this node is not readable, or when * it is not a container. This behaviour is similar to * std::vector::at(), but the error consists in calling the error * callback instead of directly raising an exception. */ ConstNodeRef at(size_t pos) const; /** Likewise, but return a seed node when pos is not found */ NodeRef at(csubstr key);
- The state for
NodeRef
was refined, and now there are three mutually exclusive states (and class predicates) for an object of this class:.invalid()
when the object was not initialized to any node.readable()
when the object points at an existing tree+node.is_seed()
when the object points at an hypotethic tree+node- The previous state
.valid()
was deprecated: its semantics were confusing as it actually could be any of.readable()
or.is_seed()
- Deprecated also the following methods for
NodeRef
/ConstNodeRef
:RYML_DEPRECATED() bool operator== (std::nullptr_t) const; RYML_DEPRECATED() bool operator!= (std::nullptr_t) const; RYML_DEPRECATED() bool operator== (csubstr val) const; RYML_DEPRECATED() bool operator!= (csubstr val) const;
- Added macros and respective cmake options to control error handling:
RYML_USE_ASSERT
- enable assertions regardless of build type. This is disabled by default. This macro was already defined; the current PR adds the cmake option.RYML_DEFAULT_CALLBACK_USES_EXCEPTIONS
- make the default error handler provided by ryml throw exceptions instead of callingstd::abort()
. This is disabled by default.
- Also,
RYML_DEBUG_BREAK()
is now enabled only ifRYML_DBG
is defined, as reported in #362. - As part of PR#423, to improve linters and codegen:
- annotate the error handlers with
[[noreturn]]
/C4_NORETURN
- annotate some error sites with
C4_UNREACHABLE_AFTER_ERR()
- annotate the error handlers with
More fixes
Tree::arena() const
was returning asubstr
; this was an error. This function was changed to:csubstr Tree::arena() const; substr Tree::arena();
- Fix #390 -
csubstr::first_real_span()
failed on scientific numbers with one digit in the exponent (PR#415). - Fix #361 - parse error on map scalars containing
:
and starting on the next line:--- # failed to parse: description: foo:bar --- # but this was ok: description: foo:bar
- PR#368 - fix pedantic compiler warnings.
- Fix #373 - false parse error with empty quoted keys in block-style map (PR#374).
- Fix #356 - fix overzealous check in
emit_as()
. An id may be larger than the tree's size, eg when nodes were removed. (PR#357). - Fix #417 - add quickstart example explaining how to avoid precision loss while serializing floats (PR#420).
- Fix #380 - Debug visualizer .natvis file for Visual Studio was missing
ConstNodeRef
(PR#383). - FR #403 - install is now optional when using cmake. The relevant option is
RYML_INSTALL
.
Python
Thanks
Release 0.5.0
Breaking changes
- Make the node API const-correct (PR#267): added
ConstNodeRef
to hold a constant reference to a node. As the name implies, aConstNodeRef
object cannot be used in any tree-mutating operation. It is also smaller than the existingNodeRef
, and faster because it does not need to check its own validity on every access. As a result of this change, there are now some constraints when obtaining a ref from a tree, and existing code is likely to break in this type of situation:The use ofconst Tree const_tree = ...; NodeRef nr = const_tree.rootref(); // ERROR (was ok): cannot obtain a mutating NodeRef from a const Tree ConstNodeRef cnr = const_tree.rootref(); // ok Tree tree = ...; NodeRef nr = tree.rootref(); // ok ConstNodeRef cnr = tree.rootref(); // ok (implicit conversion from NodeRef to ConstNodeRef) // to obtain a ConstNodeRef from a mutable Tree // while avoiding implicit conversion, use the `c` // prefix: ConstNodeRef cnr = tree.crootref(); // likewise for tree.ref() and tree.cref(). nr = cnr; // ERROR: cannot obtain NodeRef from ConstNodeRef cnr = nr; // ok
ConstNodeRef
also needs to be propagated through client code. One such place is when deserializing types:// needs to be changed from: template<class T> bool read(ryml::NodeRef const& n, T *var); // ... to: template<class T> bool read(ryml::ConstNodeRef const& n, T *var);
- The initial version of
ConstNodeRef/NodeRef
had the problem that const methods in the CRTP base did not participate in overload resolution (#294), preventing calls fromconst NodeRef
objects. This was fixed by moving non-const methods to the CRTP base and disabling them with SFINAE (PR#295). - Also added disambiguation iteration methods:
.cbegin()
,.cend()
,.cchildren()
,.csiblings()
(PR#295).
- The initial version of
- Deprecate
emit()
andemitrs()
(#120, PR#303): useemit_yaml()
andemitrs_yaml()
instead. This was done to improve compatibility with Qt, which leaks a macro namedemit
. For more information, see #120.- In the Python API:
- Deprecate
emit()
, addemit_yaml()
andemit_json()
. - Deprecate
compute_emit_length()
, addcompute_emit_yaml_length()
andcompute_emit_json_length()
. - Deprecate
emit_in_place()
, addemit_yaml_in_place()
andemit_json_in_place()
. - Calling the deprecated functions will now trigger a warning.
- Deprecate
- In the Python API:
- Location querying is no longer done lazily (#260, PR#307). It now requires explicit opt-in when instantiating the parser. With this change, the accelerator structure for location querying is now built when parsing:
Parser parser(ParserOptions().locations(true)); // now parsing also builds location lookup: Tree t = parser.parse_in_arena("myfile.yml", "foo: bar"); assert(parser.location(t["foo"]).line == 0u);
- Locations are disabled by default:
Parser parser; assert(parser.options().locations() == false);
- Deprecate
Tree::arena_pos()
: useTree::arena_size()
instead (PR#290). - Deprecate pointless
has_siblings()
: useTree::has_other_siblings()
instead (PR#330.
Performance improvements
-
Improve performance of integer serialization and deserialization (in c4core). Eg, on Linux/g++11.2, with integral types:
c4::to_chars()
can be expected to be roughly...- ~40% to 2x faster than
std::to_chars()
- ~10x-30x faster than
sprintf()
- ~50x-100x faster than a naive
stringstream::operator<<()
followed bystringstream::str()
- ~40% to 2x faster than
c4::from_chars()
can be expected to be roughly...- ~10%-30% faster than
std::from_chars()
- ~10x faster than
scanf()
- ~30x-50x faster than a naive
stringstream::str()
followed bystringstream::operator>>()
For more details, see the changelog for c4core 0.1.10.
- ~10%-30% faster than
-
Fix #289 and #331 - parsing of single-line flow-style sequences had quadratic complexity, causing long parse times in ultra long lines PR#293/PR#332.
- This was due to scanning for the token
:
before scanning for,
or]
, which caused line-length scans on every scalar scan. Changing the order of the checks was enough to address the quadratic complexity, and the parse times for flow-style are now in line with block-style. - As part of this changeset, a significant number of runtime branches was eliminated by separating
Parser::_scan_scalar()
into several different{seq,map}x{block,flow}
functions specific for each context. Expect some improvement in parse times. - Also, on Debug builds (or assertion-enabled builds) there was a paranoid assertion calling
Tree::has_child()
inTree::insert_child()
that caused quadratic behavior because the assertion had linear complexity. It was replaced with a somewhat equivalent O(1) assertion. - Now the byte throughput is independent of line size for styles and containers. This can be seen in the table below, which shows parse troughputs in MB/s of 1000 containers of different styles and sizes (flow containers are in a single line):
Container Style 10elms 100elms 1000elms 1000 Maps block 50.8MB/s 57.8MB/s 63.9MB/s 1000 Maps flow 58.2MB/s 65.9MB/s 74.5MB/s 1000 Seqs block 55.7MB/s 59.2MB/s 60.0MB/s 1000 Seqs flow 52.8MB/s 55.6MB/s 54.5MB/s - This was due to scanning for the token
-
Fix #329: complexity of
has_sibling()
andhas_child()
is now O(1), previously was linear (PR#330).
Fixes
- Fix #233 - accept leading colon in the first key of a flow map (
UNK
node) PR#234::foo: # parse error on the leading colon :bar: a # parse error on the leading colon :barbar: b # was ok :barbarbar: c # was ok foo: # was ok bar: a # was ok :barbar: b # was ok :barbarbar: c # was ol
- Fix #253: double-quoted emitter should encode carriage-return
\r
to preserve roundtrip equivalence:Tree tree; NodeRef root = tree.rootref(); root |= MAP; root["s"] = "t\rt"; root["s"] |= _WIP_VAL_DQUO; std::string s = emitrs<std::string>(tree); EXPECT_EQ(s, "s: \"t\\rt\"\n"); Tree tree2 = parse_in_arena(to_csubstr(s)); EXPECT_EQ(tree2["s"].val(), tree["s"].val());
- Fix parsing of empty block folded+literal scalars when they are the last child of a container (part of PR#264):
seq: - "" - '' - > - | # error, the resulting val included all the YAML from the next node seq2: - "" - '' - | - > # error, the resulting val included all the YAML from the next node map: a: "" b: '' c: > d: | # error, the resulting val included all the YAML from the next node map2: a: "" b: '' c: | d: > # error, the resulting val included all the YAML from the next node lastly: the last
- Fix #274 (PR#296): Lists with unindented items and trailing empty values parse incorrectly:
was wrongly parsed as
foo: - bar - baz: qux
foo: - bar - baz: qux
- Fix #277 (PR#340): merge fails with duplicate keys.
- Fix #337 (PR#338): empty lines in block scalars shall not have tab characters
\t
. - Fix #268 (PR#339): don't override key type_bits when copying val. This was causing problematic resolution of anchors/references.
- Fix #309 (PR#310): emitted scalars containing
@
or`
should be quoted. - Fix #297 (PR#298): JSON emitter should escape control characters.
- Fix #292 (PR#299): JSON emitter should quote version string scalars like
0.1.2
. - Fix #291 (PR#299): JSON emitter should quote scala...
Release 0.4.1
Fixes
- Fix #223: assertion peeking into the last line when it was whitespaces only.
Release 0.4.0
This release improves compliance with the YAML test suite (thanks @ingydotnet and @perlpunk for extensive and helpful cooperation), and adds node location tracking using the parser.
Breaking changes
As part of the new feature to track source locations, opportunity was taken to address a number of pre-existing API issues. These changes consisted of:
- Deprecate
c4::yml::parse()
andc4::yml::Parser::parse()
overloads; all these functions will be removed in short order. Until removal, any call from client code will trigger a compiler warning. - Add
parse()
alternatives, eitherparse_in_place()
orparse_in_arena()
:parse_in_place()
receives onlysubstr
buffers, ie mutable YAML source buffers. Trying to pass acsubstr
buffer toparse_in_place()
will cause a compile error:substr readwrite = /*...*/; Tree tree = parse_in_place(readwrite); // OK csubstr readonly = /*...*/; Tree tree = parse_in_place(readonly); // compile error
parse_in_arena()
receives onlycsubstr
buffers, ie immutable YAML source buffers. Prior to parsing, the buffer is copied to the tree's arena, then the copy is parsed in place. Becauseparse_in_arena()
is meant for immutable buffers, overloads receiving asubstr
YAML buffer are now declared but marked deprecated, and intentionally left undefined, such that callingparse_in_arena()
with asubstr
will cause a linker error as well as a compiler warning.This is to prevent an accidental extra copy of the mutable source buffer to the tree's arena:substr readwrite = /*...*/; Tree tree = parse_in_arena(readwrite); // compile warning+linker error
substr
is implicitly convertible tocsubstr
. If you really intend to parse an originally mutable buffer in the tree's arena, convert it first explicitly to immutable by assigning thesubstr
to acsubstr
prior to callingparse_in_arena()
:This problem does not occur withsubstr readwrite = /*...*/; csubstr readonly = readwrite; // ok Tree tree = parse_in_arena(readonly); // ok
parse_in_place()
becausecsubstr
is not implicitly convertible tosubstr
.
- In the python API,
ryml.parse()
was removed and not just deprecated; theparse_in_arena()
andparse_in_place()
now replace this. Callbacks
: changed behavior inParser
andTree
:- When a tree is copy-constructed or move-constructed to another, the receiving tree will start with the callbacks of the original.
- When a tree is copy-assigned or move-assigned to another, the receiving tree will now change its callbacks to the original.
- When a parser creates a new tree, the tree will now use a copy of the parser's callbacks object.
- When an existing tree is given directly to the parser, both the tree and the parser now retain their own callback objects; any allocation or error during parsing will go through the respective callback object.
New features
- Add tracking of source code locations. This is useful for reporting semantic errors after the parsing phase (ie where the YAML is syntatically valid and parsing is successful, but the tree contents are semantically invalid). The locations can be obtained lazily from the parser when the first location is queried:
See more details in the quickstart sample. Thanks to @cschreib for submitting a working example proving how simple it could be to achieve this.
// To obtain locations, use of the parser is needed: ryml::Parser parser; ryml::Tree tree = parser.parse_in_arena("source.yml", R"({ aa: contents, foo: [one, [two, three]] })"); // After parsing, on the first call to obtain a location, // the parser will cache a lookup structure to accelerate // tracking the location of a node, with complexity // O(numchars(srcbuffer)). Then it will do the lookup, with // complexity O(log(numlines(srcbuffer))). ryml::Location loc = parser.location(tree.rootref()); assert(parser.location_contents(loc).begins_with("{")); // note the location members are zero-based: assert(loc.offset == 0u); assert(loc.line == 0u); assert(loc.col == 0u); // On the next call to location(), the accelerator is reused // and only the lookup is done. loc = parser.location(tree["aa"]); assert(parser.location_contents(loc).begins_with("aa")); assert(loc.offset == 2u); assert(loc.line == 1u); assert(loc.col == 0u); // KEYSEQ in flow style: points at the key loc = parser.location(tree["foo"]); assert(parser.location_contents(loc).begins_with("foo")); assert(loc.offset == 16u); assert(loc.line == 2u); assert(loc.col == 0u); loc = parser.location(tree["foo"][0]); assert(parser.location_contents(loc).begins_with("one")); assert(loc.line == 2u); assert(loc.col == 6u); // SEQ in flow style: location points at the opening '[' (there's no key) loc = parser.location(tree["foo"][1]); assert(parser.location_contents(loc).begins_with("[")); assert(loc.line == 2u); assert(loc.col == 11u); loc = parser.location(tree["foo"][1][0]); assert(parser.location_contents(loc).begins_with("two")); assert(loc.line == 2u); assert(loc.col == 12u); loc = parser.location(tree["foo"][1][1]); assert(parser.location_contents(loc).begins_with("three")); assert(loc.line == 2u); assert(loc.col == 17u); // NOTE: reusing the parser with a new YAML source buffer // will invalidate the accelerator.
Parser
:- add
source()
andfilename()
to get the latest buffer and filename to be parsed - add
callbacks()
to get the parser's callbacks
- add
- Add
from_tag_long()
andnormalize_tag_long()
:assert(from_tag_long(TAG_MAP) == "<tag:yaml.org,2002:map>"); assert(normalize_tag_long("!!map") == "<tag:yaml.org,2002:map>");
- Add an experimental API to resolve tags based on the tree's tag directives. This API is still imature and will likely be subject to changes, so we won't document it yet.
- Regarding emit styles (see issue #37): add an experimental API to force flow/block style on container nodes, as well as block-literal/block-folded/double-quoted/single-quoted/plain styles on scalar nodes. This API is also immature and will likely be subject to changes, so we won't document it yet. But if you are desperate for this functionality, the new facilities will let you go further. See PR#191.
- Add preliminary support for bare-metal ARM architectures, with CI tests pending implementation of QEMU action. (#193, c4core#63).
- Add preliminary support for RISC-V architectures, with CI tests pending availability of RISC-V based github actions. (c4core#69).
Fixes
- Fix edge cases of parsing of explicit keys (ie keys after
?
) (PR#212):# all these were fixed: ? : # empty ? explicit key # this comment was not parsed correctly ? # trailing empty key was not added to the map
- Fixed parsing of tabs used as whitespace tokens after
:
or-
. This feature is costly (see some benchmark results here) and thus it is disabled by default, and requires defining a macro or cmake optionRYML_WITH_TAB_TOKENS
to enable (PR#211). - Allow tab indentation in flow seqs (PR#215) (6CA3).
- ryml now parses successfully compact JSON code
{"like":"this"}
without any need for preprocessing. This code was not valid YAML 1.1, but was made valid in YAML 1.2. So thepreprocess_json()
functions, used to insert spaces after:
are no longer necessary and have been removed. If you were using these functions, remove the calls and just pass the original source directly to ryml's parser (PR#210). - Fix handling of indentation when parsing block scalars (PR#210):
--- | hello there --- | ciao qua --- - | hello there - | ciao qua --- foo: | hello there bar: | ciao qua
- Fix parsing of maps when opening a scope with whitespace before the colon (PR#210):
foo0 : bar --- foo1 : bar # the " :" was causing an assert --- foo2 : bar --- foo3 : bar --- foo4 : bar
- Ensure container keys preserve quote flags when the key is quoted (PR#210).
- Ensure scalars beginning with
%
are emitted with quotes ((PR#216). - Fix #203: when parsing, do not convert
null
or~
to null scalar strings. Now the scalar strings contain the verbatim contents of the original scalar; to query whether a scalar value is null, useTree::key_is_null()/val_is_null()
andNodeRef::key_is_null()/val_is_null()
which return true if it is empty or any of the unquoted strings~
,null
,Null
, orNULL
. (PR#207): - Fix #205: fix parsing of escaped characters in double-quoted strings:
"\\\"\n\r\t\<TAB>\/\<SPC>\0\b\f\a\v\e\_\N\L\P"
([PR#207]...
Release 0.3.0
Breaking changes
Despite ryml being still in a non-stable 0.x.y version, considerable effort goes into trying to avoid breaking changes. However, this release has to collect on the semantic versioning prerogative for breaking changes. This is a needed improvement, so sorry for any nuisance!
The allocation and error callback logic was revamped on the amalgamation PR. Now trees and parsers receive (and store) a full ryml::Callbacks
object instead of the (now removed) ryml::Allocator
which had a pointer to a (now removed) ryml::MemoryResourceCallbacks
, which was a (now removed) ryml::MemoryResource
. To be clear, the Callbacks
class is unchanged, other than removing some unneeded helper methods.
These changes were motivated by unfortunate name clashes between c4::Allocator/ryml::Allocator
and c4::MemoryResource/ryml::MemoryResource
, occurring if <c4/allocator.hpp>
or <c4/memory_resource.hpp>
were included before <c4/yml/common.hpp>
. They also significantly simplify this part of the API, making it really easier to understand.
As a consequence of the above changes, the global memory resource getters and setters for ryml were also removed: ryml::get_memory_resource()/ryml::set_memory_resource()
.
Here's an example of the required changes in client code. First the old client code (from the quickstart):
struct PerTreeMemoryExample : public ryml::MemoryResource
{
void *allocate(size_t len, void * hint) override;
void free(void *mem, size_t len) override;
};
PerTreeMemoryExample mrp;
PerTreeMemoryExample mr1;
PerTreeMemoryExample mr2;
ryml::Parser parser = {ryml::Allocator(&mrp)};
ryml::Tree tree1 = {ryml::Allocator(&mr1)};
ryml::Tree tree2 = {ryml::Allocator(&mr2)};
Should now be rewritten to:
struct PerTreeMemoryExample
{
ryml::Callbacks callbacks() const; // helper to create the callbacks
};
PerTreeMemoryExample mrp;
PerTreeMemoryExample mr1;
PerTreeMemoryExample mr2;
ryml::Parser parser = {mrp.callbacks()};
ryml::Tree tree1 = {mr1.callbacks()};
ryml::Tree tree2 = {mr2.callbacks()};
New features
- Add amalgamation into a single header file (PR #172):
- The amalgamated header will be available together with the deliverables from each release.
- To generate the amalgamated header:
$ python tools/amalgamate.py ryml_all.hpp
- To use the amalgamated header:
- Include at will in any header of your project.
- In one - and only one - of your project source files,
#define RYML_SINGLE_HDR_DEFINE_NOW
and then#include <ryml_all.hpp>
. This will enable the function and class definitions in the header file. For example, here's a sample program:#include <iostream> #define RYML_SINGLE_HDR_DEFINE_NOW // do this before the include #include <ryml_all.hpp> int main() { auto tree = ryml::parse("{foo: bar}"); std::cout << tree["foo"].val() << "\n"; }
- Add
Tree::change_type()
andNodeRef::change_type()
(PR #171):// clears a node and sets its type to a different type (one of `VAL`, `SEQ`, `MAP`): Tree t = parse("{keyval0: val0, keyval1: val1, keyval2: val2}"); t[0].change_type(VAL); t[1].change_type(MAP); t[2].change_type(SEQ); Tree expected = parse("{keyval0: val0, keyval1: {}, keyval2: []}"); assert(emitrs<std::string>(t) == emitrs<std::string>(expected));
- Add support for compilation with emscripten (WebAssembly+javascript) (PR #176).
Fixes
- Take block literal indentation as relative to current indentation level, rather than as an absolute indentation level (PR #178):
foo: - | child0 - |2 child2 # indentation is 4, not 2
- Fix parsing when seq member maps start without a key (PR #178):
# previously this resulted in a parse error - - : empty key - - : another empty key
- Prefer passing
substr
andcsubstr
by value instead of const reference (PR #171) - Fix #173: add alias target
ryml::ryml
(PR #174) - Speedup compilation of tests by removing linking with yaml-cpp and libyaml. (PR #177)
- Fix c4core#53: cmake install targets were missing call to
export()
(PR #179). - Add missing export to
Tree
(PR #181).
Thanks
Release 0.2.3
This release is focused on bug fixes and compliance with the YAML test suite.
New features
- Add support for CPU architectures aarch64, ppc64le, s390x.
- Update c4core to 0.1.7
Tree
andNodeRef
: add document getterdoc()
anddocref()
Tree tree = parse(R"(--- doc0 --- doc1 )"); NodeRef stream = t.rootref(); assert(stream.is_stream()); // tree.doc(i): get the index of the i-th doc node. // Equivalent to tree.child(tree.root_id(), i) assert(tree.doc(0) == 1u); assert(tree.doc(1) == 2u); // tree.docref(i), same as above, return NodeRef assert(tree.docref(0).val() == "doc0"); assert(tree.docref(1).val() == "doc1"); // stream.doc(i), same as above, given NodeRef assert(stream.doc(0).val() == "doc0"); assert(stream.doc(1).val() == "doc1");
Fixes
- Fix compilation with
C4CORE_NO_FAST_FLOAT
(PR #163)
Flow maps
- Fix parse of multiline plain scalars inside flow maps (PR #161):
# test case UT92 # all parsed as "matches %": 20 - { matches % : 20 } - { matches %: 20 } - { matches %: 20 }
Tags
- Fix parsing of tags followed by comments in sequences (PR #161):
# test case 735Y - !!map # Block collection foo : bar
Quoted scalars
- Fix filtering of tab characters in quoted scalars (PR #161):
--- # test case 5GBF "Empty line <TAB> as a line feed" # now correctly parsed as "Empty line\nas a line feed" --- # test case PRH3 ' 1st non-empty <SPC>2nd non-empty<SPC> <TAB>3rd non-empty ' # now correctly parsed as " 1st non-empty\n2nd non-empty 3rd non-empty "
- Fix filtering of backslash characters in double-quoted scalars (PR #161):
# test cases NP9H, Q8AD "folded<SPC> to a space,<TAB> <SPC> to a line feed, or <TAB>\ \ <TAB>non-content" # now correctly parsed as "folded to a space,\nto a line feed, or \t \tnon-content"
- Ensure filtering of multiline quoted scalars (PR #161):
# all scalars now correctly parsed as "quoted string", # both for double and single quotes --- "quoted string" --- "quoted string" --- - "quoted string" --- - "quoted string" --- "quoted string": "quoted string" --- "quoted string": "quoted string"
Block scalars
- Ensure no newlines are added when emitting block scalars (PR #161)
- Fix parsing of block spec with both chomping and indentation: chomping may come before or after the indentation (PR #161):
# the block scalar specs below now have the same effect. # test cases: D83L, P2AD - |2- explicit indent and chomp - |-2 chomp and explicit indent
- Fix inference of block indentation with leading blank lines (PR #161):
# test cases: 4QFQ, 7T8X - > # child1 # parsed as "\n\n child1" --- # test case DWX9 | literal text # Comment # parsed as "\n\nliteral\n \n\ntext\n"
- Fix parsing of same-indentation block scalars (PR #161):
# test case W4TN # all docs have the same value: "%!PS-Adobe-2.0" --- | %!PS-Adobe-2.0 ... --- > %!PS-Adobe-2.0 ... --- | %!PS-Adobe-2.0 ... --- > %!PS-Adobe-2.0 ... --- | %!PS-Adobe-2.0 --- > %!PS-Adobe-2.0 --- | %!PS-Adobe-2.0 --- > %!PS-Adobe-2.0
- Folded block scalars: fix folding of newlines at the border of indented parts (PR #161):
# test case 6VJK # now correctly parsed as "Sammy Sosa completed another fine season with great stats.\n\n 63 Home Runs\n 0.288 Batting Average\n\nWhat a year!\n" > Sammy Sosa completed another fine season with great stats. 63 Home Runs 0.288 Batting Average What a year! --- # test case MJS9 # now correctly parsed as "foo \n\n \t bar\n\nbaz\n" > foo<SPC> <SPC> <SPC><TAB><SPC>bar baz
- Folded block scalars: fix folding of newlines when the indented part is at the begining of the scalar (PR #161):
# test case F6MC a: >2 more indented regular # parsed as a: " more indented\nregular\n" b: >2 more indented regular # parsed as b: "\n\n more indented\nregular\n"
Plain scalars
- Fix parsing of whitespace within plain scalars (PR #161):
--- # test case NB6Z key: value with tabs tabs foo bar baz # is now correctly parsed as "value with\ntabs tabs\nfoo\nbar baz" --- # test case 9YRD, EX5H (trailing whitespace) a b c d e # is now correctly parsed as "a b c d\ne"
- Fix parsing of unindented plain scalars at the root level scope (PR #161)
--- # this parsed Bare scalar is indented # was correctly parsed as "Bare scalar is indented" --- # but this failed to parse successfully: Bare scalar is not indented # is now correctly parsed as "Bare scalar is not indented" --- # test case NB6Z value with tabs tabs foo bar baz # now correctly parsed as "value with\ntabs tabs\nfoo\nbar baz" --- --- # test cases EXG3, 82AN ---word1 word2 # now correctly parsed as "---word1 word2"
- Fix parsing of comments within plain scalars
# test case 7TMG --- # now correctly parsed as "word1" word1 # comment --- # now correctly parsed as [word1, word2] [ word1 # comment , word2]
Python API
- Add missing node predicates in SWIG API definition (PR #166):
is_anchor_or_ref()
is_key_quoted()
is_val_quoted()
is_quoted()
Thanks
--- @mbs-c
--- @simu
--- @QuellaZhang
Release 0.2.2
Yank python package 0.2.1, was accidentally created while iterating the PyPI submission from the Github action. This release does not add any change, and is functionally the same as 0.2.1.