Skip to content

xml_woarchive << std::string("...\"...") generates malformed output #229

Open
@hajokirchhoff

Description

@hajokirchhoff

Serializing a string to an xml_woarchive will produce an invalid archive if there are quotable character near a "multiple of 32" boundary.

Example:
xml_woarchive << nvp("err", string("01234567890123456789012345678"-error-"));
produces something like this:

... xml serialization header ...
<err>01234567890123456789012345678&qu&quot;-error-</err>

Note the &qu" string in the output.

Here is a minimal sample:


#include <iostream>
#include <sstream>
#include "boost/archive/xml_woarchive.hpp"

int main()
{
    std::string instring("01234567890123456789012345678\"-here-is-the-error");

    std::wostringstream outstream;
    {
        boost::archive::xml_woarchive ar(outstream);
        ar& boost::serialization::make_nvp("err", instring);
    }
    auto result = outstream.str();

	/*
	result will contain this:

	<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="18">
<err>01234567890123456789012345678&qu&quot;-here-is-the-error</err>
</boost_serialization>


    The bug is    <err>...678&qu&quot;...
	A partial &quot; is written to the output, then a full &quot; is written again.
	*/
}

This is caused by wrapping the xml_escape<char const*> iterator within the wchar_from_mb<...> iterator.

Here:
boost::archive::save_iterator<char const *>(std::basic_ostream<wchar_t,std::char_traits<wchar_t>> & os, const char * begin, const char * end) Line 58
at x:...\vcpkg\buildtrees\boost-serialization\src\ost-1.75.0-e02749c5e1.clean\include\boost\archive\impl\xml_woarchive_impl.ipp(58)

   std::copy(
      xmbtows(begin),
      xmbtows(end),
      boost..ostream_iterator(os)
   );

This copy statement copies the input from the string to the archive. The parameters to this statement are "by-value".

Here is my first analysis as to the cause:

The cause is the fact that the wchar_from_mb iterator has an internal buffer of 32 characters, which it fills from the xml_escape iterator. This works fine unless the xml_escape iterator returns a sequence where a quoted character needs more than the space of 32 characters for the quote to be complete. If that is the case, the wchar_from_mb iterator fills its internal buffer, but the last quote is truncated, as seen in the example: 091234...8&qu. The buffer is full, so the escape sequence is incomplete.

But copy uses the iterators 'by-value', so the iterator with the incomplete buffer is discarded. The next 32 characters will then start again with a full &quot;.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions