Description
Serializing a string to an xml_woarchive will produce an invalid archive if there are quotable character near a "multiple of 32" boundary.
Example:
xml_woarchive << nvp("err", string("01234567890123456789012345678"-error-"));
produces something like this:
... xml serialization header ...
<err>01234567890123456789012345678&qu"-error-</err>
Note the &qu" string in the output.
Here is a minimal sample:
#include <iostream>
#include <sstream>
#include "boost/archive/xml_woarchive.hpp"
int main()
{
std::string instring("01234567890123456789012345678\"-here-is-the-error");
std::wostringstream outstream;
{
boost::archive::xml_woarchive ar(outstream);
ar& boost::serialization::make_nvp("err", instring);
}
auto result = outstream.str();
/*
result will contain this:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="18">
<err>01234567890123456789012345678&qu"-here-is-the-error</err>
</boost_serialization>
The bug is <err>...678&qu"...
A partial " is written to the output, then a full " is written again.
*/
}
This is caused by wrapping the xml_escape<char const*> iterator within the wchar_from_mb<...> iterator.
Here:
boost::archive::save_iterator<char const *>(std::basic_ostream<wchar_t,std::char_traits<wchar_t>> & os, const char * begin, const char * end) Line 58
at x:...\vcpkg\buildtrees\boost-serialization\src\ost-1.75.0-e02749c5e1.clean\include\boost\archive\impl\xml_woarchive_impl.ipp(58)
std::copy(
xmbtows(begin),
xmbtows(end),
boost..ostream_iterator(os)
);
This copy
statement copies the input from the string to the archive. The parameters to this statement are "by-value".
Here is my first analysis as to the cause:
The cause is the fact that the wchar_from_mb
iterator has an internal buffer of 32 characters, which it fills from the xml_escape iterator. This works fine unless the xml_escape iterator returns a sequence where a quoted character needs more than the space of 32 characters for the quote to be complete. If that is the case, the wchar_from_mb iterator fills its internal buffer, but the last quote is truncated, as seen in the example: 091234...8&qu
. The buffer is full, so the escape sequence is incomplete.
But copy
uses the iterators 'by-value', so the iterator with the incomplete buffer is discarded. The next 32 characters will then start again with a full "
.