Skip to content

reject non-digit wide code units in uint8/uint16 integer fast path#391

Merged
lemire merged 1 commit into
fastfloat:mainfrom
sahvx655-wq:int-fast-path-wide-units
Jun 12, 2026
Merged

reject non-digit wide code units in uint8/uint16 integer fast path#391
lemire merged 1 commit into
fastfloat:mainfrom
sahvx655-wq:int-fast-path-wide-units

Conversation

@sahvx655-wq

Copy link
Copy Markdown
Contributor

While checking the fixed-width integer paths in parse_int_string, I noticed the uint8_t and uint16_t base-10 fast paths read code units byte-wise (a four-byte memcpy for uint8_t, read4_to_u32 for uint16_t) and only ever look at the low byte. For wchar_t/char16_t/char32_t that means a non-digit code point whose low byte falls in 0x30..0x39, say U+2131 through U+2139, is taken for the matching ASCII digit, so from_chars(u"ℱℲℳℴ", v, 10) returns 1234 with std::errc() where it ought to fail. The generic ch_to_digit loop already masks anything above 0xFF and rejects these (the existing emoji tests confirm that for int), so the fast paths quietly disagree with the rest of the library and with std::from_chars semantics.

The byte-oriented SWAR is only sound when a code unit is a single byte, so both fast paths are now gated on sizeof(UC) == 1 and wider units fall through to the generic loop that already handles them. Keeping the guard at the fast-path entry leaves the byte assumption and the digit validation in one place rather than re-checking after a truncating read. Left alone this is a silent input-validation hole for anyone feeding untrusted UTF-16/UTF-32 into 8- or 16-bit integers. I added the wide-unit cases to tests/fast_int.cpp; they parse as valid before the change and are rejected after it.

@lemire lemire merged commit 0dce102 into fastfloat:main Jun 12, 2026
34 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants