arm64: valgrind false positives in checkMaxBetweenExclusiveImpl with clang 21+ (csel definedness pessimization) #39

Open
opened 2026-06-12 20:18:26 +00:00 by andrew · 0 comments
Owner

The 28 `conflict_set_blackbox_valgrind_*` failures on arm64 in CI are valgrind/memcheck false positives triggered by clang 21+'s aarch64 codegen, not a bug in this codebase.

Evidence

Same source, same corpus batch, valgrind 3.26:

Build Valgrind result
clang 21.1.8 / trunk 22.1.7, aarch64, -O3 uninit-value reports in `checkMaxBetweenExclusiveImpl(Node3*)`
clang 20.1.8 -O3, clang 21 -O1, gcc -O3, any x86-64 build clean
`--expensive-definedness-checks=yes` still reports

No functional misbehavior in any configuration (driver self-checks pass).

Why the code and codegen are both correct

The Node3 scan deliberately reads all `kMaxNodes` index slots (slots >= `numChildren` are intentionally undefined, see VALGRIND_MAKE_MEM_UNDEFINED annotations) and masks the invalid lanes with `mask &= (1 << numChildren) - 1` before branching. clang 21 compiles the per-lane `inBounds` results through NZCV flags + `csel`/`cinc`, then masks with `bics` and branches — semantically identical to the source:

cmp  w8, w12             ; inBounds(index[2]) -> flags (undefined if slot uninit)
csel w8, w8, wzr, cc     ; lane bit: 4 or 0
orr  w8, w8, w10
bics w8, w8, w9          ; mask &= (1 << numChildren) - 1, sets Z
b.eq ...                 ; if (!mask) — flagged by memcheck

Memcheck models a `csel` with undefined condition as a fully-undefined result, even though both arms (e.g. 0 and 4) agree on most bits. That smear pollutes the lane-0 bit, the `bics` can only clear the high bits, and the surviving (actually well-defined) bits stay tainted — so the branch and the `ctz`-derived child pointer get reported. A bit-precise select model (`vbits = vx | vy | (x ^ y)`) would produce no report. clang 20/gcc/x86-64 keep lane validity in the data domain (branches / movemask), which memcheck tracks exactly — hence the specificity.

Action item

File a valgrind enhancement request (https://bugs.kde.org, product valgrind, component memcheck) with the disassembly above as testcase: precise definedness for ITE/csel with undefined condition.

Workaround

Valgrind suppression for these reports, referencing this issue.

The 28 \`conflict_set_blackbox_valgrind_*\` failures on arm64 in CI are valgrind/memcheck false positives triggered by clang 21+'s aarch64 codegen, not a bug in this codebase. ## Evidence Same source, same corpus batch, valgrind 3.26: | Build | Valgrind result | |---|---| | clang 21.1.8 / trunk 22.1.7, aarch64, -O3 | uninit-value reports in \`checkMaxBetweenExclusiveImpl<false>(Node3*)\` | | clang 20.1.8 -O3, clang 21 -O1, gcc -O3, any x86-64 build | clean | | \`--expensive-definedness-checks=yes\` | still reports | No functional misbehavior in any configuration (driver self-checks pass). ## Why the code and codegen are both correct The Node3 scan deliberately reads all \`kMaxNodes\` index slots (slots >= \`numChildren\` are intentionally undefined, see VALGRIND_MAKE_MEM_UNDEFINED annotations) and masks the invalid lanes with \`mask &= (1 << numChildren) - 1\` before branching. clang 21 compiles the per-lane \`inBounds\` results through NZCV flags + \`csel\`/\`cinc\`, then masks with \`bics\` and branches — semantically identical to the source: cmp w8, w12 ; inBounds(index[2]) -> flags (undefined if slot uninit) csel w8, w8, wzr, cc ; lane bit: 4 or 0 orr w8, w8, w10 bics w8, w8, w9 ; mask &= (1 << numChildren) - 1, sets Z b.eq ... ; if (!mask) — flagged by memcheck Memcheck models a \`csel\` with undefined condition as a fully-undefined result, even though both arms (e.g. 0 and 4) agree on most bits. That smear pollutes the lane-0 bit, the \`bics\` can only clear the high bits, and the surviving (actually well-defined) bits stay tainted — so the branch and the \`ctz\`-derived child pointer get reported. A bit-precise select model (\`vbits = vx | vy | (x ^ y)\`) would produce no report. clang 20/gcc/x86-64 keep lane validity in the data domain (branches / movemask), which memcheck tracks exactly — hence the specificity. ## Action item File a valgrind enhancement request (https://bugs.kde.org, product valgrind, component memcheck) with the disassembly above as testcase: precise definedness for ITE/csel with undefined condition. ## Workaround Valgrind suppression for these reports, referencing this issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: weaselab/conflict-set#39