Fix fuzzer to truncate inputs to 32-bit integers by kvpanch · Pull Request #374 · paritytech/polkavm

kvpanch · 2026-03-30T13:44:16Z

- The fuzzer identified divergency in interpreter and recompiler when
    following code was executed

    ```
    // All registers are zeroed

    1 fallthrough
    2 shift_logical_left_imm_alt_64(A0, A0, 0x8F8F030F)
    3 sub_32(SP, T0, A2)
    4 branch_less_signed(A0, A0, target=0)
    5 memset()
    6 branch_less_signed(S1, A0, target=0)
    7 store_imm_indirect_u8(RA, 0, 0)
    ```

    when `memset` is exeucted, A0 is `0xFFFFFFFF8F8F030F`, which interpreter
    truncated to `0x000000008F8F030F`, while recompiler preserved A.

    Later that resulted that 6 branch instruction is taken by interpreter
    and it runs out of gas, while recompiler tries to executed store which
    resulted to trap.

    To fix it, bookeep `dst` register before 32-bit truncation, restore the
    register to original ptr + number of executed iterations. This prevets `A0`
    clobbering that may be used later

- the other problem was due inconsistent behavior on `count`:
  recompiler truncates that register to 32-bit value, while interpreter
  keeps it as 64-bit value.

  To fix it, keep `count` register as 64-bit integer in recompiler

koute

Few notes:

memset is an experimental instruction and isn't currently enabled "in production".
Truncating a0 isn't necessarily a problem. All loads and stores in PVM are defined to ignore the upper 32-bits and the address space is always 32-bit (even on a 64-bit target). So memset should also ignore the upper bits. The remaining question is however: do we preserve the full value of the destination pointer (with the upper bits intact), or do we truncate it? Either one is fine (whichever one is faster on the recompiler backend in preferable).
We have to be careful here so that the program can't memset any of the recompiler's internal structures (which live in the upper 32-bits of the VM sandbox's address space), which is probably why I was truncating the pointer. The truncation of count in the recompiler might have been a typo, but I don't remember at this point. We should probably add a test which deliberately tries to corrupt the VM's internal structures with memset.

koute · 2026-04-01T00:50:53Z

+            asm::load_imm(Reg::A0, 0xffff0000),
+            asm::load_imm(Reg::A1, 0),
+            asm::mul_64(Reg::A0, Reg::A0, Reg::A0),
+            asm::load_imm(Reg::A2, 0),
+            asm::memset(),
+            asm::ret(),


This is unnecessarily roundabout.

No need to set regs to zero, as they're zero by default.

No need to mul_64 because load_imm64 exists, however it's even better to just set_reg it directly on the instance.

koute · 2026-04-01T00:53:02Z

+    instance.set_next_program_counter(ProgramCounter(0));
+    assert!(matches!(instance.run().unwrap(), InterruptKind::Finished));
+    let a0 = instance.reg(Reg::A0);
+    assert_eq!(a0 >> 32 != 0 || a0 == 0, true, "A0 was unexpectedly truncated: {:#018x}", a0);


This should check the exact value.

koute · 2026-04-01T00:55:28Z

+        if reg_size == RegSize::R32 {
+            self.asm.push(mov(RegSize::R32, count, count));
+        }



You've added this but there's no test.

Sure, I can add a test if needed, but the change simply avoids zext in 64-bit mode for which test is added.

kvpanch · 2026-04-01T22:48:52Z

Few notes:

* `memset` is an experimental instruction and isn't currently enabled "in production".

* Truncating `a0` isn't necessarily a problem. All loads and stores in PVM are defined to ignore the upper 32-bits and the address space is always 32-bit (even on a 64-bit target). So `memset` should also ignore the upper bits. The remaining question is however: do we preserve the full value of the destination pointer (with the upper bits intact), or do we truncate it? Either one is fine (whichever one is faster on the recompiler backend in preferable).

* We have to be careful here so that the program can't memset any of the recompiler's internal structures (which live in the upper 32-bits of the VM sandbox's address space), which is _probably_ why I was truncating the pointer. The truncation of `count` in the recompiler _might_ have been a typo, but I don't remember at this point. We should probably add a test which deliberately tries to corrupt the VM's internal structures with `memset`.

Thanks for explanation. count change does make sense to me if you're trying to limit sandbox to first half of address space. I guess then I'll need to find a way to limit fuzzer and revisit memset to make sure it won't try to overwrite second half when ptr + count > 2^32

koute · 2026-04-03T08:23:00Z

I guess then I'll need to find a way to limit fuzzer

That's.... not a proper fix? (:

The whole point of the fuzzer is to find divergences between the interpreter and the recompiler, that is: the same program should behave exactly the same regardless of which backend is used.

So if a user (in their program) can pass a 64-bit value to memset then the fuzzer should be able to exercise it!

Or in other words: for any possible program blob (except the only exception with validation when a Module is created, which I've mentioned) the backends should give exactly the same result, and the fuzzer should be able to generate any possible input.

kvpanch · 2026-04-03T12:54:22Z

I guess then I'll need to find a way to limit fuzzer

That's.... not a proper fix? (:

The whole point of the fuzzer is to find divergences between the interpreter and the recompiler, that is: the same program should behave exactly the same regardless of which backend is used.

So if a user (in their program) can pass a 64-bit value to memset then the fuzzer should be able to exercise it!

Or in other words: for any possible program blob (except the only exception with validation when a Module is created, which I've mentioned) the backends should give exactly the same result, and the fuzzer should be able to generate any possible input.

hm... thanks. I think I have fully misunderstood recompiler and how it's used in fuzzer.
Basically it's not only about interpreter vs recompiler, but more precisely it's interpreter vs recompiler + sandbox.
I'll revisit my 2 PRs that touched exception handling.
A question: I should be able to assume that interpreter has correct way to handle exceptions (at least start with this assumption) ?

koute · 2026-04-03T13:50:19Z

How it's sandboxed is an implementation detail that's not exposed to the program.

A question: I should be able to assume that interpreter has correct way to handle exceptions (at least start with this assumption) ?

No; there can be bugs in either one of them. If they diverge one has to look at the behavior of both and see which one makes sense.

kvpanch · 2026-04-17T13:23:48Z

@koute gentle ping

kvpanch · 2026-04-17T16:09:46Z

@copilot resolve the merge conflicts in this pull request

This reverts commit 5589353.

- The fuzzer identified divergency in interpreter and recompiler when following code was executed ``` // All registers are zeroed 1 fallthrough 2 shift_logical_left_imm_alt_64(A0, A0, 0x8F8F030F) 3 sub_32(SP, T0, A2) 4 branch_less_signed(A0, A0, target=0) 5 memset() 6 branch_less_signed(S1, A0, target=0) 7 store_imm_indirect_u8(RA, 0, 0) ``` when `memset` is exeucted, A0 is `0xFFFFFFFF8F8F030F`, which interpreter truncated to `0x000000008F8F030F`, while recompiler preserved A. Later that resulted that 6 branch instruction is taken by interpreter and it runs out of gas, while recompiler tries to executed store which resulted to trap. To fix it, bookeep `dst` register before 32-bit truncation, restore the register to original ptr + number of executed iterations. This prevets `A0` clobbering that may be used later - the other problem was due inconsistent behavior on `count`: recompiler truncates that register to 32-bit value, while interpreter keeps it as 64-bit value. To fix it, keep `count` register as 64-bit integer in recompiler

koute · 2026-04-23T09:04:00Z

+            // A0 = 0xffffffffffff0000 * 0xffffffffffff0000 = 0x0000000100000000
+            asm::load_imm(Reg::A0, 0xffff0000),
+            asm::mul_64(Reg::A0, Reg::A0, Reg::A0),
+            // A2 = sign_extend(0xff08bdbd) = 0xffffffffff08bdbd
+            asm::load_imm(Reg::A2, 0xff08bdbd),


Again, this part is unnecessarily complicated because you can just use set_reg to set the desired values directly instead of calculating them.

The whole set_code ideally should just look like this:

builder.set_code(&[asm::memset(), asm::ret()]);

There's no need to involve mul_64, load_imm, etc. and overcomplicate the test because this test is supposed to test memset and memset only.

kvpanch requested a review from koute March 30, 2026 13:44

koute reviewed Apr 1, 2026

View reviewed changes

kvpanch force-pushed the kvpanch/fix-memset-codegen branch from ffb5e05 to 5589353 Compare April 2, 2026 19:11

kvpanch changed the title ~~Fix memset codegen bugs found by fuzzer~~ Fix fuzzer to truncate inputs to 32-bit integers Apr 2, 2026

kvpanch added 3 commits April 17, 2026 12:12

Fix fuzzer to truncate inputs to 32-bit integers

3b39af5

Revert "Fix fuzzer to truncate inputs to 32-bit integers"

d7f7765

This reverts commit 5589353.

kvpanch force-pushed the kvpanch/fix-memset-codegen branch from 6be8eb4 to 397b0f0 Compare April 17, 2026 16:59

koute reviewed Apr 23, 2026

View reviewed changes

kvpanch added 2 commits May 4, 2026 10:36

addressed comments

d710683

Merge branch 'master' into kvpanch/fix-memset-codegen

b0c79c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fuzzer to truncate inputs to 32-bit integers#374

Fix fuzzer to truncate inputs to 32-bit integers#374
kvpanch wants to merge 5 commits intomasterfrom
kvpanch/fix-memset-codegen

kvpanch commented Mar 30, 2026 •

edited

Loading

Uh oh!

koute left a comment

Uh oh!

koute Apr 1, 2026

Uh oh!

koute Apr 1, 2026

Uh oh!

koute Apr 1, 2026

Uh oh!

kvpanch Apr 1, 2026

Uh oh!

kvpanch commented Apr 1, 2026

Uh oh!

koute commented Apr 3, 2026

Uh oh!

kvpanch commented Apr 3, 2026

Uh oh!

koute commented Apr 3, 2026

Uh oh!

kvpanch commented Apr 17, 2026

Uh oh!

kvpanch commented Apr 17, 2026

Uh oh!

koute Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kvpanch commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koute left a comment

Choose a reason for hiding this comment

Uh oh!

koute Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

koute Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

koute Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kvpanch Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kvpanch commented Apr 1, 2026

Uh oh!

koute commented Apr 3, 2026

Uh oh!

kvpanch commented Apr 3, 2026

Uh oh!

koute commented Apr 3, 2026

Uh oh!

kvpanch commented Apr 17, 2026

Uh oh!

kvpanch commented Apr 17, 2026

Uh oh!

koute Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kvpanch commented Mar 30, 2026 •

edited

Loading