Skip to content

Fix fuzzer to truncate inputs to 32-bit integers#374

Open
kvpanch wants to merge 5 commits intomasterfrom
kvpanch/fix-memset-codegen
Open

Fix fuzzer to truncate inputs to 32-bit integers#374
kvpanch wants to merge 5 commits intomasterfrom
kvpanch/fix-memset-codegen

Conversation

@kvpanch
Copy link
Copy Markdown
Contributor

@kvpanch kvpanch commented Mar 30, 2026

- The fuzzer identified divergency in interpreter and recompiler when
    following code was executed

    ```
    // All registers are zeroed

    1 fallthrough
    2 shift_logical_left_imm_alt_64(A0, A0, 0x8F8F030F)
    3 sub_32(SP, T0, A2)
    4 branch_less_signed(A0, A0, target=0)
    5 memset()
    6 branch_less_signed(S1, A0, target=0)
    7 store_imm_indirect_u8(RA, 0, 0)
    ```

    when `memset` is exeucted, A0 is `0xFFFFFFFF8F8F030F`, which interpreter
    truncated to `0x000000008F8F030F`, while recompiler preserved A.

    Later that resulted that 6 branch instruction is taken by interpreter
    and it runs out of gas, while recompiler tries to executed store which
    resulted to trap.

    To fix it, bookeep `dst` register before 32-bit truncation, restore the
    register to original ptr + number of executed iterations. This prevets `A0`
    clobbering that may be used later

- the other problem was due inconsistent behavior on `count`:
  recompiler truncates that register to 32-bit value, while interpreter
  keeps it as 64-bit value.

  To fix it, keep `count` register as 64-bit integer in recompiler

@kvpanch kvpanch requested a review from koute March 30, 2026 13:44
Copy link
Copy Markdown
Collaborator

@koute koute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few notes:

  • memset is an experimental instruction and isn't currently enabled "in production".
  • Truncating a0 isn't necessarily a problem. All loads and stores in PVM are defined to ignore the upper 32-bits and the address space is always 32-bit (even on a 64-bit target). So memset should also ignore the upper bits. The remaining question is however: do we preserve the full value of the destination pointer (with the upper bits intact), or do we truncate it? Either one is fine (whichever one is faster on the recompiler backend in preferable).
  • We have to be careful here so that the program can't memset any of the recompiler's internal structures (which live in the upper 32-bits of the VM sandbox's address space), which is probably why I was truncating the pointer. The truncation of count in the recompiler might have been a typo, but I don't remember at this point. We should probably add a test which deliberately tries to corrupt the VM's internal structures with memset.

Comment thread crates/polkavm/src/tests.rs Outdated
Comment on lines +4496 to +4501
asm::load_imm(Reg::A0, 0xffff0000),
asm::load_imm(Reg::A1, 0),
asm::mul_64(Reg::A0, Reg::A0, Reg::A0),
asm::load_imm(Reg::A2, 0),
asm::memset(),
asm::ret(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessarily roundabout.

  1. No need to set regs to zero, as they're zero by default.
  2. No need to mul_64 because load_imm64 exists, however it's even better to just set_reg it directly on the instance.

Comment thread crates/polkavm/src/tests.rs Outdated
instance.set_next_program_counter(ProgramCounter(0));
assert!(matches!(instance.run().unwrap(), InterruptKind::Finished));
let a0 = instance.reg(Reg::A0);
assert_eq!(a0 >> 32 != 0 || a0 == 0, true, "A0 was unexpectedly truncated: {:#018x}", a0);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check the exact value.

Comment thread crates/polkavm/src/compiler/amd64.rs Outdated
Comment on lines 1322 to 1325
if reg_size == RegSize::R32 {
self.asm.push(mov(RegSize::R32, count, count));
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've added this but there's no test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add a test if needed, but the change simply avoids zext in 64-bit mode for which test is added.

@kvpanch
Copy link
Copy Markdown
Contributor Author

kvpanch commented Apr 1, 2026

Few notes:

* `memset` is an experimental instruction and isn't currently enabled "in production".

* Truncating `a0` isn't necessarily a problem. All loads and stores in PVM are defined to ignore the upper 32-bits and the address space is always 32-bit (even on a 64-bit target). So `memset` should also ignore the upper bits. The remaining question is however: do we preserve the full value of the destination pointer (with the upper bits intact), or do we truncate it? Either one is fine (whichever one is faster on the recompiler backend in preferable).

* We have to be careful here so that the program can't memset any of the recompiler's internal structures (which live in the upper 32-bits of the VM sandbox's address space), which is _probably_ why I was truncating the pointer. The truncation of `count` in the recompiler _might_ have been a typo, but I don't remember at this point. We should probably add a test which deliberately tries to corrupt the VM's internal structures with `memset`.

Thanks for explanation. count change does make sense to me if you're trying to limit sandbox to first half of address space. I guess then I'll need to find a way to limit fuzzer and revisit memset to make sure it won't try to overwrite second half when ptr + count > 2^32

@kvpanch kvpanch force-pushed the kvpanch/fix-memset-codegen branch from ffb5e05 to 5589353 Compare April 2, 2026 19:11
@kvpanch kvpanch changed the title Fix memset codegen bugs found by fuzzer Fix fuzzer to truncate inputs to 32-bit integers Apr 2, 2026
@koute
Copy link
Copy Markdown
Collaborator

koute commented Apr 3, 2026

I guess then I'll need to find a way to limit fuzzer

That's.... not a proper fix? (:

The whole point of the fuzzer is to find divergences between the interpreter and the recompiler, that is: the same program should behave exactly the same regardless of which backend is used.

So if a user (in their program) can pass a 64-bit value to memset then the fuzzer should be able to exercise it!

Or in other words: for any possible program blob (except the only exception with validation when a Module is created, which I've mentioned) the backends should give exactly the same result, and the fuzzer should be able to generate any possible input.

@kvpanch
Copy link
Copy Markdown
Contributor Author

kvpanch commented Apr 3, 2026

I guess then I'll need to find a way to limit fuzzer

That's.... not a proper fix? (:

The whole point of the fuzzer is to find divergences between the interpreter and the recompiler, that is: the same program should behave exactly the same regardless of which backend is used.

So if a user (in their program) can pass a 64-bit value to memset then the fuzzer should be able to exercise it!

Or in other words: for any possible program blob (except the only exception with validation when a Module is created, which I've mentioned) the backends should give exactly the same result, and the fuzzer should be able to generate any possible input.

hm... thanks. I think I have fully misunderstood recompiler and how it's used in fuzzer.
Basically it's not only about interpreter vs recompiler, but more precisely it's interpreter vs recompiler + sandbox.
I'll revisit my 2 PRs that touched exception handling.
A question: I should be able to assume that interpreter has correct way to handle exceptions (at least start with this assumption) ?

@koute
Copy link
Copy Markdown
Collaborator

koute commented Apr 3, 2026

How it's sandboxed is an implementation detail that's not exposed to the program.

A question: I should be able to assume that interpreter has correct way to handle exceptions (at least start with this assumption) ?

No; there can be bugs in either one of them. If they diverge one has to look at the behavior of both and see which one makes sense.

@kvpanch
Copy link
Copy Markdown
Contributor Author

kvpanch commented Apr 17, 2026

@koute gentle ping

@kvpanch
Copy link
Copy Markdown
Contributor Author

kvpanch commented Apr 17, 2026

@copilot resolve the merge conflicts in this pull request

kvpanch added 3 commits April 17, 2026 12:12
- The fuzzer identified divergency in interpreter and recompiler when
    following code was executed

    ```
    // All registers are zeroed

    1 fallthrough
    2 shift_logical_left_imm_alt_64(A0, A0, 0x8F8F030F)
    3 sub_32(SP, T0, A2)
    4 branch_less_signed(A0, A0, target=0)
    5 memset()
    6 branch_less_signed(S1, A0, target=0)
    7 store_imm_indirect_u8(RA, 0, 0)
    ```

    when `memset` is exeucted, A0 is `0xFFFFFFFF8F8F030F`, which interpreter
    truncated to `0x000000008F8F030F`, while recompiler preserved A.

    Later that resulted that 6 branch instruction is taken by interpreter
    and it runs out of gas, while recompiler tries to executed store which
    resulted to trap.

    To fix it, bookeep `dst` register before 32-bit truncation, restore the
    register to original ptr + number of executed iterations. This prevets `A0`
    clobbering that may be used later

- the other problem was due inconsistent behavior on `count`:
  recompiler truncates that register to 32-bit value, while interpreter
  keeps it as 64-bit value.

  To fix it, keep `count` register as 64-bit integer in recompiler
@kvpanch kvpanch force-pushed the kvpanch/fix-memset-codegen branch from 6be8eb4 to 397b0f0 Compare April 17, 2026 16:59
Comment thread crates/polkavm/src/tests.rs Outdated
Comment on lines +4583 to +4587
// A0 = 0xffffffffffff0000 * 0xffffffffffff0000 = 0x0000000100000000
asm::load_imm(Reg::A0, 0xffff0000),
asm::mul_64(Reg::A0, Reg::A0, Reg::A0),
// A2 = sign_extend(0xff08bdbd) = 0xffffffffff08bdbd
asm::load_imm(Reg::A2, 0xff08bdbd),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this part is unnecessarily complicated because you can just use set_reg to set the desired values directly instead of calculating them.

The whole set_code ideally should just look like this:

builder.set_code(&[asm::memset(), asm::ret()]);

There's no need to involve mul_64, load_imm, etc. and overcomplicate the test because this test is supposed to test memset and memset only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants