arch/arm/src/rp23xx: correct NVIC and PendSV priority initialisation#18865
Open
dfanache wants to merge 2 commits into
Open
arch/arm/src/rp23xx: correct NVIC and PendSV priority initialisation#18865dfanache wants to merge 2 commits into
dfanache wants to merge 2 commits into
Conversation
At system startup, the NVIC was configured for proper default priority only for the first three IPR registers, even though the loop was executed 13 times - due to a gotcha on what the `NVIC_IRQ_PRIORITY(i)` macro returns. IRQs 12-51 stayed at the reset priority of 0 - highest in the system; so any peripherals issuing those interrupts will shoot through critical sections that rely on BASEPRI = NVIC_SYSH_PRIORITY_DEFAULT (0x80). This can corrupt the TCB ready-to-run list and semaphore wait queues; this is hard to reproduce (e.g. in ostest) because it requires a peripheral ISR to land inside a critical section, but the failure modes range from hangs to wild pointer crashes once it does. In my case I had SPI and I2C peripherals that were issuing IRQs and were preempting scheduler and semaphore list operations, causing corruption. Step the loop by four and bound it with RP23XX_IRQ_NEXTINT so every IPR covering IRQs 0..51 is written once with DEFPRIORITY32. Signed-off-by: Daniel Fanache <dan@rts.ro>
PendSV is the deferred half of the Cortex-M context-switch path: a higher-priority ISR that wakes a task sets PendSV pending and returns, and PendSV then performs the register save/restore. This pattern is only safe when PendSV runs at the lowest priority - otherwise a peripheral ISR can preempt the context switch mid-update and corrupt thread state. The previous write of DEFPRIORITY32 to NVIC_SYSH12_15_PRIORITY in up_irqinitialize() leaves PendSV at NVIC_SYSH_PRIORITY_DEFAULT (0x80), the same priority as peripheral IRQs. Read-modify-write the SHPR3 to lower only the PendSV byte to NVIC_SYSH_PRIORITY_MIN. This is the canonical ARM Cortex-M priority arrangement and it matches e.g. this reference: https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html (although worked with a three-bit priority system) This patch hardens up_irqinitialize against future peripheral-IRQ prioritisation. Signed-off-by: Daniel Fanache <dan@rts.ro>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note: Please adhere to Contributing Guidelines.
Summary
Two related fixes to rp23xx exception priority initialisation:
fix NVIC priority init missing IRQs 12-51 -
NVIC_IRQ_PRIORITY(n)doesn >> 2internally, so the existing loopfor (i = 0; i < 12; i++)only writes IPR0..IPR2. RP2350 has 52 external IRQs; IRQs 12-51 stay at reset priority 0 and fire through BASEPRI critical sections.place PendSV at lowest NVIC priority - defence-in-depth follow-up. PendSV otherwise shares NVIC_SYSH_PRIORITY_DEFAULT with peripheral IRQs. Aligns rp23xx with the canonical ARM Cortex-M priority layout and hardens against future peripheral-IRQ prioritisation.
See commit messages for full analysis and observed failure modes.
Impact
Should reduce TCB ready-to-run list and semaphore wait queue corruption that appear spontaneously whenever external SPI or I2C hardware is being used.
Testing
Vanilla apache/nuttx master at the time of this PR;
raspberrypi-pico-2:nsh, built innix developshell witharm-none-eabi-gcc 15.2.1, flashed to a Raspberry Pi Pico 2 W via openocd / CMSIS-DAP.Without the patches: ostest runs to completion with exit status 0. This is consistent with the bug being latent under software-only load, since ostest does not drive peripheral IRQs and patch 1's failure path needs an IRQ in the 12..51 range firing through a critical section.
With the patches: ostest behaviour unchanged (still exits 0).
topruns concurrently in the background without faults.The corruption class that patch 1 closes was originally encountered in a downstream port under SPI+I2C+context-switch load (PX4 with NuttX 12.10). I have thought for a long time that patch 2 was the fix to my woes, but I kept having crashes and poking with gdb showed i2c semaphores with orphan waiters; and then I stumbled onto the proper fix, patch 1. I am keeping patch 2 in the PR as it reflects the canonical ARM priority layout, but I'm relegating its status as just hardening.