fix: drain PHP threads during opcache restart to prevent heap corruption#2349
Closed
superdav42 wants to merge 2 commits intophp:mainfrom
Closed
fix: drain PHP threads during opcache restart to prevent heap corruption#2349superdav42 wants to merge 2 commits intophp:mainfrom
superdav42 wants to merge 2 commits intophp:mainfrom
Conversation
PHP's opcache uses fcntl() file locks for activity tracking, which are per-process, not per-thread. In FrankenPHP's threaded model, when any single thread releases its lock (request shutdown), it releases for ALL threads — making accel_is_inactive() return true while other threads are still mid-request reading from shared memory. This allows opcache to reset interned strings and hash tables via memset while other threads dereference pointers into that memory, causing zend_mm_heap corrupted crashes every ~30 minutes under normal WordPress traffic. The fix uses a pthread read-write lock around the request lifecycle: - Normal requests: read lock (concurrent, zero contention) - When opcache schedules a restart (OOM, hash overflow, opcache_reset): the zend_accel_schedule_restart_hook sets a flag - The next php_request_startup() sees the flag and takes a write lock, which blocks until all current requests finish their shutdown. Inside that exclusive startup, opcache's accel_is_inactive() correctly sees no active threads and performs the reset safely. - After startup completes, the lock is released and all threads resume. Crash backtrace that prompted this fix: #0 accel_interned_strings_restore_state() — memset zeroing shared memory (concurrent with) php#1 zend_hash_del → zend_string_release → _efree → zend_mm_panic Fixes: php#1737 Related: php/php-src#14471 Related: php#2265
- zend_accel_schedule_restart_hook was added in PHP 8.4; wrap hook function and registration in #if PHP_VERSION_ID >= 80400 to fix build on PHP 8.2 and 8.3. - Break long __atomic_* lines to satisfy clang-format column limit. - The rwlock around the request lifecycle is unconditional (all PHP versions) — it's harmless on 8.2/8.3 (read lock is a no-contention CAS) and the hook simply doesn't fire without the restart hook.
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Prevents
zend_mm_heap corruptedcrashes caused by concurrent opcache restarts under ZTS by using apthread_rwlockto serialize the actual reset against active requests.Problem
PHP's opcache tracks active processes using
fcntl()file locks (F_RDLCK/F_WRLCK). These locks are per-process, not per-thread. In FrankenPHP's threaded model:accel_deactivate_sub()→fcntl(F_UNLCK)— this releases the lock for all threads in the processphp_request_startup()→accel_activate()seesrestart_pending→accel_is_inactive()→fcntl(F_GETLK)returnsF_UNLCK(no locks held) → proceeds with resetmemset→ crashCrash backtrace (captured via debug build coredump)
Concurrent thread at crash time:
This crash is reproducible with WordPress (plugin/core updates call
opcache_invalidate()which fills opcache memory → triggers implicitopcache_reset()). In production, crashes occur every ~20-60 minutes.Fix
Uses a
pthread_rwlockaround the request lifecycle:php_request_startup(), release afterphp_request_shutdown(). Read locks are concurrent — zero performance impact.zend_accel_schedule_restart_hooksets an atomic flag. The next thread enteringphp_request_startup()sees the flag and acquires a write lock instead — blocking until all current requests complete their shutdown. Inside that exclusive startup,accel_is_inactive()correctly reports no active threads, and the reset proceeds safely against quiescent shared memory.Why
zend_accel_schedule_restart_hook?PHP exposes this hook in
Zend/zend.hspecifically for embedders to react to opcache restart scheduling. It fires fromzend_accel_schedule_restart()beforerestart_pendingis set, giving us a clean coordination point.Performance impact
pthread_rwlock_rdlock+pthread_rwlock_unlockper request. On Linux/glibc, read-lock acquisition is a single atomic CAS (~3ns) with no syscall. Unmeasurable.Testing
zend_mm_heap corruptedRelated