Skip to content

fix: drain PHP threads during opcache restart to prevent heap corruption#2349

Closed
superdav42 wants to merge 2 commits intophp:mainfrom
superdav42:fix/opcache-restart-drain
Closed

fix: drain PHP threads during opcache restart to prevent heap corruption#2349
superdav42 wants to merge 2 commits intophp:mainfrom
superdav42:fix/opcache-restart-drain

Conversation

@superdav42
Copy link
Copy Markdown

Summary

Prevents zend_mm_heap corrupted crashes caused by concurrent opcache restarts under ZTS by using a pthread_rwlock to serialize the actual reset against active requests.

Problem

PHP's opcache tracks active processes using fcntl() file locks (F_RDLCK/F_WRLCK). These locks are per-process, not per-thread. In FrankenPHP's threaded model:

  1. Thread A finishes a request → calls accel_deactivate_sub()fcntl(F_UNLCK) — this releases the lock for all threads in the process
  2. Thread B's php_request_startup()accel_activate() sees restart_pendingaccel_is_inactive()fcntl(F_GETLK) returns F_UNLCK (no locks held) → proceeds with reset
  3. Thread C is mid-execution, dereferencing pointers into the interned strings region that Thread B is now zeroing via memsetcrash

Crash backtrace (captured via debug build coredump)

#11 zend_mm_panic("zend_mm_heap corrupted")        zend_alloc.c:398
#12 zend_mm_free_heap(ptr=0x41e4dcf8)              zend_alloc.c:1530
#13 _efree(ptr=0x41e4dcf8)                         zend_alloc.c:2754
#14 zend_string_release(s=0x41e4dcf8)              zend_string.h:348
#15 zend_hash_del(ht=..., key=0x41e98570)          zend_hash.c:1562
#16 ZEND_UNSET_DIM_SPEC_CV_CONST_HANDLER()         zend_vm_execute.h:45315

Concurrent thread at crash time:

#0  __memset_avx2_unaligned_erms()
#1  accel_interned_strings_restore_state()          ZendAccelerator.c:607
#3  zend_activate_modules()                         zend_API.c:3389
#4  php_request_startup()                           main.c:1875

This crash is reproducible with WordPress (plugin/core updates call opcache_invalidate() which fills opcache memory → triggers implicit opcache_reset()). In production, crashes occur every ~20-60 minutes.

Fix

Uses a pthread_rwlock around the request lifecycle:

  • Normal requests: acquire a read lock before php_request_startup(), release after php_request_shutdown(). Read locks are concurrent — zero performance impact.
  • Opcache restart: the zend_accel_schedule_restart_hook sets an atomic flag. The next thread entering php_request_startup() sees the flag and acquires a write lock instead — blocking until all current requests complete their shutdown. Inside that exclusive startup, accel_is_inactive() correctly reports no active threads, and the reset proceeds safely against quiescent shared memory.

Why zend_accel_schedule_restart_hook?

PHP exposes this hook in Zend/zend.h specifically for embedders to react to opcache restart scheduling. It fires from zend_accel_schedule_restart() before restart_pending is set, giving us a clean coordination point.

Performance impact

  • Normal requests: one pthread_rwlock_rdlock + pthread_rwlock_unlock per request. On Linux/glibc, read-lock acquisition is a single atomic CAS (~3ns) with no syscall. Unmeasurable.
  • During opcache restart: brief stall (~50-200ms) while active requests drain. This happens only when opcache memory fills up, which is a rare event in production.

Testing

  • Built and deployed to a production WordPress multisite (6 sites, PHP 8.4.20 ZTS, FrankenPHP built from source via static-php-cli)
  • Prior to this fix: crashes every 20-60 minutes with zend_mm_heap corrupted
  • With this fix: [monitoring in progress]

Related

PHP's opcache uses fcntl() file locks for activity tracking, which are
per-process, not per-thread. In FrankenPHP's threaded model, when any
single thread releases its lock (request shutdown), it releases for ALL
threads — making accel_is_inactive() return true while other threads
are still mid-request reading from shared memory. This allows opcache to
reset interned strings and hash tables via memset while other threads
dereference pointers into that memory, causing zend_mm_heap corrupted
crashes every ~30 minutes under normal WordPress traffic.

The fix uses a pthread read-write lock around the request lifecycle:

- Normal requests: read lock (concurrent, zero contention)
- When opcache schedules a restart (OOM, hash overflow, opcache_reset):
  the zend_accel_schedule_restart_hook sets a flag
- The next php_request_startup() sees the flag and takes a write lock,
  which blocks until all current requests finish their shutdown. Inside
  that exclusive startup, opcache's accel_is_inactive() correctly sees
  no active threads and performs the reset safely.
- After startup completes, the lock is released and all threads resume.

Crash backtrace that prompted this fix:
  #0  accel_interned_strings_restore_state() — memset zeroing shared memory
  (concurrent with)
  php#1  zend_hash_del → zend_string_release → _efree → zend_mm_panic

Fixes: php#1737
Related: php/php-src#14471
Related: php#2265
- zend_accel_schedule_restart_hook was added in PHP 8.4; wrap hook
  function and registration in #if PHP_VERSION_ID >= 80400 to fix
  build on PHP 8.2 and 8.3.
- Break long __atomic_* lines to satisfy clang-format column limit.
- The rwlock around the request lifecycle is unconditional (all PHP
  versions) — it's harmless on 8.2/8.3 (read lock is a no-contention
  CAS) and the hook simply doesn't fire without the restart hook.
@henderkes
Copy link
Copy Markdown
Contributor

#2073

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FrankenPHP crashes with zend_mm_heap corrupted while running some wordpress sites

2 participants