Giter Club home page Giter Club logo

Comments (194)

kohlerdominik avatar kohlerdominik commented on July 23, 2024 9

Was this issue resolved? When will the fix be released?

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024 6

Issue seems fixed after updating to 8.1.7

our setting:
opcache.jit=1255

from php-src.

Gwemox avatar Gwemox commented on July 23, 2024 4

Has anyone seen the same issue in 8.1.5?

from php-src.

stissot avatar stissot commented on July 23, 2024 4

We had the same Opcache segmentation fault this morning on a server running PHP-fpm 8.1.8 and the following configuration. It completely filled up the system memory and swap. A php-fpm restart was needed.

Copyright (c) The PHP Group
Zend Engine v4.1.8, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.8, Copyright (c), by Zend Technologies
opcache.enable=1
opcache.enable_cli=0
opcache.fast_shutdown=0
opcache.interned_strings_buffer=16
opcache.jit=1255
opcache.max_accelerated_files=50000
opcache.memory_consumption=2048
opcache.revalidate_freq=0
Jul 22 05:24:00 web1 kernel: [626019.962719] php-fpm8.1[921996]: segfault at 7f946d1004d0 ip 00007f9501112bc0 sp 00007ffe55c12b50 error 6 in opcache.so[7f95010f4000+b5000]
Jul 22 05:24:00 web1 kernel: [626019.962743] Code: 89 3c 90 83 43 1c 01 f6 46 1c 08 74 25 48 8b 46 08 f6 40 04 20 74 1b 49 8b 56 18 80 7a 18 00 74 11 8b 00 49 8b 95 e0 01 00 00 <48> 89 34 02 0f 1f 40 00 49 83 c>
Jul 22 05:24:00 web1 kernel: [626019.962888] php-fpm8.1[947286]: segfault at 7f94764384d0 ip 00007f9501112bc0 sp 00007ffe55c12b50 error 6 in opcache.so[7f95010f4000+b5000]
Jul 22 05:24:00 web1 kernel: [626019.962906] Code: 89 3c 90 83 43 1c 01 f6 46 1c 08 74 25 48 8b 46 08 f6 40 04 20 74 1b 49 8b 56 18 80 7a 18 00 74 11 8b 00 49 8b 95 e0 01 00 00 <48> 89 34 02 0f 1f 40 00 49 83 c>

from php-src.

dstogov avatar dstogov commented on July 23, 2024 3

@dominikhalvonik JIT performance depends on application. Tracing JIT (1255) should be faster than function (1205). But you should measure the performance of your app yourself. In case JIT gives less than 10% improvement (this is very probable), I would disable it at all.
The new version of PHP usually come with new features, fixes and unrelated to JIT performance improvements. So, for some apps, 8.1 with JIT disabled might be better than 8.0 with tracing JIT.

from php-src.

oleg-st avatar oleg-st commented on July 23, 2024 3

One fix was made in 8.1.7 (#8461), the second in 8.1.8 (#8591).
The third related issue #8642 has not yet been fixed.

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024 2

Disabling JIT seemed to solve the problem. So JIT is the cause of the segfault.

I have now enabled it and set the cache to 64M, will update later.

from php-src.

zejji avatar zejji commented on July 23, 2024 2

Having just wasted 30 hours of my life debugging segfaults which occurred immediately after new deployments of PHP containers (running a large CakePHP 4.2 application) on Docker Swarm, I can confirm that this is definitely an issue.

We are using PHP 8.1.3 via the Bitnami PHP-FPM Docker image.

The solution to the segfaults in our case was disabling JIT and setting the JIT buffer size to zero:

opcache.jit_buffer_size=0M
opcache.jit=disable

By way of additional background, I was able to consistently reproduce the errors when JIT was enabled by using the Locust load testing utility to hit the server with up to 100 requests a second during the deployment period. After disabling the JIT we are able to achieve zero-downtime deployment, which was not possible when JIT was enabled.

The kind of errors we were seeing when JIT was enabled were as follows. The occurs all occurred within the first minute after deployment:

WARNING: [pool www] child 37 exited on signal 11 (SIGSEGV - core dumped) after 3.219271 seconds from start
WARNING: [pool www] child 46 exited on signal 11 (SIGSEGV - core dumped) after 3.461960 seconds from start
WARNING: [pool www] child 150 exited on signal 11 (SIGSEGV - core dumped) after 2.930645 seconds from start

from php-src.

chelsEg avatar chelsEg commented on July 23, 2024 2

@Gwemox I think because it's very hard to reproduce...

But I have the same issue after upgrading to 8.1.

Change opcache.jit=1205 solved the problem!

from php-src.

stevenbrookes avatar stevenbrookes commented on July 23, 2024 2

Just to add that I'm seeing the same behaviour on a full stack Symfony application.

PHP 8.1.4
opcache.jit=1255 FPM SISSEGV after 3 seconds
opcache.jit=1205 FPM works
opcache.jit=1255 CLI works
opcache.jit=1205 FPM works

Application is large so very hard to isolate. Happy to send any other information needed though.

from php-src.

dstogov avatar dstogov commented on July 23, 2024 2

Great 👍

Most probably, you caught the crash in JIT code. Can you please try to dig down a bit more. The following commands would allow you to find the PHP script/function/line where the problem occurs

(gdb) p (char*)executor_globals.current_execute_data.func.op_array.filename.val
(gdb) p (char*)executor_globals.current_execute_data.func.op_array.function_name.val
(gdb) p executor_globals.current_execute_data.func.op_array.line_start
(gdb) p executor_globals.current_execute_data.opline.lineno
(gdb) p executor_globals.current_execute_data.opline - executor_globals.current_execute_data.func.op_array.opcodes

May be, I'll able to find the problem analysing the PHP source code and assembler code produced by JIT.

(gdb) disassemble  0x00007fb54a44e29c-100,  0x00007fb54a44e29c+100 

You may post the info here or send it directly to dmitrystogov at gmail dot com

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024 1

I disabled JIT for now, will update in a few days.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024 1

@cmb69 I am doing some more tests on a different machine with a similar CPU instruction set and will get back with you soon.

from php-src.

dstogov avatar dstogov commented on July 23, 2024 1

@meinemitternacht can you provide an instruction how to reproduce the crash with FastRoute?

from php-src.

dstogov avatar dstogov commented on July 23, 2024 1

@meinemitternacht the fact that the problem doesn't happen consistently, very probably tells about some race condition. e.g. one process is writing something into shared memory (and somehow gets into inconsistent state), at the same time anther process reads from SHM and somehow fails because of inconsistency. The failure occurs only because of luck, when both processes gets into specific states. I think, there is no difference between fresh OS reboot, or PHP-FPM restart (both recreate SHM). PHP CLI doesn't share memory with PHP-FPM.

from php-src.

dstogov avatar dstogov commented on July 23, 2024 1

I cannot fix this before I get a way to reproduce the crash.

from php-src.

dstogov avatar dstogov commented on July 23, 2024 1

@haad I sent a private email

from php-src.

Gwemox avatar Gwemox commented on July 23, 2024 1

I think that I have the same issue after upgrading to 8.1 (8.1.3-4) from 8.0.16 with JIT enabled opcache.jit=1255.
(PHP-FPM, Alpine, Kubernetes)

It happens randomly, after a segfault all requests fail. If I do an opcache_reset() everything works again.

Symfony 5.4.3 and Api Platform 2.6.8, I don't use PHP-DI.

@dstogov Why is this PR resolved ?

from php-src.

jirkace avatar jirkace commented on July 23, 2024 1

We have upgraded our cluster (36 server for one app) in last days and I don't see no performance change (request time or CPU usage) between PHP 8.0 with 1255 and PHP 8.1 with 1205... BUT - when we tried to turn opcache off completely, CPU usage incresed almost two times - from 15 to 30 percent

from php-src.

dstogov avatar dstogov commented on July 23, 2024 1

@jirkace I didn't mean disabling opcache. Just JIT. opcache.jit=0

from php-src.

trapiche-n avatar trapiche-n commented on July 23, 2024 1

Hello!
I have this "Segmentation fault" problem.
I have php 8.1.6 with [opcache.jit = 1255]
In my case it fails every time I make changes to the code of my app with an editor and save them!!!
To make it work again I have to make a change (it can be a simple white space) in the code, with this the application works again...until the next change!
PS: haven't tested with [opcache.jit = 1205] yet

from php-src.

nicrame avatar nicrame commented on July 23, 2024 1

For me

php81 -v

PHP 8.1.7 (cli) (built: Jun 7 2022 18:21:38) (NTS gcc x86_64)
Copyright (c) The PHP Group
Zend Engine v4.1.7, Copyright (c) Zend Technologies
with Zend OPcache v8.1.7, Copyright (c), by Zend Technologies

In 10-opcache.ini i got:
opcache.jit_buffer_size=0
opcache.jit=0
But it still crash:
Jun 28 07:43:52 Love-NAS kernel: traps: php-fpm[5596] general protection fault ip:7ff464b6887c sp:7fffac4a98f0 error:0 in opcache.so[7ff464b4c000+e3000]

from php-src.

dstogov avatar dstogov commented on July 23, 2024 1

@gregherrell thanks for backtrace. Interesting, it doesn't contain any JIT code. It seems like something in shared inheritance cache was corrupted. I have no idea how this may be related to JIT yet. Could you try to run php with opcache.protect_memory=1 in php.ini. This should cause immediate crash in case of unintended write to shared memory and may give the next direction for analyses.

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

I'm afraid this is not actionable without further information, since you didn't provide a reproduce script, and the stack backtrace only shows 1 frame. Since you claim that happens randomly, providing a reproduce script may not be possible, but at the very least we need more info, such as whether OPcache is enabled, whether OPcache JIT is enabled (tracing or function), and in which environment (SAPI) this happens, and also whether that happens only occasionally (and recovers afterwards or not), or happens frequently. Also, try to run with the latest PHP-8.1 development version, where some PHP 8.1.1 bugs have been fixed.

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024
  • it happens in php-fpm
  • randomly and frequent, cannot reproduce
  • there is nothing more in the backtrace, so its hard to find what function causes this
  • opcache settings:
    opcache.enable = on;
    opcache.enable_cli=1;
    opcache.jit_buffer_size=1G;
    opcache.validate_root = on;
    opcache.enable_cli = on;
    opcache.file_update_protection = 0;
    opcache.revalidate_freq = 0;
    opcache.validate_timestamps = 0;
    opcache.interned_strings_buffer = 64;
    opcache.max_accelerated_files = 16229;
    opcache.memory_consumption = 4096;
    opcache.fast_shutdown = 1;
    opcache.log_verbosity_level = 2;
    opcache.blacklist_filename = /etc/opcache-blacklist.txt;
    opcache.force_restart_timeout = 1800;

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

Thanks for the further info! Does it also happen when JIT is disabled (i.e. opcache.jit_buffer_size=0)? If not, what happens if you use a smaller jit_buffer_size (1GB seems very much, maybe try 16M or 64M).

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

Okay, I'll change back to feeback status (the ticket will be kept open for 2 weeks).

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024

Again another crash with SIGSEGV after setting JIT to 64M.
So definitely JIT causes this crash.

For now we leave it off there is some sort of fix (or newer php version).

If you need more info just let me know.

from php-src.

w3yyb avatar w3yyb commented on July 23, 2024

It's similar to the problem :https://bugs.php.net/bug.php?id=81664

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

Okay, so we now there is a randomly but frequently segfault if (tracing) JIT is enabled under FPM, which happens at:

#0 0x000055bbf04c0f25 in ZEND_NEW_SPEC_CONST_UNUSED_HANDLER () at /usr/src/debug/php-8.1.1/Zend/zend_vm_execute.h:10137
10137 ce = CACHED_PTR(opline->op2.num);

I'm afraid that is insufficient information to be actionable, and generally it's hard to fix a bug which is not reproducible. :(

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

We are also encountering this bug (or a very similar one) at my company. We use Yocto to build target images for X86_64, and PHP 8.1 is now producing segfaults with JIT enabled. What debugging options would be helpful here?

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cmb69 After some troubleshooting, it seems that our issue was due to the first flag, CPU optimizations. When enabled, CPUs with the avx instruction set, but not avx2 or avx512..., exhibited segfaults when running JIT. It may be worthwhile to see why this is occurring, but it would be more appropriate for us to open a separate issue at some point in the future.

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

@meinemitternacht, so you have segfaults with opcache.jit=1254, but not with opcache.jit=0254 on these machines?

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cmb69 That is correct.

After testing, this was determined to be incorrect.

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

@cappadaan, would opcache.jit=0254 work for you, i.e. no more segfaults? See #7817 (comment) for details.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cmb69 Still testing, but it seems that the CPU optimization flag was incorrect, it was the function vs tracing JIT option. The bug does not always appear, so getting false negatives is rather annoying.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cmb69 OK, here are my results:

  • opcache.jit = 0254 (fails)
  • opcache.jit = 1254 (fails)
  • opcache.jit = 0205 (passes)
  • opcache.jit = 1205 - (passes)

The following is a partial backtrace output from gdb, obtained in a previous test.

0 zend_fetch_ce_from_cache_slot (type=0x7f4f78c97170, cache_slot=0x8) at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:980
1 zend_check_type_slow (is_internal=false, is_return_type=false, cache_slot=0x8, ref=0x0, arg=0x7f4fb6e146f0, type=0x7f4f78c97170)
at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1043
2 zend_check_user_type_slow (type=0x7f4f78c97170, arg=0x7f4fb6e146f0, ref=0x0, cache_slot=0x8, is_return_type=false)
at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1103

Debugging:

opcache.jit = 1254
# 0b11111111111100000000
opcache.jit_debug = 1048320

ZEND_JIT_DEBUG_ASM             0
ZEND_JIT_DEBUG_SSA             0
ZEND_JIT_DEBUG_REG_ALLOC       0
ZEND_JIT_DEBUG_ASM_STUBS       0
ZEND_JIT_DEBUG_PERF            0
ZEND_JIT_DEBUG_PERF_DUMP       0
ZEND_JIT_DEBUG_OPROFILE        0
ZEND_JIT_DEBUG_VTUNE           0
ZEND_JIT_DEBUG_GDB             1
ZEND_JIT_DEBUG_SIZE            1
ZEND_JIT_DEBUG_ASM_ADDR        1
ZEND_JIT_DEBUG_TRACE_START     1
ZEND_JIT_DEBUG_TRACE_STOP      1
ZEND_JIT_DEBUG_TRACE_COMPILED  1
ZEND_JIT_DEBUG_TRACE_EXIT      1
ZEND_JIT_DEBUG_TRACE_ABORT     1
ZEND_JIT_DEBUG_TRACE_BLACKLIST 1
ZEND_JIT_DEBUG_TRACE_BYTECODE  1
ZEND_JIT_DEBUG_TRACE_TSSA      1
ZEND_JIT_DEBUG_TRACE_EXIT_INFO 1

This leads to the following debug output:

[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "---- TRACE 3 start (loop) FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute() /api/vendor/nikic/fast-route/src/DataGenerator/RegexBasedAbstract.php:130"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0004 FE_FETCH_R V6 CV3($part) 0050 ; op1(packed array) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0005 INIT_NS_FCALL_BY_NAME 1 string("FastRoute\DataGenerator\is_string")"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >init is_string"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0006 SEND_VAR_EX CV3($part) 1 ; op1(packed array)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0007 V7 = DO_FCALL_BY_NAME"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >call is_string"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0008 JMPZ V7 0015 ; op1(bool)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0015 T7 = QM_ASSIGN CV3($part) ; op1(packed array)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0016 V8 = FETCH_LIST_R T7 int(0) ; op1(packed array) val(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0017 ASSIGN CV4($varName) V8 ; op1(string) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0018 V8 = FETCH_LIST_R T7 int(1) ; op1(packed array) val(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0019 ASSIGN CV5($regexPart) V8 ; op1(string) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0020 FREE T7 ; op1(packed array)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0021 T7 = ISSET_ISEMPTY_DIM_OBJ (isset) CV2($variables) CV4($varName) ; op1(array) op2(string) val(undef)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0022 ;JMPZ T7 0031"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0031 INIT_METHOD_CALL 1 THIS string("regexHasCapturingGroups")"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >init FastRoute\DataGenerator\RegexBasedAbstract::regexHasCapturingGroups"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0032 SEND_VAR CV5($regexPart) 1 ; op1(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0033 V7 = DO_FCALL"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     >enter FastRoute\DataGenerator\RegexBasedAbstract::regexHasCapturingGroups"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0001  INIT_NS_FCALL_BY_NAME 2 string("FastRoute\DataGenerator\strpos")"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "      >init strpos"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0002  SEND_VAR_EX CV0($regex) 1 ; op1(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0003  SEND_VAL_EX string("(") 2"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0004  V2 = DO_FCALL_BY_NAME"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "      >call strpos"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0005  T1 = TYPE_CHECK (false) V2 ; op1(bool)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0006  ;JMPZ T1 0008"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0007  RETURN bool(false)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     <back FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0034 JMPZ V7 0044 ; op1(bool)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0044 ASSIGN_DIM CV2($variables) CV4($varName) ; op1(array) op2(string) op3(string) val(undef)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0045 ;OP_DATA CV4($varName)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0046 T8 = CONCAT string("(") CV5($regexPart) ; op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0047 T7 = FAST_CONCAT T8 string(")") ; op1(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0048 ASSIGN_OP (CONCAT) CV1($regex) T7 ; op1(string) op2(string)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "0049 JMP 0004"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "---- TRACE 3 stop (loop)"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "---- TRACE 3 already prcessed"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     TRACE 3 exit 2 FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute() /api/vendor/nikic/fast-route/src/DataGenerator/RegexBasedAbstract.php:130"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 said into stderr: "     TRACE 3 exit 4 FastRoute\DataGenerator\RegexBasedAbstract::buildRegexForRoute() /api/vendor/nikic/fast-route/src/DataGenerator/RegexBasedAbstract.php:131"
[27-Dec-2021 14:47:17] WARNING: [pool vtl] child 9433 exited on signal 11 (SIGSEGV) after 8.303799 seconds from start

So, it seems that some of the regex compilation in FastRoute is causing issues with the JIT, at least on my machine(s). Do you have any pointers on what debugging options or tests which would help move this issue forward?

from php-src.

nikic avatar nikic commented on July 23, 2024

Maybe @dstogov has suggestions for debugging.

In your trace, the cache_slot=0x8 is likely what ultimately causes the crash, but it's not visible where it originates.

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024

@cmb69 I also tested with this config:

opcache.jit_buffer_size=128M;
opcache.jit = function;

No crashes. Let me know if I can help to somehow debug this for you.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@dstogov Troubleshooting this issue is quite tedious.

root@v180:~# cat /var/log/php_log |grep 10642
[28-Dec-2021 11:43:51] NOTICE: [pool vtl] child 10642 started
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10642 said into stderr: "---- TRACE 36 start (loop) Composer\Autoload\ClassLoader::findFileWithExtension() /api/vendor/composer/ClassLoader.php:501"
  -- snip --
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10642 said into stderr: "0045 RETURN CV9($file) ; op1(string)"
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10642 said into stderr: "---- TRACE 41 abort (exit from loop)"
[28-Dec-2021 11:46:17] WARNING: [pool vtl] child 10642 exited on signal 11 (SIGSEGV) after 145.578650 seconds from start
root@v180:~# cat /var/log/php_log |grep 10643 
[28-Dec-2021 11:43:51] NOTICE: [pool vtl] child 10643 started
[28-Dec-2021 11:46:13] WARNING: [pool vtl] child 10643 exited on signal 11 (SIGSEGV) after 141.926396 seconds from start

It seems that JIT debug output immediately preceding a segfault has no bearing on if it actually caused the segfault or not. In this case, no debug output was gathered while the child was alive (but other FPM processes did produce output). However, with JIT tracing turned off, I never receive a segfault from PHP-FPM.

This is a self-contained build environment (Yocto), and all target machines share the same architecture (X86_64). I am going to attempt to generate some useful core files, or to run php-fpm under gdb. Do you have any suggestions to make this process easier? Are there compile-time flags that I can turn on, or extra debugging options that I can enable?

How does ZEND_JIT_DEBUG_GDB affect debug output?

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

Maybe https://wiki.php.net/rfc/jit#jit_debugging helps a bit. :)

from php-src.

dstogov avatar dstogov commented on July 23, 2024

@meinemitternacht I mean, I need to run the same PHP application, if this is possible. The smaller the app - the better.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cmb69 @dstogov Is there an easy way to run a PHP FPM process under gdb? Currently, that is the only method I have for reproducing my problem since it does not occur with the CLI. I am still attempting to narrow the test case as well.

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

I have no idea about debugging FPM, but debugging JIT issues is generally very hard, and it likely makes more sense to look for a way to reproduce the segfault, and provide that to Dmitry. Also try with current PHP-8.1 if possible; there have been several JIT related fixes since 8.1.1.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cmb69 I am currently attempting to attach gdb to a running FPM process, but I am encountering bugs in gdb... of all places.

Yes, I would like to provide a concise test case for him, but the bug is intermittent and I have no idea what code is actually triggering it.

from php-src.

dstogov avatar dstogov commented on July 23, 2024

@meinemitternacht some of your log above shows a crash after compiling just 3 traces. It shouldn't be very hard to me to analyse this, if I reproduce. I suspect the problem may be caused by some race condition. Or we may get crashes because of few different problems.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@dstogov Can you provide a short explanation of how the shared memory space works with JIT? I usually see the problem occur after a fresh reboot, but it still does not happen every time. My first goal is to get the issue to happen consistently then I can narrow down the code that is causing the segfault.

Basically, my question is, what is the difference between an OS reboot and restarting php-fpm? Does the OS cache libraries, or should I watch out for shared memory between the PHP CLI and php-fpm? I am using the CLI opcache.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

I can continue testing tomorrow at work.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@dstogov I have GDB attached to a php-fpm process and reproduced the segfault.

(gdb) bt
#0  zend_check_type_slow (is_internal=false, is_return_type=false, cache_slot=0x8, ref=0x0, arg=0x7f6249014590, type=0x7f620af3d470)
    at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1043
#1  zend_check_user_type_slow (type=0x7f620af3d470, arg=0x7f6249014590, ref=0x0, cache_slot=0x8, is_return_type=false)
    at /usr/src/debug/php/8.1.1-r0/php-8.1.1/Zend/zend_execute.c:1103
#2  0x00007f6249289e9a in zend_jit_verify_arg_slow (arg=0x7f6249014590, arg_info=0x7f620af3d468) at /usr/src/debug/php/8.1.1-r0/php-8.1.1/ext/opcache/jit/zend_jit_helpers.c:1467
#3  0x00007f6228cae736 in ?? ()
#4  0x00007f624907b240 in ?? ()
#5  0x00007f620af298d8 in ?? ()
#6  0x0000000000000000 in ?? ()

======

#2  0x00007f6249289e9a in zend_jit_verify_arg_slow (arg=0x7f6249014590, arg_info=0x7f620af3d468) at /usr/src/debug/php/8.1.1-r0/php-8.1.1/ext/opcache/jit/zend_jit_helpers.c:1467
1467    in /usr/src/debug/php/8.1.1-r0/php-8.1.1/ext/opcache/jit/zend_jit_helpers.c
(gdb) info args
arg = 0x7f6249014590
arg_info = 0x7f620af3d468

======

(gdb) print *arg
$9 = {
  value = {
    lval = 140060108612160,
    dval = 6.9198888018061968e-310,
    counted = 0x7f6249056240,
    str = 0x7f6249056240,
    arr = 0x7f6249056240,
    obj = 0x7f6249056240,
    res = 0x7f6249056240,
    ref = 0x7f6249056240,
    ast = 0x7f6249056240,
    zv = 0x7f6249056240,
    ptr = 0x7f6249056240,
    ce = 0x7f6249056240,
    func = 0x7f6249056240,
    ww = {
      w1 = 1225089600,
      w2 = 32610
    }
  },
  u1 = {
    type_info = 776,
    v = {
      type = 8 '\b',
      type_flags = 3 '\003',
      u = {
        extra = 0
      }
    }
  },
  u2 = {
    next = 0,
    cache_slot = 0,
    opline_num = 0,
    lineno = 0,
    num_args = 0,
    fe_pos = 0,
    fe_iter_idx = 0,
    property_guard = 0,
    constant_flags = 0,
    extra = 0
  }
}

======

(gdb) print *arg_info
$10 = {
  name = 0x7f620950bd80,
  type = {
    ptr = 0x7f620950d350,
    type_mask = 16777216
  },
  default_value = 0x6e6f6974696e00
}

======

(gdb) print *arg_info.name
$11 = {
  gc = {
    refcount = 2,
    u = {
      type_info = 342
    }
  },
  h = 17469586239392435662,
  len = 10,
  val = "d"
}

What is *arg_info.name.val ? Is that the name of a PHP variable, or something internal to the Zend engine or JIT?

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

After looking at a few things, it looks like it is getting hung up with the PHP-DI definitions.

(gdb) print /s (char *)(*arg).value.obj.ce.__tostring.op_array.filename.val   
$48 = 0x7f620af47980 "/api/vendor/php-di/php-di/src/Definition/FactoryDefinition.php"
(gdb) print /s (char *)(*arg_info).name.val          
$52 = 0x7f620950bd98 "definition"

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@dstogov After blacklisting /api/vendor/php-di/* from opcache, I can no longer reproduce the segfaults.

I have multiple projects running on these machines, each with their own vendor directory. Is it possible that having two different versions of PHP-DI in the same FPM address space is causing the JIT to become confused?

In one project, I am using PHP-DI version 5.4.6, and in the main project I am using 6.3.4.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@dstogov If I run separate php-fpm pools, would JIT'd code be shared between them, or is each pool separate?

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

All FPM pools share the same OPcache instance.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cmb69 Ouch. Well, I will do some more testing on Monday. It's either a problem with PHP-DI in this environment, or it's because I have two separate, but similar, versions in OPcache.

from php-src.

cmb69 avatar cmb69 commented on July 23, 2024

Ouch.

See https://bugs.php.net/bug.php?id=81704#1640205875 for details. :)

from php-src.

DevSysEngineer avatar DevSysEngineer commented on July 23, 2024

I think that I have the same issue as the actor. We have huge code base that is based on PHP-DI / FastRoute. Every time when I deploy a new version of our code base, our PHP FPM serivce will fail and creating a lot of logs with segfault fails. When I reload my PHP FPM service, the errors are gone: systemctl reload php8.0-fpm. It seems that cache is not being cleared properly? When I disabled JIT, the segfault failed are gone after deploying new version.

I have this issue since PHP 8.0. I also run a instance with PHP 8.1 and we see their the same issues.
May 27 12:56:38 manager1 kernel: php-fpm8.0[22682]: segfault at 55f27089 ip 0000000055f27089 sp 00007ffecd55b598 error 14 in php-fpm8.0[55f2707ab000+d1000]

I tried to create a test code to found out where the issue comes from, but has not succeeded yet.

from php-src.

drealecs avatar drealecs commented on July 23, 2024

How are you deploying the new code base versions? just git checkout?
One solution for you would be to make sure a new instance of fpm is started for the new codebase. That's usual with docker.
Another one that might work as well is to use the directory symlink replace strategy, basically replacing the whole codebase with the new codebase. The realpath will be resolved and handled as new entries.

from php-src.

DevSysEngineer avatar DevSysEngineer commented on July 23, 2024

@drealecs Yes, just a simple git checkout. Only changes files will be replaced.

from php-src.

dstogov avatar dstogov commented on July 23, 2024

What is *arg_info.name.val ? Is that the name of a PHP variable, or something internal to the Zend engine or JIT?

you may print zend string value through print (char*)arg_info.name.val

from php-src.

aidas-emersoft avatar aidas-emersoft commented on July 23, 2024

We're having a very similar issue. Been running a number of sites on PHP-FPM 8.0.10 with the following JIT config:

opcache.enable=1
opcache.jit=1255
opcache.jit_buffer_size=128M

Once switched to PHP-FPM 8.1.0 (and 8.1.1) sites randomly started crashing producing segfault errors. I've now disabled JIT hoping this will temporarily fix it.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@aidas-emersoft Do your sites happen to utilize the PHP-DI project? https://php-di.org/

In our case, we see segfaults under the tracing JIT when we do not blacklist that vendor directory. When it is excluded from opcache, no segfaults occur.

Using the function JIT option (opcache.jit = 1205), no segfaults are produced either.

from php-src.

aidas-emersoft avatar aidas-emersoft commented on July 23, 2024

@meinemitternacht - no, we don't use PHP-DI. They are full stack Symfony projects. I will try opcache.jit = 1205 setting.

No crashes have happened since completely disabling JIT 12 hours ago.

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024

OPcache:
Fixed bug #81679 (Tracing JIT crashes on reattaching).

This change in 8.1.2 does not fix this issue.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@cappadaan I don't think that fix was for this issue. It was for CGI on windows IIRC.

from php-src.

deluxetom avatar deluxetom commented on July 23, 2024

I'm experiencing the same issue with JIT enabled, PHP 8.1.2

fulltext:SIGSEGV log_string="[03-Feb-2022 00:26:30] WARNING: [pool www] child 161 exited on signal 11 (SIGSEGV) after 119.953665 seconds from start"

everything goes back to normal with JIT disabled

from php-src.

kohlerdominik avatar kohlerdominik commented on July 23, 2024

Just found the same issue after updating to PHP8.1 with JIT enabled. So far it looks like changing from opcache.jit=1235 to opcache.jit=1205 resolves the issue.

We use a laravel app with a lot of symfony components, but no PHP-DI. Here our composer dependency, maybe some other contributors can make out similarities:

Click here to show `composer.json`
"php": "^8.1",
"ext-dom": "*",
"ext-fileinfo": "*",
"ext-json": "*",
"ext-mbstring": "*",
"ext-redis": "*",
"ext-simplexml": "*",
"absszero/laravel-stackdriver-error-reporting": "^1.6",
"askedio/laravel-soft-cascade": "^8.1",
"brick/money": "^0.5",
"ezyang/htmlpurifier": "^4.13",
"fideloper/proxy": "^4.4",
"fruitcake/laravel-cors": "^2.0",
"galbar/jsonpath": "^2.0",
"guzzlehttp/guzzle": "^7.4",
"inspheric/nova-indicator-field": "^1.43",
"intervention/validation": "^3.0",
"justinrainbow/json-schema": "^5.2",
"kalnoy/nestedset": "^6.0",
"kkomelin/laravel-translatable-string-exporter": "^1.12",
"laravel/framework": "^8.77",
"laravel/horizon": "^5.7",
"laravel/nova": "~3.25",
"laravel/passport": "^10.1",
"laravel/telescope": "^4.4",
"laravel/tinker": "^2.6",
"laravel/ui": "^3.4",
"league/fractal": "^0.19",
"lucid-arch/laravel-foundation": "^8.0",
"maatwebsite/excel": "^3.1",
"maennchen/zipstream-php": "^2.1",
"mikehaertl/php-tmpfile": "dev-feature/keep-file-after-unreferencing",
"mossadal/math-parser": "^1.3",
"mustache/mustache": "^2.13",
"prettus/l5-repository": "^2.7",
"s-ichikawa/laravel-sendgrid-driver": "^3.0",
"spatie/laravel-translatable": "^5.1",
"sprain/swiss-qr-bill": "v4.0",
"staudenmeir/belongs-to-through": "^2.11",
"superbalist/flysystem-google-storage": "dev-master as 7.2.3",
"superbalist/laravel-google-cloud-storage": "^2.2",
"symfony/intl": "^5.0",
"tedivm/jshrink": "1.4.0",
"veelasky/laravel-hashid": "^2.2",
"vmitchell85/nova-links": "^1.0"

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@dstogov I am encountering a similar problem on a virtual machine I am using for development (also 8.1).

If I have a fresh instance of PHP-FPM, it works fine. However, as soon as I overwrite one of the project source files with changes during development, JIT will immediately cause a segfault. If I then restart PHP-FPM, it will begin working again.

It seems to occur regardless of what I change in the file (sometimes it is even just text within a string).

from php-src.

kohlerdominik avatar kohlerdominik commented on July 23, 2024

Hi @dstogov

The issue in my environment (Docker Container) is 100% reproducable. It's occuring on my local docker setup as well as in our GCP K8S environment. It might even apear outside of docker containers.

But our environment is hard to setup. So the best I could offer you is a remote VM with docker-compose and guide how to reproduce (the issue does only happen on certain endpoints unfortunately). And I guess inside of an alpine docker-container is something between "not the prefered way" and "impossible" to debug JIT issues...?

from php-src.

Huggyduggy avatar Huggyduggy commented on July 23, 2024

We're running 8.1.2 within AWS ECS on Fargate, thus new hosts are provisioned for each deployment. In ~ 4/5 deployments, we se multiple SIGSEGVs within the first 60 seconds, leading to a deploy-and-kill loop, which happens a few time until we happen to get a steady server. We're currently trying our luck with JIT 1205, I'll update this comment once we have some experience with it.

(For the records, we're also running on laravel 8.x)

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

@Huggyduggy Do you completely destroy the instances each time, or do you only deploy your codebase when you release fixes and such? I was wondering if you could provide some insight as to the behavior when you update PHP files on the filesystem without restarting the PHP-FPM instance.

from php-src.

Huggyduggy avatar Huggyduggy commented on July 23, 2024

@meinemitternacht During deployment, a bunch of new virtual AWS EC2 servers are started, they're provisioned with the latest container-software as provided by AWS. Then, containers are pulled & started automatically. Legacy servers/containers get removed from the LB and terminate. There's no real way of changing the codebase on existing containers / servers, I'm afraid.

We're developing on the same PHP-Docker images which we also use for production releases, during development I've not yet noticed any segfaults.

from php-src.

dominikhalvonik avatar dominikhalvonik commented on July 23, 2024

Guys, any update on this? As @zejji said, this issue is still present on 8.1.3 FPM. Is there any progress on this?

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

I had difficulties finding a minimal test case so @dstogov could debug the problem. Perhaps others will have more luck?

from php-src.

kvas-damian avatar kvas-damian commented on July 23, 2024

We had a similar problem after upgrade to PHP 8.1.3 - before we used PHP 8.0.3 without problems.

In our case, Laravel8 app based on PHP-FPM is hosted in Kubernetes. K8s cluster is based on N2D instances with 3rd Gen AMD EPYC processors. Dockerfile which started the problem for us is following:

FROM composer:2.1.12 AS php-composer

FROM php:8.1.3-fpm-alpine3.15

USER root

RUN apk --no-cache add --virtual .build-deps \
  build-base \
  && apk --no-cache add libpng-dev libzip-dev libjpeg-turbo-dev freetype-dev \
  && docker-php-ext-configure gd \
    --with-freetype \
    --with-jpeg \
  && docker-php-ext-install -j$(nproc) bcmath gd zip mysqli pdo_mysql sockets \
  && docker-php-ext-enable opcache \
  && echo $'zend_extension=opcache\n\
[opcache]\n\
opcache.enable=1\n\
opcache.enable_cli=1\n\
opcache.validate_timestamps=0\n\
opcache.max_accelerated_files=10000\n\
opcache.memory_consumption=128\n\
opcache.max_wasted_percentage=10\n\
opcache.interned_strings_buffer=16\n\
opcache.fast_shutdown=1\n\
opcache.jit_buffer_size=100M' > /usr/local/etc/php/conf.d/docker-php-ext-opcache.ini \
  && apk del .build-deps

# copy composer from the first stage
COPY --from=php-composer /usr/bin/composer /usr/bin
RUN composer --ansi --version --no-interaction; php -v; php -m; php -r 'var_export(gd_info()); echo PHP_EOL; var_export(opcache_get_status());'

From ~20 identical pods, only 2 were segfaulting, for all requests. We found following entries in our logs:

[4539894.358435] php-fpm[3056477]: segfault at 8 ip 0000000048d82054 sp 00007ffe99621df0 error 4 in zero (deleted)[48d19000+6400000]\r\n

WARNING: [pool www] child 792 exited on signal 11 (SIGSEGV) after 15.030120 seconds from start

Change from JIT tracing to function (opcache.jit=1205) solved the problem.

I hope it helps you find the root cause.

from php-src.

haad avatar haad commented on July 23, 2024

I cannot fix this before I get a way to reproduce the crash.

We have a simple way for reproducing this issue on our internal application/kubernetes. We can provide more details outside github.

from php-src.

dominikhalvonik avatar dominikhalvonik commented on July 23, 2024

@dstogov if you tell me what info you need I am more than happy to give you all information that I can provide.

from php-src.

dstogov avatar dstogov commented on July 23, 2024

@dominikhalvonik Ideally, I need to reproduce the problem in my debug environment. I may start analyses installing your app (git clone, composer install, your instruction to reproduce), using VM image, or SSH access to test environment.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

I don't think it should be closed just because it is hard to reproduce, if that is indeed the reason.

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

If this is going to be closed, I think the default parameter for opcache.jit should be changed to function so that we lessen the impact of this bug.

from php-src.

iluuu1994 avatar iluuu1994 commented on July 23, 2024

I think this might've just been closed by accident.

from php-src.

dominikhalvonik avatar dominikhalvonik commented on July 23, 2024

Hi guys, I know this might be a silly question but what is better from a performance point of view:

php:8.0.15-fpm + JIT in config opcache.jit=1255
OR
php:8.1.4-fpm + JIT in config opcache.jit=1205

The reason why I am asking is that it is better to remain on PHP 8.0 with 1255 config or we can get better performance if we migrate to PHP 8.1 with 1205. Any ideas?

from php-src.

tanhaei avatar tanhaei commented on July 23, 2024

Has anyone seen the same issue in 8.1.5?

We have some problems with 8.1.5 release. Setting opcache.jit=0 resolves the problem temporary!!

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024

Has anyone seen the same issue in 8.1.5?

The issue is still there and got even worse in 8.1.5. In 8.1.4 you could use opcache.jit=1205 as a workaround.
But in 8.1.5 this also gave segfaults, only opcache.jit=0 works now for me.

from php-src.

nepster-web avatar nepster-web commented on July 23, 2024

Somewhere near #8149

from php-src.

jonathantullett avatar jonathantullett commented on July 23, 2024

Has anyone seen the same issue in 8.1.5?

Yes, seeing the same issue in our symfony application. It's not immediate though and often can run for a few days before we start getting the segfaults across the board.

from php-src.

Gwemox avatar Gwemox commented on July 23, 2024

Has anyone seen the same issue in 8.1.5?

Yes, seeing the same issue in our symfony application. It's not immediate though and often can run for a few days before we start getting the segfaults across the board.

I think the problem has been found in #8461 !

from php-src.

brunohsouza avatar brunohsouza commented on July 23, 2024

I have been trying to solve this problem. I'm using PHP 8.1.5-fpm + Nginx + Symfony 6.

After apply the purposed solution to change opcache.jit = tracing to opcache.jit=1205 it seems to be working.

What makes me wonder is that we have some environments using an environment variable like APP_ENV=dev (where the segfault is happening) and others using APP_ENV=staging (where the segfault is not happening). Even using the same versions of code, docker, libraries, etc.

Does someone knows if there is any correlation between these segfaults with the APP_ENV variable or any env var?

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

I applied the changes from #8461 to our environment using 8.1.6 as a base, and we are still experiencing segfaults. However, there may be multiple issues at play, and those fixes may alleviate some of the problems in this issue.

@brunohsouza I am not sure that the environment variables are impacting the problems with JIT, but they may cause the application to trigger certain behaviors depending on the environment settings.

from php-src.

brunohsouza avatar brunohsouza commented on July 23, 2024

I just found another issue which maybe can be a clue for the problem with the APP_ENV variable: symfony/symfony#45752.

My guess is as Symfony have already a pre-configured and defined environment variables like dev, test and prod which reflects on the directory structure inside /var/cache. When using JIT on tracing mode it is trying to optimize some "hot codes" inside some dynamic created file that is created on the /var/cache/{env} or inside some container class. Then, once JIT cannot find the file it generates a segfault.

The issue above can be related to it once the segfault happens when the cache folder is cleaned and doesn't happens when it is not.

Also, if I change the APP_ENV, I cannot see the segfault because there's no folder with the new env name. Then, it will not try to open a class or file.

After changing the jit mode to 1205 the segfaults had stopped, I think it's because it's trying to compile all functions on script load and not profile on the fly and compile traces for hot code segments as the documentation says.

from php-src.

usefksa avatar usefksa commented on July 23, 2024

Hello,
We also have the same problem. It happened totally random. If we run 10 servers, one of them will have the bug.

from php-src.

nursoda avatar nursoda commented on July 23, 2024

I still have this issue with 8.1.7 and opcache.jit=1255: After server reboot, one or more of my nextcloud instances constantly segfault/coredump. Setting opcache.jit from 1255 to 1205 reduces the impact but I still see two coredumps upon server restart. After commenting out opcache.jit and opcache.jit_buffer_size (assuming that disables JIT), I have no coredumps upon reboot. If I get some upon later reboots, I shall report here.

from php-src.

chelsEg avatar chelsEg commented on July 23, 2024

@cappadaan for my project issue not fixed after updating to 8.1.7

from php-src.

cappadaan avatar cappadaan commented on July 23, 2024

We indeed have experienced 1 segfault up till now, seems not 100% fixed

from php-src.

everyx avatar everyx commented on July 23, 2024

Same problem here with config below, php version 8.1.8

opcache.memory_consumption = 192
opcache.interned_strings_buffer = 8
opcache.max_accelerated_files = 4000
opcache.revalidate_freq = 60
opcache.fast_shutdown = 1
opcache.enable_cli = 1
opcache.jit = 1235
opcache.jit_buffer_size = 64M
opcache.preload_user = www-data
WARNING: [pool www] child 289717 exited on signal 11 (SIGSEGV - core dumped) after 42913.627688 seconds from start

from php-src.

gregherrell avatar gregherrell commented on July 23, 2024

We have a CMS application that powers thousands of individual websites. Outside of the individual design files on each websites, they all share the same centralized code base located in a folder on each web server. We run php-fpm. We have upgraded our servers to php 8.1.9 over the last week. CentOS 7, Apache 2.4.54

After each upgrade I delete all the old opcache files before restarting the server. Each server runs a single app pool using static mode and there are thousands of instances/websites of this application on each server.

What we are finding is that php-fpm begins to 503 on individual websites and not the entire server. Further, some of the pages on the websites will still serve while others 503. The 503 relates to the the "segfault in opcache.so errors" located in /var/log/messages. A restart of php-fpm clears the errors. I cannot find a pattern as to why.

Interestingly, I wondered if the somehow the local twig cache located on each website instance might be related. During one of the outages of a single website I ran a script to delete all the twig folder on the individual websites. This immediately resulted in 503 errors for all of the sites on the server.

Below are my settings. Note the buffer size is set to zero. Admittedly, I am confused on whether I should also set jit=disable to disable jit entirely. Or this is even a jit issue.

opcache.jit => tracing => tracing
opcache.jit_bisect_limit => 0 => 0
opcache.jit_blacklist_root_trace => 16 => 16
opcache.jit_blacklist_side_trace => 8 => 8
opcache.jit_buffer_size => 0 => 0
opcache.jit_debug => 0 => 0
opcache.jit_hot_func => 127 => 127
opcache.jit_hot_loop => 64 => 64
opcache.jit_hot_return => 8 => 8
opcache.jit_hot_side_exit => 8 => 8
opcache.jit_max_exit_counters => 8192 => 8192
opcache.jit_max_loop_unrolls => 8 => 8
opcache.jit_max_polymorphic_calls => 2 => 2
opcache.jit_max_recursive_calls => 2 => 2
opcache.jit_max_recursive_returns => 2 => 2
opcache.jit_max_root_traces => 1024 => 1024
opcache.jit_max_side_traces => 128 => 128
opcache.jit_prof_threshold => 0.005 => 0.005

from php-src.

gregherrell avatar gregherrell commented on July 23, 2024

Here is a back trace.

For me, changing the buffer to > 0 generates immediate random segfaults. I realize this is not reproduceable, but perhaps it is something. I have several core dumps I can provide if need be.

Addedum. Got segfaults with JIT disabled after server was running for days.

#0 zend_accel_inheritance_cache_find (needs_autoload_ptr=, traits_and_interfaces=, parent=, ce=, entry=0x41e6ccf0)
at /usr/src/debug/php-8.1.9/ext/opcache/ZendAccelerator.c:2254
#1 zend_accel_inheritance_cache_get () at /usr/src/debug/php-8.1.9/ext/opcache/ZendAccelerator.c:2295
#2 0x0000557e80fc366f in zend_try_early_bind () at /usr/src/debug/php-8.1.9/Zend/zend_inheritance.c:3021
#3 0x0000557e80f09d93 in zend_do_delayed_early_binding (op_array=op_array@entry=0x7fe34ea02500, first_early_binding_opline=) at /usr/src/debug/php-8.1.9/Zend/zend_compile.c:1380
#4 0x00007fe353b386d4 in zend_accel_load_script () at /usr/src/debug/php-8.1.9/ext/opcache/zend_accelerator_util_funcs.c:255
#5 0x0000557e80ef1989 in compile_filename (type=type@entry=2, filename=filename@entry=0x7fe321c1d000) at /usr/src/debug/php-8.1.9/Zend/zend_language_scanner.c:707
#6 0x0000557e80f6163a in zend_include_or_eval (inc_filename_zv=, type=2) at /usr/src/debug/php-8.1.9/Zend/zend_execute.c:4623
#7 0x0000557e80f6e95a in ZEND_INCLUDE_OR_EVAL_SPEC_CV_HANDLER () at /usr/src/debug/php-8.1.9/Zend/zend_vm_execute.h:38713
#8 0x0000557e80f95516 in execute_ex () at /usr/src/debug/php-8.1.9/Zend/zend_vm_execute.h:59122
#9 0x0000557e80f21244 in zend_call_function () at /usr/src/debug/php-8.1.9/Zend/zend_execute_API.c:908
#10 0x0000557e80f21635 in zend_call_known_function () at /usr/src/debug/php-8.1.9/Zend/zend_execute_API.c:997
#11 0x0000557e80e27030 in spl_perform_autoload (class_name=0x7fe321c1cfb8, lc_name=0x7fe321d0d7e0) at /usr/src/debug/php-8.1.9/ext/spl/php_spl.c:433
#12 0x0000557e80f2051c in zend_lookup_class_ex (name=name@entry=0x7fe321c1cfb8, key=0x7fe321d0d7e0, flags=flags@entry=512) at /usr/src/debug/php-8.1.9/Zend/zend_execute_API.c:1141
#13 0x0000557e80f21982 in zend_fetch_class_by_name () at /usr/src/debug/php-8.1.9/Zend/zend_execute_API.c:1601
#14 0x0000557e80f6ba0f in ZEND_NEW_SPEC_CONST_UNUSED_HANDLER () at /usr/src/debug/php-8.1.9/Zend/zend_vm_execute.h:10147
#15 0x0000557e80f944f4 in execute_ex () at /usr/src/debug/php-8.1.9/Zend/zend_vm_execute.h:56659
#16 0x0000557e80f9d22d in zend_execute (op_array=0x7fe34ea02200, return_value=0x0) at /usr/src/debug/php-8.1.9/Zend/zend_vm_execute.h:60123

from php-src.

gregherrell avatar gregherrell commented on July 23, 2024

@gregherrell thanks for backtrace. Interesting, it doesn't contain any JIT code. It seems like something in shared inheritance cache was corrupted. I have no idea how this may be related to JIT yet. Could you try to run php with opcache.protect_memory=1 in php.ini. This should cause immediate crash in case of unintended write to shared memory and may give the next direction for analyses.

Unfortunately, this yielded a back trace with no information. I am not certain why. I generated 4 core dumps. All had the same information.

Reading symbols from /usr/sbin/php-fpm...Reading symbols from /usr/lib/debug/usr/sbin/php-fpm.debug...done.
done.
[New LWP 13843]
Core was generated by `php-fpm: pool www '.
Program terminated with signal 11, Segmentation fault.
#0 0x00007fb54a44e29c in ?? ()
(gdb) bt
#0 0x00007fb54a44e29c in ?? ()
#1 0x0000000000007fff in ?? ()
#2 0x7431bff8d9b1e4f2 in ?? ()
#3 0x0000000000000000 in ?? ()

from php-src.

meinemitternacht avatar meinemitternacht commented on July 23, 2024

Maybe this Is related to #8642 and it can be fixed along with this one

from php-src.

MichaelSch avatar MichaelSch commented on July 23, 2024

I'm adding myself to the list. I run several websites without issues, but on my Nextcloud instance php-fpm crashes randomly.

php -v
PHP 8.1.9 (cli) (built: Aug  2 2022 13:02:24) (NTS gcc x86_64)
Copyright (c) The PHP Group
Zend Engine v4.1.9, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.9, Copyright (c), by Zend Technologies
                Stack trace of thread 9677:
                #0  0x00007f94c951d13c zend_accel_inheritance_cache_get (opcache.so + 0x2113c)
                #1  0x00005603fda841f2 zend_do_link_class (php-fpm + 0x4841f2)
                #2  0x00005603fd9c91cf zend_bind_class_in_slot (php-fpm + 0x3c91cf)
                #3  0x00005603fd9c925d do_bind_class (php-fpm + 0x3c925d)
                #4  0x00005603fda22ac9 ZEND_DECLARE_CLASS_SPEC_CONST_HANDLER (php-fpm + 0x422ac9)
                #5  0x00005603fda5547d execute_ex (php-fpm + 0x45547d)
                #6  0x00005603fd9e07dc zend_call_function (php-fpm + 0x3e07dc)
                #7  0x00005603fd9e0bad zend_call_known_function (php-fpm + 0x3e0bad)
                #8  0x00005603fd8eb26a spl_perform_autoload (php-fpm + 0x2eb26a)
                #9  0x00005603fd9dfae1 zend_lookup_class_ex (php-fpm + 0x3dfae1)
                #10 0x00005603fda05791 is_a_impl (php-fpm + 0x405791)
                #11 0x00005603fda5cbd9 execute_ex (php-fpm + 0x45cbd9)
                #12 0x00005603fd9e07dc zend_call_function (php-fpm + 0x3e07dc)
                #13 0x00005603fd91d835 zif_call_user_func (php-fpm + 0x31d835)
                #14 0x00005603fda5cbd9 execute_ex (php-fpm + 0x45cbd9)
                #15 0x00005603fd9e07dc zend_call_function (php-fpm + 0x3e07dc)
                #16 0x00005603fd91d835 zif_call_user_func (php-fpm + 0x31d835)
                #17 0x00005603fda5cbd9 execute_ex (php-fpm + 0x45cbd9)
                #18 0x00005603fda5f0b9 zend_execute (php-fpm + 0x45f0b9)
                #19 0x00005603fd9eeeb0 zend_execute_scripts (php-fpm + 0x3eeeb0)
                #20 0x00005603fd989e5a php_execute_script (php-fpm + 0x389e5a)
                #21 0x00005603fd83e27d main (php-fpm + 0x23e27d)
                #22 0x00007f94c9a29550 __libc_start_call_main (libc.so.6 + 0x29550)
                #23 0x00007f94c9a29609 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29609)
                #24 0x00005603fd83efd5 _start (php-fpm + 0x23efd5)
                ELF object binary architecture: AMD x86-64

from php-src.

nikserg avatar nikserg commented on July 23, 2024

Probably similar problem. Repeated 502 on same page, which works fine on test and local servers.

From dmesg:

Aug 29 12:39:41 admin kernel: [363104.676254] traps: php-fpm8.1[41337] general protection fault ip:7f90c7c34cfc sp:7fffb6e946d0 error:0 in opcache.so[7f90c7c2f000+b5000]

PHP version:

php -v
PHP 8.1.9 (cli) (built: Aug 15 2022 09:39:52) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.9, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.9, Copyright (c), by Zend Technologies

Adding opcache.jit=0 in /etc/php/8.1/fpm/php.ini solves the problem.

from php-src.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.