Yesterday, Cloudflare had a massive outage. This doubled the number of requests sent to Hyvor Talk’s origin servers, requiring more processing power (some Javascript is dynamically created and cached at the Cloudflare level). We are talking about 10 million additional requests per day. We soon got notifications that CPU of the web servers are running at 90%+.
Our easiest option was to add another web server to handle the load. However, before that, I checked server logs (NGINX and php-fpm). There were messages like “PHP-fpm seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers)”. So, I tried tweaking some PHP-fpm configs, and it worked (for us)!
PHP-fpm config (php-fpm.conf
) changes:
[www]-pm = dynamic-pm.max_children = 500-pm.start_servers = 60-pm.min_spare_servers = 60-pm.max_spare_servers = 100+pm = static+pm.max_children = 2500
This article suggested pm=static
is better than pm=dynamic
for high traffic web servers. In our case, the web server only ran an NGINX server (which is highly optimized and takes a very little CPU processing power) and PHP-fpm.
I never completely understood what those pm.start_servers
, pm.min_spare_servers
, etc did. In short, they are additional options to set the number of threads to create dynamically when you use pm=dynamic
. However, you only need pm.max_children
when pm=static
is used.
Our server specs:
CCX31 VPS server at Hetzner
32GB RAM
8 CPU cores
Some other notes:
pm=static
may not be the best option if you also have databases like MYSQL running on the same server.The more you increase
pm.max_children
the more RAM will be used. Make sure your server has enough memory to handle that. In our case, withpm.max_children=2500
, ~16GB RAM is used. And, the CPU usage went down to around 30%.
Comments