March 8th, 2010

Faster, PHP! Kill! Kill!

PHP is easy... as programming languages go, that is. You can build sites in a real hurry.

With frameworks like Symfony, you can build them faster still, and follow modern programming practices at the same time.

And Apostrophe strips away yet another layer of effort if your site calls for a content management system.

Yes, Java has more raw speed, all else being equal (which it never is). But as the LISP programmers used to say, "a moment of regret, a lifetime of convenience."

Still, sooner or later success catches up with you and you want your site to cope with Serious Traffic... or cope with moderate traffic on a cheap virtual machine... or at the very least, not be dog-slow with just a handful of users on the system.

There's a lot of advice out there about optimizing PHP code, some of it well worth your while. And there's excitement about HipHop, Facebook's new native code compiler for PHP. But these are drastic steps that require you to rewrite your code or adopt less proven and more awkward ways of delivering your code.

Justified? Sure, sometimes, on the biggest projects in the world (like Facebook) (*). But as Donald Knuth says, "premature optimization is the root of all evil." That's because tweaking your code for speed's sake usually makes it harder to maintain and less adaptable to new requirements.

What most developers don't realize is that there are three major factors that typically slow down PHP projects based on frameworks (like Symfony or, sigh, Drupal) so much that code profiling and database query redesign don't even have a chance to become relevant factors. Fix these things first before you worry about other issues:

1. Compiling code over and over and over. Would you wait for your Mac to recompile MacOS X from source code every time you boot it up? Of course not. How about every time you fill out a dialog box? That's pretty much what you're doing every time you access a PHP-driven website that doesn't use a bytecode cache.

2. Waiting and waiting and waiting for web browsers to make another request, pinning down web server processes that your other users need. By default Apache usually lets browsers hold on to a connection for up to 15 seconds just in case they ask for more. This is a good thing in many ways, but 15 seconds is far too long. Which leads us to #3:

3. Tying up a "fat" web server process with PHP on board for every request, even requests for the zillions of little static PNGs that probably make up your page design. (**) A typical Apache web server configuration with mod_php suffers from this flaw, fatally limiting the number of simultaneous users you can handle.

So what can we do about these problems? Quite a bit as it turns out. I'll start with the low-hanging fruit and move on to the tougher stuff. The fascinating common thread with all of these suggestions: no changes at all to your PHP code.

Stop Recompiling Your World: APC

APC, aka the Alternative PHP Cache, is a bytecode cache. What does that mean and what good does it do?

Today, when users visit your website, your web server struggles to translate all of your PHP code into a more manageable "bytecode." And only then does it actually run the bytecode and output the page.

Bytecode isn't true native machine code and doesn't run as fast as compiled Java, and certainly not as fast as compiled C. That's why Facebook's HipHop project seeks to convert PHP code to C++, and then compile that to native code. But that's a drastic step with far-reaching consequences for your project. Before you think about HipHop, try a simpler fix: stop throwing away the bytecode and starting over from zero on every new request that comes to the site!

In our experience, enabling APC reduces both memory usage and the time required to respond to a request by vast amounts. Even a simple action in Symfony or Drupal can implicitly require that quite a bit of PHP code be compiled. With APC, all of that code stays compiled, and that simple action actually runs quickly and simply.

APC is simply indispensable for serious PHP programming. Turn it on. Make sure it's really working. And enjoy dramatic improvements in your social life, hair and skin.

Installing APC

Depending on your system, APC may already be installed or even, if you are very lucky, activated correctly. Virtual machine hosting from ServerGrove comes with APC "out of the box." Most shared hosting probably won't because it increases the amount of memory pinned down for each user on the shared server, but if you care about your website's speed, you're not using shared hosting. Get a small, cheap virtual machine from ServerGrove instead (or if you must, do it the hard way and set up a Slicehost or linode virtual machine; you'll learn a lot about system administration).

On our slicehost VMs, we found it necessary to install APC with pecl, the PHP package manager for extensions (like PEAR, but different in that they are not written in plain PHP):

pecl install apc

We then enable it in php.ini (you won't be able to do this in a .htaccess file):

; Must enable APC otherwise we compile all of Symfony on every hit
extension=apc.so
; Give APC enough shared memory to handle Symfony, etc.
apc.shm_size = 48
apc.include_once_override = 1
apc.mmap_file_mask = /tmp/apc.XXXXXX

Then restart Apache:

apachectl restart

And you're off to the races. The first page access will be as slow as ever, but subsequent accesses will be much, much faster. You can also check whether you're saving memory and time using Symfony's debugging toolbar.

Your system also likely has an APC status page that you can copy to your website to check out the current status of APC. It's fun to play with.

Dialing Back Keepalive: Too Much of a Good Thing

In the beginning there was NCSA Mosaic, and it was pretty cool, all things considered. But it wasn't fast.

Mosaic made only one request of a web server at a time. Trying to do more probably seemed rude. Besides, lots of web servers couldn't cope with more than one request a time at all. But that would only happen if two users came along at once, and that was never gonna happen, right?

Ah, those were the (sad, lame, pathetic) days. Scroll to the present. Typical web browsers first request the page, say "hang on a minute," open perhaps five more connections besides and then make note of all the images, JavaScript files, CSS files and other embedded bric-a-brac and start requesting all of those things. Consecutively and in parallel. Really a lot.

The Apache web server is actually very good at handling this. But by default, it is very generous: it lets the browser leave it hangin' for fifteen seconds. That means an Apache process and all of the memory it requires is tied up for all of that time, just doin' nothin'. And that's bad news.

Fortunately it's very easy to fix this in your httpd.conf file (actually, in our setups, this is usually in apache2.conf):

KeepAliveTimeout 2

Restart Apache:

apachectl restart

And you'll find that your site can suddenly cope with higher loads. That's because your're not tying up valuable resources waiting around for more requests from browsers that are already using and reusing five other connections to your server. The browser is welcome to promptly request more files and finish its business with you, but not to stand around twiddling its thumbs while you wonder if you have a date or not.

You can turn keepalive off entirely, but doing so forces the browser to "call back" for each and every little image file on your page, which is a terrible idea. So use it... just don't let it abuse you.

MaxClients: Does This mod_php Make My Butt Look Big?

mod_php is fast. Unlike the old-fashioned "PHP as CGI" approach, mod_php keeps the PHP interpreter itself loaded in memory and ready to rock at all times, and that's good.

But mod_php is also a pig. For a big-deal web application that wrangles lots of database objects, or makes thumbnails of uploaded images, or does pretty much anything cool, it's not uncommon for a PHP process to use 50MB of memory or more. More miserly code can still hit 30MB without much difficulty.

We first ran into this problem when we wanted to move smaller client sites onto small VMs, so that they could have well-secured, highly-reliable web hosting at an affordable price. They tended to bog down and become unusable quickly.

We discovered that dialing back keepalive helped tremendously with the problem. And we also learned to do the math and figure out how many reasonably fat PHP Apache processes would really fit in the memory available to the slice, and configure Apache to run no more than that. In httpd.conf:

MaxClients          20

This may seem drastic (OMG! I can't serve more than 20 simultaneous users!) but here's the thing: you already can't. If you try, if you leave this set to its much higher default, the result will be a VM crushed by the need to swap out all of these Apache processes to virtual memory while they are still trying to do work. As the operating systems geeks say, the "resident set size" of your Apache and PHP processes is bigger than your actual memory. And when that happens you're SOL.

When you set MaxClients to a realistic number, what happens instead is this:

You can handle up to 20 connections at once gracefully. That 21st connection queues up and is dealt with soon after. If the average load doesn't exceed 20 connections, you're fine.

If too many connections queue up, some people will not get through. But you will still be able to get through to other services on the VM... like the ssh connection you use to administer it and change these settings. Without that, you're stuck in a seesaw battle, shutting down your website, tweaking settings, firing it up again and watching it gradually become overwhelmed.

What Is The Right MaxClients For Me?

Take the amount of RAM in your server (for instance, a 256MB slicehost VM). Subtract a bit of overhead (at least 50MB, ideally more, bigger VMs are better that way). Now divide by the largest PHP process you typically see when running "top" or "ps -auxw | grep httpd" (you might have to grep for apache rather than httpd depending on your OS). For instance, if your processes top off around 30MB, you can set MaxClients to (256-30) divided by 30. Round down and you have a MaxClients setting of 7.

One snag: some operating systems show shared memory (like the memory used by APC) as memory in use by every one of the Apache processes, even though it is really only allocated once and shared by all of them. If your APC cache is set to 40MB and all of your PHP processes seem 40MB overweight, you can increase MaxClients somewhat to allow for this.

Yes, you can use swap space (virtual memory) to go higher, but this just forces the machine to swap, using disk space in place of memory to do hard things it needs to do quickly and freezing you out of your own machine. So don't do that.

PHP, I Love You, But Could You Sleep Downstairs?

Dialing back the number of Apache processes is a helpful step that allows your server to say no with grace and style when there is too much traffic. But the underlying problem remains: we can't deal with a lot of requests at once. And the reason we can't do that is PHP. So we're stuck. Or are we?

Our Apache processes are "fat" in terms of memory use because of the memory taken up by PHP. But most of the requests we're handling are for little PNG files and not-so-little JPEG files and CSS files and JavaScript files... all of which Apache can serve up handily with about three bytes of memory and a piece of string.

The obvious solution is to serve PHP requests and static files separately. And many sites do this explicitly, using absolute URLs pointing at a different subdomain for static files. For example:

<img src="http://static.example.com/foo.png" />

Now, this is a fine thing and I have nothing against it. When you design things this way you can use entirely separate machines to serve these requests if you wish, and that's pretty exciting from a performance perspective. Many bigger sites find this well worth their while.

But you are also complicating things. And you are forced to change working code, which introduces the potential for new and exciting bugs.

There's another approach (well, one of many): FastCGI. FastCGI is an enhanced version of the CGI protocol that appeared quite early in the evolution of the web but caught on only with those most concerned about performance. I must admit I am a very recent convert, ironic since FastCGI is a bit out of vogue today. But there are solid reasons to consider using it.

FastCGI has three big advantages:

1. FastCGI runs PHP processes separately from Apache. That means Apache can stay lean and mean, not only running much smaller processes (and therefore perhaps many more of them) but even running in a separate "worker thread" that outperforms the separate-processes mode by a long mile. This is a big, big win.

2. It can reuse a single process for new PHP requests (***), avoiding the need to restart the PHP interpreter. That's nice, but mod_php can do the same trick, and mod_php is capable of sharing APC cache memory between all of its processes. And FastCGI can't do that... or it wouldn't be able to without a little help from PHP:

3. PHP comes with a special FastCGI mode in which it manages several child processes of its own, and allows all of those processes to share a single APC cache.

Put these advantages together and you have the holy grail of PHP hosting: PHP running separately from Apache, but still maintaining all of the performance advantages of the APC cache. And we can do it without changing a line of PHP.

Installing FastCGI

There's one catch: Apache 2.0 ships with mod_cgid, a rewritten version of FastCGI that complies with the official Apache license, instead of the oh-so-fractionally-different FastCGI license (****). And it's nice and all, except that it completely lacks the third advantage of FastCGI I mentioned above. If you want the benefits I'm describing here, you have to use the real mod_fastcgi. Accept no substitutes.

These directions work for slicehost Ubuntu slices and other Ubuntu-based Linux. They will not work exactly as-is for servergrove which is centos-based:

Edit /etc/apt/sources.list Add multiverse, which includes software with insufficiently "open" licenses, such as mod_fastcgi. Change intrepid to the ubuntu version on your box, it'll be mentioned all over sources.list:

# tom@punkave.com: we want multiverse support so we can have real mod_fastcgi
deb http://us.archive.ubuntu.com/ubuntu/ intrepid multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ intrepid multiverse
deb http://us.archive.ubuntu.com/ubuntu/ intrepid-updates multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ intrepid-updates multiverse

Now do:

apt-get update
apt-get install libapache2-mod-fastcgi

Now edit /etc/apache2/mods-available/fastcgi.conf to read:

<IfModule mod_fastcgi.c>
  # One shared PHP-managed fastcgi for all sites
  Alias /fcgi /var/local/fcgi
  # IMPORTANT: without this we get more than one instance
  # of our wrapper, which itself spawns 20 PHP processes, so
  # that would be Bad (tm)
  FastCgiConfig -idle-timeout 20 -maxClassProcesses 1
  <Directory /var/local/fcgi>
    # Use the + so we don't clobber other options that
    # may be needed. You might want FollowSymLinks here
    Options +ExecCGI
  </Directory>
  AddType application/x-httpd-php5 .php
  AddHandler fastcgi-script .fcgi
  Action application/x-httpd-php5 /fcgi/php-cgi-wrapper.fcgi
</IfModule>

Now set up /var/local/fcgi/php-cgi-wrapper.fcgi:

mkdir -p /var/local/fcgi/
vi /var/local/fcgi/php-cgi-wrapper.fcgi

Paste in:

#!/bin/sh



# We like to use the same settings we formerly used for apache mod_php. 
# You don't want this if your php.ini is in /etc
PHPRC="/etc/php5/apache2"
export PHPRC
# We can accommodate about 20 50mb processes on a 1GB slice. More than that
# will swap, making people wait and locking us out of our own box.
# Better idea: just make people wait to begin with
PHP_FCGI_CHILDREN=20
PHP_FCGI_MAX_REQUESTS=100
export PHP_FCGI_CHILDREN
exec /usr/bin/php-cgi

Note the figure 20, this can be adjusted for the box in question if it has more or less memory to offer. Also note that we use the same php.ini we used with mod_php (/etc/php5/apache2). You can change that if you think it appropriate but don't forget to bring settings over.

Limiting PHP_FCGI_MAX_REQUESTS to 100 ensures that each PHP process is killed after serving a relatively small number of pages. This addresses the fact that memory used by memory-hungry individual PHP requests is not fully returned to the operating system until the process terminates. If we shut down after every page things would be needlessly slow. Exiting after 100 requests is a good compromise.

Set permissions:

chmod -R 755 /var/local/fcgi

IMPORTANT: comment out any php_value lines in .htaccess files for the sites on this box, and move those directives to php.ini. Otherwise you will get errors form those sites and they won't work.

If two sites on the same slice really must have separate php.ini settings, consider separate slices for the sites. It's possible to set up more than one fastcgi pool, but that halves the resources available and so on.

# Commented out; moved to /etc/php/apache2/php.ini
# php_value arg_separator.output &

Disable mod_php5, enable mod_fastcgi in its place, enable mod_actions which is required by the above, and restart Apache:

a2dismod php5; a2enmod fastcgi; a2enmod actions; apache2ctl restart

Now test the site. Also check /var/log/error.log and/or error.logs for individual sites to make sure you see mentions of fastcgi and not mod_php.

If you get an error and you see this in the log:

[Thu Mar 04 02:51:33 2010] [alert] [client 69.141.215.7] (2)No such file or directory: FastCGI: failed to connect to (dynamic) server "/var/local/fcgi/php-cgi-wrapper.fcgi": something is seriously wrong, any chance the socket/named_pipe directory was removed?, see the FastCgiIpcDir directive

There's a chance it's just a transitory problem; first-time fastcgi startups seem to be a bit temperamental, but we've had no problems in the long haul. Restart apache again:

apache2ctl restart

If it still doesn't work reenable mod_php5:

a2dismod fastcgi; a2enmod php5; apache2ctl restart

And do some more reading.

Late-breaking news: PHP 5.3.3 and the new SAPI FPM

In addition to the built-in support for managing FastCGI child processes in php-cgi, PHP 5.3.3 introduces a new PHP binary, php-fpm, that offers a superior implementation of FastCGI support. You don't have to switch if you are happy with the FastCGI support in php-cgi, which is fortunate because your operating system might not offer PHP 5.3.3 or its new php-fpm binary (aka the SAPI FPM) yet.

Setting up the new SAPI is not that different from setting up FastCGI with php-cgi, provided you can get past the probable need to compile it yourself (of course, by the time you read this, there may be a php-fpm package in your favorite Debian repository).

Interested parties should check out the new FastCGI Process Manager article on www.php.net.

Installing the Worker Thread MPM

Once you reach this point successfully, you are ready to switch Apache from "prefork" mode (which mod_php requires) to "worker thread" mode. Worker thread mode is blindingly, ridunculously fast at serving static stuff, and uses mod_fastcgi to take care of PHP. You can't do this with mod_php enabled, so you must succeed in setting up fastcgi first (see above):

apt-get install apache2-mpm-worker

That will restart Apache by itself. But if not you can do so manually.

Now test the site again. If it doesn't work, you can easily undo this:

apt-get install apache2-mpm-prefork

To verify success:

ps auxw | grep apache

Should show fewer processes (unless your operating system shows threads as processes). The processes you do have are running many threads (lightweight subprocesses that are exposed to each other's memory etc. and generally less expensive).

ps auxw | grep php

Should show roughly 20 processes, plus a process to manage the others. Significantly more than 20 (say, 40+) indicates you goofed and didn't set maxClassProcesses properly and you're going to have a lot of PHP processes on your hands in a hurry, so fix that. (I'm assuming you chose 20 processes as your limit for PHP above.)

Now beat up on the site a bit. Remember to check any other sites on the same box, you just changed how all of your PHP sites on this box are run.

Now copy apc.php to the web folder of one of your sites and visit that URL to see a page giving stats about how well the cache is working. You should see a healthy percentage of cache hits if you have revisited some of the same pages or at least reused Symfony and Doctrine. If not check whether you ever enabled APC for this site. You can find apc.php in /usr/share/php/apc.php on some systems. Its absence doesn't mean APC is not available.

You need to set up both mpm-worker and fastcgi to get the most benefit here. The worker MPM can serve zillions of static files in a hurry with minimal memory usage each. However, after setting up fastcgi, you can certainly crank up MaxClients and stick with the prefork MPM if you prefer.

Alternatives

APC alternatives: there are alternatives to the APC cache. These include eAccelerator and Zend Optimizer. I'm not an expert on the others, but APC will be standard in PHP 6.0, so I recommend standardizing on it now. For extreme cases (i.e. "we're getting as big as Facebook and we have too much code to start thinking about improving the code itself in any serious way"), by all means consider HipHop. You can think of it as a distant relative of APC that provides a roughly 2x performance boost at a large cost in complexity and maintenance.

FastCGI alternatives: some will disagree with my recommendation of FastCGI. mod_cgid supposedly has better child process management... with the glaring exception that APC is not shared, ballooning your memory requirements in a seriously ugly way. Better alternatives include running the nginx (pronounced "engine X") web server as a lightweight "front end" that serves static files directly and proxies other requests through to an Apache server running mod_php. One can also run Apache (without the fat mod_php module) as a front end for another instance of Apache. In these setups, the back-end Apache doesn't accept connections except from the front-end one, and Apache's reverse proxy features come into play. Personally I like the simplicity of Apache communicating directly with a master PHP process via the FastCGI protocol.

Conclusion

ERRRRRRRRRRRROOOOOMMMMMMM

Beep beep

ERRRRRRRRRRRRRRRRRROOMMMMMMMM

[HONK]

Zoom!

(*) Actually, one of Facebook's stated goals for HipHop is to avoid rewriting and presumably hand-optimizing their existing PHP code. They have billyuns and billyuns of lines of PHP code, not all of it written by hardcore software jocks. So if a handful of smart folks can make mediocre PHP run 50% faster, that's a big deal for Facebook. For most of us it makes more sense to pick the low-hanging fruit first (see all of the above), and then stop making separate SQL queries for every single darn object on the page, and then maybe think about HipHop. I'm not knocking HipHop here at all, it's an impressive technical accomplishment. But if you're starting with HipHop as your first optimization, you're not paying attention.

(**) Yes, you could use CSS sprites to speed these up, and given time you probably should, but in the short term let's talk about ways to deliver them quickly without getting bogged down in hard-to-maintain optimizations. With HTTP keepalive working for us instead of against us, the performance difference between delivering 20 PNGs and one PNG is mostly on the browser side— still good to avoid, but not a brutal impact on your web server. One step at a time, okay?

(***) Yes, FastCGI is capable of running lots of things other than PHP, but this is a PHP article. So: neener.

(****) There's nothing wrong with the FastCGI license. It's a very boring MIT-style license that shouldn't interfere with your commercial use.

Check out another article

February 24th, 2010

Choice is a good thing

February 17th, 2010

#slive2010 Liveblogging: Fabien Potencier: Symfony 2.0

February 16th, 2010

Thomas Rabaix: internationalization and Symfony