Static analysis for PHP

Lately I’ve been interested in applying static analysis to PHP projects. Static analysis is the process of analysing software code – in our case PHP source code -, without actually executing the (compiled) result of the source code you’re analysing. In its simplest form, the php -l sourcefile command provides static analysis of a PHP file by analysing the source for syntax errors. Different other analysis methods are pattern-based static analysis, data flow static analysis, and code metrics calculation. Examples of this last analysis method are for example the PMD (Project Mess Detection) or Cyclomatic complexity metric in PHPUnit.

The biggest use for applying static analysis in PHP projects is security, stability and performance testing. For one, it could be used to determine unsafe practices in source code. Let’s imagine you have a $username variable, coming from $_GET['username']. Good practices tell you this (and all user-) input should be considered tainted, and needs to be filtered. If you provide certain patterns that look for actions on this tainted value, you could determine if some variable will cause a potential SQL injection attack or is safe enough to be used.

Other uses are for example gathering various statistics about a PHP project, like: How much of my application calls a memcache server, how is the coupling in a modular component structure (PHP_Depend could help out on that), what are the parts of my application that are most prone to bugs (Sebastian Bergmann‘s bug miner is suited here) and much more. Of course, much of the time a completely custom solution would be needed, in which case you could be helped by PHP’s tokenizer functions.

Unfortunately, one of the biggest problems with static analysis on PHP code lies in the fact that PHP is a very dynamic and implicit language, from a language semantics point of view. The C language, for example, implements include which resolves its arguments at compile time. PHP’s equivalent on the other hand (include()), takes any given (valid) expression as an argument, leading to runtime resolving of the parameters, and thus making it difficult to statically analyse.

How to make your code more statically analysable? Use as much expressions that can be evaluated at analysis time. Try to use constant expressions as arguments for include() and require(). Don’t use things like magic methods or eval (actually, never use eval()!).

After this introduction on the subject you might wonder what can actually be used to implement this. One project that has been dealing almost exclusively with static analysis for PHP is Pixy. It scans PHP code and currently aims to detect things like XSS or SQL injection vulnerabilities. Some basic support for include files is also available, so in theory you could make a data flow analysis through your application. Unfortunately, right now Pixy only operates on PHP 4 code, which is of course pretty problematic, given that we are about ready to get our hands on PHP 5.3. This aside, the fun thing is that this generates nice dot graphs, such as the call graph for a simple PHP file, like below:

This is generated by the following code:

class foo
        function bar($baz)
                echo $baz;

$x = $_GET['x'];
$foo = new foo();

Other useful information gets printed too, like if there’s a security vulnerabillty:

Vulnerability detected!
- unconditional
- /home/felix/staticAnalysis.php:4
- Graph: xss1

If you’re interested in analysis like this, have a look at the Taint support patch from Wietse Venema, which in a way has the same concerns as Pixy, but tackles it at the PHP engine itself. It isn’t really a complete implementation of taint support in PHP, but is a good start. At the moment it outputs warnings to tell you a tainted variable isn’t properly filtered.

Of course, static analysis is just one step that can be taken to guarantee your code is safe. It is by no means a definite solution to secure your PHP application, and there are much more measures around that further test PHP projects. Take for example PHPUnit, SimpleTest, PHPT or Selenium. Combine this with a continuous integration tool like phpUnderControl and you might sleep a bit better at night, knowing there are some ways to ensure things won’t go wrong :-)

2 responses to “Static analysis for PHP”

  1. Hello Felix.
    Can you tell me which is the the state-of-the-art of the PHP static analysis, at the best of your knowledge.

    Searching with google gives old articles…