php-parser. This PHP library is a PHP parser as the name suggests. It enables you to define different types of visitors on the AST (Abstract Syntax Tree). Given a PHP code, I define a visitor to look for tree nodes of the type “Function Call”. Our specific target node types are FuncCall, StaticCall and MethodCall. The AnalyzeBuiltinFunctionUsage class will take a directory path as an input, and iterate recursively over all files with .php extension. It will then parse every PHP file and traverse its syntax tree using the visitor we define below.
This visitor will run enterNode() on every node in the parse tree. On lines 9 and 10, I check if current node is a function call or a method call. Given the list of builtin functions, we can count the number of calls to each of them. Notice that we are performing a static analysis. Which means, we will have to ignore dynamic function calls.
Now that we have the code to extract number of calls for builtin functions, we can run this code on popular PHP applications. We focus on WordPress, Magento, Mediawiki and phpMyAdmin at this time, targeting different versions of each application (Similar versions to https://debloating.com).
In the chart above, the blue bar indicates the total number of distinct builtin functions used among various versions of the target web application. By emulating less than 900 builtin functions (out of ~11,000 including the ones from extensions), we can perform a sound static analysis on these popular PHP applications.
The orangish bar shows the number of builtin functions used after function debloating (i.e., removing unused functions from the applications). Yellow bar indicates builtin functions with more than 50 call sites. These are the most common and important builtin functions within these applications. These would be a good starting point to start focusing on builtin functions.
Among the builtin functions, there are some that are more security sensitive than others. Taking eval() as an example, this function will take a string as input and execute it as PHP code. If attacker controlled value reaches the call to eval, the attacker will essentially be able to execute arbitrary code on the target web server.
Sensitive builtin functions can be categorized as follows [Reference]:
The presence of calls to these functions is totally natural in any web application. But the total number of calls to these functions can be used to roughly estimate the potential vulnerable points to these applications. Taint analysis tools will usually focus on a similar list of sensitive APIs and check whether user controlled input can reach these sinks.
The numbers below are the sum of all call sites among different application versions. As a result, the ratios should be compared among different applications rather than raw numbers.
Application | Command Execution | PHP Execution | Callbacks | Information Disclosure | Filesystem | Other |
---|---|---|---|---|---|---|
WordPress | 50 | 14 (72%▼) | 3156 | 2495 (20%▼) | 4678 | 4032 (14%▼) |
Magento | 171 | 0 (100%▼) | 776 | 100 (87%▼) | 1805 | 244 (86%▼) |
MediaWiki | 115 | 23 (80%▼) | 1802 | 639 (65%▼) | 2792 | 952 (66%▼) |
phpMyAdmin | 76 | 0 (100%▼) | 976 | 242 (75%▼) | 970 | 182 (81%▼) |
Overall, debloating proves to be successful in removing the majority of security sensitive builtin PHP functions. I have also uploaded the number of calls to each builtin function to the Github repo under results directory.
Finally, if you are planning to emulate PHP builtin functions for the purpose of static PHP code analysis, you know where to start.