The SPL Deserves Some Reiteration
Submitted by Matthew Turland on Thu, 02/26/2009 - 12:55While I was in college for computer science, where the preferred academic language was C++, I came into contact with the STL (Standard Template Library). You can read more about that in this interview with Bjarne Stroustrup, but its intended purpose was pretty simple: to provide standard optimized implementations for a number of commonly used conceptual structures of the time.
If any PHP extension is underrated, it's probably the SPL (Standard PHP Library). From what I can tell without having been involved in its development, its purpose is somewhat similar to the STL. A while back, it was useful mainly for allowing class instances to be iterable and simulate array access. The old API documentation, while certainly better than none at all, is not very easy to sift through. The revitalized manual documentation is a fairly large improvement over its predecessor, but certainly isn't a replacement for a real reference guide. Despite this, using the SPL classes actually turned out to be pretty straightforward once you got your hands on a good starting guide or two to help you beat the learning curve.
Most SPL guides focused on iterators because that comprised most of what it offered at the time. However, some new neat and shiny things that are coming with PHP 5.3 that add more of the STL feature set into PHP. Now you might be saying, "Well, arrays and objects give me everything I need in the way of data structures. Why do we need these?" The thing that makes them different is the implementation of the underlying algorithm. Data structures are inherently language-independent and exist as a set of logical concepts based in mathematics. These containers use different algorithms as appropriate to maximize efficiency.
For example, if you don't need the hash map capabilities of an associative array -- that is, if you aren't using the array key for a specific purpose and only need an enumerated array -- SplFixedArray (formerly SplFastArray, currently undocumented) may be a suitable replacement. The only caveat is that the size of the array is fixed, meaning that you must specify the size when you instantiate the class and an error will occur if you attempt to store more than that number of elements. This is the reason that, on average, it performs better than standard PHP arrays. While you can change the size after the instance is created, doing so tends to defeat the purpose of using the class. SplFixedArray implements the Countable, ArrayAccess, and Iterator interfaces and also provides get/setSize and from/toArray methods.
SplDoublyLinkedList is the SPL implementation of the doubly-linked list data structure. It allows for bidirectional iteration of the list and implements the same interfaces as SplFixedArray. It also serves as the base class for the SplStack and SplQueue classes, which likewise implement the stack and queue data structures.
If you've ever placed papers into a literal stack, you know that first paper on top of the stack is the last paper that was placed there, which makes it what's referred to as a LIFO container. Because of this, the iteration order of SplStack is fixed and only the iterator behavior (deleting or traversing nodes) can be modified using setIteratorMode(). Beyond that, its API is identical to its parent class, which already implements the push() and pop() methods associated with the stack data structure.
Likewise, if you've stood in line waiting to be served at a place of business before, then you're familiar with a queue, which is a FIFO container. As with SplStack, the iteration order is assumed for SplQueue and only the iterator behavior can be modified. However, SplQueue also includes enqueue() and dequeue() methods in addition to those of the parent class.
Moving on, SplHeap is an abstract class that implements common logic for a standard binary heap data structure, leaving an abstract compare() method for implementing the logic for comparing nodes in order to properly arrange them within the heap. SplMinHeap and SplMaxHeap extend SplHeap in order to implement the compare() method using equivalents of the min and max functions respectively. SplPriorityQueue doesn't extend SplHeap but does use a heap structure "under the hood" to implement a priority queue data structure, where elements are inserted into the queue with a priority value that is used when performing element comparisons.
Lastly, SplObjectStorage (also currently undocumented) provides a mechanism to use objects as keys in a dictionary without resorting to hashing them using spl_object_hash.
So, I thought I'd take a look at how the performance of these new data structures compares to approximations of their equivalent operations using normal arrays. I'm running these on a Sony Vaio VGN-NR298E with an Intel Core2Duo 1.83GHz CPU and 2 GB of RAM running Ubuntu 8.10 and a custom build of PHP 5.3.0b1. I'm using the bash script shown below to perform benchmarks with 20 executions each. Average execution times are shown in seconds in the table below. Thanks to Mikko Koppanen for pointing out php-cgi and it's -T switch. As always, there are lies, damned lies, and benchmarks and YMMV.
#!/bin/bash # Called like so where 20 is the number of executions to perform: # ./bench.sh test.php 20 time=`./php-cgi -q -T $2 $1 2>&1 | tail -n 1 | cut -d " " -f 3`; avg=`echo "scale=6; $time / $2" | bc`; echo "$1 $avg";
Standard Code |
Standard Benchmark |
SPL Code |
SPL Benchmark |
Standard:SPL Ratio |
|---|---|---|---|---|
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
$a[] = $i;
}
|
.004769 |
<?php
$a = new SplFixedArray(5000);
for($i = 0; $i < 5000; $i++) {
$a[] = $i;
}
|
.003347 |
1.424858082 |
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
$a[] = $i;
}
for($i = 0; $i < 5000; $i++) {
array_pop($a);
}
|
1.246712 |
<?php
$a = new SplStack;
for($i = 0; $i < 5000; $i++) {
$a[] = $i;
}
for($i = 0; $i < 5000; $i++) {
$a->pop();
}
|
.090300 |
13.806334441 |
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
$a[] = $i;
}
for($i = 0; $i < 5000; $i++) {
array_shift($a);
}
|
1.606241 |
<?php
$a = new SplQueue;
for($i = 0; $i < 5000; $i++) {
$a[] = $i;
}
for($i = 0; $i < 5000; $i++) {
$a->dequeue();
}
|
.092570 |
17.351636599 |
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
$a[] = rand(1, 5000);
sort($a);
}
for($i = 0; $i < 5000; $i++) {
array_shift($a);
}
|
7.181552 |
<?php
$a = new SplMinHeap;
for($i = 0; $i < 5000; $i++) {
$a->insert(rand(1, 5000));
}
for($i = 0; $i < 5000; $i++) {
$a->extract();
}
|
.269629 |
26.63493912 |
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
$a[] = rand(1, 5000);
rsort($a);
}
for($i = 0; $i < 5000; $i++) {
array_shift($a);
}
|
7.512161 |
<?php
$a = new SplMaxHeap;
for($i = 0; $i < 5000; $i++) {
$a->insert(rand(1, 5000));
}
for($i = 0; $i < 5000; $i++) {
$a->extract();
}
|
.266680 |
28.16919529 |
<?php
function priority_sort($a,$b) {
return $a[1]-$b[1];
}
$a = array();
for($i = 0; $i < 5000; $i++) {
$a[] = array($i, rand(1,10));
usort($a, 'priority_sort');
if ($i > 500) {
array_shift($a);
}
}
|
60.610793 |
<?php
$a = new SplPriorityQueue;
for($i = 0; $i < 5000; $i++) {
$a->insert($i++, rand(1,10));
if ($i > 500) {
$a->extract();
}
}
|
.140159 |
432.443103903 |
I actually wasn't able to get the standard code for the priority queue to complete 50 iterations within an hour, so I eventually gave up and used a single run as the representative average. (Yes, I'm cheating, I know. Need I point out what I said earlier about benchmarks?) Not only is SplPriorityQueue able to complete the operation, but it's able to do so fairly quickly.
The SPL classes seem to beat arrays hands down for their respective operations. This doesn't mean that you should out and replace every use of an array in your source code to a random SPL data structure when you're writing or updating it for 5.3. Rather, it's just an indication that you should be conscious of this: the wonderful flexibility that arrays offer us does not come without cost. The SPL data structures use specialized algorithms for specific purposes. As such, use the best tool for the job.
Update: I changed the description of SplObjectStorage to clarify its purpose a bit. Thanks to Elizabeth Smith for helping me to reword it.


SplObjectStorage
13x slowdown for stacks?
Re: 13x slowdown for stacks?
I agree that the gap in performance does seem fairly extreme. I'd encourage you to download PHP 5.3.0b1 and run the code samples I've provided on your own hardware to see how they measure up. My laptop is a fairly modest, if even a bit naive, test case.
for($i = 0; $i < 5000; $i++)
why do you use sorn in
I'm using sorting functions
I'm using sorting functions within loops in the Min/MaxHeap and PriorityQueue examples to accurately simulate the same algorithms that those data structures are performing.
These algorithms are suited to specific situations where the items intended to be contained by those data structures may not all be available simultaneously. That is, items may become available over time and need to leave the data structure within a given time period once they're put in, such as in a long-running or daemon-style script.
As such, the contained items much be reevaluated as each item is added in order to ensure that the next item to come out of the data structure is the correct one with respect to the algorithm in use.
So, while it is possible for the sorting to take place outside the loop, but whether or not it's feasible depends on the situation.
Cant get [] to work
Correct
I tried running my example on 5.3rc2 and got this:
PHP Fatal error: Uncaught exception 'RuntimeException' with message 'Index invalid or out of range'
I'm guessing something changed between b1 and rc1 that causes SplFixedArray to interpret [] as "increase the size of this array by 1 and set the new element slot to this value," which obviously doesn't work because the array size is fixed in this case.
My example can be modified such that $a[] becomes $a[$i] and that appears to work without issue.
Yeah, I ended up using [$i]
SPLFixedArray not so fast