The SPL Deserves Some Reiteration

While I was in college for computer science, where the preferred academic language was C++, I came into contact with the STL (Standard Template Library). You can read more about that in this interview with Bjarne Stroustrup, but its intended purpose was pretty simple: to provide standard optimized implementations for a number of commonly used conceptual structures of the time.

If any PHP extension is underrated, it's probably the SPL (Standard PHP Library). From what I can tell without having been involved in its development, its purpose is somewhat similar to the STL. A while back, it was useful mainly for allowing class instances to be iterable and simulate array access. The old API documentation, while certainly better than none at all, is not very easy to sift through. The revitalized manual documentation is a fairly large improvement over its predecessor, but certainly isn't a replacement for a real reference guide. Despite this, using the SPL classes actually turned out to be pretty straightforward once you got your hands on a good starting guide or two to help you beat the learning curve.

Most SPL guides focused on iterators because that comprised most of what it offered at the time. However, some new neat and shiny things that are coming with PHP 5.3 that add more of the STL feature set into PHP. Now you might be saying, "Well, arrays and objects give me everything I need in the way of data structures. Why do we need these?" The thing that makes them different is the implementation of the underlying algorithm. Data structures are inherently language-independent and exist as a set of logical concepts based in mathematics. These containers use different algorithms as appropriate to maximize efficiency.

For example, if you don't need the hash map capabilities of an associative array -- that is, if you aren't using the array key for a specific purpose and only need an enumerated array -- SplFixedArray (formerly SplFastArray, currently undocumented) may be a suitable replacement. The only caveat is that the size of the array is fixed, meaning that you must specify the size when you instantiate the class and an error will occur if you attempt to store more than that number of elements. This is the reason that, on average, it performs better than standard PHP arrays. While you can change the size after the instance is created, doing so tends to defeat the purpose of using the class. SplFixedArray implements the Countable, ArrayAccess, and Iterator interfaces and also provides get/setSize and from/toArray methods.

SplDoublyLinkedList is the SPL implementation of the doubly-linked list data structure. It allows for bidirectional iteration of the list and implements the same interfaces as SplFixedArray. It also serves as the base class for the SplStack and SplQueue classes, which likewise implement the stack and queue data structures.

If you've ever placed papers into a literal stack, you know that first paper on top of the stack is the last paper that was placed there, which makes it what's referred to as a LIFO container. Because of this, the iteration order of SplStack is fixed and only the iterator behavior (deleting or traversing nodes) can be modified using setIteratorMode(). Beyond that, its API is identical to its parent class, which already implements the push() and pop() methods associated with the stack data structure.

Likewise, if you've stood in line waiting to be served at a place of business before, then you're familiar with a queue, which is a FIFO container. As with SplStack, the iteration order is assumed for SplQueue and only the iterator behavior can be modified. However, SplQueue also includes enqueue() and dequeue() methods in addition to those of the parent class.

Moving on, SplHeap is an abstract class that implements common logic for a standard binary heap data structure, leaving an abstract compare() method for implementing the logic for comparing nodes in order to properly arrange them within the heap. SplMinHeap and SplMaxHeap extend SplHeap in order to implement the compare() method using equivalents of the min and max functions respectively. SplPriorityQueue doesn't extend SplHeap but does use a heap structure "under the hood" to implement a priority queue data structure, where elements are inserted into the queue with a priority value that is used when performing element comparisons.

Lastly, SplObjectStorage (also currently undocumented) provides a mechanism to use objects as keys in a dictionary without resorting to hashing them using spl_object_hash.

So, I thought I'd take a look at how the performance of these new data structures compares to approximations of their equivalent operations using normal arrays. I'm running these on a Sony Vaio VGN-NR298E with an Intel Core2Duo 1.83GHz CPU and 2 GB of RAM running Ubuntu 8.10 and a custom build of PHP 5.3.0b1. I'm using the bash script shown below to perform benchmarks with 20 executions each. Average execution times are shown in seconds in the table below. Thanks to Mikko Koppanen for pointing out php-cgi and it's -T switch. As always, there are lies, damned lies, and benchmarks and YMMV.

#!/bin/bash
# Called like so where 20 is the number of executions to perform:
# ./bench.sh test.php 20
time=`./php-cgi -q -T $2 $1 2>&1 | tail -n 1 | cut -d " " -f 3`;
avg=`echo "scale=6; $time / $2" | bc`;
echo "$1 $avg";
Standard Code
Standard
Benchmark
SPL Code
SPL
Benchmark
Standard:SPL
Ratio
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
    $a[] = $i; 
}
.004769
<?php
$a = new SplFixedArray(5000); 
for($i = 0; $i < 5000; $i++) {
    $a[] = $i; 
}
.003347
1.424858082
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
    $a[] = $i;
}
for($i = 0; $i < 5000; $i++) {
    array_pop($a);
}
1.246712
<?php
$a = new SplStack; 
for($i = 0; $i < 5000; $i++) {
    $a[] = $i;
}
for($i = 0; $i < 5000; $i++) {
    $a->pop();
}
.090300
13.806334441
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
    $a[] = $i; 
}
for($i = 0; $i < 5000; $i++) {
    array_shift($a);
}
1.606241
<?php
$a = new SplQueue; 
for($i = 0; $i < 5000; $i++) {
    $a[] = $i;
}
for($i = 0; $i < 5000; $i++) {
    $a->dequeue();
}
.092570
17.351636599
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
    $a[] = rand(1, 5000);
    sort($a);
}
for($i = 0; $i < 5000; $i++) {
    array_shift($a);
}
7.181552
<?php
$a = new SplMinHeap; 
for($i = 0; $i < 5000; $i++) {
    $a->insert(rand(1, 5000));
}
for($i = 0; $i < 5000; $i++) {
    $a->extract();
}
.269629
26.63493912
<?php
$a = array();
for($i = 0; $i < 5000; $i++) {
    $a[] = rand(1, 5000);
    rsort($a);
}
for($i = 0; $i < 5000; $i++) {
    array_shift($a);
}
7.512161
<?php
$a = new SplMaxHeap; 
for($i = 0; $i < 5000; $i++) {
    $a->insert(rand(1, 5000));
}
for($i = 0; $i < 5000; $i++) {
    $a->extract();
}
.266680
28.16919529
<?php
function priority_sort($a,$b) {
    return $a[1]-$b[1];
}
$a = array();
for($i = 0; $i < 5000; $i++) {
    $a[] = array($i, rand(1,10));
    usort($a, 'priority_sort');
    if ($i > 500) {
        array_shift($a);
    }   
}
60.610793
<?php
$a = new SplPriorityQueue;
for($i = 0; $i < 5000; $i++) {
    $a->insert($i++, rand(1,10));
    if ($i > 500) {
        $a->extract();
    }   
}
.140159
432.443103903

I actually wasn't able to get the standard code for the priority queue to complete 50 iterations within an hour, so I eventually gave up and used a single run as the representative average. (Yes, I'm cheating, I know. Need I point out what I said earlier about benchmarks?) Not only is SplPriorityQueue able to complete the operation, but it's able to do so fairly quickly.

The SPL classes seem to beat arrays hands down for their respective operations. This doesn't mean that you should out and replace every use of an array in your source code to a random SPL data structure when you're writing or updating it for 5.3. Rather, it's just an indication that you should be conscious of this: the wonderful flexibility that arrays offer us does not come without cost. The SPL data structures use specialized algorithms for specific purposes. As such, use the best tool for the job.

Update: I changed the description of SplObjectStorage to clarify its purpose a bit. Thanks to Elizabeth Smith for helping me to reword it.

SplObjectStorage

I think your wording is a bit confusing - SplObjectStorage lets you use objects as keys that can be mapped to data. Because the objects are the keys they will be unique in the container, but it doesn't work like a key => object mapping. (Think a Registry pattern) Thanks, Elizabeth

13x slowdown for stacks?

Wow, I'm dismayed that array_pop is so expensive. I knew array_shift reindexing was slow, but I'd been operating under the impression that PHP's standard arrays worked very well for stacks.

Re: 13x slowdown for stacks?

I agree that the gap in performance does seem fairly extreme. I'd encourage you to download PHP 5.3.0b1 and run the code samples I've provided on your own hardware to see how they measure up. My laptop is a fairly modest, if even a bit naive, test case.

for($i = 0; $i < 5000; $i++)

for($i = 0; $i < 5000; $i++) { ... sort($a); } It's so lame...

why do you use sorn in

why do you use sorn in for-loop? u can do it after the loop, so computation time will be reduced significantly

I'm using sorting functions

I'm using sorting functions within loops in the Min/MaxHeap and PriorityQueue examples to accurately simulate the same algorithms that those data structures are performing.

These algorithms are suited to specific situations where the items intended to be contained by those data structures may not all be available simultaneously. That is, items may become available over time and need to leave the data structure within a given time period once they're put in, such as in a long-running or daemon-style script.

As such, the contained items much be reevaluated as each item is added in order to ensure that the next item to come out of the data structure is the correct one with respect to the algorithm in use.

So, while it is possible for the sorting to take place outside the loop, but whether or not it's feasible depends on the situation.

Cant get [] to work

I couldn't get your example where you do $a[] = $i to work with the spl fixed array. I was using PHP 5.3rc1

Correct

I tried running my example on 5.3rc2 and got this:

PHP Fatal error:  Uncaught exception 'RuntimeException' with message 'Index invalid or out of range'

I'm guessing something changed between b1 and rc1 that causes SplFixedArray to interpret [] as "increase the size of this array by 1 and set the new element slot to this value," which obviously doesn't work because the array size is fixed in this case.

My example can be modified such that $a[] becomes $a[$i] and that appears to work without issue.

Yeah, I ended up using [$i]

Yeah, I ended up using [$i] for my own benchmarks on SplFixedArray http://www.idontplaydarts.com/2009/06/benchmarking-splfixedarray-access-... - Has this affected the performance? I noticed you got a 1.41x speed increase where as I only managed 1.39x using [$i].

SPLFixedArray not so fast

In your benchmarking example you tested only array assignments which may in fact be faster with SPLFixedArray. However, the story changes when you put such an array through foreach. Implementing Iterator, for each element in the array PHP executes several function calls: 0. rewind() at the beginning then 1. next() 2. valid() 3. current() for each element in the array, which in fact makes it almost equally fast in a foreach loop (in fact, it is marginally slower). So the gain may be "significant" only in assignments.