By stumbling on this so thread i decided to write similar test in PHP.
My test code is this:
// Slow version
$t1 = microtime(true);
for ($n = 0, $i = 0; $i < 20000000; $i++) {
$n += 2 * ($i * $i);
}
$t2 = microtime(true);
echo "n={$n}\n";
// Optimized version
$t3 = microtime(true);
for ($n = 0, $i = 0; $i < 20000000; $i++) {
$n += $i * $i;
}
$n *= 2;
$t4 = microtime(true);
echo "n={$n}\n";
$speedup = round(100 * (($t2 - $t1) - ($t4 - $t3)) / ($t2 - $t1), 0);
echo "speedup: {$speedup}%\n";
Results
- in PHP
2 * ($i * $i)
version runs quite similar like2 * $i * $i
,
so PHP interpreter isn't optimizing bytecode as JVM in Java - Even when I optimized code manually - I've got ~
8%
speedup, when
Java's version gets ~16%
speedup. So PHP version gets about 1/2 speedup factor of that in Java's code.
Rationale for optimization
I will not go into many details, but ratio of multiplications in optimized and un-optimized code is ->
1 summation : 3/4
2 summations: 4/6
3 summations: 5/8
4 summations: 6/10
...
And in general:
where n is number of summations in a loop. To be formula useful to us - we need to calculate limit of it when N approaches infinity (to replicate situation that we do A LOT of summations in a loop). So :
So we get conclusion that in optimized code there must be 50% less multiplications.
Questions
- Why PHP interpreter isn't applying code optimization ?
- Why PHP speedup factor is just half of that in Java ?
Answer
It's time to analyze PHP opcodes which are generated by PHP interpreter. For that you need to install VLD extension and use it from command line to generate opcodes of php script at hand.
Opcode analysis
- Seems that
$i++
is not the same as++$i
in terms of opcodes and memory usage. Statement $i++; generates opcodes:
POST_INC ~4 !1
FREE ~4
Increases counter by 1 and saves previous value into memory slot #4. Then, because this value is never used - frees it from memory. Question - why do we need to store value if it is never used ?
- Seems that indeed there is a loop penalty, so we can gain additional performance by performing loop unrolling.
Optimized test code
Changing POST_INC into ASSIGN_ADD (which don't saves additional info in memory) and performing loop unrolling, gives use such test code :
while (true) {
// Slow version
$t1 = microtime(true);
for ($n = 0, $i = 0; $i < 2000; $i+=10) {
// loop unrolling
$n += 2 * (($i+0) * ($i+0));
$n += 2 * (($i+1) * ($i+1));
$n += 2 * (($i+2) * ($i+2));
$n += 2 * (($i+3) * ($i+3));
$n += 2 * (($i+4) * ($i+4));
$n += 2 * (($i+5) * ($i+5));
$n += 2 * (($i+6) * ($i+6));
$n += 2 * (($i+7) * ($i+7));
$n += 2 * (($i+8) * ($i+8));
$n += 2 * (($i+9) * ($i+9));
}
$t2 = microtime(true);
echo "{$n}\n";
// Optimized version
$t3 = microtime(true);
for ($n = 0, $i = 0; $i < 2000; $i+=10) {
// loop unrolling
$n += ($i+0) * ($i+0);
$n += ($i+1) * ($i+1);
$n += ($i+2) * ($i+2);
$n += ($i+3) * ($i+3);
$n += ($i+4) * ($i+4);
$n += ($i+5) * ($i+5);
$n += ($i+6) * ($i+6);
$n += ($i+7) * ($i+7);
$n += ($i+8) * ($i+8);
$n += ($i+9) * ($i+9);
}
$n *= 2;
$t4 = microtime(true);
echo "{$n}\n";
$speedup = round(100 * (($t2 - $t1) - ($t4 - $t3)) / ($t2 - $t1), 0);
$table[$speedup]++;
echo "****************\n";
foreach ($table as $s => $c) {
if ($s >= 0 && $s <= 20)
echo "$s,$c\n";
}
}
Results
Script aggregates number of times CPU hit into one or other speedup value.
When CPU hits vs Speedup is drawn as a graph, we get such picture:
So it is most likely that script will get 10% speedup. This means that our optimizations resulted in +2% speedup (compared to original scripts 8%).
Expectations
I'm pretty sure that all these things i've done - could be done automatically by a PHP JIT'er. I don't think that it's hard to change automatically a pair of POST_INC/FREE opcodes into one PRE_INC opcode when generating binary executable. Also it's not a miracle that PHP JIT'er could apply loop unrolling. And this is just a start of optimizations !
Hopefully there will be a JIT'er in PHP 8.0
No comments:
Post a Comment