Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP string direct access using str[index] vs splitting into an array

Tags:

arrays

string

php

I'm iterating through each character in a string in PHP. Currently I'm using direct access

 $len=strlen($str);
 $i=0;
 while($i++<$len){
    $char=$str[$i];
    ....
 }

That got me pondering what is probably purely academic. How does direct access work under the hood and is there a length of string that would see optimization in a character loop(micro though it may be) by splitting said string into an array and using the array's internal pointer to keep index location in memory?

TLDNR: Would accessing each member of a 5 million item array be faster than accessing each character of a 5 million character string directly?

like image 923
user2782001 Avatar asked Dec 11 '25 23:12

user2782001


1 Answers

Accessing a string's bytes is faster by an order of magnitude. Why? PHP likely just has each array index referenced to the index where it is storing each byte in memory. So it likely just goes right to the location it needs to, reads in one byte of data, and it is done. Note that unless the characters are single-byte you will not actually get a usable character from accessing via string byte-array.

When accessing a potential multi-byte string (via mb_substr) a number of additional steps need to be taken in order to ensure the character is not more than one byte, how many bytes it is, then access each needed byte and return the individual [possibly multi-byte] character (notice there are a few extra steps).

So, I put together a simple test code just to show that array-byte access is orders of magnitude faster (but will not give you a usable character if it a multi-byte character exists as a given string's byte index). I grabbed the random character function from here ( Optimal function to create a random UTF-8 string in PHP? (letter characters only) ), then added the following:

$str = rand_str( 5000000, 5000000 );
$bStr = unpack('C*', $str);

$len = count($bStr)-1;

$i = 0;
$startTime = microtime(true);
while($i++<$len) {
    $char = $str[$i];
}
$endTime = microtime(true);

echo '<pre>Array access: ' . $len . ' items: ', $endTime-$startTime, ' seconds</pre>';


$i = 0;
$len = mb_strlen($str)-1;
$startTime = microtime(true);
while($i++<$len) {
    $char = mb_substr($str, $i, 1);
    if( $i >= 100000 ) {
        break;
    }
}
$endTime = microtime(true);

echo '<pre>Substring access: ' . ($len+1) . ' (limited to ' . $i . ') items: ', $endTime-$startTime, ' seconds</pre>';

You will notice that the mb_substr loop I have restricted to 100,000 characters. Why? It just takes too darn long to run through all 5,000,000 characters!

What were my results?

Array access: 12670380 items: 0.4850001335144 seconds

Substring access: 5000000 (limited to 100000) items: 17.00200009346 seconds

Notice the string array access was able to filter through all 12,670,380 bytes -- yep, 12.6 MILLION bytes from 5 MILLION characters [many were multi-byte] -- in just 1/2 second while the mb_substring, limited to 100,000 characters, took 17 seconds!

like image 72
Jim Avatar answered Dec 13 '25 14:12

Jim