Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP code to generate safe URL?

We need to generate a unique URL from the title of a book - where the title can contain any character. How can we search-replace all the 'invalid' characters so that a valid and neat lookoing URL is generated?

For instance:

"The Great Book of PHP"

www.mysite.com/book/12345/the-great-book-of-php

"The Greatest !@#$ Book of PHP"

www.mysite.com/book/12345/the-greatest-book-of-php

"Funny title     "

www.mysite.com/book/12345/funny-title
like image 535
siliconpi Avatar asked Oct 21 '10 06:10

siliconpi


5 Answers

Ah, slugification

// This function expects the input to be UTF-8 encoded.
function slugify($text)
{
    // Swap out Non "Letters" with a -
    $text = preg_replace('/[^\\pL\d]+/u', '-', $text); 

    // Trim out extra -'s
    $text = trim($text, '-');

    // Convert letters that we have left to the closest ASCII representation
    $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);

    // Make text lowercase
    $text = strtolower($text);

    // Strip out anything we haven't been able to convert
    $text = preg_replace('/[^-\w]+/', '', $text);

    return $text;
}

This works fairly well, as it first uses the unicode properties of each character to determine if it's a letter (or \d against a number) - then it converts those that aren't to -'s - then it transliterates to ascii, does another replacement for anything else, and then cleans up after itself. (Fabrik's test returns "arvizturo-tukorfurogep")

I also tend to add in a list of stop words - so that those are removed from the slug. "the" "of" "or" "a", etc (but don't do it on length, or you strip out stuff like "php")

like image 175
Mez Avatar answered Nov 16 '22 23:11

Mez


If “invalid” means non-alphanumeric, you can do this:

function foo($str) {
    return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($str)), '-');
}

This will turn $str into lowercase, replace any sequence of one or more non-alphanumeric characters by one hyphen, and then remove leading and trailing hyphens.

var_dump(foo("The Great Book of PHP") === 'the-great-book-of-php');
var_dump(foo("The Greatest !@#$ Book of PHP") === 'the-greatest-book-of-php');
var_dump(foo("Funny title     ") === 'funny-title');
like image 33
Gumbo Avatar answered Nov 16 '22 22:11

Gumbo


You can use a simple regular expression for this purpose:

<?php
    function safeurl( $v )
    {
        $v = strtolower( $v );
        $v = preg_replace( "/[^a-z0-9]+/", "-", $v );
        $v = trim( $v, "-" );
        return $v;
    }
    echo "<br>www.mysite.com/book/12345/" . safeurl( "The Great Book of PHP" );
    echo "<br>www.mysite.com/book/12345/" . safeurl( "The Greatest !@#$ Book of PHP" );
    echo "<br>www.mysite.com/book/12345/" . safeurl( "  Funny title  " );
    echo "<br>www.mysite.com/book/12345/" . safeurl( "!!Even Funnier title!!" );
?>
like image 2
Salman A Avatar answered Nov 17 '22 00:11

Salman A


If you want to allow only letters, digits and underscore (usual word characters) you can do:

$str = strtolower(preg_replace(array('/\W/','/-+/','/^-|-$/'),array('-','-',''),$str));

It first replaces any non-word character(\W) with a -.
Next it replaces any consecutive - with a single -
Next it deletes any leading or trailing -.

Working link

like image 1
codaddict Avatar answered Nov 16 '22 22:11

codaddict


This code comes from CodeIgniter's url helper. It should do the trick.

function url_title($str, $separator = 'dash', $lowercase = FALSE)
    {
        if ($separator == 'dash')
        {
            $search     = '_';
            $replace    = '-';
        }
        else
        {
            $search     = '-';
            $replace    = '_';
        }

        $trans = array(
                        '&\#\d+?;'              => '',
                        '&\S+?;'                => '',
                        '\s+'                   => $replace,
                        '[^a-z0-9\-\._]'        => '',
                        $replace.'+'            => $replace,
                        $replace.'$'            => $replace,
                        '^'.$replace            => $replace,
                        '\.+$'                  => ''
                      );

        $str = strip_tags($str);

        foreach ($trans as $key => $val)
        {
            $str = preg_replace("#".$key."#i", $val, $str);
        }

        if ($lowercase === TRUE)
        {
            $str = strtolower($str);
        }

        return trim(stripslashes($str));
    }
like image 1
thomaux Avatar answered Nov 16 '22 22:11

thomaux