Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capitalising Extended-Latin multibyte characters in PHP and outputting them as escaped HTML

I've stumbled across a problem in PHP and it's proving much harder to solve than I would have expected.

On the English version of my site, I have a plaintext-fragment:

about-us

which I can straightforwardly change into the capitalised text form:

About Us

using the following:

$Text_Array = explode('-', $Plain_Text_Fragment); // ['about', 'us']

for ($i = 0; $i < count($Text_Array); $i++) {
  $Text_Array[$i] = strtoupper($Text_Array[$i][0]) . substr($Text_Array[$i], 1);
}

$Capitalised_Text = implode(' ', $Text_Array); // 'About Us'

It turns out, it's not nearly so straightforward to turn the plaintext fragment:

über-uns

into the capitalised text form:

&Uuml;ber Uns

TLDR: What's the most straightforward approach in PHP to achieve this?


Problem #1 : Ascertaining whether the first letter is multi-byte

I only need to capitalise the first letter of each word in the plaintext-fragment, so, whilst I can easily tell that the plaintext-fragment contains one or more multibyte characters, using:

strlen('über') === mb_strlen('über') // FALSE

that still doesn't tell me whether the first letter of the plaintext fragment is multibyte or not. (It might be one or more of any of the other letters).

I can't isolate and test $Text_Array[$i][0] because, of course, the 'ü' in 'über' is both $Text_Array[$i][0] and $Text_Array[$i][1].

It also appears that mb_str_split() does not exist.


Problem #2 : Capitalising 'ü'

Once I am past Problem #1 (having confirmed that the first letter of 'über' is multibyte), it's not clear to me how to capitalise it. I want to use mb_strtoupper() but I need to use this on both $Text_Array[$i][0] and $Text_Array[$i][1] and no other character (unless there are other multibyte characters in $Text_Array[$i].

I think I can solve Problem #2 something like this:

$Text_Array[$i] = mb_strtoupper(substr($Text_Array[$i], 0, 2)) . substr($Text_Array[$i], 2);

I have checked this and it definitely works. One down, two to go.


Problem #3 : Outputting &Uuml; instead of Ü

Although I am working using UTF-8 encoding, I'd much prefer to output the HTML-escape &Uuml; than a raw Ü. I figured there would be a PHP native function to allow me to convert between the two and there is:

htmlentities()

But I really can't tell if htmlentities() is working or not because both my DOM Inspector and my View Source are telling me that they see Ü, not &Uuml;. I appreciate that they might be seeing the latter and they are just trying to be helpful, but I can't be absolutely sure whether the PHP function htmlentities() is working or not.


Question:

What's the most straightforward approach in PHP to convert:

über-uns

into:

&Uuml;ber Uns ?
like image 625
Rounin - Glory to UKRAINE Avatar asked Dec 05 '25 20:12

Rounin - Glory to UKRAINE


1 Answers

Try using mb_convert_case

$string = "über-uns";

$string = str_replace("-", " ", $string);

$capitalised = mb_convert_case($string, MB_CASE_TITLE, "UTF-8");

echo htmlentities($capitalised, ENT_HTML5, "UTF-8");

like image 123
András Avatar answered Dec 08 '25 11:12

András



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!