Correct way to split UTF-8 String

Question

I want to split a utf-8 string.

I have tried the StringTokenizer but it fails.

The title should be "0" but it shows as "عُدي_صدّام_حُسين".

    String test = "en.m عُدي_صدّام_حُسين 1 0";

    StringTokenizer stringTokenizer = new StringTokenizer(test);
    String code = stringTokenizer.nextToken();
    String title = stringTokenizer.nextToken();

enter image description here What is the correct way to split a utf-8 string?

Andy Turner · Accepted Answer

The problem here is that the Arabic text isn't "at the end" of the string.

For example, if I select the contents of the string literal (in Chrome), moving my mouse from left-to-right, it selects the en.m first, then selects all of the arabic text, then the 0 1. The text just looks "at the end" because that's how it is being rendered.

The string, as specified in your Java source code actually does have the عُدي_صدّام_حُسين as the second token. So, you're splitting it correctly, you're just not splitting what you think you're splitting.

Correct way to split UTF-8 String

Tags:

java

string

utf-8

token

Jason

1 Answers

Andy Turner

Recent Activity

Donate For Us

Correct way to split UTF-8 String

Tags:

java

string

utf-8

token

Jason

1 Answers

Andy Turner

Related questions

Recent Activity

Donate For Us