Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all non-numeric characters but keep a specific word

Tags:

java

regex

I'm working on a script that can download mangas from www.mangafox.me in Java.

Unfortunately, this website doesn't have any APIs thus I use some archaic ways to get my data. However, it's possible to get an xml with every chapters of a manga. For example : http://mangafox.me/rss/nisekoi.xml.

I parse this xml and use the title tag to get a chapter's number and associated volume.

For example, I have a string like this : Nisekoi Vol TBD Ch 215 and I want to keep only TBD and 215.

At the moment, I replace all non-numeric characters with spaces and keep every occurences of TBD by using :

String title = "Nisekoi Vol TBD Ch 215";
title = title.replaceAll("[^0-9.\bTBD\b]+", " ").trim();

title equals to "TBD 215" and then I use title.split(" ") to get the volume and the chapter.

This is working just fine until I do the same with a manga that starts with an T. Apparently, the capital T isn't replaced by a space.

I'm not very good at Regular expression so how do I get to replace every character that is not a number, a dot (for decimals) or the word "TBD" by a space in Java ?

Thanks !

like image 984
Christian Kula Avatar asked Dec 12 '25 21:12

Christian Kula


1 Answers

KISS - Keep it stupid simple: grab the number at the end of the title with \\d+$ and concenate your title afterwards like TBD + your_number.

like image 81
Jan Avatar answered Dec 15 '25 12:12

Jan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!