Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing date string using SimpleDateFormat with Wildcard support (e.g. *yyyy*MM*dd*hh*mm*ss)

First of all, I'd like to know if there is an existing library that is similar to SimpleDateFormat but supports wildcard characters? If not, what is the best approach for this?

I have this problem where I need to match and extract the date from a file name but I could not seem to find the right approach for this scenario. While I admit that the scenario below isn't practical at all for a file name, I've had to include it still as a "WHAT IF".

Scenario

Filename: 19882012ABCseptemberDEF03HIJ12KLM0156_249.zip, Pattern: yyyyMMMddhhmmss'_.zip'

  • Expected Date: September 03, 2012 12:01:56 AM
  • Broken down version: 1988-2012-ABC-september-DEF-03-HIJ-12-KLM-01-56-_249.zip

I see a lot of issues parsing this (e.g. determining the correct year). I hope you guys can shed some light and help me get to the right direction.

like image 224
Rafael Ibasco Avatar asked Dec 03 '25 13:12

Rafael Ibasco


1 Answers

There is no sunch thing that I know of in SimpleDateFormat but what you can do is check with a regular expression if the input filename match, and if it does extract what matched to create your date.

This is a quick regex that validates your criterias:

(.*?)([0-9]{4})([^0-9]*?)([a-z]+)(.*?)([0-9]{2})(.*?)([0-9]{2})(.*?)([0-9]{4})_([^.]+)[.]zip

Which means (it's really not that complicated)

(.*?) // anything 
([0-9]{4}) // followed by 4 digits
([^0-9]*?) // followed by anything excepted digits
([a-z]+) // followed by a sequence of text in lowercase
(.*?) // followed by anything
([0-9]{2}) // until it finds 2 digits
(.*?) // followed by anything
([0-9]{2}) // until it finds 2 digits again
(.*?) // followed by anything
([0-9]{4}) // until if finds 4 consecutive digits
_([^.]+) // an underscore followed by anything except a dot '.'
[.]zip // the file extension

You can use it in Java

String filename = "19882012ABCseptemberDEF03HIJ12KLM0156_249.zip";
String regex = "(.*?)([0-9]{4})([^0-9]*?)([a-z]+)(.*?)([0-9]{2})(.*?)([0-9]{2})(.*?)([0-9]{4})_([^.]+)[.]zip";
Matcher m = Pattern.compile(regex).matcher(filename);
if (m.matches()) {
    // m.group(2); // the year
    // m.group(4); // the month
    // m.group(6); // the day
    // m.group(8); // the hour
    // m.group(10); // the minutes & seconds
    String dateString = m.group(2) + "-" + m.group(4) + "-" + m.group(6) + " " + m.group(8) + m.group(10);
    Date date = new SimpleDateFormat("yyyy-MMM-dd HHmmss").parse(dateString);
    // here you go with your date
}

Runnable sample on ideone: http://ideone.com/GBDEJ

Edit: you can avoid matching what you dont wan't by removing the parenthesis around what you dont care. Then the regular expression becomes .*?([0-9]{4})[^0-9]*?([a-z]+).*?([0-9]{2}).*?([0-9]{2}).*?([0-9]{4})_[^.]+[.]zip and the matched group becomes

group(1): the year
group(2): the month
group(3): the day
group(4): the hour
group(5): the minutes & secondes
like image 169
Alex Avatar answered Dec 06 '25 01:12

Alex



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!