What am I doing wrong here? I am trying to extract from this "list"
ARTICLE 11 - Title AA
ARTICLE 22 Title BB
ARTICLE 33
ARTICLE 44 - Title DD
ARTICLE 55 Title EE
all the article numbers and the titles (if any) for each article. The "-" is optional when title exists.
With this RegEx
(article)(\s*)([^\s]*)((\s*)(-)?(\s*)(.*))
I get only 4 items. The item 33 and 44 are considered one article only and this is I suppose just because "ARTICLE 33" has no title.
11|Title AA
22|Title BB
33|ARTICLE 44 - Title DD
55|Title EE
Please see the code here: http://jsfiddle.net/Z94wf/
EDIT
What I expect to get is this:
11|Title AA
22|Title BB
33|
44|Title DD
55|Title EE
Thanks
You second \s* is matching the newline char on the 3rd line, so if you change to explicitly match only space and dash as follows
(article)(\s*)([^\s]+)(([ -]*)(.*))
you get the desired result
http://jsfiddle.net/Z94wf/37/
I can't be sure on all of the forms of your input but what about something with a few less groups and a bit more explicit...
ARTICLE\s+(\d+)[\s-]*(.*)
This should match the starting literal followed by some space followed by the number and then an optional set of spaces and the "-" char and then everything else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With