I need to combine two regular expressions into one. the text(userdoc) is
INPUT :
<user>textxtxtxtx</user>
<unnecessarytag>unwanted info</unnecessarytag>
<info>infoinfoinfo. part 1.....multiline</info>
<unnecessarytag>unwanted info</unnecessarytag>
<info>infoinfoinfo. part 2.....multiline</info>
There will many similar blocks in the file.
OUTPUT :
<user>textxtxtxtx</user>
<info>infoinfoinfo. part 1.....multiline</info>
<info>infoinfoinfo. part 2.....multiline</info>
Order has to be maintained
One user can have many info. A file contains many userdocs.
The code for this is :
String out = String.join("\n", Files.readAllLines(Paths.get("text.txt")));
Pattern p = Pattern.compile("<user>(.*?)</user>");
Matcher m = p.matcher(out);
Pattern p1 = Pattern.compile("<info>([^<]*)</info>", Pattern.MULTILINE);
Matcher m1 = p1.matcher(out);
I planned on writing
while (m.find() && m1.find())
{
String cp = m.group();
String cp1 = m1.group();
System.out.println( cp + cp1 );
}
But it gives text where every user will have only one info. How to combine these two regex to make a pattern which supports ab^n format.
Hello there why dont you turn this into a XML by using JDOM2 or generally any DOM implementation in java. Your current approach could prove error prone. Apart from that, querying the XML will be easier, more readable (in terms of code needed) and generally more elegant.
Do do that, you will need to do something like the following (I am using JDOM2)
SAXBuilder saxBuilder = new SAXBuilder();
\\where modelPath a string originated from the IPath of the file that stores the data
Document originalDoc = saxBuilder.build(new File(modelPath));
Then processing the nodes it is pretty easy, you can either use the traditional parent -> children approach or an implementation a bit more generic that is robust to model structure changes. This implementation is related with xpath expressions. There are some pros and cons in these approaches that I suggest you investigate and evaluate yourself.
In order for this to work your structure should change to something like that:
<?xml version="1.0" encoding="UTF-8"?>
<userdocs>
<user name="textxtxtxtx">
<info>...</info>
<info>...</info>
<info>...</info>
</user>
<user name="test2">
<info>...</info>
<info>...</info>
<info>...</info>
</user>
<!-- etc... -->
</userdocs>
Then you can do this to retrieve the elements of your preference.
public static List<Element> getElements(String regex, Document doc, Namespace ns) {
XPathFactory xFactory = XPathFactory.instance();
XPathExpression<Element> expr = xFactory.compile(regex, Filters.element(), null, ns);
return expr.evaluate(doc);
}
\\a sample caller of the method
getElements("//user",doc,namespace).
forEach(el->{
//your processing
});
\\all it will take to retrive the user `xx`
with all of its info children is this expression //user[@name='textxtxtxtx']
A list of the xpath expressions and their meaning can be found here Tester / Evaluator / Samples
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With