I am attempting to parse HTML for specific data but am having issues with return characters, at least I think that's what the problem is. I am using a simple substring method to take apart the HTML as I know beforehand what I am looking for.
Here is my parse method:
public static void parse(String response, String[] hashItem, String[][] startEnd) throws Exception
{
for (i = 0; i < hashItem.length; i++)
{
part = response.substring(response.indexOf(startEnd[i][0]) + startEnd[i][0].length());
value = part.substring(0, part.indexOf(startEnd[i][1]));
DATABASE.setHash(hashItem[i], value);
}
}
Here is a sample of the HTML that is giving me issues
<table cellspacing=0 cellpadding=2 class=smallfont>
<tr onclick="lu();" onmouseover="style.cursor='hand'">
<td class=bodybox nowrap> 21,773,177,147 $ </td><td></td>
<td class=bodybox nowrap> 629,991,926 F </td><td></td>
<td class=bodybox nowrap> 24,537 P </td><td></td>
<td class=bodybox nowrap> 0 T </td>
<td></td><td class=bodybox nowrap> RT </td>
There are hidden return characters but when I try to add them into the string that I am trying to use it doesn't work out well, if at all. Is there a method or perhaps a better way to strip hidden characters from the HTML to make it easier to parse? Any help is greatly appreciated as always.
If you want to make parsing very easy, try Jsoup:
This example will download the page, parse and get the text.
Document doc = Jsoup.connect("http://jsoup.org").get();
Elements tds = doc.select("td.bodybox");
for (Element td : tds) {
String tdText = td.text();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With