I am writing something to extract product data from Amazon but I am not sure on how to normalize the whitespace in the HTML.
fn parse_html(html: std::string::String) {
let fragment = Html::parse_fragment(&html);
let product_title = Selector::parse(".s-line-clamp-2").unwrap();
for title in fragment.select(&product_title) {
let title_txt = title.text().collect::<Vec<_>>();
println!("{:?}", title_txt);
}
}
This works but the data i get is like this ["\n \n \n \n\n\n\n\n", "\n \n \n \n ", "Men\'s Sneakers", "\n \n \n \n \n", "\n\n \n"]
I only want this: ["Men\'s Sneakers"]
You can use trim to remove whitespace from the ends of strings and filter to remove empty strings from your vector:
let title_txt = title
.text()
.map(|s| s.trim())
.filter(|s| !s.is_empty())
.collect::<Vec<_>>();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With