Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove all whitespace and ASCII elements like "\n" from my HTML in Rust?

Tags:

text

rust

I am writing something to extract product data from Amazon but I am not sure on how to normalize the whitespace in the HTML.

fn parse_html(html: std::string::String) {
    let fragment = Html::parse_fragment(&html);
    let product_title = Selector::parse(".s-line-clamp-2").unwrap();

    for title in fragment.select(&product_title) {
        let title_txt = title.text().collect::<Vec<_>>();
        println!("{:?}", title_txt);
    }
}

This works but the data i get is like this ["\n \n \n \n\n\n\n\n", "\n \n \n \n ", "Men\'s Sneakers", "\n \n \n \n \n", "\n\n \n"]

I only want this: ["Men\'s Sneakers"]

like image 243
Liam Seskis Avatar asked Jan 31 '26 03:01

Liam Seskis


1 Answers

You can use trim to remove whitespace from the ends of strings and filter to remove empty strings from your vector:

let title_txt = title
    .text()
    .map(|s| s.trim())
    .filter(|s| !s.is_empty())
    .collect::<Vec<_>>();
like image 106
Jmb Avatar answered Feb 03 '26 00:02

Jmb