Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lxml -- how to change img src to absolute link

Using lxml, how do you globally replace all src attributes with an absolute link?

like image 845
Walrus the Cat Avatar asked Dec 01 '25 21:12

Walrus the Cat


2 Answers

Here is an example code which also covers <a href>:

from lxml import etree, html
import urlparse

def fix_links(content, absolute_prefix):
    """
    Rewrite relative links to be absolute links based on certain URL.

    @param content: HTML snippet as a string
    """

    if type(content) == str:
        content = content.decode("utf-8")

    parser = etree.HTMLParser()

    content = content.strip()

    tree  = html.fragment_fromstring(content, create_parent=True)

    def join(base, url):
        """
        Join relative URL
        """
        if not (url.startswith("/") or "://" in url):
            return urlparse.urljoin(base, url)
        else:
            # Already absolute
            return url

    for node in tree.xpath('//*[@src]'):
        url = node.get('src')
        url = join(absolute_prefix, url)
        node.set('src', url)
    for node in tree.xpath('//*[@href]'):
        href = node.get('href')
        url = join(absolute_prefix, href)
        node.set('href', url)

    data =  etree.tostring(tree, pretty_print=False, encoding="utf-8")

    return data

The full story is available in Plone developer documentation.

like image 155
Mikko Ohtamaa Avatar answered Dec 04 '25 13:12

Mikko Ohtamaa


I'm not sure when this was added, but documents created from lxml.fromstring() now have a method called make_links_absolute. From the documentation:

make_links_absolute(base_href, resolve_base_href=True):

This makes all links in the document absolute, assuming that base_href is the URL of the document. So if you pass base_href="http://localhost/foo/bar.html" and there is a link to baz.html that will be rewritten as http://localhost/foo/baz.html.

If resolve_base_href is true, then any tag will be taken into account (just calling self.resolve_base_href()).

like image 35
Mattwmaster58 Avatar answered Dec 04 '25 11:12

Mattwmaster58



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!