Strip URL in Python

Question

I'm quite new to python. I'm trying to parse a file of URLs to leave only a specific part (bold part) of URL.

Here are some example of URL's i am working with:

http://www.mega.pk/**washingmachine**-dawlance/
http://www.mega.pk/**washingmachine**-haier/
http://www.mega.pk/**airconditioners**-acson/
http://www.mega.pk/**airconditioners**-lg/
http://www.mega.pk/**airconditioners**-samsung/

I have tried some regular expression but it gets very complicated. What I have in mind is remove this "http://www.mega.pk/" from all urls as it is common and then remove anything that is after "-" including all slashes. But know no way of doing it.

narendranathjoshi · Accepted Answer

Use the urllib (formerly urlparse) module. It's built specifically for this purpose.

from urllib.parse import urlparse

url = "http://www.mega.pk/washingmachine-dawlance/"

path = urlparse(url).path  # get the path from the URL ("/washingmachine-dawlnace/")
path = path[:path.index("-")]  # remove everything after the '-' including itself
path = path[1:]  # remove the '/' at the starting of the path (just before 'washing')

path variable will have the value washingmachine

Cheers!

Strip URL in Python

Tags:

python

regex

url

strip

Mansoor Akram

1 Answers

narendranathjoshi

Recent Activity

Donate For Us

Strip URL in Python

Tags:

python

regex

url

strip

Mansoor Akram

1 Answers

narendranathjoshi

Related questions

Recent Activity

Donate For Us