I have a noisy data..something like
<@ """@$ FSDF >something something <more noise>
Now I just want to extract "something something". 
Is there a way on how to delete the text between those two delimiters "<" and ">"?
To eliminate text before a given character, type the character preceded by an asterisk (*char). To remove text after a certain character, type the character followed by an asterisk (char*). To delete a substring between two characters, type an asterisk surrounded by 2 characters (char*char).
If you want to remove the [] and the () you can use this code: >>> import re >>> x = "This is a sentence. (once a day) [twice a day]" >>> re. sub("[\(\[].
Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.
If you want to replace a string that matches a regular expression (regex) instead of perfect match, use the sub() of the re module. In re. sub() , specify a regex pattern in the first argument, a new string in the second, and a string to be processed in the third.
Use regular expressions:
>>> import re
>>> s = '<@ """@$ FSDF >something something <more noise>'
>>> re.sub('<[^>]+>', '', s)
'something something '
[Update]
If you tried a pattern like <.+>, where the dot means any character and the plus sign means one or more, you know it does not work.
>>> re.sub(r'<.+>', s, '')
''
Why!?! It happens because regular expressions are "greedy" by default. The expression will match anything until the end of the string, including the > - and this is not what we want. We want to match < and stop on the next >, so we use the [^x] pattern which means "any character but x" (x being >). 
The ? operator turns the match "non-greedy", so this has the same effect:
>>> re.sub(r'<.+?>', '', s)
'something something '
The previous is more explicit, this one is less typing; be aware that x? means zero or one occurrence of x.
Of course, you can use regular expressions.
import re
s = #your string here
t = re.sub('<.*?>', '', s)
The above code should do it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With