I have an error: UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 266-266: Non-BMP character not supported in Tk
I'm parsing the data, and some emoji's falls to array. data = 'this variable contains some emoji'sツ😂' I want: data = 'this variable contains some emoji's'
How I can remove these characters from my data or handle this situation in Python 3?
If the goal is just to remove all characters above '\uFFFF', the straightforward approach is to do just that:
data = "this variable contains some emoji'sツ😂"
data = ''.join(c for c in data if c <= '\uFFFF')
It's possible your string is in decomposed form, so you may need to normalize it to composed form first so the non-BMP characters are identifiable:
import unicodedata
data = ''.join(c for c in unicodedata.normalize('NFC', data) if c <= '\uFFFF')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With