Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strange python regex behavior - maybe connected to unicode or sqlalchemy

I'm trying to search for a pattern in sqlalchemy results (actually filter by a 'like' or 'op'('regexp')(pattern) which I believe is implanted with regex somewhere) - the string and the search string are both in hebrew, and presumably (maybe I'm wrong-)-unicode where r = u'לבן' and c = u'לבן, ורוד, ' when I do re.search(r,c) I get the SRE.match object but when I query the db like:

f = session.query(classname)
c = f[0].color

and c gives me:

'\xd7\x9c\xd7\x91\xd7\x9f,\xd7\x95\xd7\xa8\xd7\x95\xd7\x93,'

or print (c):

לבן,ורוד,

practicaly the same but running re.search(r,c) gives me no match object.

Since I suspected a unicode issue I tried to transform to unicode with unicode(c) and I get an 'UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 0: ordinal' which I guess means this is already unicode string - so where's the catch here? I would prefer using the sqlalchemy 'like' but I get the same problem there = where I know for sure (as I showed in my example that the data contains the string)

Should I transform the search string,pattern somehow? is this related to unicode? something else?

The db table (which I'm quering) collation is utf8_unicode_ci

like image 298
alonisser Avatar asked Mar 27 '26 09:03

alonisser


1 Answers

c = f[0].color

is not returning a Unicode string (or its repr() would show a u'...' kind of string), but a UTF-8 encoded string.

Try

c = f[0].color.decode("utf-8")

which results in

u'\u05dc\u05d1\u05df,\u05d5\u05e8\u05d5\u05d3,'

or

u'לבן,ורוד,'

if your console can display Hebrew characters.

like image 120
Tim Pietzcker Avatar answered Mar 28 '26 23:03

Tim Pietzcker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!