example:
>>> uni = u'some text'
>>> print unicode(uni)
some text
>>> print unicode(uni, errors='ignore')
TypeError
Traceback (most recent call last)
----> 1 print unicode(uni, errors='ignore')
TypeError: decoding Unicode is not supported
Why does this blow up only if I pass additional parameters to the constructor?
Looking at the source code,
static PyObject *
unicode_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
PyObject *x = NULL;
static char *kwlist[] = {"object", "encoding", "errors", 0};
char *encoding = NULL;
char *errors = NULL;
if (type != &PyUnicode_Type)
return unicode_subtype_new(type, args, kwds);
if (!PyArg_ParseTupleAndKeywords(args, kwds, "|Oss:str",
kwlist, &x, &encoding, &errors))
return NULL;
if (x == NULL)
_Py_RETURN_UNICODE_EMPTY();
if (encoding == NULL && errors == NULL)
return PyObject_Str(x);
else
return PyUnicode_FromEncodedObject(x, encoding, errors);
}
notice that near the bottom,
if (encoding == NULL && errors == NULL)
return PyObject_Str(x);
else
return PyUnicode_FromEncodedObject(x, encoding, errors);
So when called without the errors parameter, PyObject_Str(x) is called, and this raises no TypeError. But when error and/or encoding is supplied, then PyUnicode_FromEncodedObject is called, and now x must be an encoded string, not a unicode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With