Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why setting logger encoding to UTF-8 writes file with UNIX line-endings?

I create a logger that writes to a text file:

import logging

logger_dbg = logging.getLogger("dbg")
logger_dbg.setLevel(logging.DEBUG)
fh_dbg_log = logging.FileHandler('debug.log', mode='w', encoding='utf-8')
fh_dbg_log.setLevel(logging.DEBUG)

# Print time, logger-level and the call's location in a source file.
formatter = logging.Formatter(
    '%(asctime)s-%(levelname)s(%(module)s:%(lineno)d)  %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S')
fh_dbg_log.setFormatter(formatter)

logger_dbg.addHandler(fh_dbg_log)
logger_dbg.propagate = False

Then when I want to log some information I call this logger:

logger_dbg.debug("Closing port...")
logger_dbg.debug("Port closed.")

The problem is that the written log file debug.log uses a single Linefeed (LF) character as the newline character, despite me running this program on Windows 7 (64-bit):

2015-11-30 12:39:08-DEBUG(SerialThread:196)  Closing port...  2015-11-30 12:39:08-DEBUG(SerialThread:198)  Port closed.

Strangely, if I instead set the logger's file-handle without the encoding='utf-8' argument the newline character is correctly written as CR/LF.

Why does setting the encoding to UTF-8 causes Python to use the incorrect newline character?

like image 501
DBedrenko Avatar asked Nov 28 '25 14:11

DBedrenko


1 Answers

When you specify an encoding, codecs.open() is used instead of the regular open() call. This function always opens a file in binary mode, and implements encoding on top of that. That way it can guarantee that any codec will work, not just ASCII-based codecs. A side-effect of this choice is that on Windows newlines are no longer translated to the platform convention!

You could file a bug to have this fixed, a better solution is to use io.open(); the io module is the new Python 3 I/O framework, backported to Python 2, and it handles text modes much better, including handling newlines correctly on Windows.

You can patch the logging.FileHandler._open method to fix this locally:

import io
from logging import FileHandler

_orig_open = FileHandler._open
_orig_emit = FileHandler.emit

def filehandler_open_patch(self):
    if self.encoding is not None:
        return io.open(self.baseFilename, self.mode, encoding=self.encoding)
    return _orig_open(self)

def filehandler_emit_patch(self, record):
    if not self.encoding:
        return _orig_emit(self, record)
    try:
        msg = self.format(record)
        stream = self.stream
        fs = u"%s\n"
        if not isinstance(msg, unicode):
            msg = msg.decode('ASCII', 'replace')
        ufs = u'%s\n'
        stream.write(ufs % msg)
        self.flush()
    except (KeyboardInterrupt, SystemExit):
        raise
    except:
        self.handleError(record)

FileHandler._open = filehandler_open_patch
FileHandler.emit = filehandler_emit_patch

The FileHandler.emit() method also needs to be patched, as otherwise Unicode messages are first encoded to UTF-8, but io.open() file objects only accept Unicode objects.

like image 182
Martijn Pieters Avatar answered Dec 01 '25 07:12

Martijn Pieters



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!