Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why BOM in logging message?

61 views
Skip to first unread message

Roy Smith

unread,
Jan 9, 2013, 5:54:58 PM1/9/13
to
We've got 10 (supposedly) identical servers, all running Ubuntu 12.04,
Python 2.7, Django 1.3. We log to syslog using the logging module and
a custom fomatter.

'formatters': {
'verbose': {
'format': '%(asctime)s [%(process)d]: %(program)s %(session_id)s %(request_id)s %(request_id_digest)s %(remote_addr)s %(name)s %(level\
name)s %(funcName)s() %(message)s',
'()': 'songza.logging.ContextFormatter',
},
},

There's nothing particularly exciting in the formatter code:

class ContextFormatter(logging.Formatter):
def format(self, record):
record.program = context.get_program()
record.request_id = context.get_request_id()
record.request_id_digest = context.get_request_id_digest()
record.session_id = context.get_session_id() or '-'
record.remote_addr = context.get_remote_addr() or '-'
return logging.Formatter.format(self, record)

What's weird is that two of the servers, and only those two, stick a
BOM (Byte Order Mark) in front of the message they log. It shows up
in syslog as:

2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]

The other machines, never put the BOM in. Given that all 10 machines
are ostensibly clones of each other, we're scratching our heads what
might be different about those two which cause the BOMs to appear.
Any ideas?

I suppose it's possible it's a syslog config problem and not a Python
problem, but I figured I'd start at the beginning of the chain.

Chris Angelico

unread,
Jan 9, 2013, 6:07:18 PM1/9/13
to pytho...@python.org
On Thu, Jan 10, 2013 at 9:54 AM, Roy Smith <r...@panix.com> wrote:
> What's weird is that two of the servers, and only those two, stick a
> BOM (Byte Order Mark) in front of the message they log.

Could it be this issue you're looking at?

http://bugs.python.org/issue14452

What are the exact Python versions in use? Are the two different
servers running an older revision of Py 2.7?

ChrisA

John Gordon

unread,
Jan 9, 2013, 6:19:14 PM1/9/13
to
In <kcksg2$bkp$1...@panix2.panix.com> r...@panix.com (Roy Smith) writes:

> What's weird is that two of the servers, and only those two, stick a
> BOM (Byte Order Mark) in front of the message they log. It shows up
> in syslog as:

> 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754 [18979]: [etc...]

I worked on an application that would insert a BOM in syslog messages if
the logged message contained unicode, but not if it was plain ascii.

Not sure if this relates to your issue, but it's similar enough that it
seemed worth mentioning.

--
John Gordon A is for Amy, who fell down the stairs
gor...@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"

Roy Smith

unread,
Jan 9, 2013, 8:17:02 PM1/9/13
to
In article <kcktti$sbg$1...@reader1.panix.com>,
John Gordon <gor...@panix.com> wrote:

> In <kcksg2$bkp$1...@panix2.panix.com> r...@panix.com (Roy Smith) writes:
>
> > What's weird is that two of the servers, and only those two, stick a
> > BOM (Byte Order Mark) in front of the message they log. It shows up
> > in syslog as:
>
> > 2013-01-09T00:00:00+00:00 web5.songza.com <U+FEFF>2013-01-0900:00:00,754
> > [18979]: [etc...]
>
> I worked on an application that would insert a BOM in syslog messages if
> the logged message contained unicode, but not if it was plain ascii.
>
> Not sure if this relates to your issue, but it's similar enough that it
> seemed worth mentioning.

That doesn't seem to be it. All messages from web{2,5} have BOMs, no
message from web{1,3,4,6,7,8,9,10} ever does.

I even tried looking at the output of socket.gethostname() on the
various machines to see if maybe the hostname had some unicode character
in it. No joy.

Roy Smith

unread,
Jan 9, 2013, 8:17:43 PM1/9/13
to
In article <mailman.344.1357772...@python.org>,
It sounds like it might be it, but we're running 2.7.3 on all machines.

Roy Smith

unread,
Jan 10, 2013, 11:06:40 AM1/10/13
to
In article <roy-10B053.2...@news.panix.com>,
Well, this is fascinating. It turns out that while all of our
machines report that they're running 2.7.3, they have two different
versions of /usr/lib/python2.7/logging/handlers.py!

-rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
-rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py

The April 24th version has the BOM code in SysLogHander.emit():

| # Message is a string. Convert to bytes as required by RFC 5424
| if type(msg) is unicode:
| msg = msg.encode('utf-8')
| if codecs:
| msg = codecs.BOM_UTF8 + msg
| msg = prio + msg

and the August 1st version doesn't:

| # Message is a string. Convert to bytes as required by RFC 5424
| if type(msg) is unicode:
| msg = msg.encode('utf-8')
| msg = prio + msg

Is it possible there's two different 2.7.3 builds that somehow got
packaged by Ubuntu at different times? The pattern of which machines
have the August code and which have the April code correlates with
when we rolled out each server instance.

Chris Angelico

unread,
Jan 10, 2013, 11:16:09 AM1/10/13
to pytho...@python.org
On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <r...@panix.com> wrote:
> Well, this is fascinating. It turns out that while all of our
> machines report that they're running 2.7.3, they have two different
> versions of /usr/lib/python2.7/logging/handlers.py!
>
> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py

Ha, that would do it!

I don't have a corresponding system to compare against, but what
package is that file managed by?

$ dpkg -S /usr/lib/python2.7/logging/handlers.py

See if both systems show it as part of the same package, and if so, if
the package is at the same version on each. On my Maverick desktop, I
have 2.6, and the package is python2.6-minimal.

ChrisA

Roy Smith

unread,
Jan 10, 2013, 11:40:40 AM1/10/13
to
>On Fri, Jan 11, 2013 at 3:06 AM, Roy Smith <r...@panix.com> wrote:
>> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
>> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py

Chris Angelico <ros...@gmail.com> wrote:
>$ dpkg -S /usr/lib/python2.7/logging/handlers.py

Yup, on some machines we've got 2.7.3-0ubuntu3, and on others,
2.7.3-0ubuntu3.1 of python2.7-minimal. Details at:

https://launchpad.net/ubuntu/+source/python2.7/2.7.3-0ubuntu3.1

Well, I guess today is dist-upgrade day :-)

Chris Angelico

unread,
Jan 10, 2013, 11:52:17 AM1/10/13
to pytho...@python.org
I love it when everything adds up! The message even cites the specific
change. It's like a cryptic crossword - once you get the answer, you
KNOW it's the answer because suddenly it all makes sense :)

Thanks for bringing a fun problem to solve! It's (unfortunately) a
refreshing change to read a post from someone who knows how to ask
smart questions.

ChrisA

Terry Reedy

unread,
Jan 10, 2013, 6:45:15 PM1/10/13
to pytho...@python.org
On 1/10/2013 11:06 AM, Roy Smith wrote:

> Well, this is fascinating. It turns out that while all of our
> machines report that they're running 2.7.3, they have two different
> versions of /usr/lib/python2.7/logging/handlers.py!
>
> -rw-r--r-- 1 root root 45076 Aug 1 05:39 /usr/lib/python2.7/logging/handlers.py
> -rw-r--r-- 1 root root 45143 Apr 20 2012 /usr/lib/python2.7/logging/handlers.py
>
> The April 24th version has the BOM code in SysLogHander.emit():

The 'BOM' you are discussing here is not a true 2 or 4 byte
byte-order-mark, but the pseudo-BOM (pseudo because a stream of single
bytes has no byte order within the single bytes) that Micro$tupid adds
(sometimes) to utf-8 encoded bytes to mark their encoding as utf-8
rather than anything else. In otherwords, it is a non-(anti-)standard
U(nicode)E(ncoding)M(ark). It is actually the utf-8 encoding of the
2-byte BOM, and hence not a single mark! It is really confusing when
people use 'BOM' to refer to a UEM sequence instead of a BOM.

> | # Message is a string. Convert to bytes as required by RFC 5424
> | if type(msg) is unicode:
> | msg = msg.encode('utf-8')
> | if codecs:
> | msg = codecs.BOM_UTF8 + msg
> | msg = prio + msg

2.7.3 was released in April.

> and the August 1st version doesn't:
>
> | # Message is a string. Convert to bytes as required by RFC 5424
> | if type(msg) is unicode:
> | msg = msg.encode('utf-8')
> | msg = prio + msg

The issue referenced in an earlier messaged was to remove the UEM where
it did not belong.

> Is it possible there's two different 2.7.3 builds that somehow got
> packaged by Ubuntu at different times?

As you discovered, Ubuntu sometimes updates a release with bug patches
before we release a new official bugfix release. 2.7.4, with many
bugfixes, is still to see the light of day.

> The pattern of which machines
> have the August code and which have the April code correlates with
> when we rolled out each server instance.

Great detective work ;-).

--
Terry Jan Reedy

0 new messages