issue with Backslash plague and re.split

Question

Question: Why as described below is the code cutting off the last character of some fields?

I have a string that I need parse, split up and then import as key/values into a dict. The issue I have is one field may contain multiple embedded comma-seperated subfields but in those cases there are three backslashes in front of the comma. I have the code 99% working but for some reason using the following code (which I think SHOULD work) results in the last character of all the other fields getting stripped. I think I understand the "backslash plague" in Python Regex and I tried this several ways but cannot find a way that both doesn't split the "ConfigChangeData" and doesnt drop the last character of the other fields.

First, here is the string I start with (in a variable called data):

2015-10-05 18:08:47,186 root         INFO     <181>Oct  5 17:09:10 someservername Administrative_and_Operational_Audit 0000419602 1 0 2015-10-05 17:09:10.841 -05:00 0000006065 52001 NOTICE Configuration-Changes: Changed configuration, Version=someversion.x86_64, ConfigVersionId=150, AdminInterface=GUI, AdminIPAddress=192.168.1.77, AdminSession=46CE916D0502A641592B105FF7CB3B70, AdminName=admin, ConfigChangeData='RADIUS:Shared Secret'='********'\\,'TACACS+:Shared Secret'='********'\\,'IP Address'='127.0.0.91/32', ObjectType=Network Device, ObjectName=testclient, ObjectId=4072, inLocalMode=false,

Here is my code:

##split the syslog data into CSV's in a list
#Here be dragons: One field, "ConfigChangeData" can have multiple embedded 
#subfields. This is indicated by three trailing backslashes
#The following line needs to split on commas NOT proceeded by a backslash
csvlist=re.split("[^\\],", data)
AVPdict=dict()
##Create an Attribute/value pair by analysing the CSV values
##If the CSV value represents a AVP pair (detected by presense of an = sign)
##add it to the AVP dict
for csv in csvlist:
    logger.debug("csv: %s" %(csv))
    if re.search("=", csv):
        csv=csv.strip() # clear out some embedded whitespace
        attribute,value=csv.split("=", 1)
        AVPdict[attribute]=value

Here is output from logging:

2015-10-05 18:08:47,189 root         DEBUG    csv:  Version=someversion.x86_6
2015-10-05 18:08:47,190 root         DEBUG    csv:  ConfigVersionId=15
2015-10-05 18:08:47,190 root         DEBUG    csv:  AdminInterface=GU
2015-10-05 18:08:47,190 root         DEBUG    csv: AdminIPAddress=192.168.7  
2015-10-05 18:08:47,191 root         DEBUG    csv:   AdminSession=46CE916D0502A641592B105FF7CB3B7
2015-10-05 18:08:47,191 root         DEBUG    csv:  AdminName=admi
2015-10-05 18:08:47,191 root         DEBUG    csv:  ConfigChangeData='RADIUS:Shared Secret'='********'\\,'TACACS+:Shared Secret'='********'\\,'IP Address'='127.0.0.91/32
2015-10-05 18:08:47,192 root         DEBUG    csv:  ObjectType=Network Devic
2015-10-05 18:08:47,192 root         DEBUG    csv:  ObjectName=testclien
2015-10-05 18:08:47,192 root         DEBUG    csv:  ObjectId=407
2015-10-05 18:08:47,193 root         DEBUG    csv:  inLocalMode=fals
2015-10-05 18:08:47,193 root         DEBUG    csv:

Blckknght · Accepted Answer

Your regex pattern is consuming the last character before your comma because that character is part of the pattern you're splitting on. It's the character matched by the ugly [^\\] bit of the pattern.

I think you want a negative-lookbehind. This will let you check that the preceeding letter was not a backslash without actually including that character in the match.

csvlist=re.split(r"(?<!\),", data)

Note that I'm using a raw string so you only need to two backslashes, rather than the four you were originally using.

issue with Backslash plague and re.split

Tags:

python

regex

csv

wvunathans

1 Answers

Blckknght

Recent Activity

Donate For Us

issue with Backslash plague and re.split

Tags:

python

regex

csv

wvunathans

1 Answers

Blckknght

Related questions

Recent Activity

Donate For Us