Question: Why as described below is the code cutting off the last character of some fields?
I have a string that I need parse, split up and then import as key/values into a dict. The issue I have is one field may contain multiple embedded comma-seperated subfields but in those cases there are three backslashes in front of the comma. I have the code 99% working but for some reason using the following code (which I think SHOULD work) results in the last character of all the other fields getting stripped. I think I understand the "backslash plague" in Python Regex and I tried this several ways but cannot find a way that both doesn't split the "ConfigChangeData" and doesnt drop the last character of the other fields.
First, here is the string I start with (in a variable called data):
2015-10-05 18:08:47,186 root INFO <181>Oct 5 17:09:10 someservername Administrative_and_Operational_Audit 0000419602 1 0 2015-10-05 17:09:10.841 -05:00 0000006065 52001 NOTICE Configuration-Changes: Changed configuration, Version=someversion.x86_64, ConfigVersionId=150, AdminInterface=GUI, AdminIPAddress=192.168.1.77, AdminSession=46CE916D0502A641592B105FF7CB3B70, AdminName=admin, ConfigChangeData='RADIUS:Shared Secret'='********'\\\,'TACACS+:Shared Secret'='********'\\\,'IP Address'='127.0.0.91/32', ObjectType=Network Device, ObjectName=testclient, ObjectId=4072, inLocalMode=false,
Here is my code:
##split the syslog data into CSV's in a list
#Here be dragons: One field, "ConfigChangeData" can have multiple embedded
#subfields. This is indicated by three trailing backslashes
#The following line needs to split on commas NOT proceeded by a backslash
csvlist=re.split("[^\\\\],", data)
AVPdict=dict()
##Create an Attribute/value pair by analysing the CSV values
##If the CSV value represents a AVP pair (detected by presense of an = sign)
##add it to the AVP dict
for csv in csvlist:
logger.debug("csv: %s" %(csv))
if re.search("=", csv):
csv=csv.strip() # clear out some embedded whitespace
attribute,value=csv.split("=", 1)
AVPdict[attribute]=value
Here is output from logging:
2015-10-05 18:08:47,189 root DEBUG csv: Version=someversion.x86_6
2015-10-05 18:08:47,190 root DEBUG csv: ConfigVersionId=15
2015-10-05 18:08:47,190 root DEBUG csv: AdminInterface=GU
2015-10-05 18:08:47,190 root DEBUG csv: AdminIPAddress=192.168.7
2015-10-05 18:08:47,191 root DEBUG csv: AdminSession=46CE916D0502A641592B105FF7CB3B7
2015-10-05 18:08:47,191 root DEBUG csv: AdminName=admi
2015-10-05 18:08:47,191 root DEBUG csv: ConfigChangeData='RADIUS:Shared Secret'='********'\\\,'TACACS+:Shared Secret'='********'\\\,'IP Address'='127.0.0.91/32
2015-10-05 18:08:47,192 root DEBUG csv: ObjectType=Network Devic
2015-10-05 18:08:47,192 root DEBUG csv: ObjectName=testclien
2015-10-05 18:08:47,192 root DEBUG csv: ObjectId=407
2015-10-05 18:08:47,193 root DEBUG csv: inLocalMode=fals
2015-10-05 18:08:47,193 root DEBUG csv:
Your regex pattern is consuming the last character before your comma because that character is part of the pattern you're splitting on. It's the character matched by the ugly [^\\\\] bit of the pattern.
I think you want a negative-lookbehind. This will let you check that the preceeding letter was not a backslash without actually including that character in the match.
csvlist=re.split(r"(?<!\\),", data)
Note that I'm using a raw string so you only need to two backslashes, rather than the four you were originally using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With