Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically double-quote "keys" in text to form valid JSON string in python

I'm working with text contained in JS variables on a webpage and extracting strings using regex, then turning it into JSON objects in python using json.loads().

The issue I'm having is the unquoted "keys". Right now, I'm doing a series of replacements (code below) to "" each key in each string, but what I want is to dynamically identify any unquoted keys before passing the string into json.loads().

Example 1 with no space after : character

json_data1 = '[{storeName:"testName",address:"12345 Road",address2:"Suite 500",city:"testCity",storeImage:"http://www.testLink.com",state:"testState",phone:"999-999-9999",lat:99.9999,lng:-99.9999}]'

Example 2 with space after : character

json_data2 = '[{storeName: "testName",address: "12345 Road",address2: "Suite 500",city: "testCity",storeImage: "http://www.testLink.com",state: "testState",phone: "999-999-9999",lat: 99.9999,lng: -99.9999}]'

Example 3 with space after ,: characters

json_data3 = '[{storeName: "testName", address: "12345 Road", address2: "Suite 500", city: "testCity", storeImage: "http://www.testLink.com", state: "testState", phone: "999-999-9999", lat: 99.9999, lng: -99.9999}]'

Example 4 with space after : character and newlines

json_data4 = '''[
{
    storeName: "testName", 
    address: "12345 Road", 
    address2: "Suite 500", 
    city: "testCity", 
    storeImage: "http://www.testLink.com", 
    state: "testState", 
    phone: "999-999-9999", 
    lat: 99.9999, lng: -99.9999
}]'''

I need to create pattern that identifies which are keys and not random string values containing characters such as the string link in storeImage. In other words, I want to dynamically find keys and double-quote them to use json.loads() and return a valid JSON object.

I'm currently replacing each key in the text this way

content = re.sub('storeName:', '"storeName":', content)
content = re.sub('address:', '"address":', content)
content = re.sub('address2:', '"address2":', content)
content = re.sub('city:', '"city":', content)
content = re.sub('storeImage:', '"storeImage":', content)
content = re.sub('state:', '"state":', content)
content = re.sub('phone:', '"phone":', content)
content = re.sub('lat:', '"lat":', content)
content = re.sub('lng:', '"lng":', content)

Returned as string representing valid JSON

json_data = [{"storeName": "testName", "address": "12345 Road", "address2": "Suite 500", "city": "testCity", "storeImage": "http://www.testLink.com", "state": "testState", "phone": "999-999-9999", "lat": 99.9999, "lng": -99.9999}]

I'm sure there is a better way of doing this but I haven't been able to find or come up with a regex pattern to handle these. Any help is greatly appreciated!

like image 479
Derrick Brewer Avatar asked Sep 07 '25 02:09

Derrick Brewer


2 Answers

That repetition is of course unnecessary. You could put everything into a single regex:

content = re.sub(r"\b(storeName|address2?|city|storeImage|state|phone|lat|lng):", r'"\1":', content)

\1 contains the match within the first (in this case, only) set of parentheses, so "\1": surrounds it with quotes and adds back the colon.

Note the use of a word boundary anchor to make sure we match only those exact words.

like image 69
Tim Pietzcker Avatar answered Sep 10 '25 00:09

Tim Pietzcker


Something like this should do the job: ([{,]\s*)([^"':]+)(\s*:)

Replace for: \1"\2"\3

Example: https://regex101.com/r/oV0udR/1

like image 26
AdrianEddy Avatar answered Sep 10 '25 00:09

AdrianEddy