By default, the #split method work as follows:
"id,name,title(first_name,last_name)".split(",")
will give you following output:
["id", "name", "title(first_name", "last_name)"]
But I want something like following:
["id", "name", "title(first_name,last_name)"]
So, I use following regex (from the this answer) using split to get desired output:
"id,name,title(first_name,last_name)".split(/,(?![^(]*\))/)
But, again when I use another string, which is my actual input above, the logic fails. My actual string is:
"id,name,title(first_name,last_name,address(street,pincode(id,code)))"
and it is giving following output:
["id", "name", "title(first_name", "last_name", "address(street", "pincode(id,code)))"]
rather than
["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
Updated Answer
Since the earlier answer didn't take care of all the cases as rightly pointed out in the comments, I'm updating the answer with another solution.
This approach separates the valid commas using a separator | and, later uses it to split the string using String#split.
class TokenArrayParser
SPLIT_CHAR = '|'.freeze
def initialize(str)
@str = str
end
def parse
separate_on_valid_comma.split(SPLIT_CHAR)
end
private
def separate_on_valid_comma
dup = @str.dup
paren_count = 0
dup.length.times do |idx|
case dup[idx]
when '(' then paren_count += 1
when ')' then paren_count -= 1
when ',' then dup[idx] = SPLIT_CHAR if paren_count.zero?
end
end
dup
end
end
%w(
id,name,title(first_name,last_name)
id,name,title(first_name,last_name,address(street,pincode(id,code)))
first_name,last_name,address(street,pincode(id,code)),city(name)
a,b(c(d),e,f)
id,name,title(first_name,last_name),pub(name,address)
).each {|str| puts TokenArrayParser.new(str).parse.inspect }
# =>
# ["id", "name", "title(first_name,last_name)"]
# ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
# ["first_name", "last_name", "address(street,pincode(id,code))", "city(name)"]
# ["a", "b(c(d),e,f)"]
# ["id", "name", "title(first_name,last_name)", "pub(name,address)"]
I'm sure this can be optimized more.
def doit(str)
split_here = 0.chr
stack = 0
s = str.gsub(/./) do |c|
ret = c
case c
when '('
stack += 1
when ','
ret = split_here, if stack.zero?
when ')'
raise(RuntimeError, "parens are unbalanced") if stack.zero?
stack -= 1
end
ret
end
raise(RuntimeError, "parens are unbalanced, stack at end=#{stack}") if stack > 0
s.split(split_here)
end
doit "id,name,title(first_name,last_name)"
#=> ["id", "name", "title(first_name,last_name)"]
doit "id,name,title(first_name,last_name,address(street,pincode(id,code)))"
#=> ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
doit "a,b(c(d),e,f)"
#=> ["a", "b(c(d),e,f)"]
doit "id,name,title(first_name,last_name),pub(name,address)"
#=> ["id", "name", "title(first_name,last_name)", "pub(name,address)"]
doit "a,b(c)d),e,f)"
#=> RuntimeError: parens are unbalanced
doit "a,b(c(d),e),f("
#=> RuntimeError: parens are unbalanced, stack at end=["("]
A comma is to be split upon if and only if stack is zero when it is encountered. If it is to be split upon it is changed to a character (split_here) that is not in the string. (I used 0.chr). The string is then split on split_here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With