Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby string.match() function fails with identical string within the to be matched string

I copied and pasted a smaller portion of a large string and matched it against the large string. However, it does not return the value back. In the NOT case, it returns true. Is there something to the match function I am missing, or could there be hidden characters?

times = File.readlines('timesplit')
stringcomp = "created_at : Tue Jul 02 03:30:50 +0000 2013  id : 351905778745094144  id_str : 351905778745094144"
times.each do |t|
 r = t.split('|') 
 timestamp = r[1]
 puts !stringcomp.match(timestamp)
 puts stringcomp.match(timestamp)
end

Below are the contents for timesplit.

Jul_01|created_at : Tue Jul 02 03:30:50 +0000 2013  id :
Jul_02|created_at : Tue Sep 03 05:08:44 +0000 2013  id :
like image 288
Linus Liang Avatar asked Jan 31 '26 17:01

Linus Liang


1 Answers

The problem is subtle. String.match expects a regular expression for its parameter, and, if it doesn't see one it tries to turn the parameter into an expression:

Converts pattern to a Regexp (if it isn’t already one), then invokes its match method on str.

So:

created_at : Tue Jul 02 03:30:50 +0000 2013  id :

isn't a pattern going in, and it gets converted to one.

The problem is the +. In regular expressions, + means one-or-more of the preceding character or group or character set.

The correct way to specify a literal match between your stringcomp and your newly created pattern would be for the pattern to be:

created_at : Tue Jul 02 03:30:50 \+0000 2013  id :

Notice the \+. That means the + is now a literal value, not a length specifier.

For visual proof, check these two Rubular tests:

  • Without escaping: http://rubular.com/r/L6Fwrw1ftf
  • Escaped: http://rubular.com/r/SjGAYtzHuS

That all said, the simple fix is to not try to use match, and instead use a substring search:

times = [
  'Jul_01|created_at : Tue Jul 02 03:30:50 +0000 2013  id :',
  'Jul_02|created_at : Tue Sep 03 05:08:44 +0000 2013  id :'
]

stringcomp = "created_at : Tue Jul 02 03:30:50 +0000 2013  id : 351905778745094144  id_str : 351905778745094144"
times.each do |t|
  timestamp = t.split('|').last
  puts stringcomp[timestamp] || 'sub-string not found'
end

Which outputs:

created_at : Tue Jul 02 03:30:50 +0000 2013  id :
sub-string not found

If you want a boolean result, instead of the matching substring being returned you can use:

!!stringcomp[timestamp]

For example:

!!stringcomp['created_at : Tue Jul 02 03:30:50 +0000 2013  id :'] # => true

Alternately, you could use Regexp.escape on your string, prior to passing it in to match, but I think that's overkill when a substring match will accomplish what you want.

like image 141
the Tin Man Avatar answered Feb 02 '26 09:02

the Tin Man