Assuming we have a text:
In software, a stack overflow occurs if the call stack pointer exceeds the stack bound. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory.
What I am trying to do is find 2 words before and after a specific word (target). So for example if target is word start it should match 'at' 'the' (left) and 'of' 'the' (right). I am using the following method in ruby but it returns no matches. Any tips about what to fix in my regex? I have also tried "#{target}" instead of Regex.escape.
def checkWords(target, text, numLeft = 2, numRight = 2)
regex = ""
regex += " (\\S+) " * numLeft
regex += Regexp.escape(target)
regex += " (\\S+)" * numRight
pattern = Regexp.new(regex, Regexp::IGNORECASE)
matches = pattern.match(text)
return true if matches
end
Edit:
Regex printed:
(\S+) (\S+) "£52" (\S+) (\S+)
Edit based on Wiktor Stribiżew:
def checkWords(target, text, numLeft = 2, numRight = 2)
pattern = Regexp.new(/#{"(\\S+) "*numLeft}#{Regexp.escape(target)}#{" (\\S+)"*numRight}/i)
matches = pattern.match(text)
end
▶ input[/(\S+\s+){,2}start(\s+\S+){,2}/i]
#⇒ "at the start of the"
more generic:
▶ target = 'start'
▶ input[/(\S+\s+){,2}#{Regexp.escape target}(\s+\S+){,2}/i]
#⇒ "at the start of the"
To handle a punctuation after the target:
▶ target = 'start'
▶ input[/(\S+\s+){,2}#{Regexp.escape target}\p{P}?(\s+\S+){,2}/i]
#⇒ "at the start of the"
Your function might look like:
def checkWords(target, text, numLeft = 2, numRight = 2)
text =~ /(\S+\s+){,#{numLeft}}#{Regexp.escape target}\p{P}?(\s+\S+){,#{numRight}}/i
end
In the case you're looking at, I think you might be better served by splitting the text on non-word characters and then searching through the splits for your target word. Once you've found it, it's very easy to take the appropriate slices of the array of words in order to get the results you want.
For example:
def check_words(target, text, num_left = 2, num_right = 2)
# Split the text using the regex /\W+/ (matches non-word characters)
words = text.split /\W+/
# Iterate over the words in the array
# Enumerable#each_with_index includes the index, so retrieving the surrounding
# words is a snap
words.each_with_index do |word, index|
if word == target
# Make a hash with two Symbol keys and small
# arrays containing the desired words
return {
before: words.slice(index - num_left, num_left),
after: words.slice(index, num_right)
}
end
end
end
This can then be called like so:
check_words('start', text)
And it returns a Hash containing the num_left words before and the num_right words after the keyword:
{:before=>["at", "the"], :after=>["start", "of"]}
The {before: ...} syntax is Ruby 2 for {:before => ...}; either syntax will work fine.
Also, you may be interested in the Ruby documentation for Regexp, if you've not seen it already.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With