Assignment instructions: http://pastebin.com/pxJS4gfR
Objective: Take a collection of documents and generate its inverted index.
My plan
I am using the following regular expression \.I(.*?)\.B\m to grab the text needed from a collections file as shown here: http://rubular.com/r/mOpfuvRT12
Edit: I have used mudasobwa's suggestion
content = File.read('test.txt')
# deal with content
content.scan(/\.T(.*?)\.B/m) { |mtch|
puts mtch
}
This grabs the necessary text I need however I need to place the grabbed text into a Hash to be used later and I am not sure how to work with the String.scan/regex/ because it returns an Array of Arrays.
I am basically trying to replicate this example:
puts "Enter something: "
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each do |word|
frequencies[word] += 1
end
frequencies = frequencies.sort_by { |k, v| v }
frequencies.reverse!
frequencies.each do |word, freq|
puts word + " " + freq.to_s
end
You are trying to read the file line by line. In such a case /m multiline modifier makes no sense. You are to read the entire file and then parse it for whatever you want:
content = File.read('test.txt')
content.scan(/\.T(.*?)\.B/m) { |mtch|
puts mtch
}
UPD
To put the scan results to hash as in the example you need either flatten method of an array:
content = File.read('test.txt')
# flatten the array ⇓⇓⇓⇓⇓⇓⇓
words = content.scan(/\.T(.*?)\.B/m).flatten
words.each …
or block within scan method:
content = File.read('test.txt')
freqs = {}
content.scan(/\.T(.*?)\.B/m) { |mtch|
(freqs[mtch] ||= 0) += 1
}
…
UPD2 To split the resulting array of sentenses to array of words:
arr = ["Preliminary Report International", "Fingers or Fists"]
arr.map {|e| e.split(' ')}.flatten.map(&:downcase)
# ⇒ ["preliminary", "report", "international", "fingers", "or", "fists"]
Here first map iterates array elements and transforms them to arrays of splitten words, flatten produces plain array from yielded array of arrays, and, finally, downcase is here because you’ve requested the downcased words in your example.
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With