I want to calculate co-occurance terms using scala. But I encounter some problems.
This is my code:
val path = "pg100.txt"
val words = sc.textFile(path).map(_.toLowerCase.split("[\\s*$&#/\"'\\,.:;?!\\[\\](){}<>~\\-_]+").map(_.trim).sorted)
val coTerm = words.map{ line =>
for{
i <-0 until line.length
j <- (i+1) until line.length
} {
((line(i), line(j)), 1)
}}
The expected output should be:
coTerm.collect
res48: Array[Unit] = Array(((word1, word2), 1), ((word1, word3), 1), ((word2, word3), 1)...
But my output is following:
coTerm.collect
res51: Array[Unit] = Array((), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), ()....
I don't know why I can use println function in .map to print the word pairs but cannot emit the output.
The cause is you are not actually returning any records from you map.
Use yield to return the records in the for as shown below:
val coTerm = words.map{ line =>
for{
i <-0 until line.length
j <- (i+1) until line.length
} yield {
((line(i), line(j)), 1)
}}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With