I am trying to run a simple MR program using rmr2 in a single node Hadoop cluster. Here is the environment for the setup
Ubuntu 12.04 (32 bit)
R (Ubuntu comes with 2.14.1, so updated to 3.0.2)
Installed the latest rmr2 and rhdfs from here and the corresponding dependencies
Hadoop 1.2.1
Now I am trying to run a simple MR program as
Sys.setenv(HADOOP_HOME="/home/training/Installations/hadoop-1.2.1")
Sys.setenv(HADOOP_CMD="/home/training/Installations/hadoop-1.2.1/bin/hadoop")
library(rmr2)
library(rhdfs)
ints = to.dfs(1:100)
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v))
from.dfs(calc)
The mapreduce job fails with the below error message in hadoop-1.2.1/logs/userlogs/job_201310091055_0001/attempt_201310091055_0001_m_000000_0/stderr
Error in library(functional) : there is no package called ‘functional’
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
But, the sessionInfo()
shows that functional package has been loaded
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i686-pc-linux-gnu (32-bit)
>locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C
>attached base packages:
[1] stats graphics grDevices utils datasets methods base
>other attached packages:
[1] rhdfs_1.0.6 rJava_0.9-4 rmr2_2.3.0 reshape2_1.2.2 plyr_1.8
[6] stringr_0.6.2 **functional_0.4** digest_0.6.3 bitops_1.0-6 RJSONIO_1.0-3
[11] Rcpp_0.10.5
Update : I am able to run a R MR job reading and writing from STDIO without using the rmr2 and the rhdfs libraries as mentioned here. So, for now my guess is that the problem is isolated to rmr2 and the rhdfs packages.
How to get around this problem?
Install the dependencies for rmr2/rhdfs in a system directory instead of a custom directory (~/R/x86_64-pc-linux-gnu-library/3.0). This can be done running R as sudo and then installing the dependencies. Thanks to Antonio for the help in the RHadoop forums.
The most common solution of these kind of problem is re-installation since in sesssionInfo() you are getting
**functional_0.4**
while when i did sessionInfo() i got
functional_0.4
i guess there is some missing dependencies you might be missing so use from your R console
install.packages("functional",dependencies="TRUE")
to fix any problem due to any other packages .
P.S: Choose cloud-0 mirror from the available ones.
If still that does not help i recommend you use r-base-dev as your R version though i don't have a reason to justify this using http://cran.r-project.org/bin/linux/ubuntu/README
sudo apt-get install r-base-dev
Thanks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With