Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- groups = rbinom(32,n=50,prob=0.4)
- groupsdfs =to.dfs(groups)
- mapreduceResult<- mapreduce(
- input =groupsdfs,
- map =function(.,v) keyval(v,1),
- reduce = function(k,vv) keyval(k,sum(vv)))
- from.dfs(mapreduceResult)
- 14/07/24 11:22:59 INFO mapreduce.Job: map 100% reduce 58%
- 14/07/24 11:23:01 INFO mapreduce.Job: Task Id : attempt_1406189659246_0001_r_000016_1, Status : FAILED
- Error: java.lang.RuntimeException: Error in configuring object
- at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
- at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
- at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
- at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
- at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
- at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
- at java.security.AccessController.doPrivileged(Native Method)
- at javax.security.auth.Subject.doAs(Subject.java:415)
- at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
- at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
- Caused by: java.lang.reflect.InvocationTargetException
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:606)
- at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
- ... 9 more
- Caused by: java.lang.RuntimeException: configuration exception
- at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
- at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
- ... 14 more
- Caused by: java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
- at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
- at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
- ... 15 more
- Caused by: java.io.IOException: error=2, No such file or directory
- at java.lang.UNIXProcess.forkAndExec(Native Method)
- at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
- at java.lang.ProcessImpl.start(ProcessImpl.java:130)
- at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
- ... 16 more
- 14/07/24 11:23:42 INFO mapreduce.Job: Job job_1406189659246_0001 failed with state FAILED due to: Task failed task_1406189659246_0001_r_000007
- 14/07/24 11:23:42 INFO mapreduce.Job: Counters: 54
- File System Counters
- FILE: Number of bytes read=1631
- FILE: Number of bytes written=2036200
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=1073
- HDFS: Number of bytes written=5198
- HDFS: Number of read operations=67
- HDFS: Number of large read operations=0
- HDFS: Number of write operations=38
- Job Counters
- Failed map tasks=2
- Failed reduce tasks=28
- Killed reduce tasks=1
- Launched map tasks=4
- Launched reduce tasks=48
- Other local map tasks=2
- Data-local map tasks=2
- Total time spent by all maps in occupied slots (ms)=18216
- Total time spent by all reduces in occupied slots (ms)=194311
- Total time spent by all map tasks (ms)=18216
- Total time spent by all reduce tasks (ms)=194311
- Total vcore-seconds taken by all map tasks=18216
- Total vcore-seconds taken by all reduce tasks=194311
- Total megabyte-seconds taken by all map tasks=18653184
- Total megabyte-seconds taken by all reduce tasks=198974464
- Map-Reduce Framework
- Map input records=3
- Map output records=25
- Map output bytes=2196
- Map output materialized bytes=2266
- Input split bytes=214
- Combine input records=0
- Combine output records=0
- Reduce input groups=10
- Reduce shuffle bytes=1859
- Reduce input records=21
- Reduce output records=30
- Spilled Records=46
- Shuffled Maps =38
- Failed Shuffles=0
- Merged Map outputs=38
- GC time elapsed (ms)=1339
- CPU time spent (ms)=40060
- Physical memory (bytes) snapshot=5958418432
- Virtual memory (bytes) snapshot=33795457024
- Total committed heap usage (bytes)=7176978432
- Shuffle Errors
- BAD_ID=0
- CONNECTION=0
- IO_ERROR=0
- WRONG_LENGTH=0
- WRONG_MAP=0
- WRONG_REDUCE=0
- File Input Format Counters
- Bytes Read=859
- File Output Format Counters
- Bytes Written=5198
- rmr
- reduce calls=10
- 14/07/24 11:23:42 ERROR streaming.StreamJob: Job not Successful!
- Streaming Command Failed!
- Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
- hadoop streaming failed with error code 1
- Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
- Sys.setenv(HADOOP_STREAMING="/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming.jar")
- Sys.setenv(JAVA_HOME="/usr/java/jdk1.7.0_55-cloudera")
- Sys.setenv(HADOOP_COMMON_LIB_NATIVE_DIR="/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hadoop/lib/native")
- Sys.setenv(HADOOP_OPTS="-Djava.library.path=HADOOP_HOME/lib")
- library(rhdfs)
- hdfs.init()
- library(rmr2)
- ## space and word delimiter
- map <- function(k,lines) {
- words.list <- strsplit(lines, '\s')
- words <- unlist(words.list)
- return( keyval(words, 1) )
- }
- reduce <- function(word, counts) {
- keyval(word, sum(counts))
- }
- wordcount <- function (input, output=NULL) {
- mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
- }
- ## variables
- hdfs.root <- '/user/node'
- hdfs.data <- file.path(hdfs.root, 'data')
- hdfs.out <- file.path(hdfs.root, 'out')
- ## run mapreduce job
- ##out <- wordcount(hdfs.data, hdfs.out)
- system.time(out <- wordcount(hdfs.data, hdfs.out))
- ## fetch results from HDFS
- results <- from.dfs(out)
- results.df <- as.data.frame(results, stringsAsFactors=F)
- colnames(results.df) <- c('word', 'count')
- ##head(results.df)
- ## sorted output TOP10
- head(results.df[order(-results.df$count),],10)
Add Comment
Please, Sign In to add comment