Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- WARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0.
- Use ./bin/spark-submit <python file>
- Spark assembly has been built with Hive, including Datanucleus jars on classpath
- Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
- 14/10/27 03:29:43 WARN Utils: Your hostname, Sid resolves to a loopback address: 127.0.1.1; using 192.168.0.15 instead (on interface wlan0)
- 14/10/27 03:29:43 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
- 14/10/27 03:29:43 INFO SecurityManager: Changing view acls to: sid,
- 14/10/27 03:29:43 INFO SecurityManager: Changing modify acls to: sid,
- 14/10/27 03:29:43 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sid, ); users with modify permissions: Set(sid, )
- 14/10/27 03:29:44 INFO Slf4jLogger: Slf4jLogger started
- 14/10/27 03:29:44 INFO Remoting: Starting remoting
- 14/10/27 03:29:44 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:38405]
- 14/10/27 03:29:44 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:38405]
- 14/10/27 03:29:44 INFO Utils: Successfully started service 'sparkDriver' on port 38405.
- 14/10/27 03:29:44 INFO SparkEnv: Registering MapOutputTracker
- 14/10/27 03:29:44 INFO SparkEnv: Registering BlockManagerMaster
- 14/10/27 03:29:44 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141027032944-e2cd
- 14/10/27 03:29:44 INFO Utils: Successfully started service 'Connection manager for block manager' on port 58556.
- 14/10/27 03:29:44 INFO ConnectionManager: Bound socket to port 58556 with id = ConnectionManagerId(192.168.0.15,58556)
- 14/10/27 03:29:44 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
- 14/10/27 03:29:44 INFO BlockManagerMaster: Trying to register BlockManager
- 14/10/27 03:29:44 INFO BlockManagerMasterActor: Registering block manager 192.168.0.15:58556 with 265.4 MB RAM
- 14/10/27 03:29:44 INFO BlockManagerMaster: Registered BlockManager
- 14/10/27 03:29:44 INFO HttpFileServer: HTTP File server directory is /tmp/spark-ed95bc19-7c96-4d6e-8b07-fba7695c2f41
- 14/10/27 03:29:44 INFO HttpServer: Starting HTTP Server
- 14/10/27 03:29:44 INFO Utils: Successfully started service 'HTTP file server' on port 42993.
- 14/10/27 03:29:45 INFO Utils: Successfully started service 'SparkUI' on port 4040.
- 14/10/27 03:29:45 INFO SparkUI: Started SparkUI at http://192.168.0.15:4040
- 14/10/27 03:29:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- 14/10/27 03:29:45 INFO Utils: Copying /home/sid/Downloads/spark/pdsWork/smallCode.py to /tmp/spark-14dbc370-b423-48dd-b498-3798b76af4bb/smallCode.py
- 14/10/27 03:29:45 INFO SparkContext: Added file file:/home/sid/Downloads/spark/pdsWork/smallCode.py at http://192.168.0.15:42993/files/smallCode.py with timestamp 1414394985793
- 14/10/27 03:29:45 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:38405/user/HeartbeatReceiver
- 14/10/27 03:29:46 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=278302556
- 14/10/27 03:29:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 265.3 MB)
- 14/10/27 03:29:46 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
- 14/10/27 03:29:46 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
- 14/10/27 03:29:46 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
- 14/10/27 03:29:46 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
- 14/10/27 03:29:46 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
- 14/10/27 03:29:46 INFO FileInputFormat: Total input paths to process : 1
- 14/10/27 03:29:46 INFO SparkContext: Starting job: saveAsTextFile at NativeMethodAccessorImpl.java:-2
- 14/10/27 03:29:46 INFO DAGScheduler: Got job 0 (saveAsTextFile at NativeMethodAccessorImpl.java:-2) with 1 output partitions (allowLocal=false)
- 14/10/27 03:29:46 INFO DAGScheduler: Final stage: Stage 0(saveAsTextFile at NativeMethodAccessorImpl.java:-2)
- 14/10/27 03:29:46 INFO DAGScheduler: Parents of final stage: List()
- 14/10/27 03:29:46 INFO DAGScheduler: Missing parents: List()
- 14/10/27 03:29:46 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at saveAsTextFile at NativeMethodAccessorImpl.java:-2), which has no missing parents
- 14/10/27 03:29:46 INFO MemoryStore: ensureFreeSpace(61848) called with curMem=163705, maxMem=278302556
- 14/10/27 03:29:46 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 60.4 KB, free 265.2 MB)
- 14/10/27 03:29:46 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[5] at saveAsTextFile at NativeMethodAccessorImpl.java:-2)
- 14/10/27 03:29:46 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
- 14/10/27 03:29:46 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1258 bytes)
- 14/10/27 03:29:46 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
- 14/10/27 03:29:46 INFO Executor: Fetching http://192.168.0.15:42993/files/smallCode.py with timestamp 1414394985793
- 14/10/27 03:29:46 INFO Utils: Fetching http://192.168.0.15:42993/files/smallCode.py to /tmp/fetchFileTemp8666575365236477394.tmp
- 14/10/27 03:29:47 INFO CacheManager: Partition rdd_2_0 not found, computing it
- 14/10/27 03:29:47 INFO HadoopRDD: Input split: file:/home/sid/Downloads/spark/pdsWork/input.txt:0+147355
- 14/10/27 03:29:47 ERROR PythonRDD: Python worker exited unexpectedly (crashed)
- org.apache.spark.api.python.PythonException: Traceback (most recent call last):
- File "/home/sid/Downloads/spark/python/pyspark/worker.py", line 75, in main
- command = pickleSer._read_with_length(infile)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 146, in _read_with_length
- length = read_int(stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 464, in read_int
- raise EOFError
- EOFError
- at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
- at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
- at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
- at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
- at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
- at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
- at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
- at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
- at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
- at org.apache.spark.scheduler.Task.run(Task.scala:54)
- at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
- at java.lang.Thread.run(Thread.java:745)
- Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
- File "/home/sid/Downloads/spark/python/pyspark/worker.py", line 79, in main
- serializer.dump_stream(func(split_index, iterator), outfile)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 196, in dump_stream
- self.serializer.dump_stream(self._batched(iterator), stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 128, in dump_stream
- self._write_with_length(obj, stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 138, in _write_with_length
- serialized = self.dumps(obj)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 356, in dumps
- return cPickle.dumps(obj, 2)
- PicklingError: Can't pickle __main__.testing: attribute lookup __main__.testing failed
- at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
- at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
- at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
- at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
- at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
- at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
- 14/10/27 03:29:47 ERROR PythonRDD: This may have been caused by a prior exception:
- org.apache.spark.api.python.PythonException: Traceback (most recent call last):
- File "/home/sid/Downloads/spark/python/pyspark/worker.py", line 79, in main
- serializer.dump_stream(func(split_index, iterator), outfile)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 196, in dump_stream
- self.serializer.dump_stream(self._batched(iterator), stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 128, in dump_stream
- self._write_with_length(obj, stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 138, in _write_with_length
- serialized = self.dumps(obj)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 356, in dumps
- return cPickle.dumps(obj, 2)
- PicklingError: Can't pickle __main__.testing: attribute lookup __main__.testing failed
- at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
- at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
- at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
- at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
- at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
- at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
- 14/10/27 03:29:47 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
- org.apache.spark.api.python.PythonException: Traceback (most recent call last):
- File "/home/sid/Downloads/spark/python/pyspark/worker.py", line 79, in main
- serializer.dump_stream(func(split_index, iterator), outfile)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 196, in dump_stream
- self.serializer.dump_stream(self._batched(iterator), stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 128, in dump_stream
- self._write_with_length(obj, stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 138, in _write_with_length
- serialized = self.dumps(obj)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 356, in dumps
- return cPickle.dumps(obj, 2)
- PicklingError: Can't pickle __main__.testing: attribute lookup __main__.testing failed
- at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
- at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
- at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
- at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
- at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
- at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
- 14/10/27 03:29:47 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
- File "/home/sid/Downloads/spark/python/pyspark/worker.py", line 79, in main
- serializer.dump_stream(func(split_index, iterator), outfile)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 196, in dump_stream
- self.serializer.dump_stream(self._batched(iterator), stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 128, in dump_stream
- self._write_with_length(obj, stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 138, in _write_with_length
- serialized = self.dumps(obj)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 356, in dumps
- return cPickle.dumps(obj, 2)
- PicklingError: Can't pickle __main__.testing: attribute lookup __main__.testing failed
- org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
- org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
- org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
- org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
- org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
- org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
- org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
- org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
- 14/10/27 03:29:47 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
- 14/10/27 03:29:47 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
- 14/10/27 03:29:47 INFO TaskSchedulerImpl: Cancelling stage 0
- 14/10/27 03:29:47 INFO DAGScheduler: Failed to run saveAsTextFile at NativeMethodAccessorImpl.java:-2
- Traceback (most recent call last):
- File "/home/sid/Downloads/spark/pdsWork/smallCode.py", line 42, in <module>
- output.saveAsTextFile("output")
- File "/home/sid/Downloads/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile
- keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)
- File "/home/sid/Downloads/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
- File "/home/sid/Downloads/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
- py4j.protocol.Py4JJavaError: An error occurred while calling o40.saveAsTextFile.
- : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
- File "/home/sid/Downloads/spark/python/pyspark/worker.py", line 79, in main
- serializer.dump_stream(func(split_index, iterator), outfile)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 196, in dump_stream
- self.serializer.dump_stream(self._batched(iterator), stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 128, in dump_stream
- self._write_with_length(obj, stream)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 138, in _write_with_length
- serialized = self.dumps(obj)
- File "/home/sid/Downloads/spark/python/pyspark/serializers.py", line 356, in dumps
- return cPickle.dumps(obj, 2)
- PicklingError: Can't pickle __main__.testing: attribute lookup __main__.testing failed
- org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
- org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:154)
- org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:87)
- org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
- org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
- org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
- org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
- org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
- org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
- org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
- Driver stacktrace:
- at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
- at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
- at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
- at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
- at scala.Option.foreach(Option.scala:236)
- at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
- at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
- at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
- at akka.actor.ActorCell.invoke(ActorCell.scala:456)
- at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
- at akka.dispatch.Mailbox.run(Mailbox.scala:219)
- at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
- at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
- at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
- at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
- at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement