Guest User

Untitled

a guest
Mar 19th, 2019
57
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 23.46 KB | None | 0 0
  1. init of cps module called
  2. Creating a new instance of class <class 'cps.settings.manager.SettingsManager'>
  3. -------Starting the COST PER SCAN Analysis---------
  4. Creating a new instance of class <class 'cps.cost_per_scan.CostPerScan'>
  5. Going to initialize Logger
  6. [2019-03-19 08:08:01,623][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][WARNING][root]--Sentry is not configured yet
  7. [2019-03-19 08:08:01,785][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][root]--Root Logger Initialized. CPS Id 37a864f3-eff0-4903-a381-23e37aeb4f60
  8. [2019-03-19 08:08:01,847][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.config]--Conf Path generated is <class 'cps.config.dev.Config'>
  9. [2019-03-19 08:08:01,848][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.config]--Loading configuration from <class 'cps.config.dev.Config'>
  10. [2019-03-19 08:08:01,848][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.settings.manager]--Config to update is {'PACKAGE_PARQUET_BUCKET': 'prod-dwh-express-package-parquet', 'TRIP_RAW_BUCKET': 'prod-dwh-express-trips-kafka-raw', 'AIR_INTER_SERVICE_OUTPUT_BUCKET_KEY': 'air_inter', 'CPS_OUTPUT_BUCKET': 'cps-output-bucket-dev', 'TRIP_ROUTE_AD_JSON_BUCKET': 'prod-integration-s3-thanos-route-kafka', 'FUEL_LM_OUTPUT_BUCKET_KEY': 'fuel_lm', 'TRIP_PARQUET_BUCKET': 'prod-dwh-express-trips-parquet', 'DEBUGGING_DF_OUTPUT': 'HDFS', 'S3_ACCESS_PROTOCOL': 's3://', 'S3_REGION': 'us-east-1', 'END_DATE': datetime.datetime(2019, 2, 28, 0, 0), 'START_DATE': datetime.datetime(2019, 1, 1, 0, 0), 'CPS_JDBC_PASSWORD': 'costperscan123', 'IST_AD_JSON_BUCKET': 'prod-integration-s3-ist-ad-json', 'PACKAGE_AD_JSON_BUCKET': 'prod-integration-s3-package-ad-json', 'IS_DEBUGGING_ENABLED': True, 'ENABLE_AWS_GLUE_CATALOG': True, 'REAL_TIME_JDBC_URL': 'jdbc:postgresql://dwh-real-time-postgres.delhivery.com/dwh', 'S3_JSON_FORMAT': 'json', 'JDBC_DRIVER_NAME': 'org.postgresql.Driver', 'OFFROLL_MANPOWER_LM_OUTPUT_BUCKET_KEY': 'offroll_manpower_lm', 'CPS_CODE_BASE_BUCKET': 'cps-code-base-dev', 'DEFAULT_DATA_ACCESS_SERVICE': 'S3_JSON_DATA_ACCESS_SERVICE', 'ENABLE_DEFAULT_DATA_ACCESS_SERVICE_FOR_HIVE_ENTITIES': False, 'DAS_LM_OUTPUT_BUCKET_KEY': 'das_lm', 'ADHOC_LM_OUTPUT_BUCKET_KEY': 'adhoc_lm', 'ONROLL_MANPOWER_OUTPUT_BUCKET_KEY': 'onroll_manpower', 'CPS_JDBC_USER_NAME': 'cps', 'CPS_JDBC_URL': 'jdbc:postgresql://cps-prod-aurora.ceypiyhweprx.us-east-1.rds.amazonaws.com/cps', 'REAL_TIME_JDBC_PASSWORD': 'datawarehouse987', 'S3_ENDPOINT_URL': 's3.us-east-1.amazonaws.com', 'REAL_TIME_JDBC_USER_NAME': 'datawarehouse'}
  11. [2019-03-19 08:08:01,849][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.cost_per_scan]--AWS Glue Catalog is enabled.
  12. [2019-03-19 08:08:26,329][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.cost_per_scan]--Spark Configuration is [('spark.submit.pyFiles', '/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip,/home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.eventLog.enabled', 'true'), ('spark.app.id', 'application_1552982744164_0001'), ('spark.driver.host', 'ip-172-30-2-246.ec2.internal'), ('spark.driver.extraLibraryPath', '/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native'), ('spark.sql.parquet.output.committer.class', 'com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter'), ('spark.blacklist.decommissioning.timeout', '1h'), ('spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS', '$(hostname -f)'), ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'), ('spark.driver.port', '36115'), ('spark.executor.extraJavaOptions', "-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'"), ('spark.sql.hive.caseSensitiveInferenceMode', 'NEVER_INFER'), ('spark.sql.parquet.compression.codec', 'snappy'), ('spark.eventLog.dir', 'hdfs:///var/log/spark/apps'), ('spark.sql.warehouse.dir', 'hdfs:///user/spark/warehouse'), ('hive.metastore.client.factory.class', 'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory'), ('spark.driver.memory', '5585M'), ('spark.hadoop.fs.s3n.impl', 'com.amazon.ws.emr.hadoop.fs.EmrFileSystem'), ('spark.sql.hive.metastorePartitionPruning', 'true'), ('spark.history.fs.logDirectory', 'hdfs:///var/log/spark/apps'), ('spark.ui.filters', 'org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter'), ('spark.yarn.historyServer.address', 'ip-172-30-2-246.ec2.internal:18080'), ('spark.hadoop.fs.s3.impl', 'com.amazon.ws.emr.hadoop.fs.EmrFileSystem'), ('spark.sql.parquet.filterPushdown', 'true'), ('spark.sql.parquet.mergeSchema', 'false'), ('spark.hadoop.yarn.timeline-service.enabled', 'false'), ('spark.sql.hive.convertMetastoreParquet', 'true'), ('spark.executor.id', 'driver'), ('spark.sql.shuffle.partitions', '744'), ('spark.jars.packages', 'org.postgresql:postgresql:42.2.5'), ('spark.yarn.dist.jars', 'file:///home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.driver.extraJavaOptions', "-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'"), ('spark.rdd.compress', 'true'), ('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version', '2'), ('spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored', 'true'), ('spark.decommissioning.timeout.threshold', '20'), ('spark.sql.catalogImplementation', 'hive'), ('spark.executor.memory', '27648M'), ('spark.app.name', 'CostPerScan'), ('spark.stage.attempt.ignoreOnDecommissionFetchFailure', 'true'), ('spark.speculation', 'false'), ('spark.executorEnv.PYTHONPATH', '{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip<CPS>{{PWD}}/cps.zip<CPS>{{PWD}}/org.postgresql_postgresql-42.2.5.jar'), ('spark.yarn.secondary.jars', 'org.postgresql_postgresql-42.2.5.jar'), ('spark.executor.extraLibraryPath', '/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native'), ('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem', '2'), ('spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES', 'http://ip-172-30-2-246.ec2.internal:20888/proxy/application_1552982744164_0001'), ('spark.executor.cores', '4'), ('spark.shuffle.io.maxRetries', '4'), ('spark.default.parallelism', '744'), ('spark.sql.hive.metastore.sharedPrefixes', 'com.amazonaws.services.dynamodbv2'), ('spark.repl.local.jars', 'file:///home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.serializer.objectStreamReset', '100'), ('spark.submit.deployMode', 'client'), ('spark.sql.parquet.fs.optimized.committer.optimization-enabled', 'true'), ('spark.ui.proxyBase', '/proxy/application_1552982744164_0001'), ('spark.driver.extraClassPath', '/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar'), ('spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem', 'true'), ('spark.task.maxFailures', '4'), ('spark.yarn.submit.waitAppCompletion', 'true'), ('spark.history.ui.port', '18080'), ('spark.shuffle.service.enabled', 'true'), ('spark.driver.appUIAddress', 'http://ip-172-30-2-246.ec2.internal:4040'), ('spark.resourceManager.cleanupExpiredHost', 'true'), ('spark.yarn.dist.pyFiles', 's3://cps-code-base-dev/spark_dist/cps.zip,file:///home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.kryo.unsafe', 'true'), ('spark.executor.extraClassPath', '/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar'), ('spark.files.fetchFailure.unRegisterOutputOnHost', 'true'), ('spark.broadcast.compress', 'true'), ('spark.hadoop.parquet.enable.summary-metadata', 'false'), ('spark.master', 'yarn'), ('spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS', 'ip-172-30-2-246.ec2.internal'), ('spark.sql.orc.filterPushdown', 'true'), ('spark.yarn.isPython', 'true'), ('spark.dynamicAllocation.enabled', 'true'), ('spark.blacklist.decommissioning.enabled', 'true')]s
  13. [2019-03-19 08:08:26,333][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.cost_per_scan]--Resource setup done. <pyspark.sql.session.SparkSession object at 0x7fb5f37e4b70>
  14. Initialized a cps instance -9223363260625497649
  15. Creating a new instance of class <class 'cps.data_access.hive.HiveDataAccessService'>
  16. Creating a new instance of class <class 'cps.data_access.hdfs.HdfsDataAccessService'>
  17. Creating a new instance of class <class 'cps.data_access.s3.S3JsonDataAccessService'>
  18. Creating a new instance of class <class 'cps.data_access.s3.S3ParquetDataAccessService'>
  19. Creating a new instance of class <class 'cps.data_access.jdbc.CpsJdbcDataAccessService'>
  20. [2019-03-19 08:08:26,386][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Initializing JdbcDataAccessService for <class 'cps.data_access.jdbc.CpsJdbcDataAccessService'>
  21. Creating a new instance of class <class 'cps.data_access.jdbc.RealTimeJdbcDataAccessService'>
  22. [2019-03-19 08:08:26,387][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Initializing JdbcDataAccessService for <class 'cps.data_access.jdbc.RealTimeJdbcDataAccessService'>
  23. Creating a new instance of class <class 'cps.data_access.csv.CsvDataAccessService'>
  24. Creating a new instance of class <class 'cps.analysis_services.registrar.AnalysisServiceRegistrar'>
  25. Creating a new instance of class <class 'cps.analysis_services.air.AirInterService'>
  26. Creating a new instance of class <class 'cps.analysis_services.fuel.FuelLastMileAnalysisService'>
  27. Creating a new instance of class <class 'cps.analysis_services.das.DasLastMileAnalysisService'>
  28. Creating a new instance of class <class 'cps.analysis_services.adhoc_lm.AdhocLastMileAnalysisService'>
  29. Creating a new instance of class <class 'cps.analysis_services.offroll.OffrollManpowerLmService'>
  30. Creating a new instance of class <class 'cps.analysis_services.onroll.OnrollManpowerService'>
  31. [2019-03-19 08:08:26,603][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.analysis_services.base]--Analysis ID adhoc_lm_2019_03_19_08_08_26_603123 written to file
  32. [2019-03-19 08:08:26,603][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Reading Hive Table express_dwh.package_s3_parquet Partitions from 2019-01-01-00 to 2019-02-28-00
  33. [2019-03-19 08:08:32,416][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Table express_dwh.package_s3_parquet Read for partitions 2019-01-01-00 to 2019-02-28-00
  34. [2019-03-19 08:08:45,411][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Reading Hive Table express_dwh.dispatch_lm_s3_parquet Partitions from 2018-12-30-00 to 2019-03-05-00
  35. [2019-03-19 08:08:45,712][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Table express_dwh.dispatch_lm_s3_parquet Read for partitions 2018-12-30-00 to 2019-03-05-00
  36. [2019-03-19 08:08:47,817][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.analysis_services.adhoc_lm]--Package and Dispatch joined for date 2019-01-01 00:00:00 2019-02-28 00:00:00
  37. [2019-03-19 08:08:47,818][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Reading Jdbc Table facility Partitions from None to None
  38. [2019-03-19 08:08:50,576][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Jdbc Table facility Read for partitions None to None
  39. [2019-03-19 08:08:52,764][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Writing in JDBC DB in table adhoc_lm
  40. Traceback (most recent call last):
  41. File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/main.py", line 11, in <module>
  42. cps_app.start_analysis(ConfConstant.ADHOC_LM_OUTPUT_BUCKET_KEY.value)
  43. File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/cost_per_scan.py", line 119, in start_analysis
  44. File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/analysis_services/registrar.py", line 44, in start_analysis_service
  45. File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/analysis_services/base.py", line 53, in start
  46. File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/analysis_services/adhoc_lm.py", line 52, in run
  47. File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/data_access/jdbc.py", line 51, in write_df_to_jdbc_db
  48. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 703, in save
  49. File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  50. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  51. File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
  52. py4j.protocol.Py4JJavaError: An error occurred while calling o612.save.
  53. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 4.0 failed 4 times, most recent failure: Lost task 44.3 in stage 4.0 (TID 8453, ip-172-30-2-231.ec2.internal, executor 32): org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://prod-dwh-express-dispatch-lm-parquet/ad=2019-02-27/part-00000-62e6fce7-7655-4be6-b18e-50e1099ccf4a.c000.snappy.parquet. Column: [si_earned], Expected: DoubleType, Found: BINARY
  54. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
  55. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
  56. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  57. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  58. at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
  59. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  60. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  61. at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
  62. at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
  63. at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
  64. at org.apache.spark.scheduler.Task.run(Task.scala:109)
  65. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
  66. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  67. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  68. at java.lang.Thread.run(Thread.java:748)
  69. Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException
  70. at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.constructConvertNotSupportedException(VectorizedColumnReader.java:245)
  71. at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBinaryBatch(VectorizedColumnReader.java:490)
  72. at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:216)
  73. at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:263)
  74. at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:164)
  75. at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
  76. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
  77. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186)
  78. ... 14 more
  79.  
  80. Driver stacktrace:
  81. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1803)
  82. at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1791)
  83. at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1790)
  84. at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  85. at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  86. at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1790)
  87. at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
  88. at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
  89. at scala.Option.foreach(Option.scala:257)
  90. at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
  91. at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2024)
  92. at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1973)
  93. at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1962)
  94. at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  95. at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
  96. at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
  97. at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
  98. at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
  99. at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
  100. at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935)
  101. at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933)
  102. at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  103. at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  104. at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  105. at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933)
  106. at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:821)
  107. at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:96)
  108. at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  109. at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  110. at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  111. at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  112. at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  113. at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  114. at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  115. at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  116. at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  117. at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  118. at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  119. at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  120. at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
  121. at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
  122. at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  123. at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
  124. at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
  125. at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
  126. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  127. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  128. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  129. at java.lang.reflect.Method.invoke(Method.java:498)
  130. at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
  131. at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
  132. at py4j.Gateway.invoke(Gateway.java:282)
  133. at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
  134. at py4j.commands.CallCommand.execute(CallCommand.java:79)
  135. at py4j.GatewayConnection.run(GatewayConnection.java:238)
  136. at java.lang.Thread.run(Thread.java:748)
  137. Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://prod-dwh-express-dispatch-lm-parquet/ad=2019-02-27/part-00000-62e6fce7-7655-4be6-b18e-50e1099ccf4a.c000.snappy.parquet. Column: [si_earned], Expected: DoubleType, Found: BINARY
  138. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
  139. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
  140. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  141. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  142. at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
  143. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  144. at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  145. at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
  146. at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
  147. at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
  148. at org.apache.spark.scheduler.Task.run(Task.scala:109)
  149. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
  150. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  151. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  152. ... 1 more
  153. Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException
  154. at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.constructConvertNotSupportedException(VectorizedColumnReader.java:245)
  155. at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBinaryBatch(VectorizedColumnReader.java:490)
  156. at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:216)
  157. at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:263)
  158. at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:164)
  159. at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
  160. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
  161. at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186)
  162. ... 14 more
Add Comment
Please, Sign In to add comment