Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- init of cps module called
- Creating a new instance of class <class 'cps.settings.manager.SettingsManager'>
- -------Starting the COST PER SCAN Analysis---------
- Creating a new instance of class <class 'cps.cost_per_scan.CostPerScan'>
- Going to initialize Logger
- [2019-03-19 08:08:01,623][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][WARNING][root]--Sentry is not configured yet
- [2019-03-19 08:08:01,785][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][root]--Root Logger Initialized. CPS Id 37a864f3-eff0-4903-a381-23e37aeb4f60
- [2019-03-19 08:08:01,847][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.config]--Conf Path generated is <class 'cps.config.dev.Config'>
- [2019-03-19 08:08:01,848][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.config]--Loading configuration from <class 'cps.config.dev.Config'>
- [2019-03-19 08:08:01,848][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.settings.manager]--Config to update is {'PACKAGE_PARQUET_BUCKET': 'prod-dwh-express-package-parquet', 'TRIP_RAW_BUCKET': 'prod-dwh-express-trips-kafka-raw', 'AIR_INTER_SERVICE_OUTPUT_BUCKET_KEY': 'air_inter', 'CPS_OUTPUT_BUCKET': 'cps-output-bucket-dev', 'TRIP_ROUTE_AD_JSON_BUCKET': 'prod-integration-s3-thanos-route-kafka', 'FUEL_LM_OUTPUT_BUCKET_KEY': 'fuel_lm', 'TRIP_PARQUET_BUCKET': 'prod-dwh-express-trips-parquet', 'DEBUGGING_DF_OUTPUT': 'HDFS', 'S3_ACCESS_PROTOCOL': 's3://', 'S3_REGION': 'us-east-1', 'END_DATE': datetime.datetime(2019, 2, 28, 0, 0), 'START_DATE': datetime.datetime(2019, 1, 1, 0, 0), 'CPS_JDBC_PASSWORD': 'costperscan123', 'IST_AD_JSON_BUCKET': 'prod-integration-s3-ist-ad-json', 'PACKAGE_AD_JSON_BUCKET': 'prod-integration-s3-package-ad-json', 'IS_DEBUGGING_ENABLED': True, 'ENABLE_AWS_GLUE_CATALOG': True, 'REAL_TIME_JDBC_URL': 'jdbc:postgresql://dwh-real-time-postgres.delhivery.com/dwh', 'S3_JSON_FORMAT': 'json', 'JDBC_DRIVER_NAME': 'org.postgresql.Driver', 'OFFROLL_MANPOWER_LM_OUTPUT_BUCKET_KEY': 'offroll_manpower_lm', 'CPS_CODE_BASE_BUCKET': 'cps-code-base-dev', 'DEFAULT_DATA_ACCESS_SERVICE': 'S3_JSON_DATA_ACCESS_SERVICE', 'ENABLE_DEFAULT_DATA_ACCESS_SERVICE_FOR_HIVE_ENTITIES': False, 'DAS_LM_OUTPUT_BUCKET_KEY': 'das_lm', 'ADHOC_LM_OUTPUT_BUCKET_KEY': 'adhoc_lm', 'ONROLL_MANPOWER_OUTPUT_BUCKET_KEY': 'onroll_manpower', 'CPS_JDBC_USER_NAME': 'cps', 'CPS_JDBC_URL': 'jdbc:postgresql://cps-prod-aurora.ceypiyhweprx.us-east-1.rds.amazonaws.com/cps', 'REAL_TIME_JDBC_PASSWORD': 'datawarehouse987', 'S3_ENDPOINT_URL': 's3.us-east-1.amazonaws.com', 'REAL_TIME_JDBC_USER_NAME': 'datawarehouse'}
- [2019-03-19 08:08:01,849][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.cost_per_scan]--AWS Glue Catalog is enabled.
- [2019-03-19 08:08:26,329][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.cost_per_scan]--Spark Configuration is [('spark.submit.pyFiles', '/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip,/home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.eventLog.enabled', 'true'), ('spark.app.id', 'application_1552982744164_0001'), ('spark.driver.host', 'ip-172-30-2-246.ec2.internal'), ('spark.driver.extraLibraryPath', '/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native'), ('spark.sql.parquet.output.committer.class', 'com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter'), ('spark.blacklist.decommissioning.timeout', '1h'), ('spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS', '$(hostname -f)'), ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'), ('spark.driver.port', '36115'), ('spark.executor.extraJavaOptions', "-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'"), ('spark.sql.hive.caseSensitiveInferenceMode', 'NEVER_INFER'), ('spark.sql.parquet.compression.codec', 'snappy'), ('spark.eventLog.dir', 'hdfs:///var/log/spark/apps'), ('spark.sql.warehouse.dir', 'hdfs:///user/spark/warehouse'), ('hive.metastore.client.factory.class', 'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory'), ('spark.driver.memory', '5585M'), ('spark.hadoop.fs.s3n.impl', 'com.amazon.ws.emr.hadoop.fs.EmrFileSystem'), ('spark.sql.hive.metastorePartitionPruning', 'true'), ('spark.history.fs.logDirectory', 'hdfs:///var/log/spark/apps'), ('spark.ui.filters', 'org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter'), ('spark.yarn.historyServer.address', 'ip-172-30-2-246.ec2.internal:18080'), ('spark.hadoop.fs.s3.impl', 'com.amazon.ws.emr.hadoop.fs.EmrFileSystem'), ('spark.sql.parquet.filterPushdown', 'true'), ('spark.sql.parquet.mergeSchema', 'false'), ('spark.hadoop.yarn.timeline-service.enabled', 'false'), ('spark.sql.hive.convertMetastoreParquet', 'true'), ('spark.executor.id', 'driver'), ('spark.sql.shuffle.partitions', '744'), ('spark.jars.packages', 'org.postgresql:postgresql:42.2.5'), ('spark.yarn.dist.jars', 'file:///home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.driver.extraJavaOptions', "-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'"), ('spark.rdd.compress', 'true'), ('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version', '2'), ('spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored', 'true'), ('spark.decommissioning.timeout.threshold', '20'), ('spark.sql.catalogImplementation', 'hive'), ('spark.executor.memory', '27648M'), ('spark.app.name', 'CostPerScan'), ('spark.stage.attempt.ignoreOnDecommissionFetchFailure', 'true'), ('spark.speculation', 'false'), ('spark.executorEnv.PYTHONPATH', '{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.7-src.zip<CPS>{{PWD}}/cps.zip<CPS>{{PWD}}/org.postgresql_postgresql-42.2.5.jar'), ('spark.yarn.secondary.jars', 'org.postgresql_postgresql-42.2.5.jar'), ('spark.executor.extraLibraryPath', '/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native'), ('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem', '2'), ('spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES', 'http://ip-172-30-2-246.ec2.internal:20888/proxy/application_1552982744164_0001'), ('spark.executor.cores', '4'), ('spark.shuffle.io.maxRetries', '4'), ('spark.default.parallelism', '744'), ('spark.sql.hive.metastore.sharedPrefixes', 'com.amazonaws.services.dynamodbv2'), ('spark.repl.local.jars', 'file:///home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.serializer.objectStreamReset', '100'), ('spark.submit.deployMode', 'client'), ('spark.sql.parquet.fs.optimized.committer.optimization-enabled', 'true'), ('spark.ui.proxyBase', '/proxy/application_1552982744164_0001'), ('spark.driver.extraClassPath', '/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar'), ('spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem', 'true'), ('spark.task.maxFailures', '4'), ('spark.yarn.submit.waitAppCompletion', 'true'), ('spark.history.ui.port', '18080'), ('spark.shuffle.service.enabled', 'true'), ('spark.driver.appUIAddress', 'http://ip-172-30-2-246.ec2.internal:4040'), ('spark.resourceManager.cleanupExpiredHost', 'true'), ('spark.yarn.dist.pyFiles', 's3://cps-code-base-dev/spark_dist/cps.zip,file:///home/hadoop/.ivy2/jars/org.postgresql_postgresql-42.2.5.jar'), ('spark.kryo.unsafe', 'true'), ('spark.executor.extraClassPath', '/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar'), ('spark.files.fetchFailure.unRegisterOutputOnHost', 'true'), ('spark.broadcast.compress', 'true'), ('spark.hadoop.parquet.enable.summary-metadata', 'false'), ('spark.master', 'yarn'), ('spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS', 'ip-172-30-2-246.ec2.internal'), ('spark.sql.orc.filterPushdown', 'true'), ('spark.yarn.isPython', 'true'), ('spark.dynamicAllocation.enabled', 'true'), ('spark.blacklist.decommissioning.enabled', 'true')]s
- [2019-03-19 08:08:26,333][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.cost_per_scan]--Resource setup done. <pyspark.sql.session.SparkSession object at 0x7fb5f37e4b70>
- Initialized a cps instance -9223363260625497649
- Creating a new instance of class <class 'cps.data_access.hive.HiveDataAccessService'>
- Creating a new instance of class <class 'cps.data_access.hdfs.HdfsDataAccessService'>
- Creating a new instance of class <class 'cps.data_access.s3.S3JsonDataAccessService'>
- Creating a new instance of class <class 'cps.data_access.s3.S3ParquetDataAccessService'>
- Creating a new instance of class <class 'cps.data_access.jdbc.CpsJdbcDataAccessService'>
- [2019-03-19 08:08:26,386][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Initializing JdbcDataAccessService for <class 'cps.data_access.jdbc.CpsJdbcDataAccessService'>
- Creating a new instance of class <class 'cps.data_access.jdbc.RealTimeJdbcDataAccessService'>
- [2019-03-19 08:08:26,387][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Initializing JdbcDataAccessService for <class 'cps.data_access.jdbc.RealTimeJdbcDataAccessService'>
- Creating a new instance of class <class 'cps.data_access.csv.CsvDataAccessService'>
- Creating a new instance of class <class 'cps.analysis_services.registrar.AnalysisServiceRegistrar'>
- Creating a new instance of class <class 'cps.analysis_services.air.AirInterService'>
- Creating a new instance of class <class 'cps.analysis_services.fuel.FuelLastMileAnalysisService'>
- Creating a new instance of class <class 'cps.analysis_services.das.DasLastMileAnalysisService'>
- Creating a new instance of class <class 'cps.analysis_services.adhoc_lm.AdhocLastMileAnalysisService'>
- Creating a new instance of class <class 'cps.analysis_services.offroll.OffrollManpowerLmService'>
- Creating a new instance of class <class 'cps.analysis_services.onroll.OnrollManpowerService'>
- [2019-03-19 08:08:26,603][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.analysis_services.base]--Analysis ID adhoc_lm_2019_03_19_08_08_26_603123 written to file
- [2019-03-19 08:08:26,603][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Reading Hive Table express_dwh.package_s3_parquet Partitions from 2019-01-01-00 to 2019-02-28-00
- [2019-03-19 08:08:32,416][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Table express_dwh.package_s3_parquet Read for partitions 2019-01-01-00 to 2019-02-28-00
- [2019-03-19 08:08:45,411][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Reading Hive Table express_dwh.dispatch_lm_s3_parquet Partitions from 2018-12-30-00 to 2019-03-05-00
- [2019-03-19 08:08:45,712][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.hive]--Table express_dwh.dispatch_lm_s3_parquet Read for partitions 2018-12-30-00 to 2019-03-05-00
- [2019-03-19 08:08:47,817][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.analysis_services.adhoc_lm]--Package and Dispatch joined for date 2019-01-01 00:00:00 2019-02-28 00:00:00
- [2019-03-19 08:08:47,818][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Reading Jdbc Table facility Partitions from None to None
- [2019-03-19 08:08:50,576][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Jdbc Table facility Read for partitions None to None
- [2019-03-19 08:08:52,764][172.30.2.246][ip-172-30-2-246][PID:13527][TID:140419800852288][INFO][cps.data_access.jdbc]--Writing in JDBC DB in table adhoc_lm
- Traceback (most recent call last):
- File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/main.py", line 11, in <module>
- cps_app.start_analysis(ConfConstant.ADHOC_LM_OUTPUT_BUCKET_KEY.value)
- File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/cost_per_scan.py", line 119, in start_analysis
- File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/analysis_services/registrar.py", line 44, in start_analysis_service
- File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/analysis_services/base.py", line 53, in start
- File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/analysis_services/adhoc_lm.py", line 52, in run
- File "/mnt/tmp/spark-36781a1a-f7a3-4188-bbd6-147cddcb57eb/cps.zip/cps/data_access/jdbc.py", line 51, in write_df_to_jdbc_db
- File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 703, in save
- File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
- File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
- File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
- py4j.protocol.Py4JJavaError: An error occurred while calling o612.save.
- : org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 4.0 failed 4 times, most recent failure: Lost task 44.3 in stage 4.0 (TID 8453, ip-172-30-2-231.ec2.internal, executor 32): org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://prod-dwh-express-dispatch-lm-parquet/ad=2019-02-27/part-00000-62e6fce7-7655-4be6-b18e-50e1099ccf4a.c000.snappy.parquet. Column: [si_earned], Expected: DoubleType, Found: BINARY
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
- at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
- at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
- at org.apache.spark.scheduler.Task.run(Task.scala:109)
- at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
- at java.lang.Thread.run(Thread.java:748)
- Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.constructConvertNotSupportedException(VectorizedColumnReader.java:245)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBinaryBatch(VectorizedColumnReader.java:490)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:216)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:263)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:164)
- at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186)
- ... 14 more
- Driver stacktrace:
- at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1803)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1791)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1790)
- at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
- at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
- at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1790)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
- at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871)
- at scala.Option.foreach(Option.scala:257)
- at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871)
- at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2024)
- at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1973)
- at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1962)
- at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
- at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682)
- at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
- at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
- at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
- at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
- at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:935)
- at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:933)
- at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
- at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
- at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
- at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:933)
- at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:821)
- at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:96)
- at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
- at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
- at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
- at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
- at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
- at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
- at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
- at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
- at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
- at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
- at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
- at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
- at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
- at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
- at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
- at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
- at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
- at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:498)
- at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
- at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
- at py4j.Gateway.invoke(Gateway.java:282)
- at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
- at py4j.commands.CallCommand.execute(CallCommand.java:79)
- at py4j.GatewayConnection.run(GatewayConnection.java:238)
- at java.lang.Thread.run(Thread.java:748)
- Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file s3://prod-dwh-express-dispatch-lm-parquet/ad=2019-02-27/part-00000-62e6fce7-7655-4be6-b18e-50e1099ccf4a.c000.snappy.parquet. Column: [si_earned], Expected: DoubleType, Found: BINARY
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
- at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
- at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
- at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
- at org.apache.spark.scheduler.Task.run(Task.scala:109)
- at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
- ... 1 more
- Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.constructConvertNotSupportedException(VectorizedColumnReader.java:245)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBinaryBatch(VectorizedColumnReader.java:490)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:216)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:263)
- at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:164)
- at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
- at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186)
- ... 14 more
Add Comment
Please, Sign In to add comment