SHARE
TWEET

Untitled

a guest Jun 17th, 2019 56 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. ID Day Name Description
  2.  1   2016-09-01  Sam   Retail
  3.  2   2016-01-28  Chris Retail
  4.  3   2016-02-06  ChrisTY Retail
  5.  4   2016-02-26  Christa Retail
  6.  3   2016-12-06  ChrisTu Retail
  7.  4   2016-12-31  Christi Retail
  8.  
  9. Table B
  10.  
  11. ID SkEY
  12. 1  1.1
  13. 2  1.2
  14. 3  1.3
  15.  
  16. The following query is working but taking a long time as the number of  
  17. columns are around 60(just used sample 3).performance isn't good at all as  
  18. the result is taking 1 hour for 20 days partitions to process,
  19. Can you please figure out and optimise the query.
  20.  
  21. from pyspark.sql import sparksession
  22. from pyspark.sql import functions as F
  23. from pyspark import HiveContext
  24. hiveContext= HiveContext(sc)
  25.  
  26.  def UDF_df(i):
  27. print(i[0])
  28. ABC2=spark.sql("select * From A where day where day    
  29. ='{0}'.format(i[0]))
  30. Join=ABC2.join(Tab2.join(ABC2.ID == Tab2.ID))
  31. .select(Tab2.skey,ABC2.Day,ABC2.Name,ABC2.Description)
  32. Join
  33.  .select("Tab2.skey","ABC2.Day","ABC2.Name","ABC2.Description")
  34.  .write
  35.  .mode("append")
  36.  .format("parquet')
  37. .insertinto("Table")
  38. ABC=spark.sql("select distinct day from A where day<= ' 2016-01-01' and    
  39. day<='2016-12-31'")
  40. Tab2=spark.sql("select * from B where day is not null)
  41. for in in ABC.collect():
  42. UDF_df(i)
  43.  
  44.  Above is the pyspark code for a month that I've considered just to test the
  45.  total time. A Join B with ID and output ID along with other columns of A.
  46.  It's taking 1 hour to complete. Is there any better way of optimising the
  47. query by taking either 1 month or 1 year of data. And also output table is
  48. partitioned on 2 columns where the data is getting inserted which is why    
  49. hive contexts are used.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top