Advertisement
Guest User

Untitled

a guest
Jun 17th, 2019
85
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.56 KB | None | 0 0
  1. ID Day Name Description
  2. 1 2016-09-01 Sam Retail
  3. 2 2016-01-28 Chris Retail
  4. 3 2016-02-06 ChrisTY Retail
  5. 4 2016-02-26 Christa Retail
  6. 3 2016-12-06 ChrisTu Retail
  7. 4 2016-12-31 Christi Retail
  8.  
  9. Table B
  10.  
  11. ID SkEY
  12. 1 1.1
  13. 2 1.2
  14. 3 1.3
  15.  
  16. The following query is working but taking a long time as the number of
  17. columns are around 60(just used sample 3).performance isn't good at all as
  18. the result is taking 1 hour for 20 days partitions to process,
  19. Can you please figure out and optimise the query.
  20.  
  21. from pyspark.sql import sparksession
  22. from pyspark.sql import functions as F
  23. from pyspark import HiveContext
  24. hiveContext= HiveContext(sc)
  25.  
  26. def UDF_df(i):
  27. print(i[0])
  28. ABC2=spark.sql("select * From A where day where day
  29. ='{0}'.format(i[0]))
  30. Join=ABC2.join(Tab2.join(ABC2.ID == Tab2.ID))
  31. .select(Tab2.skey,ABC2.Day,ABC2.Name,ABC2.Description)
  32. Join
  33. .select("Tab2.skey","ABC2.Day","ABC2.Name","ABC2.Description")
  34. .write
  35. .mode("append")
  36. .format("parquet')
  37. .insertinto("Table")
  38. ABC=spark.sql("select distinct day from A where day<= ' 2016-01-01' and
  39. day<='2016-12-31'")
  40. Tab2=spark.sql("select * from B where day is not null)
  41. for in in ABC.collect():
  42. UDF_df(i)
  43.  
  44. Above is the pyspark code for a month that I've considered just to test the
  45. total time. A Join B with ID and output ID along with other columns of A.
  46. It's taking 1 hour to complete. Is there any better way of optimising the
  47. query by taking either 1 month or 1 year of data. And also output table is
  48. partitioned on 2 columns where the data is getting inserted which is why
  49. hive contexts are used.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement