Guest User

Untitled

a guest
Nov 13th, 2018
157
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.33 KB | None | 0 0
  1. +---+----+----------------------------+
  2. | id| seq| seq_detail|
  3. +---+----+----------------------------+
  4. | a| 1| {'a':'apple'}|
  5. | b| 1| {'a':'apple'}|
  6. | c| 1 2| [{'a':'apple'},{'b':'ben'}]|
  7. | d| 2| {'b':'ben'}|
  8. | e| 1| {'a':'apple'}|
  9. | f| 3| {'c':'cat'}|
  10. | g| 1| {'a':'apple'}|
  11. | h| 1| {'a':'apple'}|
  12. +---+----+----------------------------+
  13.  
  14. +----+-------------------------+-----------+---------+
  15. | seq| id| seq_detail| id_count|
  16. +----+-------------------------+-----------+---------+
  17. | 1_1|['a', 'b', 'c', 'e', 'g']| apple| 5|
  18. | 1_2| 'h'| apple| 1|
  19. | 2_1| ['c', 'd']| ben| 2|
  20. | 3_1| 'f'| cat| 1|
  21. +---+----+---------------------+-----------+---------+
  22.  
  23. from pyspark.sql import SparkSession
  24.  
  25.  
  26. spark = SparkSession.builder
  27. .master('local[*]')
  28. .appName("sample")
  29. .config("spark.some.config.option", "some-value")
  30. .getOrCreate()
  31.  
  32. for i in range(1,4):
  33. df = spark.read.parquet('./parquet')
  34. df = df. filter(df.seq.contains(str(i)))
  35. df.show()
  36. if df.count() == 0:
  37. pass
  38. else:
  39. df = df.collect()
  40. data = [x.asdict() for x in df]
Add Comment
Please, Sign In to add comment