Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- +---+----+----------------------------+
- | id| seq| seq_detail|
- +---+----+----------------------------+
- | a| 1| {'a':'apple'}|
- | b| 1| {'a':'apple'}|
- | c| 1 2| [{'a':'apple'},{'b':'ben'}]|
- | d| 2| {'b':'ben'}|
- | e| 1| {'a':'apple'}|
- | f| 3| {'c':'cat'}|
- | g| 1| {'a':'apple'}|
- | h| 1| {'a':'apple'}|
- +---+----+----------------------------+
- +----+-------------------------+-----------+---------+
- | seq| id| seq_detail| id_count|
- +----+-------------------------+-----------+---------+
- | 1_1|['a', 'b', 'c', 'e', 'g']| apple| 5|
- | 1_2| 'h'| apple| 1|
- | 2_1| ['c', 'd']| ben| 2|
- | 3_1| 'f'| cat| 1|
- +---+----+---------------------+-----------+---------+
- from pyspark.sql import SparkSession
- spark = SparkSession.builder
- .master('local[*]')
- .appName("sample")
- .config("spark.some.config.option", "some-value")
- .getOrCreate()
- for i in range(1,4):
- df = spark.read.parquet('./parquet')
- df = df. filter(df.seq.contains(str(i)))
- df.show()
- if df.count() == 0:
- pass
- else:
- df = df.collect()
- data = [x.asdict() for x in df]
Add Comment
Please, Sign In to add comment