Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- spark.range(100, numPartitions=1).write.bucketBy(3, 'id').sortBy('id').saveAsTable('df')
- # No need to `repartition`.
- spark.table('df').repartition(3, 'id').explain()
- # == Physical Plan ==
- # *(1) FileScan parquet default.df2[id#33620L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>, # SelectedBucketsCount: 3 out of 3
- # Still need to `sortWithinPartitions`.
- spark.table('df').sortWithinPartitions('id').explain()
- # == Physical Plan ==
- # *(1) Sort [id#33620L ASC NULLS FIRST], false, 0
- # +- *(1) FileScan parquet default.df2[id#33620L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 3 out of 3
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement