Advertisement
Guest User

Untitled

a guest
Jun 25th, 2019
74
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.79 KB | None | 0 0
  1. spark.range(100, numPartitions=1).write.bucketBy(3, 'id').sortBy('id').saveAsTable('df')
  2.  
  3. # No need to `repartition`.
  4. spark.table('df').repartition(3, 'id').explain()
  5. # == Physical Plan ==
  6. # *(1) FileScan parquet default.df2[id#33620L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>, # SelectedBucketsCount: 3 out of 3
  7.  
  8. # Still need to `sortWithinPartitions`.
  9. spark.table('df').sortWithinPartitions('id').explain()
  10. # == Physical Plan ==
  11. # *(1) Sort [id#33620L ASC NULLS FIRST], false, 0
  12. # +- *(1) FileScan parquet default.df2[id#33620L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[df], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>, SelectedBucketsCount: 3 out of 3
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement