Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- """
- A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
- Represents an immutable, partitioned collection of elements that can be
- operated on in parallel.
- """
- def __init__(self, jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSerializer())):
- self._jrdd = jrdd
- self.is_cached = False
- self.is_checkpointed = False
- self.ctx = ctx
- self._jrdd_deserializer = jrdd_deserializer
- self._id = jrdd.id()
- self.partitioner = None
Add Comment
Please, Sign In to add comment