Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- VoltDB development was based on a real research. The team focused on how the computing changed
- in 21st century and how can we leverage that to build a better system. How much faster can we go?
- Memory was the first key insight because it is getting cheaper while operational stores are
- growing. While memory is 100 times faster than SSDs and 10000 times faster than spinning disks,
- in memory databases werent similarly faster. Regular RDBMSs built to be in memory databases
- disappointed with less than 10 times performance increases.
- What was holding back those systems? Research showed that traditional databases spent
- less than 10 % of their time doing actual work. Most of the time was spent in 2 places.
- -Page Buffer Managemenet and Concurrency Management.
- Page Buffer Management
- -Page buffer system assigns database records to fixed size pages and organize their placement
- and manages which pages are loaded into memory and which are on disk. Pages are tracked as dirty
- or clean. It adds nothing to an in-memory system.
- Concurrency Management
- -Database must solve 2 concurrency problems
- ---Logical problem of multiple user transactions operating concurrently must not conflict
- ---and must read consistent data
- -It is achieved by using high level locks on tables and rows,
- -data structures must be thread safe which adds to the cost.
- Horizontal scaling
- -Beyond memory, the second major shift is to move horizontal. Many small machines
- can be more effective and efficient than one large machine. This were VoltDB is comming from.
- VoltDB barely reads from disk, most of the traditional workload is removed and the disk IO
- is almost 100 % append only stream writes. Even spinning disks can sustain high write throughput
- when used like this. The system is never blocked on disk synchronization.
- This is achieved with 2 mechanisms. Background snapshopts and logical logging. Background snapshopts
- are transactional and data are serialized to disk at a single logicla point in time cluster wide
- and they dont block ongoing operational work.
- Logical logging protects data that mutates between snapshots. Logical log of write operations
- is stremed to disk. If the cluster fails the most recent snapshot is loaded into memory and the
- locigal log is replayed. It has advantage over binary logs because disk IO can begin even
- before the operational work has started.
- One thread
- -To go more than 10 times faster the concurrency costs need to be eliminated. All data operations
- in VoltDb are single threaded, each operation is run completely before starting the next one.
- Just simple data structures with no thread safety are used. Its also much more simple to test and modify such system.
- This choice is possible only with memory-centric design because the latency of reading and writing latency
- from disk is often hiden by shared memory multithreading. when its run single threaded this latency
- will make CPU spend most of the time idle.
- No waiting on users
- -VoltdDB operations are full ACID transactions, if the singel threaded work runs continuosly
- its necessary to eliminate waiting on user mid-transaction. There is no external transaction control
- and rather stored procedures are used.
- Concurrency through scheduling and not shared memory
- -The concurrency problem was solved by doing one thing at a time using single threaded pipeline.
- VoltDB can scale to multiple machines. In abstraction each core is treated as a standalone machine and
- gets single threaded pipeline. Next it needs to keep the pipelines full. It is done by partitioning the data
- which will be described by Erik in following slides. For example a finance app can be partitioned
- by specific customers regions and their data can be routed easily to the right pipeline.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement