Untitled

VoltDB development was based on a real research. The team focused on how the computing changed
in 21st century and how can we leverage that to build a better system. How much faster can we go?

Memory was the first key insight because it is getting cheaper while operational stores are
growing. While memory is 100 times faster than SSDs and 10000 times faster than spinning disks,
in memory databases werent similarly faster. Regular RDBMSs built to be in memory databases
disappointed with less than 10 times performance increases.

What was holding back those systems? Research showed that traditional databases spent
less than 10 % of their time doing actual work. Most of the time was spent in 2  places.

-Page Buffer Managemenet and Concurrency Management.

Page Buffer Management
-Page buffer system assigns database records to fixed size pages and organize their placement
and manages which pages are loaded into memory and which are on disk. Pages are tracked as dirty
or clean. It adds nothing to an in-memory system.

Concurrency Management
-Database must solve 2 concurrency problems
---Logical problem of multiple user transactions operating concurrently must not conflict
---and must read consistent data
-It is achieved by using high level locks on tables and rows,
-data structures must be thread safe which adds to the cost.

Horizontal scaling
-Beyond memory, the second major shift is to move horizontal. Many small machines
can be more effective and efficient than one large machine. This were VoltDB is comming from.

VoltDB barely reads from disk, most of the traditional workload is removed and the disk IO
is almost 100 % append only stream writes. Even spinning disks can sustain high write throughput
when used like this. The system is never blocked on disk synchronization.

This is achieved with 2 mechanisms. Background snapshopts and logical logging. Background snapshopts
are transactional and data are serialized to disk at a single logicla point in time cluster wide
and they dont block ongoing operational work.

Logical logging protects data that mutates between snapshots. Logical log of write operations
is stremed to disk. If the cluster fails the most recent snapshot is loaded into memory and the
locigal log is replayed. It has advantage over binary logs because disk IO can begin even
before the operational work has started.

One thread
-To go more than 10 times faster the concurrency costs need to be eliminated. All data operations
in VoltDb are single threaded, each operation is run completely before starting the next one.
Just simple data structures with no thread safety are used. Its also much more simple to test and modify such system.

This choice is possible only with memory-centric design because the latency of reading and writing latency
from disk is often hiden by shared memory multithreading. when its run single threaded this latency
will make CPU spend most of the time idle.

No waiting on users
-VoltdDB operations are full ACID transactions, if the singel threaded work runs continuosly
its necessary to eliminate waiting on user mid-transaction. There is no external transaction control
and rather stored procedures are used.

Concurrency through scheduling and not shared memory
-The concurrency problem was solved by doing one thing at a time using single threaded pipeline.
VoltDB can scale to multiple machines. In abstraction each core is treated as a standalone machine and
gets single threaded pipeline. Next it needs to keep the pipelines full. It is done by partitioning the data
which will be described by Erik in following slides. For example a finance app can be partitioned
by specific customers regions and their data can be routed easily to the right pipeline.