Untitled

- PC-184153-106
- Username: LABS\cloudera
- Password:  cloudera
- Swift <-> S3 <-> HRD (Cloud Datastore)
- "schema on demand" :)
- Column-based DBs: Apache Parquet, Amazon Dynamo, BigQuery
- CAP theorem
- Eventual consistency, Cassandra/Riak/CouchDB
- What does a lack of consistency mean in things like Mongo/BigTable/Redis?
- Thoughtworks -> Martin Fowler
- HortonWorks (MS Info offering leverages this)
- Read map reduce patent: system and method for efficient large-scale data
  processing
- FERPA?
- GFS vs HDFS
- Hadoop: Common + HDFS + MapReduce
- Beowolf cluster? :(
- HBase, Spark, and CLoudera Impala bypass MapReduce, queries are much faster (near
  real time)
- Hadoop: Jetty embeded running in NameNode
- Storm, Kafka, Spark, "streaming"?
- AWS Elastic MapReduce (EMR)
- Thrift? (like ODBC, JDBC)
- https://en.wikipedia.org/wiki/Bitmap_index
- https://en.wikipedia.org/wiki/CAP_theorem
- https://en.wikipedia.org/wiki/Column-oriented_DBMS
- Avro/ORC/Regex/Thrift
- JavaDB used as metastore for hive?
- Cassandra similar to HBase
- HBase explicit lock, Cassandra eventualy consistent
- HBAse reads, Cassandra writes
- Spark typically added cia partnerships with Databricks
- Apache Mesos for clustering
- Mahout on the way out for ML, Spark :)