a guest Oct 3rd, 2017 87 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
- - PC-184153-106
- - Username: LABS\cloudera
- - Password: cloudera
- - Swift <-> S3 <-> HRD (Cloud Datastore)
- - "schema on demand" :)
- - Column-based DBs: Apache Parquet, Amazon Dynamo, BigQuery
- - CAP theorem
- - Eventual consistency, Cassandra/Riak/CouchDB
- - What does a lack of consistency mean in things like Mongo/BigTable/Redis?
- - Thoughtworks -> Martin Fowler
- - HortonWorks (MS Info offering leverages this)
- - Read map reduce patent: system and method for efficient large-scale data
- - FERPA?
- - GFS vs HDFS
- - Hadoop: Common + HDFS + MapReduce
- - Beowolf cluster? :(
- - HBase, Spark, and CLoudera Impala bypass MapReduce, queries are much faster (near
- real time)
- - Hadoop: Jetty embeded running in NameNode
- - Storm, Kafka, Spark, "streaming"?
- - AWS Elastic MapReduce (EMR)
- - Thrift? (like ODBC, JDBC)
- - https://en.wikipedia.org/wiki/Bitmap_index
- - https://en.wikipedia.org/wiki/CAP_theorem
- - https://en.wikipedia.org/wiki/Column-oriented_DBMS
- - Avro/ORC/Regex/Thrift
- - JavaDB used as metastore for hive?
- - Cassandra similar to HBase
- - HBase explicit lock, Cassandra eventualy consistent
- - HBAse reads, Cassandra writes
- - Spark typically added cia partnerships with Databricks
- - Apache Mesos for clustering
- - Mahout on the way out for ML, Spark :)
RAW Paste Data