Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- a data lake is a centralized repository for all data, including structured and unstructured. A data warehouse utilizes a pre-defined schema optimized for analytics. In a data lake, the schema is not defined, enabling additional types of analytics like big data analytics, full text search, real-time analytics, and machine learning.
- __________________________________________________________________________________________________________________________________________
- Data Warehouse
- Relational data from transactional systems, operational databases, and line of business applications.
- Designed prior to the data warehouse implementation (schema-on-write).
- Fastest query results using higher cost storage.
- Highly curated data that serves as the central version of the truth.
- Business analysts, data scientists, and data developers.
- Batch reporting, BI, and visualizations
- Data Lake
- Non-relational and relational data from IoT devices, web sites, mobile apps, social media, and corporate applications.
- Written at the time of analysis (schema-on-read).
- Query results getting faster using low-cost storage.
- Any data that may or may not be curated (i.e. raw data).
- Data scientists, data developers, and business analysts (using curated data).
- Machine learning, predictive analytics, data discovery, and profiling.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement