Advertisement
Guest User

bdfe

a guest
Dec 18th, 2017
110
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.03 KB | None | 0 0
  1. 1. Hadoop Components
  2. 1) Hadoop Distributed File System
  3. The Most common file system used by hadoop. It is based on Google File System and provides a distributed file system that is designed to run on large clusters (thousands of computers) of small computer machine in a reliable, fault tolerant manner.
  4. HDFS consists of a single NameNode that manages the file system metadata and one or more slave DataNodes that store the actual data.
  5. A file in an HDFS namespace is split into several blocks and those blocks are stored in a set of DataNodes. The NameNode determines the mapping of blocks to the DataNodes. The DataNodes takes care of read and write operation with the file system. They also take car of block creation, deletion, and replication based on instruction given by NameNode.
  6. 2) MapReduce
  7. Software framework for easily writing applications which process big amounts of data in-parallel on large clusters (thousand of nodes) of commodity hardware in a reliable, fault-tolerant manner. The name referse to the two different tasks that Hadoop programs perform.
  8. A. Map Task: Tahkes input data and converts it into a set of data, where individual elements are broken down into tuples (rows or key/value pairs)
  9. B. Reduce Task: Takes the output of map task as input and combines those data tuples into a smaller set of tuples. This task always performed after map task
  10. 3) Sqoop
  11. A tool designed for transferring data from a relational database directly into HDFS or into Hive. It automatically generates classes needed to import data into HDFS after analyzing the schema’s table, then reading of tables’ content is a parallel MapReduce job.
  12. 4) Oozie
  13. Oozie is a open-source tool for handling complex pipelines of data processing. Using Oozie, users can define actions and dependencies between them and it will schedule them without any intervention.
  14. 5) Flume
  15. A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amount of log data. It is designed to import streaming data flows.
  16. 6) HBase
  17. A Distributed, column oriented database management system, modeled on Google’s Big Table that runs on top of HDFS.
  18. 7) RHadoop
  19. RHadoop is a collection of five R packages that allow users to manage and analyze data with Hadoop.
  20. 8) RHIPE
  21. R package that provides a way to use Hadoop from R.
  22. 9) R
  23. Open source programming language and software environment for statistical computing and graphics
  24. 10) Mahout
  25. To build an environment for quickly creating scalable performant machine learning applications. Divided into 4 main groups, collective filtering, categorization, clustering, mining of parallel frequent patterns.
  26. 11) Crunch
  27. A Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.
  28. 12) Cascading
  29. A Java API for defining complex data flows and integrating those flows with back-end systems, and a query planner for mapping and executing logical flows onto a computing platform.
  30. 13) Pig
  31. Intended to allow people using Hadoop to focus more on analyzing large datasets and thus spend less time having to write mapper and reducer programs.
  32. 14) Hive
  33. A distributed data warehouse. It enables easy data ETL (Extract, Transform, Load) from HDFS or other data storage like HBase or other traditional DBMS. It has the advantage of using a SQL-like syntax, the HiveQL.
  34. 15) Chukwa
  35. A data collection system for monitoring large distributed systems.
  36. 16) Aaaaa
  37. 2. Definition of Smart City
  38. 1) A city well performing in a forward looking way in economy, people, governance, mobility, environment, and living, built on the smart combination of endowments and activities of self-decisive, independent, and aware citizens. The focus of this definition is it views a smart city as a futuristic model of collaborative component.
  39. 2) Smart city is a very broad concept, which includes not only physical infrastructure but also human and social factor. The focus of this definition is it included the social aspects and agreed that smart city has a broad focus.
  40. 3) A city that monitors and integrates conditions of all of its critical infrastructures, including roads, bridges, tunnels, rails, subways, airports, seaports, communications, water, power, even major buildings, can better optimize its resources, plan its preventive maintenance activities, and monitor security aspects while maximizing services to its citizens. This definition focus on the integration of infrastructure and systems that monitor and control the resources to achieve sustainability as the main aspect of a smart city.
  41. 3. Security part of the building blocks
  42. The security stage of the data life cycle describe the security of data, governance bodies, organization, and agendas. It also clarifies the roles in data stewardship. Therefore,
  43. appropriateness in terms of data type and use must be considered in developing data,systems, tools, policies, and procedures to protect legitimate privacy, confidentiality, and intellectual property.
  44. 4.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement