A number of tools are available for working with Big Data. Many of the tools are open source and Linux distribution based. This course covers the fundamentals of Big Data, including positioning it in a historical IT context, the tools available for working with Big Data, the Big Data stack, and finally, an in-depth look at Apache Hadoop.
Learning Objectives
Big Data in Perspective
- start the course
- put in Big Data into perspective of supercomputing
- describe Big Data in context of technology waves and put it into the perspective by comparing to previous technology waves
- list the six emerging technologies and relate them to Big Data
Global Data
- define big data and describe Gartner's Vectors
- define structured and unstructured data in terms of Gartner's model
- list the standard sizes used in Big Data to determine sizes of data sets
The Key Contributors
- list the three primary key contributors to the origins of Big Data
- list the primary Big Data distro companies
The Apache Software Foundation
- describe the Apache Software Foundation
- list projects attributable to the Apache Software Foundation
- list projects attributable to the Apache Software Foundation
- describe Cascading and MongoDB
Big Data Stack
- list the layers of the Big Data Stack
- list the common Big Data components
- describe columnar databases and Hbase
Hadoop in Detail
- describe solutions for scaling computing
- describe the design principles of Hadoop
- map out the functional view of Hadoop
- describe the architecture of HDFS
- describe the architecture of Yarn
- describe the attributes and processes of MapReduce
- describe the architecture of Spark
Practice: Big Data elements and functions
- describe Big Data in a historical context and the tools available for working with Big Data