Apache HBase is a NoSQL column-oriented database that provides big data storage for semi-structured data. It runs on HDFS and ZooKeeper and can be integrated with MapReduce. In a column-oriented database, data in a column is stored together using column families rather than in a row. The physical architecture uses a Master-Slave relationship and distributes the data in a cluster-like format. This course will show how to install HBase and discuss the HBase architecture and data modeling designs.
Learning Objectives
Installation
- start the course
- describe HBase and its features
- identify the hardware requirements for HBase
- identify the software requirements for HBase
- describe the filesystems used for HBase
- describe the different HBase installation modes
- install HBase in local mode
- install HBase in fully distributed mode
- access and navigate the web-based management console for HBase
- get started with using the HBase shell
Architecture
- describe the HBase components and their functionalities
- describe the HFile and Region components and their functionalities in the HBase architecture
- describe the functionality of the WAL and MemStore in an HBase architecture
- describe minor and major compaction and region splitting
- describe how data replication is used in HBase
- identify the various methods to access HBase through clients
- secure HBase using authentication and authorization methods
- describe MapReduce and how it is integrated with HBase
Data Modeling
- describe the HBase schema
- identify the considerations and practices that go into designing an HBase table
- design rowkeys for HBase tables
- design the schema to support versions, different datatypes, and joins
- determine which rows and cells to keep after deletion from a table
Practice: Install HBase
- install, configure, and secure HBase