Hadoop Programming on the Hortonworks Data Platform for Managers Training

This training course introduces students to Apache Hadoop and key Hadoop ecosystem projects: Pig, Hive, Sqoop, and Spark. This training course is appropriate for Managers, Business Analysts, and IT Architects.

Location

Public Classes: Delivered live online via WebEx and guaranteed to run . Join from anywhere!

Private Classes: Delivered at your offices , or any other location of your choice.

Goals
  1. Learn the lab environment.
  2. Get started with Apache Ambari.
  3. Learn the Hadoop Distributed File System.
  4. Get started with Apache Pig.
  5. Work with Data Sets in Apache Pig.
  6. Work with the Hive and Beeline Shells.
  7. Work with Hive Data Definition Language.
  8. Work with the Spark Shell.
Outline
  1. MapReduce Overview
    1. The Client – Server Processing Pattern
    2. Distributed Computing Challenges
    3. MapReduce Defined
    4. Google's MapReduce
    5. The Map Phase of MapReduce
    6. The Reduce Phase of MapReduce
    7. MapReduce Explained
    8. MapReduce Word Count Job
    9. MapReduce Shared-Nothing Architecture
    10. Similarity with SQL Aggregation Operations
    11. Example of Map & Reduce Operations using JavaScript
    12. Problems Suitable for Solving with MapReduce
    13. Typical MapReduce Jobs
    14. Fault-tolerance of MapReduce
    15. Distributed Computing Economics
    16. MapReduce Systems
    17. Summary
  2. Hadoop Overview
    1. Apache Hadoop
    2. Apache Hadoop Logo
    3. Typical Hadoop Applications
    4. Hadoop Clusters
    5. Hadoop Design Principles
    6. Hadoop Versions
    7. Hadoop's Main Components
    8. Hadoop Simple Definition
    9. Side-by-Side Comparison: Hadoop 1 and Hadoop 2
    10. Hadoop-based Systems for Data Analysis
    11. Other Hadoop Ecosystem Projects
    12. Hadoop Caveats
    13. Hadoop Distributions
    14. Cloudera Distribution of Hadoop (CDH)
    15. Cloudera Distributions
    16. Hortonworks Data Platform (HDP)
    17. MapR
    18. Summary
  3. Hadoop Distributed File System Overview
    1. Hadoop Distributed File System (HDFS)
    2. HDFS High Availability
    3. HDFS "Fine Print"
    4. Storing Raw Data in HDFS
    5. Hadoop Security
    6. HDFS Rack-awareness
    7. Data Blocks
    8. Data Block Replication Example
    9. HDFS NameNode Directory Diagram
    10. Accessing HDFS
    11. Examples of HDFS Commands
    12. Other Supported File Systems
    13. WebHDFS
    14. Examples of WebHDFS Calls
    15. Client Interactions with HDFS for the Read Operation
    16. Read Operation Sequence Diagram
    17. Client Interactions with HDFS for the Write Operation
    18. Communication inside HDFS
    19. Summary
  4. Apache Pig Scripting Platform
    1. What is Pig?
    2. Pig Latin
    3. Apache Pig Logo
    4. Pig Execution Modes
    5. Local Execution Mode
    6. MapReduce Execution Mode
    7. Running Pig
    8. Running Pig in Batch Mode
    9. What is Grunt?
    10. Pig Latin Statements
    11. Pig Programs
    12. Pig Latin Script Example
    13. SQL Equivalent
    14. Differences between Pig and SQL
    15. Statement Processing in Pig
    16. Comments in Pig
    17. Supported Simple Data Types
    18. Supported Complex Data Types
    19. Arrays
    20. Defining Relation's Schema
    21. Not Matching the Defined Schema
    22. The bytearray Generic Type
    23. Using Field Delimiters
    24. Loading Data with TextLoader()
    25. Referencing Fields in Relations
    26. Summary
  5. Apache Pig HDFS Interface
    1. The HDFS Interface
    2. FSShell Commands (Short List)
    3. Grunt's Old File System Commands
    4. Summary
  6. Apache Pig Relational and Eval Operators
    1. Pig Relational Operators
    2. Example of Using the JOIN Operator
    3. Example of Using the Order By Operator
    4. Caveats of Using Relational Operators
    5. Pig Eval Functions
    6. Caveats of Using Eval Functions (Operators)
    7. Example of Using Single-column Eval Operations
    8. Example of Using Eval Operators For Global Operations
    9. Summary
  7. Hive
    1. What is Hive?
    2. Apache Hive Logo
    3. Hive's Value Proposition
    4. Who uses Hive?
    5. Hive's Main Sub-Systems
    6. Hive Features
    7. The "Classic" Hive Architecture
    8. The New Hive Architecture
    9. HiveQL
    10. Where are the Hive Tables Located?
    11. Hive Command-line Interface (CLI)
    12. The Beeline Command Shell
    13. Summary
  8. Hive Command-line Interface
    1. Hive Command-line Interface (CLI)
    2. The Hive Interactive Shell
    3. Running Host OS Commands from the Hive Shell
    4. Interfacing with HDFS from the Hive Shell
    5. The Hive in Unattended Mode
    6. The Hive CLI Integration with the OS Shell
    7. Executing HiveQL Scripts
    8. Comments in Hive Scripts
    9. Variables and Properties in Hive CLI
    10. Setting Properties in CLI
    11. Example of Setting Properties in CLI
    12. Hive Namespaces
    13. Using the SET Command
    14. Setting Properties in the Shell
    15. Setting Properties for the New Shell Session
    16. Setting Alternative Hive Execution Engines
    17. The Beeline Shell
    18. Connecting to the Hive Server in Beeline
    19. Beeline Command Switches
    20. Beeline Internal Commands
    21. Summary
  9. Hive Data Definition Language
    1. Hive Data Definition Language
    2. Creating Databases in Hive
    3. Using Databases
    4. Creating Tables in Hive
    5. Supported Data Type Categories
    6. Common Numeric Types
    7. String and Date / Time Types
    8. Miscellaneous Types
    9. Example of the CREATE TABLE Statement
    10. Working with Complex Types
    11. Table Partitioning
    12. Table Partitioning
    13. Table Partitioning on Multiple Columns
    14. Viewing Table Partitions
    15. Row Format
    16. Data Serializers / Deserializers
    17. File Format Storage
    18. File Compression
    19. More on File Formats
    20. The ORC Data Format
    21. Converting Text to ORC Data Format
    22. The EXTERNAL DDL Parameter
    23. Example of Using EXTERNAL
    24. Creating an Empty Table
    25. Dropping a Table
    26. Table / Partition(s) Truncation
    27. Alter Table/Partition/Column
    28. Views
    29. Create View Statement
    30. Why Use Views?
    31. Restricting Amount of Viewable Data
    32. Examples of Restricting Amount of Viewable Data
    33. Creating and Dropping Indexes
    34. Describing Data
    35. Summary
  10. Hive Data Manipulation Language
    1. Hive Data Manipulation Language (DML)
    2. Using the LOAD DATA statement
    3. Example of Loading Data into a Hive Table
    4. Loading Data with the INSERT Statement
    5. Appending and Replacing Data with the INSERT Statement
    6. Examples of Using the INSERT Statement
    7. Multi Table Inserts
    8. Multi Table Inserts Syntax
    9. Multi Table Inserts Example
    10. Summary
  11. Apache Sqoop
    1. What is Sqoop?
    2. Apache Sqoop Logo
    3. Sqoop Import / Export
    4. Sqoop Help
    5. Examples of Using Sqoop Commands
    6. Data Import Example
    7. Fine-tuning Data Import
    8. Controlling the Number of Import Processes
    9. Data Splitting
    10. Helping Sqoop Out
    11. Example of Executing Sqoop Load in Parallel
    12. A Word of Caution: Avoid Complex Free-Form Queries
    13. Using Direct Export from Databases
    14. Example of Using Direct Export from MySQL
    15. More on Direct Mode Import
    16. Changing Data Types
    17. Example of Default Types Overriding
    18. File Formats
    19. The Apache Avro Serialization System
    20. Binary vs Text
    21. More on the SequenceFile Binary Format
    22. Generating the Java Table Record Source Code
    23. Data Export from HDFS
    24. Export Tool Common Arguments
    25. Data Export Control Arguments
    26. Data Export Example
    27. Using a Staging Table
    28. INSERT and UPDATE Statements
    29. INSERT Operations
    30. UPDATE Operations
    31. Example of the Update Operation
    32. Failed Exports
    33. Sqoop2
    34. Sqoop2 Architecture
    35. Summary
  12. Introduction to Apache Spark
    1. What is Apache Spark
    2. A Short History of Spark
    3. Where to Get Spark?
    4. The Spark Platform
    5. Spark Logo
    6. Common Spark Use Cases
    7. Languages Supported by Spark
    8. Running Spark on a Cluster
    9. The Driver Process
    10. Spark Applications
    11. Spark Shell
    12. The spark-submit Tool
    13. The spark-submit Tool Configuration
    14. The Executor and Worker Processes
    15. The Spark Application Architecture
    16. Interfaces with Data Storage Systems
    17. Limitations of Hadoop's MapReduce
    18. Spark vs MapReduce
    19. Spark as an Alternative to Apache Tez
    20. The Resilient Distributed Dataset (RDD)
    21. Spark Streaming (Micro-batching)
    22. Spark SQL
    23. Example of Spark SQL
    24. Spark Machine Learning Library
    25. GraphX
    26. Spark vs R
    27. Summary
  13. The Spark Shell
    1. The Spark Shell
    2. The Spark Shell UI
    3. Spark Shell Options
    4. Getting Help
    5. The Spark Context (sc) and SQL Context (sqlContext)
    6. The Shell Spark Context
    7. Loading Files
    8. Saving Files
    9. Basic Spark ETL Operations
    10. Summary
Class Materials

Each student in our Live Online and our Onsite classes receives a comprehensive set of materials, including course notes and all the class examples.

Class Prerequisites

Experience in the following is required for this Hadoop class:

  • General knowledge of programming.

Training for Yourself

$1,250.00 or 2 vouchers

Upcoming Live Online Classes

Please select a class.

Training for your Team

Length: 2 Days
  • Private Class for your Team
  • Online or On-location
  • Customizable
  • Expert Instructors

What people say about our training

Far superior to other classes we have taken!
Debbie Kurzhals
Power Engineers
A semester's worth of class in one solid, easy-to-follow, day!
Nathan Woolard
University of Central Oklahoma
The class was exceptional. I was surprised at how effective online training could be.
Clint Sorensen
Entaire Global Companies, Inc.
Our instructor was fun and engaging and kept the day moving right along.
Gary Bonnell
State Farm Insurance

No cancelation for low enrollment

Certified Microsoft Partner

Registered Education Provider (R.E.P.)

GSA schedule pricing

63,830

Students who have taken Instructor-led Training

11,921

Organizations who trust Webucator for their Instructor-led training needs

100%

Satisfaction guarantee and retake option

9.30

Students rated our trainers 9.30 out of 10 based on 29,938 reviews

Contact Us or call 1-877-932-8228