MOC 20775 - Performing Data Engineering on Microsoft HD Insight

This MOC20775 - Performing Data Engineering on Microsoft HD Insight training class teaches students to plan and implement big data workflows on HDInsight.

The primary audience for this course is data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

  Microsoft Certified Partner

Webucator is a Microsoft Certified Partner for Learning Solutions (CPLS). This class uses official Microsoft courseware and will be delivered by a Microsoft Certified Trainer (MCT).

Goals
  1. Learn to deploy HDInsight clusters.
  2. Learn to authorizing users to access resources.
  3. Learn to loading data into HDInsight.
  4. Learn to troubleshooting HDInsight.
  5. Learn to implement batch solutions.
  6. Learn to design batch ETL solutions for big data with Spark.
  7. Learn to analyze data with Spark SQL.
  8. Learn to analyze Data with Hive and Phoenix.
  9. Learn to describe Stream Analytics.
  10. Learn to implement Spark streaming using the DStream API.
  11. Learn to develop big data real-time processing solutions with Apache Storm.
  12. Learn to build solutions that use Kafka and HBase.
Outline
  1. Getting Started with HDInsight
    1. What is Big Data?
    2. Introduction to Hadoop
    3. Working with MapReduce Function
    4. Introducing HDInsight
    5. Lab: Working with HDInsight
      1. Provision an HDInsight cluster and run MapReduce jobs
  2. Deploying HDInsight Clusters
    1. Identifying HDInsight cluster types
    2. Managing HDInsight clusters by using the Azure portal
    3. Managing HDInsight Clusters by using Azure PowerShell
    4. Lab: Managing HDInsight clusters with the Azure Portal
      1. Create an HDInsight cluster that uses Data Lake Store storage
      2. Customize HDInsight by using script actions
      3. Delete an HDInsight cluster
  3. Authorizing Users to Access Resources
    1. Non-domain Joined clusters
    2. Configuring domain-joined HDInsight clusters
    3. Manage domain-joined HDInsight clusters
    4. Lab: Authorizing Users to Access Resources
      1. Prepare the Lab Environment
      2. Manage a non-domain joined cluster
  4. Loading data into HDInsight
    1. Storing data for HDInsight processing
    2. Using data loading tools
    3. Maximising value from stored data
    4. Lab: Loading Data into your Azure account
      1. Load data for use with HDInsight
  5. Troubleshooting HDInsight
    1. Analyze HDInsight logs
    2. YARN logs
    3. Heap dumps
    4. Operations management suite
    5. Lab: Troubleshooting HDInsight
      1. Analyze HDInsight logs
      2. Analyze YARN logs
      3. Monitor resources with Operations Management Suite
  6. Implementing Batch Solutions
    1. Apache Hive storage
    2. HDInsight data queries using Hive and Pig
    3. Operationalize HDInsight
    4. Lab: Implement Batch Solutions
      1. Deploy HDInsight cluster and data storage
      2. Use data transfers with HDInsight clusters
      3. Query HDInsight cluster data
  7. Design Batch ETL solutions for big data with Spark
    1. What is Spark?
    2. ETL with Spark
    3. Spark performance
    4. Lab: Design Batch ETL solutions for big data with Spark.
      1. Create a HDInsight Cluster with access to Data Lake Store
      2. Use HDInsight Spark cluster to analyze data in Data Lake Store
      3. Analyzing website logs using a custom library with Apache Spark cluster on HDInsight
      4. Managing resources for Apache Spark cluster on Azure HDInsight
  8. Analyze Data with Spark SQL
    1. Implementing iterative and interactive queries
    2. Perform exploratory data analysis
    3. Lab: Performing exploratory data analysis by using iterative and interactive queries
      1. Build a machine learning application
      2. Use zeppelin for interactive data analysis
      3. View and manage Spark sessions by using Livy
  9. Analyze Data with Hive and Phoenix
    1. Implement interactive queries for big data with interactive hive.
    2. Perform exploratory data analysis by using Hive
    3. Perform interactive processing by using Apache Phoenix
    4. Lab: Analyze data with Hive and Phoenix
      1. Implement interactive queries for big data with interactive Hive
      2. Perform exploratory data analysis by using Hive
      3. Perform interactive processing by using Apache Phoenix
  10. Stream Analytics
    1. Stream analytics
    2. Process streaming data from stream analytics
    3. Managing stream analytics jobs
    4. Lab: Implement Stream Analytics
      1. Process streaming data with stream analytics
      2. Managing stream analytics jobs
  11. Implementing Streaming Solutions with Kafka and HBase
    1. Building and Deploying a Kafka Cluster
    2. Publishing, Consuming, and Processing data using the Kafka Cluster
    3. Using HBase to store and Query Data
    4. Lab: Implementing Streaming Solutions with Kafka and HBase
      1. Create a virtual network and gateway
      2. Create a storm cluster for Kafka
      3. Create a Kafka producer
      4. Create a streaming processor client topology
      5. Create a Power BI dashboard and streaming dataset
      6. Create an HBase cluster
      7. Create a streaming processor to write to HBase
  12. Develop big data real-time processing solutions with Apache Storm
    1. Persist long term data
    2. Stream data with Storm
    3. Create Storm topologies
    4. Configure Apache Storm
    5. Lab: Developing big data real-time processing solutions with Apache Storm
      1. Stream data with Storm
      2. Create Storm Topologies
  13. Create Spark Streaming Applications
    1. Working with Spark Streaming
    2. Creating Spark Structured Streaming Applications
    3. Persistence and Visualization
    4. Lab: Building a Spark Streaming Application
      1. Installing Required Software
      2. Building the Azure Infrastructure
      3. Building a Spark Streaming Pipeline
Class Materials

Each student in our Live Online and our Onsite classes receives a comprehensive set of materials, including course notes and all the class examples.

Class Prerequisites

Experience in the following is required for this SQL Server class:

  • Programming experience using R, and familiarity with common R packages.
  • Knowledge of common statistical methods and data analysis best practices.
  • Basic knowledge of the Microsoft Windows operating system and its core functionality.
  • Working knowledge of relational databases.
Preparing for Class
Certifications

Training for your Team

Length: 5 Days
  • Private Class for your Team
  • Online or On-location
  • Customizable
  • Expert Instructors

What people say about our training

Excellent course, excellent instructor, excellent course materials. Have highly recommended course to my manager and colleagues.
Debra Hackett
PAREXEL International Corporation
Great introduction course to learn about SharePoint 2016
Maritza Watkins
Bank of Tokyo Mitsubishi
Provides a great team working environment. The trainers and Classes are incredibly informative. If for first time training or just a refresher course I would recommend Webucator to any company looking to improve their employees skills.
Casey Ringeisen
PAE
The MVC 5 course is a great course for beginners in MVC.
Shaun Cline
Huntington Bank

No cancelation for low enrollment

Certified Microsoft Partner

Registered Education Provider (R.E.P.)

GSA schedule pricing

61,011

Students who have taken Instructor-led Training

11,714

Organizations who trust Webucator for their Instructor-led training needs

100%

Satisfaction guarantee and retake option

9.29

Students rated our trainers 9.29 out of 10 based on 28,956 reviews

Contact Us or call 1-877-932-8228