Machine Learning with Apache Spark Training

This Machine Learning with Apache Spark training class provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning. This course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

This course is meant for Data Scientists, Business Analysts, Software Developers, and IT Architects.

Goals
  1. Learn machine learning algorithms.
  2. Obtain an introduction to functional programming.
  3. Obtain an introduction to Apache Spark.
  4. Learn Spark Shell.
  5. Learn about the Spark Machine Learning Library.
  6. Learn text mining.
Outline
  1. Machine Learning Algorithms
    1. Supervised vs Unsupervised Machine Learning
    2. Supervised Machine Learning Algorithms
    3. Unsupervised Machine Learning Algorithms
    4. Choose the Right Algorithm
    5. Life-cycles of Machine Learning Development
    6. Classifying with k-Nearest Neighbors (SL)
    7. k-Nearest Neighbors Algorithm
    8. k-Nearest Neighbors Algorithm
    9. The Error Rate
    10. Decision Trees (SL)
    11. Random Forests
    12. Unsupervised Learning Type: Clustering
    13. K-Means Clustering (UL)
    14. K-Means Clustering in a Nutshell
    15. Regression Analysis
    16. Logistic Regression
    17. Summary
  2. Introduction to Functional Programming
    1. What is Functional Programming (FP)?
    2. Terminology: Higher-Order Functions
    3. Terminology: Lambda vs Closure
    4. A Short List of Languages that Support FP
    5. FP with Java
    6. FP With JavaScript
    7. Imperative Programming in JavaScript
    8. The JavaScript map (FP) Example
    9. The JavaScript reduce (FP) Example
    10. Using reduce to Flatten an Array of Arrays (FP) Example
    11. The JavaScript filter (FP) Example
    12. Common High-Order Functions in Python
    13. Common High-Order Functions in Scala
    14. Elements of FP in R
    15. Summary
  3. Introduction to Apache Spark
    1. What is Apache Spark
    2. A Short History of Spark
    3. Where to Get Spark?
    4. The Spark Platform
    5. Spark Logo
    6. Common Spark Use Cases
    7. Languages Supported by Spark
    8. Running Spark on a Cluster
    9. The Driver Process
    10. Spark Applications
    11. Spark Shell
    12. The spark-submit Tool
    13. The spark-submit Tool Configuration
    14. The Executor and Worker Processes
    15. The Spark Application Architecture
    16. Interfaces with Data Storage Systems
    17. Limitations of Hadoop's MapReduce
    18. Spark vs MapReduce
    19. Spark as an Alternative to Apache Tez
    20. The Resilient Distributed Dataset (RDD)
    21. Spark Streaming (Micro-batching)
    22. Spark SQL
    23. Example of Spark SQL
    24. Spark Machine Learning Library
    25. GraphX
    26. Spark vs. R
    27. Summary
  4. The Spark Shell
    1. The Spark Shell
    2. The Spark Shell UI
    3. Spark Shell Options
    4. Getting Help
    5. The Spark Context (sc) and SQL Context (sqlContext)
    6. The Shell Spark Context
    7. Loading Files
    8. Saving Files
    9. Basic Spark ETL Operations
    10. Summary
  5. The Spark Machine Learning Library
    1. What is MLlib?
    2. Supported Languages
    3. MLlib Packages
    4. Dense and Sparse Vectors
    5. Labeled Point
    6. Python Example of Using the LabeledPoint Class
    7. LIBSVM format
    8. An Example of a LIBSVM File
    9. Loading LIBSVM Files
    10. Local Matrices
    11. Example of Creating Matrices in MLlib
    12. Distributed Matrices
    13. Example of Using a Distributed Matrix
    14. Classification and Regression Algorithm
    15. Clustering
    16. Summary
  6. Text Mining
    1. What is Text Mining?
    2. The Common Text Mining Tasks
    3. What is Natural Language Processing (NLP)?
    4. Some of the NLP Use Cases
    5. Machine Learning in Text Mining and NLP
    6. Machine Learning in NLP
    7. TF-IDF
    8. The Feature Hashing Trick
    9. Stemming
    10. Example of Stemming
    11. Stop Words
    12. Popular Text Mining and NLP Libraries and Packages
    13. Summary
Class Materials

Each student in our Live Online and our Onsite classes receives a comprehensive set of materials, including course notes and all the class examples.

Class Prerequisites

Experience in the following is required for this Spark class:

  • General knowledge of statistics and programming.

Training for your Team

Length: 1 Day
  • Private Class for your Team
  • Online or On-location
  • Customizable
  • Expert Instructors

What people say about our training

I really enjoyed this class. The intructor was great and extremely knowledgeable. I will definitely take the advanced class for Crystal XI.
Michelle Johnson
PCC Community Wellness Center
The Instructor was great, gave me a different point of view of ADA compliance.
Khanh Tran
Placentia Yorba Linda USD
This was my second course with Webucator and I found the instructor and manuals to be thorough, clear and extremely helpful. Quality and excellence! Thank you!
Yukiko Johnson
Northern Tier Oil Transport
This was an excellent class. The instructor is very knowledgeable with MS Project and provided more insight than the books offered. This class is very much worth it if you want to take an intro class to MS Project.
Ruan Riggs
Black Box Network Services

No cancelation for low enrollment

Certified Microsoft Partner

Registered Education Provider (R.E.P.)

GSA schedule pricing

61,621

Students who have taken Instructor-led Training

11,779

Organizations who trust Webucator for their Instructor-led training needs

100%

Satisfaction guarantee and retake option

9.29

Students rated our trainers 9.29 out of 10 based on 28,661 reviews

Contact Us or call 1-877-932-8228