Machine Learning with Apache Spark Training

This Machine Learning with Apache Spark training class provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning. This course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.

This course is meant for Data Scientists, Business Analysts, Software Developers, and IT Architects.

Goals
  1. Learn machine learning algorithms.
  2. Obtain an introduction to functional programming.
  3. Obtain an introduction to Apache Spark.
  4. Learn Spark Shell.
  5. Learn about the Spark Machine Learning Library.
  6. Learn text mining.
Outline
  1. Machine Learning Algorithms
    1. Supervised vs Unsupervised Machine Learning
    2. Supervised Machine Learning Algorithms
    3. Unsupervised Machine Learning Algorithms
    4. Choose the Right Algorithm
    5. Life-cycles of Machine Learning Development
    6. Classifying with k-Nearest Neighbors (SL)
    7. k-Nearest Neighbors Algorithm
    8. k-Nearest Neighbors Algorithm
    9. The Error Rate
    10. Decision Trees (SL)
    11. Random Forests
    12. Unsupervised Learning Type: Clustering
    13. K-Means Clustering (UL)
    14. K-Means Clustering in a Nutshell
    15. Regression Analysis
    16. Logistic Regression
    17. Summary
  2. Introduction to Functional Programming
    1. What is Functional Programming (FP)?
    2. Terminology: Higher-Order Functions
    3. Terminology: Lambda vs Closure
    4. A Short List of Languages that Support FP
    5. FP with Java
    6. FP With JavaScript
    7. Imperative Programming in JavaScript
    8. The JavaScript map (FP) Example
    9. The JavaScript reduce (FP) Example
    10. Using reduce to Flatten an Array of Arrays (FP) Example
    11. The JavaScript filter (FP) Example
    12. Common High-Order Functions in Python
    13. Common High-Order Functions in Scala
    14. Elements of FP in R
    15. Summary
  3. Introduction to Apache Spark
    1. What is Apache Spark
    2. A Short History of Spark
    3. Where to Get Spark?
    4. The Spark Platform
    5. Spark Logo
    6. Common Spark Use Cases
    7. Languages Supported by Spark
    8. Running Spark on a Cluster
    9. The Driver Process
    10. Spark Applications
    11. Spark Shell
    12. The spark-submit Tool
    13. The spark-submit Tool Configuration
    14. The Executor and Worker Processes
    15. The Spark Application Architecture
    16. Interfaces with Data Storage Systems
    17. Limitations of Hadoop's MapReduce
    18. Spark vs MapReduce
    19. Spark as an Alternative to Apache Tez
    20. The Resilient Distributed Dataset (RDD)
    21. Spark Streaming (Micro-batching)
    22. Spark SQL
    23. Example of Spark SQL
    24. Spark Machine Learning Library
    25. GraphX
    26. Spark vs. R
    27. Summary
  4. The Spark Shell
    1. The Spark Shell
    2. The Spark Shell UI
    3. Spark Shell Options
    4. Getting Help
    5. The Spark Context (sc) and SQL Context (sqlContext)
    6. The Shell Spark Context
    7. Loading Files
    8. Saving Files
    9. Basic Spark ETL Operations
    10. Summary
  5. The Spark Machine Learning Library
    1. What is MLlib?
    2. Supported Languages
    3. MLlib Packages
    4. Dense and Sparse Vectors
    5. Labeled Point
    6. Python Example of Using the LabeledPoint Class
    7. LIBSVM format
    8. An Example of a LIBSVM File
    9. Loading LIBSVM Files
    10. Local Matrices
    11. Example of Creating Matrices in MLlib
    12. Distributed Matrices
    13. Example of Using a Distributed Matrix
    14. Classification and Regression Algorithm
    15. Clustering
    16. Summary
  6. Text Mining
    1. What is Text Mining?
    2. The Common Text Mining Tasks
    3. What is Natural Language Processing (NLP)?
    4. Some of the NLP Use Cases
    5. Machine Learning in Text Mining and NLP
    6. Machine Learning in NLP
    7. TF-IDF
    8. The Feature Hashing Trick
    9. Stemming
    10. Example of Stemming
    11. Stop Words
    12. Popular Text Mining and NLP Libraries and Packages
    13. Summary
Class Materials

Each student in our Live Online and our Onsite classes receives a comprehensive set of materials, including course notes and all the class examples.

Class Prerequisites

Experience in the following is required for this Spark class:

  • General knowledge of statistics and programming.
Preparing for Class

Training for your Team

Length: 1 Day
  • Private Class for your Team
  • Online or On-location
  • Customizable
  • Expert Instructors

What people say about our training

Outstanding. I now love Visio, when before I had no confidence to move forward.
Marjorie Desmond
PDC Energy
Very convenient having all our staff trained at once. We got up and running in no time.
Kirk Trachy
Intuit QuickBase
This was worth the time just for the tips and tricks.
Belinda Drygalski
Sargent & Lundy
I would recommend this class for anyone that wants to master the basic concepts of PowerPoint. I can now build presentations and I thought it would be much harder but the instructor made it so easy!
Nick Alexander
Associated Global Systems

No cancelation for low enrollment

Certified Microsoft Partner

Registered Education Provider (R.E.P.)

GSA schedule pricing

61,268

Students who have taken Instructor-led Training

11,739

Organizations who trust Webucator for their Instructor-led training needs

100%

Satisfaction guarantee and retake option

9.29

Students rated our trainers 9.29 out of 10 based on 29,151 reviews

Contact Us or call 1-877-932-8228