MOC 20773 - Analyzing Big Data with Microsoft R
In this Analyzing Big Data with Microsoft R training class students learn to use Microsoft R Server to create and run an analysis on a large dataset, and how to utilize it in Big Data environments, such as a Hadoop or Spark cluster, or a SQL Server database.
The primary audience for this course is people who wish to analyze large data sets within a big data environment. The secondary audience is developers who need to integrate R analyses into their solutions.
Microsoft Certified Partner
Webucator is a Microsoft Certified Partner for Learning Solutions (CPLS). This class uses official Microsoft courseware and will be delivered by a Microsoft Certified Trainer (MCT).
- Learn to explain how Microsoft R Server and Microsoft R Client work.
- Learn to use R Client with R Server to explore big data held in different data stores.
- Learn to visualize data by using graphs and plots.
- Learn to transform and clean big data sets.
- Learn to implement options for splitting analysis jobs into parallel tasks .
- Learn to build and evaluate regression models generated from big data .
- Learn to create, score, and deploy partitioning models generated from big data.
- Learn to use R in the SQL Server and Hadoop environments.
- Microsoft R Server and R Client
- What is Microsoft R server
- Using Microsoft R client
- The ScaleR functions
- Lab: Exploring Microsoft R Server and Microsoft R Client
- Using R client in VSTR and RStudio
- Exploring ScaleR functions
- Connecting to a remote server
- Exploring Big Data
- Understanding ScaleR data sources
- Reading data into an XDF object
- Summarizing data in an XDF object
- Lab: Exploring Big Data
- Reading a local CSV file into an XDF file
- Transforming data on input
- Reading data from SQL Server into an XDF file
- Generating summaries over the XDF data
- Visualizing Big Data
- Visualizing In-memory data
- Visualizing big data
- Lab: Visualizing data
- Using ggplot to create a faceted plot with overlays
- Using rxlinePlot and rxHistogram
- Processing Big Data
- Transforming Big Data
- Managing datasets
- Lab: Processing big data
- Transforming big data
- Sorting and merging big data
- Connecting to a remote server
- Parallelizing Analysis Operations
- Using the RxLocalParallel compute context with rxExec
- Using the revoPemaR package
- Lab: Using rxExec and RevoPemaR to parallelize operations
- Using rxExec to maximize resource use
- Creating and using a PEMA class
- Creating and Evaluating Regression Models
- Clustering Big Data
- Generating regression models and making predictions
- Lab: Creating a linear regression model
- Creating a cluster
- Creating a regression model
- Generate data for making predictions
- Use the models to make predictions and compare the results
- Creating and Evaluating Partitioning Models
- Creating partitioning models based on decision trees.
- Test partitioning models by making and comparing predictions
- Lab: Creating and evaluating partitioning models
- Splitting the dataset
- Building models
- Running predictions and testing the results
- Comparing results
- Processing Big Data in SQL Server and Hadoop
- Using R in SQL Server
- Using Hadoop Map/Reduce
- Using Hadoop Spark
- Lab: Processing big data in SQL Server and Hadoop
- Creating a model and predicting outcomes in SQL Server
- Performing an analysis and plotting the results using Hadoop Map/Reduce
- Integrating a sparklyr script into a ScaleR workflow
Each student in our Live Online and our Onsite classes receives a comprehensive set of materials, including course notes and all the class examples.
Experience in the following is required for this Microsoft Big Data class:
- Programming experience using R, and familiarity with common R packages.
- Knowledge of common statistical methods and data analysis best practices.
- Basic knowledge of the Microsoft Windows operating system and its core functionality.
- Working knowledge of relational databases.