Operations Strategy: What approaches can be used to find errors and bugs in distributed applications, and devise solutions for them?
Lab: Explore log aggregation in Splunk
Making Big Data Secure
What is required to secure Big Data infrastructure?
How can centralized security management software, such as Kerberos and LDAP, be configured as part of a broader security architecture?
What special considerations are there for applications and users who need to access protected resources?
How are permissions and roles managed so that Big Data processing resources, such as Spark applications running on top of YARN or Kubernetes, are able to access data stored within HDFS or in an object storage like Amazon S3?
Lab: Configuring Secure Access to Big Data Resources
How does DevOps work in a data context?
Infrastructure: Version Control (git, GitHub), Automation (Jenkins), Processing (Spark, Hadoop, YARN), Data Management (Kafka),
Process Differences: DataOps is more than DevOps and data
Lifecycle and Differences
Incorporating Complex Data Infrastructure into Continuous Integration/Deployment
Standardization of runtime environment using containers
Accounting for Infrastructure Differences within IaC configuration
Incorporating orchestration to handle supporting component deployment and management
Statistical Process Control (SPC) to ensure pipeline and model repeatability
GitHub: Source Forge
Docker and Jenkins: Continuous Integration
Spinnaker: Continuous Deployment
Lab: Continuous Integration of a Kafka Based Application Using Jenkins
Each student will receive a comprehensive set of materials, including course notes and all the class examples.
Experience in the following is required for this R Programming class: