Data Private Public

Introduction to AI, Data Science & Machine Learning with Python (DSC112)

5 days
PythonDataAI

Learn core data science with Python: data prep and visualization, NLP for unstructured data, ML models (regression, classification, clustering) and ethics.

Register or Request Training

Price per student
$3,283.00
Guaranteed to run
Select a date
Please select a class.
  • Private class for your team
  • Live expert instructor
  • Online or on‑location
  • Customizable agenda
  • Proposal turnaround within 1–2 business days

Course Overview

This foundational course introduces the data science lifecycle and the role of the data scientist in turning business questions into analytics, machine learning (ML), and AI solutions.

You will work with Python and key libraries to import, explore, clean, and visualize data, including handling missing values and standardizing or normalizing features. You will also learn practical approaches for preparing unstructured text for analysis using common natural language processing (NLP) techniques and term-document matrices, and you will explore concepts and architectures behind foundation models, GPTs, and retrieval-augmented generation (RAG), including how large language models (LLMs) can be integrated into data science work.

The course then covers core ML approaches—linear regression and feature engineering, decision tree classification and evaluation, alternative classifiers (logistic regression and Naive Bayes), neural networks and deep learning concepts, clustering (k-means and hierarchical), association rules, recommender systems, and network analysis. The course concludes with big data and cloud approaches, plus communication and ethics considerations for practicing data scientists.

Course Benefits

  • Describe the data science role, required skillset, and the data science lifecycle within an organization.
  • Translate business questions into AI/ML approaches and identify relevant data sources.
  • Use Python (including pandas) to import, explore, manipulate, and prepare data for analysis.
  • Clean and preprocess data (duplicates, missing values, rescaling, standardizing, and normalizing).
  • Create effective visualizations for exploration and communication using pandas, matplotlib, and seaborn.
  • Preprocess unstructured text for ML using NLP techniques (e.g., stemming, stop words) and build term-document matrices.
  • Explain key concepts behind foundation models, GPTs, and RAG, and identify ways to integrate LLMs into data science workflows.
  • Build and evaluate ML models in Python, including linear regression (e.g., RMSE), decision trees, logistic regression, and Naive Bayes.
  • Apply clustering (k-means and hierarchical) to segment data, including unstructured text.
  • Create association-rule models, evaluate them with support/confidence/lift, and develop recommenders.
  • Analyze and visualize networks to uncover relationship-based insights.
  • Recognize cloud/big data approaches and ethical considerations in modern AI and data science.

Delivery Methods

Public Class
Live expert-led online training from anywhere. Guaranteed to run .
Private Class
Delivered for your team at your site or online.

Course Outline

  1. The Role of a Data Scientist and the Data Science Lifecycle
    1. Required skillset of a data scientist
    2. Combining technical and non-technical roles
    3. Data scientist vs. data engineer
    4. Lifecycle of data science efforts within an organization
    5. Turning business questions into ML/AI models
    6. Diverse data sources for answering business questions
    7. Concepts behind foundation models, GPTs, and RAG
  2. Python for Data Science: Data Access, Preparation, and Visualization
    1. Python features relevant to data scientists and data engineers
    2. Viewing datasets with pandas
    3. Importing, exporting, and working with data (relational databases to images)
    4. Selecting, filtering, combining, grouping, and applying functions in pandas
    5. Duplicates, missing values, rescaling, standardizing, and normalizing
    6. Visualization with pandas, matplotlib, and seaborn
  3. Unstructured Data and NLP for AI/ML
    1. Preprocessing unstructured data (web adverts, emails, blog posts)
    2. NLP approaches (stemming, stop words)
    3. Building a term-document matrix (TDM)
    4. Architectures of foundation models, GPTs, and RAG
    5. Integrating LLMs into data science work
  4. Regression Modeling with Linear Regression
    1. Expressing business problems (e.g., revenue prediction) as linear regression
    2. Assessing variables as predictors of a target
    3. Evaluating linear regression models in Python (e.g., RMSE)
    4. Feature engineering to improve regression models
  5. Classification with Decision Trees
    1. How classifiers are built and used (e.g., customer churn)
    2. Training, test, and validation
    3. Evaluating a decision tree classifier
  6. Alternative Classification Approaches and Evaluation
    1. Alternative approaches to classification
    2. Activation functions and logistic regression classifiers
    3. Neural network architectures and deep learning concepts
    4. Probability foundations of Naive Bayes classifiers
    5. Measuring classification performance
    6. ROC curves, AUC, precision, recall, and confusion matrices
  7. Clustering and Segmentation
    1. Customer/product/service segmentation with clustering algorithms
    2. Similarity and distance measures
    3. Top-down clustering with scikit-learn k-means
    4. Bottom-up clustering with hierarchical clustering
    5. Clustering unstructured data (tweets, emails, documents)
  8. Association Rules and Recommender Systems
    1. Modeling behaviors/events from logged data with association rules
    2. Support, confidence, and lift
    3. Feature engineering to improve models
    4. Building a recommender unique to your product/service offering
  9. Network Analysis
    1. Organizations and environments as networks of inter-relationships
    2. Visualizing relationships to uncover insights
    3. Ego-centric and socio-centric network analysis
  10. Big Data, Cloud Approaches, Communication, and Ethics
    1. Cloud approaches (Microsoft, Amazon, Google) for big data analytics
    2. Communication and ethics aspects of being a data scientist
    3. Ethical implications of recent developments in AI
    4. Continual learning paths for data scientists

Class Materials

Each student receives a comprehensive set of materials, including course notes and all class examples.

Have questions about this course?

We can help with curriculum details, delivery options, pricing, or anything else. Reach out and we’ll point you in the right direction.

}