Course Overview
About Course
AIOps (Artificial Intelligence for IT Operations) applies AI and machine learning to IT monitoring data to automate and enhance operations. It ingests and correlates massive volumes of logs, metrics and events to surface meaningful signals, diagnose root causes, and even trigger automated remediation. For example, IBM explains that an AIOps platform uses AI (ML, NLP, etc.) to “ingest and aggregate” huge IT data volumes, identify significant events, and autonomously resolve issues. Gartner similarly notes that AIOps “combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination”.
This curriculum is aimed at IT professionals, DevOps engineers, SREs and data scientists who need to manage complex infrastructure. It covers foundational concepts (big data, machine learning, observability) through advanced tool-specific skills. We train on leading AIOps platforms – Splunk (ITSI and Observability Cloud), Dynatrace (Davis AI), Moogsoft (APEX AIOps) and IBM Watson AIOps (Cloud Pak) – with hands-on labs. Participants will be ready to earn vendor certifications (for example, Splunk ITSI Certified Admins, Dynatrace Associate/Professional, IBM Cloud Pak Watson AIOps Administrator, etc.) and apply AIOps to real-world problems.
Benefits: Trainees learn to reduce alert noise and outages by applying AI to monitoring data. AIOps skills lead to faster incident detection, deeper system visibility, and predictive insights. Organizations adopting AIOps see fewer outages and lower MTTR, while engineers gain in-demand expertise. (For instance, Moogsoft claims its platform “detects incidents before they happen” using ML-based correlation, and Dynatrace touts Davis AI for “precise root-cause analysis” and proactive issue prevention.)
Course Syllabus
Module 1: Introduction to AIOps & Foundations (3h)
Description: This module defines AIOps and its role in modern IT operations. Topics include the history of AIOps, the challenges of legacy monitoring, and how AI/ML solves them. We cover key terms (big data, observability, ML models), AIOps use cases (event correlation, anomaly detection, root cause analysis), and benefits (faster detection, reduced noise). A brief overview of relevant certifications (DevOps Institute AIOps Foundation, vendor certs) is given.
Objectives: By the end, participants will
- Understand what AIOps is and why it matters (AI-driven IT operations).
- Recognize common pain points (alert storms, siloed data) that AIOps addresses.
- Describe core AIOps capabilities (data ingestion, analytics, automation).
- Identify major AIOps tools (Splunk ITSI, Dynatrace, Moogsoft, IBM Watson AIOps).
Module 2: Data & Machine Learning Fundamentals (4h)
Description: AIOps relies on data science. This module covers foundational concepts: data types (logs, metrics, traces), data ingestion and pipelines, and big data technologies (Elasticsearch, Hadoop, time-series DBs). We introduce statistics and ML basics: supervised vs. unsupervised learning, classification, clustering, and anomaly detection algorithms. We discuss selecting the right models for IT data (for example, clustering alerts, predicting load) and how ML supports AIOps tasks .
Objectives: Participants will be able to
- Explain the types and volume of data needed for AIOps (logs, metrics, events).
- Describe basic ML techniques used in AIOps (clustering for alert correlation, time-series forecasting for anomalies) .
- Demonstrate creating a simple ML model (e.g. anomaly detector) on sample monitoring data.
- Understand how ML underpins alert reduction and predictive insights in AIOps tools.
Module 3: Monitoring, Metrics & Observability Basics (3h)
Description: Before AI can help, data must be collected. This module reviews IT monitoring fundamentals: metrics, logs, events, traces, and service-level indicators. We cover modern observability frameworks and tools (Prometheus, OpenTelemetry, Grafana, ELK, etc.). Participants learn how to instrument applications and infrastructure to feed data into AIOps platforms. We also cover incident management principles (ITIL/DevOps, SRE concepts), since AIOps augments these workflows .
Objectives: After this module trainees will
- Understand observability pillars (logs, metrics, traces) and how to instrument systems.
- Know popular monitoring tools and how they integrate with AIOps platforms.
- Explain how effective observability feeds AIOps analytics (e.g. metrics baselining for anomaly detection).
- Recognize the connection between AIOps and incident management processes (ticketing, RCA).
Module 4: Splunk AIOps (ITSI & Observability) (5h)
Description: We dive into Splunk’s AIOps offerings. First we cover Splunk IT Service Intelligence (ITSI), its features for monitoring applications and infrastructure. Topics include configuring ITSI services and entities, using glass tables and deep dives, and built-in ML analytics (episode detection, KPI baselining). We demonstrate anomaly detection and event correlation in ITSI. Next, we explore Splunk Observability Cloud (including Infrastructure Monitoring and AI assistant tools), showing how Splunk ingests logs, metrics and traces for AIOps. Labs involve setting up Splunk data inputs, creating dashboards, and using AI-driven alerts. Throughout, we show how Splunk uses AI/ML to reduce noise and predict incidents.
Objectives: Participants will
- Install/configure Splunk ITSI and Observability; onboard sample data.
- Define services and entities in ITSI and build glass tables.
- Use Splunk’s Machine Learning Toolkit or AI features for anomaly detection and root-cause suggestions.
- Manage alerts and events: use ITSI’s event analytics to group and correlate events.
- Prepare for Splunk certifications (e.g. ITSI Admin) by mastering architecture, searches, and dashboardss.
Module 5: Dynatrace Observability & AIOps (Davis AI) (5h)
Description: This module covers Dynatrace, a full-stack observability platform powered by its Davis AI engine. We start with core Dynatrace concepts: OneAgent installation, auto-discovery, and Smartscape topology. Then we explore Davis AI’s capabilities: automatic root-cause analysis (RCA), anomaly detection, and predictive insights. We show how Dynatrace continuously auto-baselines metrics and automatically groups related problems. Participants practice using Davis CoPilot for natural-language querying and generating remediation playbooks. The lab includes deploying Dynatrace on sample environments, viewing the Davis dashboard, and interpreting AI-powered problem tickets. We also discuss Dynatrace APIs and Davis query language (DQL). By course end, attendees understand how Dynatrace uses causal AI to pinpoint issues .
Objectives: By the end of this module, participants will
- Navigate the Dynatrace platform UI and topology graph.
- Interpret Davis AI problem tickets and root-cause analysis reports.
- Configure alert thresholds and problem notifications.
- Use Dynatrace AI features for forecasting (e.g. predictive auto-scaling) .
- Prepare for Dynatrace certification exams by learning key topics (architecture, problem resolution).
Module 6: Moogsoft AIOps (Dell APEX Incident Management) (5h)
Description: We explore Moogsoft AIOps (now Dell APEX). Topics include event ingestion, the correlation engine, and incident management workflow. Participants learn how Moogsoft applies ML to dedupe and cluster alerts (“noise reduction”) and detect emerging incidents from IT streams . Hands-on exercises cover configuring integrations (e.g. ingest alerts from monitoring tools), reviewing the AI-assisted timeline for incidents, and setting up automated response rules. We also demo the new APEX AIOps dashboards for incident prioritization. Emphasis is on Moogsoft’s design as a “single system of engagement” for incidents: it unifies data from Nagios, SolarWinds, Azure, AWS, etc. (Moogsoft provides connectors to dozens of platforms) .
Objectives: After this module, participants will
- Understand Moogsoft’s ML-based alert correlation (adaptive thresholding, pattern matching) .
- Configure basic incident management workflows in APEX AIOps.
- Use Moogsoft’s UI to investigate incident clusters and drill into contributing alerts.
- Automate simple incident response actions (e.g. auto-ticketing, chatops notifications).
- Appreciate how Moogsoft integrates “Bring Your Own Stack” – connecting Splunk, Dynatrace, Datadog, etc..
Module 7: IBM Watson AIOps (Cloud Pak) (5h)
Description: This module covers IBM Watson AIOps (Cloud Pak). We explain its architecture: AI Manager, Event Manager, and Metric Manager components. Key topics include topology discovery (pinpointing fault location), event grouping (correlation of alerts), and anomaly detection from metrics. In hands-on labs, learners install or simulate the Cloud Pak environment, feed sample data (logs, Netcool events, custom alerts), and use Watson’s AI dashboard to view incident “stories.” We demonstrate Slack integration: Watson posts incident summaries and suggested fixes via ChatOps. We also show Watson’s “predictive insights” using metrics forecasting. Students practice root cause analysis using Watson’s knowledge base and explore hybrid cloud use cases. Throughout, we highlight the AI: e.g. IBM describes how Watson uses NLP and ML to connect data and recommend actions.
Objectives: Participants will be able to
- Describe the Watson AIOps components (AI Manager, Event Manager, Metric Manager) and their roles.
- Ingest events and metrics into Watson AIOps and visualize them in the topology.
- Interpret Watson’s AI-generated incident summaries (including blast radius and root cause).
- Configure Watson to send alerts into Slack or email (ChatOps workflow).
- Utilize IBM’s pretrained models to get predictive alerts (e.g. capacity forecast) and review automation suggestions.
Module 8: Integration & Automation Workflows (4h)
Description: Having learned individual tools, this module focuses on end-to-end AIOps workflows and automation. We cover integrating multiple data sources (APM, CI/CD, ticketing) into a unified platform. Topics include building data pipelines (Kafka, Logstash, APIs), using OpenTelemetry, and connecting ITSM tools (ServiceNow, JIRA) for closed-loop automation. Students learn to trigger remediation: for instance, tying Dynatrace or Splunk alerts to runbooks or Kubernetes auto-healing. We also demonstrate how to implement ML Ops (MLOps) best practices: versioning models, retraining on new data, and measuring ML accuracy in production. The lab includes a small project: students link monitoring alerts to an automated script (e.g. scale-up a VM or post an alert to Teams) and observe the feedback loop. We highlight industry best practices from Gartner and use cases of “predictive operations” in Dynatrace .
Objectives: Participants will
- Architect an AIOps data flow combining logs, metrics, and events into one analytics engine.
- Implement a basic automated remediation (e.g. script triggered by an anomaly).
- Explain the concept of continuous learning for AIOps (models retrained as environments change).
- Understand governance concerns (trust, transparency) when automating IT tasks.
- Learn “preventive operations” concepts where ML models proactively adjust resources.
Module 9: Use Cases, Best Practices & Certification Prep (6h)
Description: The final module ties everything together. We review AIOps in industry: case studies in finance, healthcare, e-commerce, highlighting how AI improved service uptime. We discuss metrics for success (MTTR reduction, event suppression rate, etc.) and the “people/process” changes needed for AIOps adoption. Instructors present sample scenarios and ask teams to design an AIOps solution using learned tools. Hands-on, participants complete a capstone lab or project: for example, using a sample dataset of alerts to build an end-to-end AIOps pipeline (ingest→analyze→alert→automate). We conclude with exam prep: sample questions and tips for the relevant certifications (splitting content review into categories aligned with cert blueprints). An assessment (quiz or practical project presentation) evaluates the full 40-hour learning.
Objectives: Learners will
- Apply AIOps concepts to real scenarios (choose the right tools/ML methods for given problems).
- Review key product knowledge for certification readiness (Splunk, Dynatrace, etc.).
- Demonstrate an integrated AIOps solution in a lab, showing data ingestion, analysis, and response.
- Summarize best practices in AIOps deployment (incremental adoption, data governance).
Key Features
Delivery Modes: Instructor-led live sessions via web conference, supplemented by self-paced eLearning videos. Sessions are interactive, with Q&A and hands-on demos.
Hands-On Labs: Each module includes lab exercises. Trainees get access to a cloud-based AIOps sandbox environment pre-configured with Splunk, Dynatrace, Moogsoft and IBM AIOps instances (or simulations) and sample data feeds.
Assessments: Quizzes and practical tasks follow each module to reinforce learning. A final capstone project or test evaluates readiness. Successful participants earn completion certificates.
Cert Prep Materials: Official exam blueprints and sample questions are provided. Lab exercises align with certification objectives (e.g. Splunk ITSI Admin, Dynatrace Associate, IBM Watson AIOps).
Support: Dedicated technical mentors are available during labs. Participants can access a discussion forum or Slack channel for help. Course materials and recordings remain available for review.



