Course Overview
About Course
This training program introduces learners to Dataiku, a unified data and AI platform that “welcomes users with a wide range of skills” and supports the full lifecycle of data projects. Participants – from business analysts to data scientists and engineers – will learn to build data flows, run visual and code-based analyses, and deploy models in production. The course emphasizes guided learning paths and practical exercises, drawing on Dataiku’s “template projects” and certification prep materials. Graduates will be able to collaborate across functions, using Dataiku’s no-code, low-code, and full-code interfaces to tackle real-world problems. The training ultimately empowers learners with in-demand skills (Python, R, SQL, cloud integration, etc.) and prepares them for Dataiku certification.
-
Course Syllabus
- Module 1: Introduction to Dataiku & Data Science Fundamentals – Overview of the Dataiku platform, interface (projects, Flow, datasets), and collaborative analytics. Introduce data science methodology (CRISP-DM) and roles (analyst, data scientist, engineer, business user). Get hands-on by creating a new project, importing a sample dataset, and exploring the Flow; cover basic concepts like managed datasets and versioning.
- Module 2: Getting Data into Dataiku – Data connections and ingestion. Learn to connect to diverse data sources (files, databases, cloud storage). Hands-on: configure connections (e.g. JDBC for SQL databases, Snowflake, Azure Blob) and import data into Dataiku projects. Explore Dataiku’s Data Catalog and dataset metadata.
- Module 3: Data Preparation & Visual Recipes – Data wrangling and cleaning with visual recipes. Use point-and-click recipes (Join, Split, Filter, Group, Prepare, etc.) to transform data without coding. Perform data profiling, remove duplicates/missing values, normalize and encode fields. Hands-on lab: build a data pipeline using visual recipes and review data lineage.
- Module 4: SQL & Database Integration – Leverage SQL within Dataiku. Use SQL Query and Script recipes to write custom transformations. Practice pushdown execution: Dataiku sends queries directly to the database (e.g. Snowflake) for fast in-database processing. Configure a SQL database connection (MySQL/Postgres/Snowflake). Hands-on: write a SELECT query to join tables in-database and output the result as a new dataset.
- Module 5: Python Scripting in Dataiku – Use Python for advanced analysis. Set up Python code environments and use built-in libraries (pandas, NumPy, scikit-learn, XGBoost). Author Python recipes and Jupyter notebooks to manipulate datasets via the Dataiku API. Demonstrate how to call Dataiku functions from Python for data I/O. Hands-on: write a Python recipe to compute custom features or train a scikit-learn model.
- Module 6: R Programming in Dataiku – Use R for statistical analysis. Activate R integration and use R packages (tidyverse, ggplot2, randomForest, etc.). Create R recipes or notebooks for data exploration and modeling. Example: build an R script to perform linear regression or visualize data with ggplot2. Learn how Dataiku handles R environments and packages.
- Module 7: Machine Learning & Automated Modeling – Introduction to Dataiku’s AutoML. Use the Visual ML (Auto Model) interface to build classification/regression models step by step. Compare algorithms (decision trees, logistic regression, XGBoost, etc.) and tune hyperparameters. Explain built-in feature engineering (categorical encoding, feature derivation). Hands-on: solve a predictive problem (e.g. customer churn) using AutoML and review performance metrics.
- Module 8: Model Evaluation & Interpretability – Evaluate and compare models using metrics (accuracy, RMSE, ROC AUC). Use Dataiku’s charts (confusion matrix, lift charts) and explainability tools (feature importance, SHAP). Practice model validation (cross-validation, holdout sets) and understand overfitting. Hands-on: interpret model results, identify key drivers, and prepare a report for stakeholders.
- Module 9: Automation, MLOps & Deployment – Productionize Dataiku projects. Build Scenarios to automate tasks (scheduling, retraining triggers). Create API services from models and deploy webapps or dashboards. Manage model lifecycle: version control, audit logs, monitoring for drift. Cover Dataiku’s governance features. Scenario labs: set up a scheduled model training pipeline and publish a real-time scoring API.
- Module 10: Cloud & Big-Data Integration – Connect Dataiku to cloud platforms and big-data engines. Snowflake: configure a Snowflake data warehouse connection and use pushdown SQL to query large datasets. Azure: leverage Dataiku on Azure (via Cloud Stack) and connect to Azure services (Synapse/Purview). Learn about Azure Synapse integration (DSS supports visual recipes in Synapse). Also cover Hadoop/Spark integration basics for large-scale processing. Hands-on: load data from Azure Blob/Synapse and run a recipe in-database.
- Module 11: Industry Use Cases & Capstone Project – Apply skills to real-world scenarios. Study industry examples (e.g. retail sales forecasting, credit-card fraud detection, manufacturing maintenance). Final capstone: end-to-end project (data ingestion ▶ preparation ▶ modeling ▶ deployment) on a chosen dataset. Prepare for the Dataiku Core Designer certification by reviewing exam topics and taking practice quizzes. Present project results to the class, emphasizing business impact.
Key Features
- Hands-On Labs & Exercises: Each module includes guided labs using real datasets (finance, retail, IoT, etc.) so learners can practice Dataiku recipes, notebooks, and cloud connections in context.
- Certification Prep: Content aligns with Dataiku certification (e.g. Core Designer), including sample questions and review of platform concepts. Learners earn badges or mini-certificates after key assessments.
- Use Cases & Projects: Training leverages industry examples (fraud detection, predictive maintenance, sales forecasting) and a capstone project to illustrate how Dataiku solves business problems. For example, a credit-card fraud detection scenario highlights Dataiku’s unified workflow for data wrangling, feature engineering, and ML.
- Multi-Role Collaboration: Emphasis on Dataiku’s cross-functional platform: business users can use no-code visual tools, while analysts and engineers can write code. Dataiku “provides one solution for all stakeholders… No-, low-, and full-code functionality”, enabling IT and business to work together.
- Integrated Tech Stack: Learners practice with Python, R, and SQL inside Dataiku. The course covers connecting to cloud warehouses (Snowflake with pushdown) and Azure services, giving exposure to modern data platforms and big-data processing.
- Progressive Learning Path: Begins with foundational concepts for beginners and advances to complex topics (MLOps, cloud integration). Each section balances theory (principles of data science, model evaluation) with practical workflows, ensuring a deep, hands-on understanding of Dataiku.



