Data Engineering on AWS

Length 3 days
Price NZD 2550 exc GST

Course overview

View dates &
book now

Why study this course

This three-day, intermediate course is designed for professionals seeking a deep dive into data engineering practices and solutions on AWS. Through a balanced combination of theory, practical labs, and activities, participants learn to design, build, optimise, and secure data engineering solutions using AWS services. From foundational concepts to hands-on implementation of data lakes, data warehouses, and both batch and streaming data pipelines, this course equips data professionals with the skills needed to architect and manage modern data solutions at scale.

This course includes presentations, demonstrations, hands-on labs, and group exercises.

Request Course Information

What you’ll learn

This course is designed to teach participants how to:

Understand the foundational roles and key concepts of data engineering, including data personas, data discovery, and relevant AWS services.
Identify and explain the various AWS tools and services crucial for data engineering, encompassing orchestration, security, monitoring, CI/CD, IaC, networking, and cost optimisation.
Design and implement a data lake solution on AWS, including storage, data ingestion, transformation, and serving data for consumption.
Optimise and secure a data lake solution by implementing open table formats, security measures, and troubleshooting common issues.
Design and set up a data warehouse using Amazon Redshift Serverless, understanding its architecture, data ingestion, processing, and serving capabilities.
Apply performance optimisation techniques to data warehouses in Amazon Redshift, including monitoring, data optimisation, query optimisation, and orchestration.
Manage security and access control for data warehouses in Amazon Redshift, understanding authentication, data security, auditing, and compliance.
Design effective batch data pipelines using appropriate AWS services for processing and transforming data.
Implement comprehensive strategies for batch data pipelines, covering data processing, transformation, integration, cataloging, and serving data for consumption.
Optimise, orchestrate, and secure batch data pipelines, demonstrating advanced skills in data processing automation and security.
Architect streaming data pipelines, understanding various use cases, ingestion, storage, processing, and analysis using AWS services.
Optimise and secure streaming data solutions, including compliance considerations and access control.

AWS at Lumify Work

Lumify Work is an official AWS Training Partner for Australia, New Zealand, and the Philippines. Through our Authorised AWS Instructors, we can provide you with a learning path that’s relevant to you and your organisation, so you can get more out of the cloud. We offer virtual and face-to-face classroom-based training to help you build your cloud skills and enable you to achieve industry-recognised AWS Certification.

View all AWS at Lumify Work courses

Who is the course for?

This course is designed for professionals who are interested in designing, building, optimising, and securing data engineering solutions using AWS services.

Course subjects

Module 1: Data Engineering Roles and Key Concepts

Role of a Data Engineer
Key functions of a Data Engineer
Data Personas
Data Discovery
AWS Data Services

Module 2: AWS Data Engineering Tools and Services

Orchestration and Automation
Data Engineering Security
Monitoring
Continuous Integration and Continuous Delivery
Infrastructure as Code
AWS Serverless Application Model
Networking Considerations
Cost Optimisation Tools

Module 3: Designing and Implementing Data Lakes

Data lake introduction
Data lake storage
Ingest data into a data lake
Catalog data
Transform data
Serve data for consumption
Hands-on lab: Setting up a Data Lake on AWS

Module 4: Optimising and Securing a Data Lake Solution

Open Table Formats
Security using AWS Lake Formation
Setting permissions with Lake Formation
Security and governance
Troubleshooting
Hands-on lab: Automating Data Lake Creation using AWS Lake Formation Blueprints

Module 5: Data Warehouse Architecture and Design Principles

Introduction to data warehouses
Amazon Redshift overview
Ingesting data into Redshift
Processing data
Serving data for consumption
Hands-on lab: Setting up a Data Warehouse using Amazon Redshift Serverless

Module 6: Performance Optimisation Techniques for Data Warehouses

Monitoring and optimisation options
Data optimisation in Amazon Redshift
Query optimisation in Amazon Redshift
Orchestration options

Module 7: Security and Access Control for Data Warehouses

Authentication and access control in Amazon Redshift
Data security in Amazon Redshift
Auditing and compliance in Amazon Redshift
Hands-on lab: Managing Access Control in Redshift

Module 8: Designing Batch Data Pipelines

Introduction to batch data pipelines
Designing a batch data pipeline
AWS services for batch data processing

Module 9: Implementing Strategies for Batch Data Pipelines

Elements of a batch data pipeline
Processing and transforming data
Integrating and cataloging data
Serving data for consumption
Hands-on lab: A Day in the Life of a Data Engineer

Module 10: Optimising, Orchestrating, and Securing Batch Data Pipelines

Optimising the batch data pipeline
Orchestrating the batch data pipeline
Securing the batch data pipeline
Hands-on lab: Orchestrating Data Processing in Spark using AWS Step Functions

Module 11: Streaming Data Architecture Patterns

Introduction to streaming data pipelines
Ingesting data from stream sources
Streaming data ingestion services
Storing streaming data
Processing streaming data
Analysing streaming data with AWS services
Hands-on lab: Streaming Analytics with Amazon Managed Service for Apache Flink

Module 12: Optimising and Securing Streaming Solutions

Optimising a streaming data solution
Securing a streaming data pipeline
Compliance considerations
Hands-on lab: Access Control with Amazon Managed Streaming for Apache Kafka

Please note: This is an emerging technology course. Course outline is subject to change as needed.

Prerequisites

Familiarity with basic machine learning concepts, such as supervised and unsupervised learning, regression, classification, and clustering algorithms
Working knowledge of Python programming language and common data science libraries like NumPy, Pandas, and Scikit-learn
Basic understanding of cloud computing concepts and familiarity with the AWS platform
Familiarity with SQL and relational databases is recommended but not mandatory
Experience with version control systems like Git is beneficial but not required

Subscribe to the AWS Skill Builder

Access interactive training developed by the experts at AWS.

Subscribe here

FREE E-BOOK: The Value of AWS Cloud Certifications

This e-book will show you how certifications in AWS skills will benefit both employers and employees. Learn how training can improve operational efficiency and gain insights on certification pathways.

Download E-Book

FREE E-BOOK: The New Era of Cloud Computing

We've created this e-book to assist you on your cloud journey, from defining the optimal cloud infrastructure and choosing a cloud platform, to security in the cloud and the core challenges in moving to the cloud.

Download E-Book

Terms & Conditions

The supply of this course by Lumify Work is governed by the booking terms and conditions. Please read the terms and conditions carefully before enrolling in this course, as enrolment in the course is conditional on acceptance of these terms and conditions.