Course subjects
Module 1: Data Engineering Roles and Key Concepts
Module 2: AWS Data Engineering Tools and Services
Orchestration and Automation
Data Engineering Security
Monitoring
Continuous Integration and Continuous Delivery
Infrastructure as Code
AWS Serverless Application Model
Networking Considerations
Cost Optimisation Tools
Module 3: Designing and Implementing Data Lakes
Data lake introduction
Data lake storage
Ingest data into a data lake
Catalog data
Transform data
Serve data for consumption
Hands-on lab: Setting up a Data Lake on AWS
Module 4: Optimising and Securing a Data Lake Solution
Open Table Formats
Security using AWS Lake Formation
Setting permissions with Lake Formation
Security and governance
Troubleshooting
Hands-on lab: Automating Data Lake Creation using AWS Lake Formation Blueprints
Module 5: Data Warehouse Architecture and Design Principles
Introduction to data warehouses
Amazon Redshift overview
Ingesting data into Redshift
Processing data
Serving data for consumption
Hands-on lab: Setting up a Data Warehouse using Amazon Redshift Serverless
Module 6: Performance Optimisation Techniques for Data Warehouses
Monitoring and optimisation options
Data optimisation in Amazon Redshift
Query optimisation in Amazon Redshift
Orchestration options
Module 7: Security and Access Control for Data Warehouses
Authentication and access control in Amazon Redshift
Data security in Amazon Redshift
Auditing and compliance in Amazon Redshift
Hands-on lab: Managing Access Control in Redshift
Module 8: Designing Batch Data Pipelines
Introduction to batch data pipelines
Designing a batch data pipeline
AWS services for batch data processing
Module 9: Implementing Strategies for Batch Data Pipelines
Elements of a batch data pipeline
Processing and transforming data
Integrating and cataloging data
Serving data for consumption
Hands-on lab: A Day in the Life of a Data Engineer
Module 10: Optimising, Orchestrating, and Securing Batch Data Pipelines
Optimising the batch data pipeline
Orchestrating the batch data pipeline
Securing the batch data pipeline
Hands-on lab: Orchestrating Data Processing in Spark using AWS Step Functions
Module 11: Streaming Data Architecture Patterns
Introduction to streaming data pipelines
Ingesting data from stream sources
Streaming data ingestion services
Storing streaming data
Processing streaming data
Analysing streaming data with AWS services
Hands-on lab: Streaming Analytics with Amazon Managed Service for Apache Flink
Module 12: Optimising and Securing Streaming Solutions
Optimising a streaming data solution
Securing a streaming data pipeline
Compliance considerations
Hands-on lab: Access Control with Amazon Managed Streaming for Apache Kafka
Please note: This is an emerging technology course. Course outline is subject to change as needed.