Cloud Computing and Visualisation Category Banner Image

Cloudera Data Analyst Training for Apache Hadoop

  • Length 4 days
Course overview
View dates &
book now

Why study this course

This four-day data analyst course is for anyone who wants to access, manipulate, transform, and analyse massive data sets in the Hadoop cluster using SQL and familiar scripting languages. This is the core curriculum in the data analyst learning path.

Cloudera’s Data Analyst Training course focuses on Apache Hive and Apache Impala. You will learn how to apply traditional data analytics and business intelligence skills to big data. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyse complex data sets using SQL and familiar scripting languages.

Apache Hive makes transformation and analysis of complex, multi-structured data scalable in Cloudera environments. Apache Impala enables real-time interactive analysis of the data stored in Hadoop using a native SQL environment. Together, they make multi-structured data accessible to analysts, database administrators, and others without Java® programming expertise.

Request Course Information

By submitting an enquiry, you agree to our privacy policy and receiving email and other forms of communication from us. You can opt-out at any time.


What you’ll learn

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning skills such as:

  • How the open source ecosystem of big data tools addresses challenges not met by traditional RDBMSs

  • Using Apache Hive and Apache Impala to provide SQL access to data

  • Hive and Impala syntax and data formats, including functions and subqueries

  • Create, modify, and delete tables, views, and databases; load data; and store results of queries

  • Create and use partitions and different file formats

  • Combining two or more datasets using JOIN or UNION, as appropriate

  • What analytic and windowing functions are, and how to use them

  • Store and query complex or nested data structures

  • Process and analyse semi-structured and unstructured data

  • Techniques for optimising Hive and Impala queries

  • Extending the capabilities of Hive and Impala using parameters, custom file formats and SerDes, and external scripts

  • How to determine whether Hive, Impala, an RDBMS, or a mix of these is best for a given task


Cloudera Training Partner logo

Cloudera at Lumify Work

Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyse your data, and keep that data secure and protected.


Stay ahead of the technology curve

Don’t let your tech outpace the skills of your people

Quality Instructors and Content

Expert instructors with real world experience and the latest vendor-approved in-depth course content.

Partner-Preferred Supplier

Chosen and awarded by the world's leading vendors as preferred training partner.

Ahead of the Technology Curve

No matter your chosen technologies or platforms, we can help you stay one step ahead.

Who is the course for?

This course is designed for:

  • data analysts

  • business intelligence specialists

  • developers

  • system architects

  • database administrators


Course subjects

Apache Hadoop Fundamentals

  • The Motivation for Hadoop

  • Hadoop Overview

  • Data Storage: HDFS

  • Distributed Data Processing: YARN, MapReduce, and Spark

  • Data Processing and Analysis: Hive and Impala

  • Database Integration: Sqoop

  • Other Hadoop Data Tools

  • Exercise Scenario Explanation

Introduction to Apache Hive and Impala

  • What is Hive?

  • What is Impala?

  • Why Use Hive and Impala?

  • Schema and Data Storage

  • Comparing Hive and Impala to Traditional Databases

  • Use Cases

Querying with Apache Hive and Impala

  • Databases and Tables

  • Basic Hive and Impala Query Language Syntax

  • Data Types

  • Using Hue to Execute Queries

  • Using Beeline (Hive’s Shell)

  • Using the Impala Shell

Common Operators and Built-In Functions

  • Operators

  • Scalar Functions

  • Aggregate Functions

Data Management

  • Data Storage

  • Creating Databases and Tables

  • Loading Data

  • Altering Databases and Tables

  • Simplifying Queries with Views

  • Storing Query Results

Data Storage and Performance

  • Partitioning Tables

  • Loading Data into Partitioned Tables

  • When to Use Partitioning

  • Choosing a File Format

  • Using Avro and Parquet File Formats

Working with Multiple Datasets

  • UNION and Joins

  • Handling NULL Values in Joins

  • Advanced Joins

Analytic Functions and Windowing

  • Using Common Analytic Functions

  • Other Analytic Functions

  • Sliding Windows

Complex Data

  • Complex Data with Hive

  • Complex Data with Impala

Analysing Text

  • Using Regular Expressions with Hive and Impala

  • Processing Text Data with SerDes in Hive

  • Sentiment Analysis and n-grams

Apache Hive Optimisation

  • Understanding Query Performance

  • Bucketing

  • Hive on Spark

Apache Impala Optimisation

  • How Impala Executes Queries

  • Improving Impala Performance

Extending Apache Hive and Impala

  • Custom SerDes and File Formats in Hive

  • Data Transformation with Custom Scripts in Hive

  • User-Defined Functions

  • Parameterised Queries

Choosing the Best Tool for the Job

  • Comparing Hive, Impala, and Relational Databases

  • Which to Choose?


Prerequisites

Some knowledge of SQL is assumed, as is basic Linux command-line familiarity.Prior knowledge of Apache Hadoop is not required.


Terms & Conditions

The supply of this course by Lumify Work is governed by the booking terms and conditions. Please read the terms and conditions carefully before enrolling in this course, as enrolment in the course is conditional on acceptance of these terms and conditions.


Request Course Information

By submitting an enquiry, you agree to our privacy policy and receiving email and other forms of communication from us. You can opt-out at any time.

Select and book a course

Can't find a date you like?

Contact sales

Stay ahead of the technology curve

Don’t let your tech outpace the skills of your people

Quality Instructors and Content

Expert instructors with real world experience and the latest vendor-approved in-depth course content.

Partner-Preferred Supplier

Chosen and awarded by the world's leading vendors as preferred training partner.

Ahead of the Technology Curve

No matter your chosen technologies or platforms, we can help you stay one step ahead.


Looking for more course options?