KMS Healthcare

[VIE] The Lakehouse Architecture

zoom Join Now

The Topic Covers

Data lakehouse architectures gained popularity around 2020, with the term coined by the company Databricks. The concept of a lakehouse revolves around replacing the traditional relational data warehouse with a single repository, a data lake, in your data architecture. This approach allows for the ingestion of all types of data—structured, semi-structured, and unstructured—into the lakehouse, enabling queries and reports to be performed directly from this unified source.

A data pipeline, also known as data processing, is a systematic process that involves collecting, transforming, and moving data from various sources to a destination where it can be analyzed, stored, or utilized. It encompasses stages such as data extraction, data cleansing, data transformation, and data loading. Data pipelines are designed to ensure data quality, reliability, and efficiency, facilitating the smooth flow of data for insights, reporting, or other business purposes.

In this presentation, we will delve into the concepts of the Lakehouse architecture and explore the underlying technology of Delta Lake. Our focus will be on applying these concepts to our HR Data Warehouse project, utilizing tools such as Spark, Hive, Trino, S3, Kafka, and implementing simple data pipelines on AWS using Data Analytics Services such as Glue, EMR, Athena, Kinesis, Lambda, and Redshift.

Language: Vietnamese

Duration: 30 minutes

Our Speaker(s)

Duc Vo

SSE

KMS Healthcare