-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Title of the talk
Do you need Spark to create a data stack?
Description
Spark has been considered a mature and reliable data processing framework for data engineers across the world. But with the evolution of the landscape around data engineering, we have new tools and frameworks available for use.
This talk will focus on using MinIO for object store, duckdb for data warehousing and dbt for processing. We will also look into polars for processing of data as well.
The purpose of this talk is not to declare obsolescence of Spark as a data processing library, but rather suggest alternatives for data engineers which can be useful and better suited for specific situations.
Table of contents
- Introduction
- Background behind Spark
- Current outlook of data engineering
- MinIO as local object store
- Duckdb as data warehouse
- Using dbt to define transforms
Duration (including Q&A)
30-35 mins
Prerequisites
No response
Speaker bio
My LinkedIn ID is-- https://www.linkedin.com/in/sourav-singh-8124b6267
The talk/workshop speaker agrees to
-
Share the slides, code snippets and other material used during the talk
-
If the talk is recorded, you grant the permission to release
the video on PythonPune's YouTube
channel
under CC-BY-4.0
license -
Not do any hiring pitches during the talk and follow the Code
of
Conduct