Skip to content

Do you need Spark to create a data stack? #187

@souravsingh

Description

@souravsingh

Title of the talk

Do you need Spark to create a data stack?

Description

Spark has been considered a mature and reliable data processing framework for data engineers across the world. But with the evolution of the landscape around data engineering, we have new tools and frameworks available for use.

This talk will focus on using MinIO for object store, duckdb for data warehousing and dbt for processing. We will also look into polars for processing of data as well.

The purpose of this talk is not to declare obsolescence of Spark as a data processing library, but rather suggest alternatives for data engineers which can be useful and better suited for specific situations.

Table of contents

  1. Introduction
  2. Background behind Spark
  3. Current outlook of data engineering
  4. MinIO as local object store
  5. Duckdb as data warehouse
  6. Using dbt to define transforms

Duration (including Q&A)

30-35 mins

Prerequisites

No response

Speaker bio

My LinkedIn ID is-- https://www.linkedin.com/in/sourav-singh-8124b6267

The talk/workshop speaker agrees to

Metadata

Metadata

Assignees

No one assigned

    Labels

    talk-proposalNew talk of Python Pune meetup

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions