Skip to content

DeltaMichael/versatile-data-kit

 
 

Repository files navigation

Versatile Data Kit Versatile Data Kit

Last Activity license pre-commit build status twitter YouTube Channel Subscribers

Overview

Versatile Data Kit (VDK) is an open source framework that enables anyone with basic SQL or Python knowledge to build, run, and manage their own data workflows.

Data processing instructions use plain text SQL or python files that are executed sequentially in alphanumeric order, allowing you to easily build your data workflows.

VDK is built for resiliency and can recover in mid-process or restart entirely from the start.

Data Journey and Versatile Data Kit

VDK creates data processing workflows to:

  • Ingest data (extract)
  • Transform data (transform)
  • Export data (load)

Data Journey Data Journey

Solve common data engineering problems

  • Ingest data from different sources, including CSV files, JSON objects, and data from REST API services.
  • Use Python/SQL and VDK templates to transform data.
  • Ensure data applications are packaged, versioned, and deployed correctly while dealing with credentials, retries, and reconnects.
  • Provide built-in monitoring and smart notification capabilities.
  • Track both code and data modifications and the relationship between them, allowing quicker troubleshooting and version rollback.

Without / With Versatile Data Kit Without / With Versatile Data Kit

Versatile Data Kit Components

  • Software Development Kit (SDK):
    • Tools to automate the extraction, transformation, and loading of data.
    • A plugin framework that allows users to extend the framework according to their specific requirements.
  • Control Service: The Control Service allows users to create, deploy, manage, and execute data jobs in a Kubernetes runtime environment.

Getting Started

Installing VDK is a simple pip command. See the Getting Started guide to install VDK and create a data job.

Next Steps

Contributing

Create an issue or pull request on GitHub to submit suggestions or changes. If you are interested in contributing as a developer, visit the contributing page.

Contacts

Code of Conduct

Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels, and mailing lists is expected to be familiar with and follow the Code of Conduct.

About

Build, run and manage your data pipelines with Python or SQL on any cloud

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 39.3%
  • Java 28.4%
  • TypeScript 25.5%
  • HTML 2.9%
  • SCSS 1.6%
  • Shell 1.0%
  • Other 1.3%