You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 25, 2023. It is now read-only.
Copy file name to clipboardExpand all lines: content/docs/0000_getting-started/0150_frequently_asked_questions.mdx
+22-8Lines changed: 22 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,28 +76,42 @@ PostgreSQL deployment can also be used on Splitgraph.
76
76
No. Splitgraph can be used in a decentralized way, sharing data between two engines like one would
77
77
with Git. Here's an [example](https://github.com/splitgraph/splitgraph/tree/master/examples/push-to-other-engine) of getting two Splitgraph instances to synchronize with each other.
78
78
79
-
It is also possible to push data to S3-compatible storage (like [Minio](https://github.com/splitgraph/splitgraph/tree/487c704eb6aba5025708215bfa80399723c530b1/examples/push-to-object-storage)).
79
+
It is also possible to push data to S3-compatible storage (like [Minio](https://github.com/splitgraph/splitgraph/tree/master/examples/push-to-object-storage)).
80
80
81
81
You can use [Splitgraph Cloud](../splitgraph_cloud/introduction) if you wish to
82
82
get or share public data or have a [REST API](../splitgraph_cloud/publish_rest_api) generated for your dataset.
83
83
84
84
### Why not just use...
85
85
86
-
#### dbt, Pachyderm, ...
86
+
#### dbt
87
87
88
-
There are plenty of great tools around for building datasets and managing ETL pipelines. Firstly,
89
-
they can also work against Splitgraph, since a Splitgraph engine is also a PostgreSQL instance.
90
-
After the dataset is built, one can snapshot the schema it was built in and package it up as a Splitgraph image.
91
-
This enriches the tool by adding version control, packaging and sharing to datasets that it uses and builds.
88
+
dbt is a tool for transforming data inside of the data warehouse that allows users to build up
89
+
transformations from reusable and versionable SQL snippets.
92
90
93
-
We have an example of running [dbt](../integrating_splitgraph/dbt) against Splitgraph, swapping between different versions of the
91
+
dbt is enhanced by Splitgraph: since a Splitgraph engine is also a PostgreSQL instance, dbt can
92
+
work against it, getting benefits like version control, packaging and sharing to datasets that it uses and builds.
93
+
94
+
We have an example of running [dbt](../integrating_splitgraph/dbt) in such way, swapping between different versions of the
94
95
source dataset and looking at their effect on the built dbt model.
95
96
96
-
Secondly, Splitgraph offers its own method of building datasets: [Splitfiles](../concepts/splitfiles). Splitfiles offer Dockerfile-like caching, provenance tracking, fast dataset rebuilds, joins between datasets and full SQL support.
97
+
Splitgraph also offers its own method of building datasets: [Splitfiles](../concepts/splitfiles). Splitfiles offer Dockerfile-like caching, provenance tracking, fast dataset rebuilds, joins between datasets and full SQL support.
97
98
98
99
We envision Splitfiles as a replacement for ETL pipelines: instead of a series of processes that transform data between tables in a data warehouse,
99
100
transformations are treated as pure functions between isolated self-contained datasets, allowing one to replay any part of their pipeline at any point in time.
100
101
102
+
#### Pachyderm
103
+
104
+
Pachyderm is used mostly for managing and running distributed data pipelines on flat files (images,
105
+
genomics data etc). By specializing in datasets that can be represented as tables in a database,
106
+
Splitgraph gets benefits like delta compression on changed data or faster querying speeds.
107
+
108
+
Similarly to Pachyderm, Splitgraph supports [data lineage (or provenance)](../working_with_data/inspecting_provenance) tracking where the
109
+
commands and source datasets that were used to build a particular dataset are recorded in that
110
+
dataset's metadata, allowing for them to be replayed or inspected.
111
+
112
+
Splitgraph can be integrated with Pachyderm using the same methods one would use [for PostgreSQL](https://docs.pachyderm.com/latest/how-tos/splitting-data/splitting/#ingesting-postgressql-data). This can then be used to run a [Splitfile](../concepts/splitfiles) to build a dataset as a
113
+
Pachyderm stage.
114
+
101
115
#### dvc, DataLad, ...
102
116
103
117
Some tools use [git-annex](https://git-annex.branchable.com/) to version code and data together.
0 commit comments