You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/storage_clients.mdx
+10-7Lines changed: 10 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ Crawlee provides three main storage client implementations:
28
28
29
29
- <ApiLinkto="class/FileSystemStorageClient">`FileSystemStorageClient`</ApiLink> - Provides persistent file system storage with in-memory caching.
30
30
- <ApiLinkto="class/MemoryStorageClient">`MemoryStorageClient`</ApiLink> - Stores data in memory with no persistence.
31
-
- <ApiLinkto="class/SqlStorageClient">`SqlStorageClient`</ApiLink> - Provides persistent storage using a SQL database ([SQLite](https://sqlite.org/) or [PostgreSQL](https://www.postgresql.org/)). Requires installing the extra dependency: `crawlee[sql_sqlite]` for SQLite or `crawlee[sql_postgres]` for PostgreSQL.
31
+
- <ApiLinkto="class/SqlStorageClient">`SqlStorageClient`</ApiLink> - Provides persistent storage using a SQL database ([SQLite](https://sqlite.org/), [PostgreSQL](https://www.postgresql.org/), [MySQL](https://www.mysql.com/) or [MariaDB](https://mariadb.org/)). Requires installing the extra dependency: `crawlee[sql_sqlite]` for SQLite, `crawlee[sql_postgres]` for PostgreSQL or `crawlee[sql_mysql]` for MySQL and MariaDB.
32
32
- <ApiLinkto="class/RedisStorageClient">`RedisStorageClient`</ApiLink> - Provides persistent storage using a [Redis](https://redis.io/) database v8.0+. Requires installing the extra dependency `crawlee[redis]`.
33
33
-[`ApifyStorageClient`](https://docs.apify.com/sdk/python/reference/class/ApifyStorageClient) - Manages storage on the [Apify platform](https://apify.com), implemented in the [Apify SDK](https://github.com/apify/apify-sdk-python).
34
34
@@ -144,7 +144,7 @@ The `MemoryStorageClient` does not persist data between runs. All data is lost w
144
144
The `SqlStorageClient` is experimental. Its API and behavior may change in future releases.
145
145
:::
146
146
147
-
The <ApiLinkto="class/SqlStorageClient">`SqlStorageClient`</ApiLink> provides persistent storage using a SQL database (SQLite by default, or PostgreSQL). It supports all Crawlee storage types and enables concurrent access from multiple independent clients or processes.
147
+
The <ApiLinkto="class/SqlStorageClient">`SqlStorageClient`</ApiLink> provides persistent storage using a SQL database (SQLite by default, or PostgreSQL, MySQL, MariaDB). It supports all Crawlee storage types and enables concurrent access from multiple independent clients or processes.
148
148
149
149
:::note dependencies
150
150
The <ApiLinkto="class/SqlStorageClient">`SqlStorageClient`</ApiLink> is not included in the core Crawlee package.
@@ -154,10 +154,12 @@ To use it, you need to install Crawlee with the appropriate extra dependency:
154
154
<code>pip install 'crawlee[sql_sqlite]'</code>
155
155
- For PostgreSQL support, run:
156
156
<code>pip install 'crawlee[sql_postgres]'</code>
157
+
- For MySQL or MariaDB support, run:
158
+
<code>pip install 'crawlee[sql_mysql]'</code>
157
159
:::
158
160
159
161
By default, <ApiLinkto="class/SqlStorageClient">SqlStorageClient</ApiLink> uses SQLite.
160
-
To use PostgreSQL instead, just provide a PostgreSQL connection string via the `connection_string` parameter. No other code changes are needed—the same client works for both databases.
162
+
To use a different database, just provide the appropriate connection string via the `connection_string` parameter. No other code changes are needed—the same client works for all supported databases.
@@ -214,7 +216,6 @@ class dataset_metadata_buffer {
214
216
+ id (PK)
215
217
+ accessed_at
216
218
+ modified_at
217
-
+ dataset_id (FK)
218
219
+ delta_item_count
219
220
}
220
221
@@ -247,7 +248,6 @@ class key_value_store_metadata_buffer {
247
248
+ id (PK)
248
249
+ accessed_at
249
250
+ modified_at
250
-
+ key_value_store_id (FK)
251
251
}
252
252
253
253
%% ========================
@@ -321,7 +321,6 @@ class request_queue_metadata_buffer {
321
321
+ id (PK)
322
322
+ accessed_at
323
323
+ modified_at
324
-
+ request_queue_id (FK)
325
324
+ client_id
326
325
+ delta_handled_count
327
326
+ delta_pending_count
@@ -346,11 +345,15 @@ Configuration options for the <ApiLink to="class/SqlStorageClient">`SqlStorageCl
346
345
347
346
Configuration options for the <ApiLinkto="class/SqlStorageClient">`SqlStorageClient`</ApiLink> can be set via constructor arguments:
348
347
349
-
-**`connection_string`** (default: SQLite in <ApiLinkto="class/Configuration">`Configuration`</ApiLink> storage dir) - SQLAlchemy connection string, e.g. `sqlite+aiosqlite:///my.db` or `postgresql+asyncpg://user:pass@host/db`.
348
+
-**`connection_string`** (default: SQLite in <ApiLinkto="class/Configuration">`Configuration`</ApiLink> storage dir) - SQLAlchemy connection string, e.g. `sqlite+aiosqlite:///my.db`, `postgresql+asyncpg://user:pass@host/db`, `mysql+aiomysql://user:pass@host/db` or `mariadb+aiomysql://user:pass@host/db`.
For advanced scenarios, you can configure <ApiLinkto="class/SqlStorageClient">`SqlStorageClient`</ApiLink> with a custom SQLAlchemy engine and additional options via the <ApiLinkto="class/Configuration">`Configuration`</ApiLink> class. This is useful, for example, when connecting to an external PostgreSQL database or customizing connection pooling.
353
352
353
+
:::warning
354
+
If you use MySQL or MariaDB, pass the `isolation_level='READ COMMITTED'` argument to `create_async_engine`. MySQL/MariaDB default to the `REPEATABLE READ` isolation level, which can cause unnecessary locking, deadlocks, or stale reads when multiple Crawlee workers access the same tables concurrently. Using `READ COMMITTED` ensures more predictable row-level locking and visibility semantics for `SqlStorageClient`.
0 commit comments