-
-
Notifications
You must be signed in to change notification settings - Fork 123
Use ON CONFLICT for upsert #653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tried running the test failures through o3 which cost $1.61 and forgot to return the actual solutions, instead providing a bulleted list of what the fixes did without detailing what those fixes are! https://gist.github.com/simonw/6f4f211fdc2d1bc894193fa4b9a5de4b Gemini 2.5 Flash (thinking) for 8.3 cents was a lot more useful: https://gist.github.com/simonw/d867a1791b40b3a5fbfb66b75417c0a3 |
|
I tried this but it caused more test failures elsewhere, so I dropped it: diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index b960790..b9be730 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -3300,13 +3300,23 @@ class Table(Queryable):
Use ``analyze=True`` to run ``ANALYZE`` after the insert has completed.
"""
+ hash_id = self.value_or_default("hash_id", hash_id)
+
+ if pk is DEFAULT and not hash_id:
+ introspected_pks = self.pks
+ if introspected_pks and introspected_pks != ["rowid"]:
+ pk = (
+ introspected_pks[0]
+ if len(introspected_pks) == 1
+ else tuple(introspected_pks)
+ )
+
pk = self.value_or_default("pk", pk)
foreign_keys = self.value_or_default("foreign_keys", foreign_keys)
column_order = self.value_or_default("column_order", column_order)
not_null = self.value_or_default("not_null", not_null)
defaults = self.value_or_default("defaults", defaults)
batch_size = self.value_or_default("batch_size", batch_size)
- hash_id = self.value_or_default("hash_id", hash_id)
hash_id_columns = self.value_or_default("hash_id_columns", hash_id_columns)
alter = self.value_or_default("alter", alter)
ignore = self.value_or_default("ignore", ignore) |
|
I think I'm fighting with two problems right now:
|
|
I don't understand why the UPSERT refactor has caused these new problems relating to primary keys that weren't an issue before. |
|
OK, just these test failures still to fix: |
|
Trying Gemini 2.5 Flash again for some ideas: pytest -p no:warnings > /tmp/errors.txt
files-to-prompt . -e py -c > /tmp/files.txt
llm -m gemini-2.5-flash-preview-04-17 -f /tmp/errors.txt -f /tmp/files.txt \
-s 'Analyze cause of test failures and suggest fixes'
git diff main | llm -c 'These are the changes I made which caused the tests to fail'https://gist.github.com/simonw/1383565aac316d68cc29f289e33b2e51#response-1 |
|
Abandoning this change too: diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 89648e8..e9825ac 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -3319,6 +3319,40 @@ class Table(Queryable):
if upsert and (not pk and not hash_id):
raise PrimaryKeyRequired("upsert() requires a pk")
+ extracts_resolved = resolve_extracts(
+ self.value_or_default("extracts", extracts)
+ )
+ conversions_resolved = self.value_or_default("conversions", conversions) or {}
+
+ def process_records_with_extracts(records_iterator):
+ for record in records_iterator:
+ modified_record = dict(record) # Work on a copy
+ for original_col, extracted_table_name in extracts_resolved.items():
+ original_value = modified_record.get(original_col)
+ # Handle None/empty strings like Table.extract does (map to None)
+ if original_value is not None and original_value != "":
+ # Use lookup to get/create ID in the extracted table
+ # This handles table creation, unique index, and value insertion
+ extracted_table_obj = self.db[extracted_table_name]
+ # Lookup on the 'value' column is the standard behavior for extracts
+ lookup_id = extracted_table_obj.lookup(
+ {"value": original_value}
+ )
+ # Replace original value with the integer ID
+ modified_record[original_col] = lookup_id
+ else:
+ # Ensure it's None if it was an empty string or None
+ modified_record[original_col] = None
+
+ yield modified_record
+
+ if extracts_resolved:
+ # Process records through the extraction generator
+ records = process_records_with_extracts(iter(records))
+ else:
+ # If no extracts, just use the original iterator
+ records = iter(records)
+
assert not (hash_id and pk), "Use either pk= or hash_id="
if hash_id_columns and (hash_id is None):
hash_id = "id" |
|
Here's the code deleted from the start of the sqlite-utils/sqlite_utils/db.py Lines 2975 to 2993 in 72f6c82
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #653 +/- ##
==========================================
- Coverage 95.64% 95.56% -0.08%
==========================================
Files 8 8
Lines 2867 2911 +44
==========================================
+ Hits 2742 2782 +40
- Misses 125 129 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Getting a bunch of errors on SQLite 3.23.1 - the branch that uses old UPSERT. They look like this: Full list of failing tests: Code at fault: sqlite-utils/sqlite_utils/db.py Lines 3113 to 3118 in efeba1f
|
|
I can actually exercise that code even without installing SQLite 3.23.1 in my development environment by setting the new I'll parametrize that for |
|
New test failure: Looks to me like the temporary table created here was not correctly cleaned up: sqlite-utils/sqlite_utils/db.py Lines 691 to 716 in 41497c3
|
|
Hopefully down to the last round of failures now, which are caused by the error messages differing: |
Refs:
📚 Documentation preview 📚: https://sqlite-utils--653.org.readthedocs.build/en/653/