Use ON CONFLICT for upsert #653

simonw · 2025-05-08T00:10:55Z

Refs:

Use UPSERT (aka INSERT ... ON CONFLICT SET) in .upsert() and .upsert_all() when available #652

📚 Documentation preview 📚: https://sqlite-utils--653.org.readthedocs.build/en/653/

simonw · 2025-05-08T04:23:52Z

Tried running the test failures through o3 which cost $1.61 and forgot to return the actual solutions, instead providing a bulleted list of what the fixes did without detailing what those fixes are! https://gist.github.com/simonw/6f4f211fdc2d1bc894193fa4b9a5de4b

Gemini 2.5 Flash (thinking) for 8.3 cents was a lot more useful: https://gist.github.com/simonw/d867a1791b40b3a5fbfb66b75417c0a3

simonw · 2025-05-09T02:12:03Z

I tried this but it caused more test failures elsewhere, so I dropped it:

diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index b960790..b9be730 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -3300,13 +3300,23 @@ class Table(Queryable):
 
         Use ``analyze=True`` to run ``ANALYZE`` after the insert has completed.
         """
+        hash_id = self.value_or_default("hash_id", hash_id)
+
+        if pk is DEFAULT and not hash_id:
+            introspected_pks = self.pks
+            if introspected_pks and introspected_pks != ["rowid"]:
+                pk = (
+                    introspected_pks[0]
+                    if len(introspected_pks) == 1
+                    else tuple(introspected_pks)
+                )
+
         pk = self.value_or_default("pk", pk)
         foreign_keys = self.value_or_default("foreign_keys", foreign_keys)
         column_order = self.value_or_default("column_order", column_order)
         not_null = self.value_or_default("not_null", not_null)
         defaults = self.value_or_default("defaults", defaults)
         batch_size = self.value_or_default("batch_size", batch_size)
-        hash_id = self.value_or_default("hash_id", hash_id)
         hash_id_columns = self.value_or_default("hash_id_columns", hash_id_columns)
         alter = self.value_or_default("alter", alter)
         ignore = self.value_or_default("ignore", ignore)

simonw · 2025-05-09T02:12:33Z

I think I'm fighting with two problems right now:

Whatever is going on with the indexes
The thing where a table doesn't remember what its primary keys are supposed to be

simonw · 2025-05-09T02:13:19Z

I don't understand why the UPSERT refactor has caused these new problems relating to primary keys that weren't an issue before.

sqlite_utils/db.py

simonw · 2025-05-09T02:20:07Z

OK, just these test failures still to fix:

FAILED tests/test_extracts.py::test_extracts[True-kwargs0-Species] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[True-kwargs1-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[True-kwargs2-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[False-kwargs0-Species] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[False-kwargs1-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[False-kwargs2-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_lookup.py::test_lookup_with_extra_insert_parameters - AssertionError: assert {'convert_to_...0-01-01', ...} == {'convert_to_...0-01-01', ...}

simonw · 2025-05-09T02:22:36Z

Trying Gemini 2.5 Flash again for some ideas:

pytest -p no:warnings > /tmp/errors.txt
files-to-prompt . -e py -c > /tmp/files.txt
llm -m gemini-2.5-flash-preview-04-17 -f /tmp/errors.txt -f /tmp/files.txt \
  -s 'Analyze cause of test failures and suggest fixes'

git diff main | llm -c 'These are the changes I made which caused the tests to fail'

https://gist.github.com/simonw/1383565aac316d68cc29f289e33b2e51#response-1

simonw · 2025-05-09T02:47:56Z

Abandoning this change too:

diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 89648e8..e9825ac 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -3319,6 +3319,40 @@ class Table(Queryable):
         if upsert and (not pk and not hash_id):
             raise PrimaryKeyRequired("upsert() requires a pk")
 
+        extracts_resolved = resolve_extracts(
+            self.value_or_default("extracts", extracts)
+        )
+        conversions_resolved = self.value_or_default("conversions", conversions) or {}
+
+        def process_records_with_extracts(records_iterator):
+            for record in records_iterator:
+                modified_record = dict(record)  # Work on a copy
+                for original_col, extracted_table_name in extracts_resolved.items():
+                    original_value = modified_record.get(original_col)
+                    # Handle None/empty strings like Table.extract does (map to None)
+                    if original_value is not None and original_value != "":
+                        # Use lookup to get/create ID in the extracted table
+                        # This handles table creation, unique index, and value insertion
+                        extracted_table_obj = self.db[extracted_table_name]
+                        # Lookup on the 'value' column is the standard behavior for extracts
+                        lookup_id = extracted_table_obj.lookup(
+                            {"value": original_value}
+                        )
+                        # Replace original value with the integer ID
+                        modified_record[original_col] = lookup_id
+                    else:
+                        # Ensure it's None if it was an empty string or None
+                        modified_record[original_col] = None
+
+                yield modified_record
+
+        if extracts_resolved:
+            # Process records through the extraction generator
+            records = process_records_with_extracts(iter(records))
+        else:
+            # If no extracts, just use the original iterator
+            records = iter(records)
+
         assert not (hash_id and pk), "Use either pk= or hash_id="
         if hash_id_columns and (hash_id is None):
             hash_id = "id"

simonw · 2025-05-09T02:49:18Z

Here's the code deleted from the start of the build_insert_queries_and_params() method that used to handle this:

sqlite-utils/sqlite_utils/db.py

Lines 2975 to 2993 in 72f6c82

    
           extracts = resolve_extracts(extracts) 
        
           for record in chunk: 
        
               record_values = [] 
        
               for key in all_columns: 
        
                   value = jsonify_if_needed( 
        
                       record.get( 
        
                           key, 
        
                           ( 
        
                               None 
        
                               if key != hash_id 
        
                               else hash_record(record, hash_id_columns) 
        
                           ), 
        
                       ) 
        
                   ) 
        
                   if key in extracts: 
        
                       extract_table = extracts[key] 
        
                       value = self.db[extract_table].lookup({"value": value}) 
        
                   record_values.append(value) 
        
               values.append(record_values)

codecov · 2025-05-09T02:53:59Z

Codecov Report

❌ Patch coverage is 93.68421% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.56%. Comparing base (72f6c82) to head (79913af).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
sqlite_utils/db.py	93.68%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #653      +/-   ##
==========================================
- Coverage   95.64%   95.56%   -0.08%     
==========================================
  Files           8        8              
  Lines        2867     2911      +44     
==========================================
+ Hits         2742     2782      +40     
- Misses        125      129       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Refs #652 (comment)

simonw · 2025-05-09T03:02:51Z

Getting a bunch of errors on SQLite 3.23.1 - the branch that uses old UPSERT. They look like this:

self = <Database <sqlite3.Connection object at 0x7f4822e46a80>>
sql = 'UPDATE [table] AS m SET [a] = v.[a], [b] = v.[b], [c] = v.[c] FROM (VALUES (?, ?, ?, ?)) AS v(custom_id, a, b, c) WHERE m.[custom_id] = v.[custom_id]'
parameters = ['4acc71e0547112eb432f0a36fb1924c4a738cb49', 1, 2, 3]

    def execute(
        self, sql: str, parameters: Optional[Union[Iterable, dict]] = None
    ) -> sqlite3.Cursor:
        """
        Execute SQL query and return a ``sqlite3.Cursor``.
    
        :param sql: SQL query to execute
        :param parameters: Parameters to use in that query - an iterable for ``where id = ?``
          parameters, or a dictionary for ``where id = :id``
        """
        if self._tracer:
            self._tracer(sql, parameters)
        if parameters is not None:
>           return self.conn.execute(sql, parameters)
E           sqlite3.OperationalError: near "AS": syntax error

Full list of failing tests:

FAILED tests/test_conversions.py::test_upsert_conversion - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_conversions.py::test_upsert_all_conversion - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_create.py::test_create_table_with_custom_columns[upsert] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_create.py::test_create_table_with_custom_columns[upsert_all] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_create.py::test_insert_all_analyze[upsert_all] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_all - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_all_not_null - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_with_hash_id - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_with_hash_id_columns[None] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_with_hash_id_columns[custom_id] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_compound_primary_key - sqlite3.OperationalError: near "AS": syntax error

Code at fault:

sqlite-utils/sqlite_utils/db.py

Lines 3113 to 3118 in efeba1f

    
           update_sql = ( 
        
               f"UPDATE [{self.name}] AS m SET {assignments_sql} " 
        
               f"FROM (VALUES {row_placeholders_sql}) " 
        
               f"AS v({alias_cols_sql}) " 
        
               f"WHERE " + " AND ".join(f"m.[{c}] = v.[{c}]" for c in pk_cols) 
        
           )

simonw · 2025-05-09T03:03:51Z

I can actually exercise that code even without installing SQLite 3.23.1 in my development environment by setting the new use_old_upsert=True argument in the Database() constructor.

I'll parametrize that for test_create_table_with_custom_columns and test_upsert.

Refs #653 (comment)

simonw · 2025-05-09T03:13:19Z

New test failure:

FAILED tests/test_create.py::test_create_table_with_custom_columns[False-upsert] - AssertionError: assert ['dogs'] == ['dogs', 't2c...22ab3c51f194']
  
  Right contains one more item: 't2ca612dba8569c9fc12922ab3c51f194'
  
  Full diff:
    [
        'dogs',
  -     't2ca612dba8569c9fc12922ab3c51f194',
    ]

Looks to me like the temporary table created here was not correctly cleaned up:

sqlite-utils/sqlite_utils/db.py

Lines 691 to 716 in 41497c3

    
           @property 
        
           def supports_on_conflict(self) -> bool: 
        
               # SQLite's upsert is implemented as INSERT INTO ... ON CONFLICT DO ... 
        
               if not hasattr(self, "_supports_on_conflict"): 
        
                   try: 
        
                       table_name = "t{}".format(secrets.token_hex(16)) 
        
                       with self.conn: 
        
                           self.conn.execute( 
        
                               "create table {} (id integer primary key, name text)".format( 
        
                                   table_name 
        
                               ) 
        
                           ) 
        
                           self.conn.execute( 
        
                               "insert into {} (id, name) values (1, 'one')".format(table_name) 
        
                           ) 
        
                           self.conn.execute( 
        
                               ( 
        
                                   "insert into {} (id, name) values (1, 'two') " 
        
                                   "on conflict do update set name = 'two'" 
        
                               ).format(table_name) 
        
                           ) 
        
                           self.conn.execute("drop table {}".format(table_name)) 
        
                       self._supports_on_conflict = True 
        
                   except Exception: 
        
                       self._supports_on_conflict = False 
        
               return self._supports_on_conflict

Refs #653 (comment)

simonw · 2025-05-09T03:18:00Z

Hopefully down to the last round of failures now, which are caused by the error messages differing:

    def test_upsert_alter(db_path, tmpdir):
        json_path = str(tmpdir / "dogs.json")
        db = Database(db_path)
        insert_dogs = [{"id": 1, "name": "Cleo"}]
        write_json(json_path, insert_dogs)
        result = CliRunner().invoke(
            cli.cli, ["insert", db_path, "dogs", json_path, "--pk", "id"]
        )
        assert result.exit_code == 0, result.output
        # Should fail with error code if no --alter
        upsert_dogs = [{"id": 1, "age": 5}]
        write_json(json_path, upsert_dogs)
        result = CliRunner().invoke(
            cli.cli, ["upsert", db_path, "dogs", json_path, "--pk", "id"]
        )
        assert result.exit_code == 1
>       assert (
            "Error: table dogs has no column named age\n\n"
            "Try using --alter to add additional columns"
        ) == result.output.strip()
E       AssertionError: assert 'Error: table...ional columns' == 'Error: no su...ters = [5, 1]'
E         
E         - Error: no such column: age
E         + Error: table dogs has no column named age
E           
E         + Try using --alter to add additional columns
E         - sql = UPDATE [dogs] SET [age] = ? WHERE [id] = ?
E         - parameters = [5, 1]

Refs #653 (comment)

simonw added the enhancement New feature or request label May 8, 2025

simonw mentioned this pull request May 8, 2025

Test against multiple SQLite versions #654

Merged

simonw force-pushed the upsert branch from eaa3dc1 to 9039832 Compare May 8, 2025 00:52

simonw added 2 commits May 7, 2025 17:54

supports_strict now caches on self._supports_strict

55047db

WIP new upsert implementation, refs #652

fe3a3d4

simonw force-pushed the upsert branch from 9039832 to fe3a3d4 Compare May 8, 2025 00:54

simonw mentioned this pull request May 8, 2025

Whether or not a table remembers or detects its primary key is confusing and messy #655

Closed

simonw commented May 9, 2025

View reviewed changes

sqlite_utils/db.py Outdated Show resolved Hide resolved

simonw added 2 commits May 8, 2025 19:15

Remove upsert() requires a pk checks

2ab894d

Put PrimaryKeyRequired back in the right place

ac9f820

Fixed remaining broken tests

058a8df

Fixed mypy complaints

e9a6b4c

simonw force-pushed the upsert branch from c096156 to e9a6b4c Compare May 9, 2025 02:57

Prior to sqlite-utils 4.0

efeba1f

Refs #652 (comment)

Fix for use_old_upsert branch

41497c3

Refs #653 (comment)

simonw added 2 commits May 8, 2025 20:15

Drop table in finally

4940755

Refs #653 (comment)

Use drop table if exists

fa6ae2b

Handle different SQLite version error messages

79913af

Refs #653 (comment)

simonw marked this pull request as ready for review May 9, 2025 03:24

simonw merged commit 8e7d018 into main May 9, 2025
86 checks passed

simonw deleted the upsert branch May 9, 2025 03:37

Uh oh!

Use ON CONFLICT for upsert #653

Use ON CONFLICT for upsert #653

Uh oh!

Conversation

simonw commented May 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 8, 2025

Uh oh!

simonw commented May 9, 2025

Uh oh!

simonw commented May 9, 2025

Uh oh!

simonw commented May 9, 2025

Uh oh!

Uh oh!

simonw commented May 9, 2025

Uh oh!

simonw commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 9, 2025

Uh oh!

simonw commented May 9, 2025

Uh oh!

codecov bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

simonw commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 9, 2025

Uh oh!

simonw commented May 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonw commented May 8, 2025 •

edited by github-actions bot

Loading

simonw commented May 9, 2025 •

edited

Loading

codecov bot commented May 9, 2025 •

edited

Loading

simonw commented May 9, 2025 •

edited

Loading

simonw commented May 9, 2025 •

edited

Loading