Skip to content

Conversation

@simonw
Copy link
Owner

@simonw simonw commented May 8, 2025

@simonw simonw added the enhancement New feature or request label May 8, 2025
@simonw
Copy link
Owner Author

simonw commented May 8, 2025

Tried running the test failures through o3 which cost $1.61 and forgot to return the actual solutions, instead providing a bulleted list of what the fixes did without detailing what those fixes are! https://gist.github.com/simonw/6f4f211fdc2d1bc894193fa4b9a5de4b

Gemini 2.5 Flash (thinking) for 8.3 cents was a lot more useful: https://gist.github.com/simonw/d867a1791b40b3a5fbfb66b75417c0a3

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

I tried this but it caused more test failures elsewhere, so I dropped it:

diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index b960790..b9be730 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -3300,13 +3300,23 @@ class Table(Queryable):
 
         Use ``analyze=True`` to run ``ANALYZE`` after the insert has completed.
         """
+        hash_id = self.value_or_default("hash_id", hash_id)
+
+        if pk is DEFAULT and not hash_id:
+            introspected_pks = self.pks
+            if introspected_pks and introspected_pks != ["rowid"]:
+                pk = (
+                    introspected_pks[0]
+                    if len(introspected_pks) == 1
+                    else tuple(introspected_pks)
+                )
+
         pk = self.value_or_default("pk", pk)
         foreign_keys = self.value_or_default("foreign_keys", foreign_keys)
         column_order = self.value_or_default("column_order", column_order)
         not_null = self.value_or_default("not_null", not_null)
         defaults = self.value_or_default("defaults", defaults)
         batch_size = self.value_or_default("batch_size", batch_size)
-        hash_id = self.value_or_default("hash_id", hash_id)
         hash_id_columns = self.value_or_default("hash_id_columns", hash_id_columns)
         alter = self.value_or_default("alter", alter)
         ignore = self.value_or_default("ignore", ignore)

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

I think I'm fighting with two problems right now:

  • Whatever is going on with the indexes
  • The thing where a table doesn't remember what its primary keys are supposed to be

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

I don't understand why the UPSERT refactor has caused these new problems relating to primary keys that weren't an issue before.

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

OK, just these test failures still to fix:

FAILED tests/test_extracts.py::test_extracts[True-kwargs0-Species] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[True-kwargs1-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[True-kwargs2-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[False-kwargs0-Species] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[False-kwargs1-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_extracts.py::test_extracts[False-kwargs2-species_id] - AssertionError: assert [Index(seq=0,...ns=['value'])] == []
FAILED tests/test_lookup.py::test_lookup_with_extra_insert_parameters - AssertionError: assert {'convert_to_...0-01-01', ...} == {'convert_to_...0-01-01', ...}

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

Trying Gemini 2.5 Flash again for some ideas:

pytest -p no:warnings > /tmp/errors.txt
files-to-prompt . -e py -c > /tmp/files.txt
llm -m gemini-2.5-flash-preview-04-17 -f /tmp/errors.txt -f /tmp/files.txt \
  -s 'Analyze cause of test failures and suggest fixes'

git diff main | llm -c 'These are the changes I made which caused the tests to fail'

https://gist.github.com/simonw/1383565aac316d68cc29f289e33b2e51#response-1

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

Abandoning this change too:

diff --git a/sqlite_utils/db.py b/sqlite_utils/db.py
index 89648e8..e9825ac 100644
--- a/sqlite_utils/db.py
+++ b/sqlite_utils/db.py
@@ -3319,6 +3319,40 @@ class Table(Queryable):
         if upsert and (not pk and not hash_id):
             raise PrimaryKeyRequired("upsert() requires a pk")
 
+        extracts_resolved = resolve_extracts(
+            self.value_or_default("extracts", extracts)
+        )
+        conversions_resolved = self.value_or_default("conversions", conversions) or {}
+
+        def process_records_with_extracts(records_iterator):
+            for record in records_iterator:
+                modified_record = dict(record)  # Work on a copy
+                for original_col, extracted_table_name in extracts_resolved.items():
+                    original_value = modified_record.get(original_col)
+                    # Handle None/empty strings like Table.extract does (map to None)
+                    if original_value is not None and original_value != "":
+                        # Use lookup to get/create ID in the extracted table
+                        # This handles table creation, unique index, and value insertion
+                        extracted_table_obj = self.db[extracted_table_name]
+                        # Lookup on the 'value' column is the standard behavior for extracts
+                        lookup_id = extracted_table_obj.lookup(
+                            {"value": original_value}
+                        )
+                        # Replace original value with the integer ID
+                        modified_record[original_col] = lookup_id
+                    else:
+                        # Ensure it's None if it was an empty string or None
+                        modified_record[original_col] = None
+
+                yield modified_record
+
+        if extracts_resolved:
+            # Process records through the extraction generator
+            records = process_records_with_extracts(iter(records))
+        else:
+            # If no extracts, just use the original iterator
+            records = iter(records)
+
         assert not (hash_id and pk), "Use either pk= or hash_id="
         if hash_id_columns and (hash_id is None):
             hash_id = "id"

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

Here's the code deleted from the start of the build_insert_queries_and_params() method that used to handle this:

extracts = resolve_extracts(extracts)
for record in chunk:
record_values = []
for key in all_columns:
value = jsonify_if_needed(
record.get(
key,
(
None
if key != hash_id
else hash_record(record, hash_id_columns)
),
)
)
if key in extracts:
extract_table = extracts[key]
value = self.db[extract_table].lookup({"value": value})
record_values.append(value)
values.append(record_values)

@codecov
Copy link

codecov bot commented May 9, 2025

Codecov Report

❌ Patch coverage is 93.68421% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.56%. Comparing base (72f6c82) to head (79913af).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
sqlite_utils/db.py 93.68% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #653      +/-   ##
==========================================
- Coverage   95.64%   95.56%   -0.08%     
==========================================
  Files           8        8              
  Lines        2867     2911      +44     
==========================================
+ Hits         2742     2782      +40     
- Misses        125      129       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

Getting a bunch of errors on SQLite 3.23.1 - the branch that uses old UPSERT. They look like this:

self = <Database <sqlite3.Connection object at 0x7f4822e46a80>>
sql = 'UPDATE [table] AS m SET [a] = v.[a], [b] = v.[b], [c] = v.[c] FROM (VALUES (?, ?, ?, ?)) AS v(custom_id, a, b, c) WHERE m.[custom_id] = v.[custom_id]'
parameters = ['4acc71e0547112eb432f0a36fb1924c4a738cb49', 1, 2, 3]

    def execute(
        self, sql: str, parameters: Optional[Union[Iterable, dict]] = None
    ) -> sqlite3.Cursor:
        """
        Execute SQL query and return a ``sqlite3.Cursor``.
    
        :param sql: SQL query to execute
        :param parameters: Parameters to use in that query - an iterable for ``where id = ?``
          parameters, or a dictionary for ``where id = :id``
        """
        if self._tracer:
            self._tracer(sql, parameters)
        if parameters is not None:
>           return self.conn.execute(sql, parameters)
E           sqlite3.OperationalError: near "AS": syntax error

Full list of failing tests:

FAILED tests/test_conversions.py::test_upsert_conversion - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_conversions.py::test_upsert_all_conversion - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_create.py::test_create_table_with_custom_columns[upsert] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_create.py::test_create_table_with_custom_columns[upsert_all] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_create.py::test_insert_all_analyze[upsert_all] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_all - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_all_not_null - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_with_hash_id - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_with_hash_id_columns[None] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_with_hash_id_columns[custom_id] - sqlite3.OperationalError: near "AS": syntax error
FAILED tests/test_upsert.py::test_upsert_compound_primary_key - sqlite3.OperationalError: near "AS": syntax error

Code at fault:

update_sql = (
f"UPDATE [{self.name}] AS m SET {assignments_sql} "
f"FROM (VALUES {row_placeholders_sql}) "
f"AS v({alias_cols_sql}) "
f"WHERE " + " AND ".join(f"m.[{c}] = v.[{c}]" for c in pk_cols)
)

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

I can actually exercise that code even without installing SQLite 3.23.1 in my development environment by setting the new use_old_upsert=True argument in the Database() constructor.

I'll parametrize that for test_create_table_with_custom_columns and test_upsert.

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

New test failure:

FAILED tests/test_create.py::test_create_table_with_custom_columns[False-upsert] - AssertionError: assert ['dogs'] == ['dogs', 't2c...22ab3c51f194']
  
  Right contains one more item: 't2ca612dba8569c9fc12922ab3c51f194'
  
  Full diff:
    [
        'dogs',
  -     't2ca612dba8569c9fc12922ab3c51f194',
    ]

Looks to me like the temporary table created here was not correctly cleaned up:

@property
def supports_on_conflict(self) -> bool:
# SQLite's upsert is implemented as INSERT INTO ... ON CONFLICT DO ...
if not hasattr(self, "_supports_on_conflict"):
try:
table_name = "t{}".format(secrets.token_hex(16))
with self.conn:
self.conn.execute(
"create table {} (id integer primary key, name text)".format(
table_name
)
)
self.conn.execute(
"insert into {} (id, name) values (1, 'one')".format(table_name)
)
self.conn.execute(
(
"insert into {} (id, name) values (1, 'two') "
"on conflict do update set name = 'two'"
).format(table_name)
)
self.conn.execute("drop table {}".format(table_name))
self._supports_on_conflict = True
except Exception:
self._supports_on_conflict = False
return self._supports_on_conflict

@simonw
Copy link
Owner Author

simonw commented May 9, 2025

Hopefully down to the last round of failures now, which are caused by the error messages differing:

    def test_upsert_alter(db_path, tmpdir):
        json_path = str(tmpdir / "dogs.json")
        db = Database(db_path)
        insert_dogs = [{"id": 1, "name": "Cleo"}]
        write_json(json_path, insert_dogs)
        result = CliRunner().invoke(
            cli.cli, ["insert", db_path, "dogs", json_path, "--pk", "id"]
        )
        assert result.exit_code == 0, result.output
        # Should fail with error code if no --alter
        upsert_dogs = [{"id": 1, "age": 5}]
        write_json(json_path, upsert_dogs)
        result = CliRunner().invoke(
            cli.cli, ["upsert", db_path, "dogs", json_path, "--pk", "id"]
        )
        assert result.exit_code == 1
>       assert (
            "Error: table dogs has no column named age\n\n"
            "Try using --alter to add additional columns"
        ) == result.output.strip()
E       AssertionError: assert 'Error: table...ional columns' == 'Error: no su...ters = [5, 1]'
E         
E         - Error: no such column: age
E         + Error: table dogs has no column named age
E           
E         + Try using --alter to add additional columns
E         - sql = UPDATE [dogs] SET [age] = ? WHERE [id] = ?
E         - parameters = [5, 1]

@simonw simonw marked this pull request as ready for review May 9, 2025 03:24
@simonw simonw merged commit 8e7d018 into main May 9, 2025
86 checks passed
@simonw simonw deleted the upsert branch May 9, 2025 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants