Skip to content

Dictionary key case sensitivity is not respected for parameter bindings #171

@psnelgrove-r7

Description

@psnelgrove-r7

What happens?

If you have a python dictionary with two keys of the same word but with different casing e.g. {"Key": "first", "key": "second"}, the last value is used for both keys in the resulting anonymous struct.

To Reproduce

Test Case

import duckdb
conn = duckdb.connect()
print(conn.execute("SELECT ?", [{"Key": "first", "key": "second"}]).fetchone())
print(conn.execute("SELECT ?", [{"key": "second", "Key": "first"}]).fetchone())

Output

({'Key': 'second', 'key': 'second'},)
({'key': 'first', 'Key': 'first'},)

Analysis

I tracked this down from the ExecuteMany to TransformPreparedParameters to finally TransformDictionaryToStruct. I'm pretty confident that I've got the call path correct given that when you binding parameters LogicalType is LogicalType::UNKNOWN.

https://github.com/duckdb/duckdb-python/blob/main/src/duckdb_py/native/python_conversion.cpp#L78

When reviewing the function you can see that the keys are placed in a case insensitive map.

case_insensitive_map_t<idx_t> key_mapping;
for (idx_t i = 0; i < struct_keys.size(); i++) {
   key_mapping[struct_keys[i]] = i;
}

That mapping is then used to grab the value by index via auto val = TransformPythonValue(dict.values.attr("__getitem__")(value_index). This is why the last value wins because key_mapping always get the last index.

I'll be the first to admit my lack of knowledge around C++ and making python bindings. But the implementation of re-accessing dict.values.attr("__getitem__")(value_index) feels odd to me when the keys and values are always of equal length. This is enforced in the initial assertion of the function.

I suspected it was because the TransformStructKeys function would return a set of keys smaller than the original based on the LogicalType but when I looked it simply unpacks the values. So maybe it used to work this way but not anymore. Based on this it appears the implementation could be simplified to this:

Value TransformDictionaryToStruct(const PyDictionary &dict, const LogicalType &target_type = LogicalType::UNKNOWN) {
	bool struct_target = target_type.id() == LogicalTypeId::STRUCT;
	if (struct_target && dict.len != StructType::GetChildCount(target_type)) {
		throw InvalidInputException("We could not convert the object %s to the desired target type (%s)",
		                            dict.ToString(), target_type.ToString());
	}

	auto struct_keys = TransformStructKeys(dict.keys, dict.len, target_type);

	child_list_t<Value> struct_values;
	for (idx_t i = 0; i < dict.len; i++) {
		auto &key = struct_target ? StructType::GetChildName(target_type, i) : struct_keys[i];
		auto &child_type = struct_target ? StructType::GetChildType(target_type, i) : LogicalType::UNKNOWN;
		auto val = TransformPythonValue(dict.values.attr("__getitem__")(i), child_type);
		struct_values.emplace_back(make_pair(std::move(key), std::move(val)));
	}
	return Value::STRUCT(std::move(struct_values));
}

Probably simplified more if the string conversion in TransformStructKeys were inlined into the main for loop.

OS:

MacOS aarch64

DuckDB Package Version:

1.4.1

Python Version:

3.10.15

Full Name:

Peter Snelgrove

Affiliation:

Rapid7

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions