-
Notifications
You must be signed in to change notification settings - Fork 57
Description
What happens?
If you have a python dictionary with two keys of the same word but with different casing e.g. {"Key": "first", "key": "second"}, the last value is used for both keys in the resulting anonymous struct.
To Reproduce
Test Case
import duckdb
conn = duckdb.connect()
print(conn.execute("SELECT ?", [{"Key": "first", "key": "second"}]).fetchone())
print(conn.execute("SELECT ?", [{"key": "second", "Key": "first"}]).fetchone())Output
({'Key': 'second', 'key': 'second'},)
({'key': 'first', 'Key': 'first'},)
Analysis
I tracked this down from the ExecuteMany to TransformPreparedParameters to finally TransformDictionaryToStruct. I'm pretty confident that I've got the call path correct given that when you binding parameters LogicalType is LogicalType::UNKNOWN.
https://github.com/duckdb/duckdb-python/blob/main/src/duckdb_py/native/python_conversion.cpp#L78
When reviewing the function you can see that the keys are placed in a case insensitive map.
case_insensitive_map_t<idx_t> key_mapping;
for (idx_t i = 0; i < struct_keys.size(); i++) {
key_mapping[struct_keys[i]] = i;
}That mapping is then used to grab the value by index via auto val = TransformPythonValue(dict.values.attr("__getitem__")(value_index). This is why the last value wins because key_mapping always get the last index.
I'll be the first to admit my lack of knowledge around C++ and making python bindings. But the implementation of re-accessing dict.values.attr("__getitem__")(value_index) feels odd to me when the keys and values are always of equal length. This is enforced in the initial assertion of the function.
I suspected it was because the TransformStructKeys function would return a set of keys smaller than the original based on the LogicalType but when I looked it simply unpacks the values. So maybe it used to work this way but not anymore. Based on this it appears the implementation could be simplified to this:
Value TransformDictionaryToStruct(const PyDictionary &dict, const LogicalType &target_type = LogicalType::UNKNOWN) {
bool struct_target = target_type.id() == LogicalTypeId::STRUCT;
if (struct_target && dict.len != StructType::GetChildCount(target_type)) {
throw InvalidInputException("We could not convert the object %s to the desired target type (%s)",
dict.ToString(), target_type.ToString());
}
auto struct_keys = TransformStructKeys(dict.keys, dict.len, target_type);
child_list_t<Value> struct_values;
for (idx_t i = 0; i < dict.len; i++) {
auto &key = struct_target ? StructType::GetChildName(target_type, i) : struct_keys[i];
auto &child_type = struct_target ? StructType::GetChildType(target_type, i) : LogicalType::UNKNOWN;
auto val = TransformPythonValue(dict.values.attr("__getitem__")(i), child_type);
struct_values.emplace_back(make_pair(std::move(key), std::move(val)));
}
return Value::STRUCT(std::move(struct_values));
}Probably simplified more if the string conversion in TransformStructKeys were inlined into the main for loop.
OS:
MacOS aarch64
DuckDB Package Version:
1.4.1
Python Version:
3.10.15
Full Name:
Peter Snelgrove
Affiliation:
Rapid7
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have