Skip to content

Commit ef72d74

Browse files
author
Jesse
authored
SQLAlchemy 2: Stop skipping some non-type tests (#247)
* Stop skipping TableDDLTest and permanent skip HasIndexTest We're now in the territory of features that aren't required for sqla2 compat as of pysql==3.0.0 but we may consider adding this in the future. In this case, table comment reflection needs to be manually implemented. Index reflection would require hooking into the compiler to reflect the partition strategy. test_suite.py::HasIndexTest_databricks+databricks::test_has_index[dialect] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index[inspector] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[dialect] SKIPPED (Databricks does not support indexes.) test_suite.py::HasIndexTest_databricks+databricks::test_has_index_schema[inspector] SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_add_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.) test_suite.py::TableDDLTest_databricks+databricks::test_create_index_if_not_exists SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_create_table PASSED test_suite.py::TableDDLTest_databricks+databricks::test_create_table_if_not_exists PASSED test_suite.py::TableDDLTest_databricks+databricks::test_create_table_schema PASSED test_suite.py::TableDDLTest_databricks+databricks::test_drop_index_if_exists SKIPPED (Databricks does not support indexes.) test_suite.py::TableDDLTest_databricks+databricks::test_drop_table PASSED test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_comment SKIPPED (Comment reflection is possible but not implemented in this dialect.) test_suite.py::TableDDLTest_databricks+databricks::test_drop_table_if_exists PASSED test_suite.py::TableDDLTest_databricks+databricks::test_underscore_names PASSED Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Permanently skip QuotedNameArgumentTest with comments The fixes to DESCRIBE TABLE and visit_xxx were necessary to get to the point where I could even determine that these tests wouldn't pass. But those changes are not currently tested in the dialect. If, in the course of reviewing the remaining tests in the compliance suite, I find that these visit_xxxx methods are not tested anywhere else then we should extend test_suite.py with our own tests to confirm the behaviour for ourselves. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Move files from base.py to _ddl.py The presence of this pytest.ini file is _required_ to establish pytest's root_path https://docs.pytest.org/en/7.1.x/reference/customize.html#finding-the-rootdir Without it, the custom pytest plugin from SQLAlchemy can't read the contents of setup.cfg which makes none of the tests runnable. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Emit a warning for certain constructs Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping RowFetchTest Date type work fixed this test failure Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Revise infer_types logic to never infer a TINYINT This allows these SQLAlchemy tests to pass: test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_bound_limit_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_expr_limit_simple_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_expr_offset PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases0] PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases1] PASSED test_suite.py::FetchLimitOffsetTest_databricks+databricks::test_simple_limit_offset[cases2] PASSED This partially reverts the change introduced in #246 Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping FetchLimitOffsetTest I implemented our custom DatabricksStatementCompiler so we can override the default rendering of unbounded LIMIT clauses from `LIMIT -1` to `LIMIT ALL` We also explicitly skip the FETCH clause tests since Databricks doesn't support this syntax. Blacked all source code here too. Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping FutureTableDDLTest Add meaningful skip markers for table comment reflection and indexes Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping Identity column tests This closes #175 Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping HasTableTest Adding the @reflection.cache decorator to has_table is necessary to pass test_has_table_cache Caching calls to has_table improves the efficiency of the connector Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Permanently skip LongNameBlowoutTest Databricks constraint names are limited to 255 characters Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Stop skipping ExceptionTest Black test_suite.py Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Permanently skip LastrowidTest Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Implement PRIMARY KEY and FOREIGN KEY reflection and enable tests Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> * Skip all IdentityColumnTest tests Turns out that none of these can pass for the same reason that the first two seemed un-runnable in db6f52b Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com> --------- Signed-off-by: Jesse Whitehouse <jesse.whitehouse@databricks.com>
1 parent 58ecda3 commit ef72d74

File tree

11 files changed

+483
-357
lines changed

11 files changed

+483
-357
lines changed

examples/sqlalchemy.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@
3939
- Constraints: with the addition of information_schema to Unity Catalog, Databricks SQL supports
4040
foreign key and primary key constraints. This dialect can write these constraints but the ability
4141
for alembic to reflect and modify them programmatically has not been tested.
42-
- Delta IDENTITY columns are not yet supported.
4342
"""
4443

4544
import os

src/databricks/sql/utils.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -534,9 +534,15 @@ def named_parameters_to_dbsqlparams_v2(parameters: List[Any]):
534534

535535

536536
def resolve_databricks_sql_integer_type(integer):
537-
"""Returns the smallest Databricks SQL integer type that can contain the passed integer"""
537+
"""Returns DbsqlType.INTEGER unless the passed int() requires a BIGINT.
538+
539+
Note: TINYINT is never inferred here because it is a rarely used type and clauses like LIMIT and OFFSET
540+
cannot accept TINYINT bound parameter values. If you need to bind a TINYINT value, you can explicitly
541+
declare its type in a DbsqlParameter object, which will bypass this inference logic."""
538542
if -128 <= integer <= 127:
539-
return DbSqlType.TINYINT
543+
# If DBR is ever updated to permit TINYINT values passed to LIMIT and OFFSET
544+
# then we can change this line to return DbSqlType.TINYINT
545+
return DbSqlType.INTEGER
540546
elif -2147483648 <= integer <= 2147483647:
541547
return DbSqlType.INTEGER
542548
else:

src/databricks/sqlalchemy/__init__.py

Lines changed: 109 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,23 @@
1-
"""This module's layout loosely follows example of SQLAlchemy's postgres dialect
2-
"""
3-
4-
import decimal, re, datetime
5-
from dateutil.parser import parse
1+
import re
2+
from typing import Any, Optional
63

74
import sqlalchemy
8-
from sqlalchemy import types, event
9-
from sqlalchemy.engine import default, Engine
5+
from sqlalchemy import event
6+
from sqlalchemy.engine import Engine, default, reflection
7+
from sqlalchemy.engine.interfaces import (
8+
ReflectedForeignKeyConstraint,
9+
ReflectedPrimaryKeyConstraint,
10+
)
1011
from sqlalchemy.exc import DatabaseError, SQLAlchemyError
11-
from sqlalchemy.engine import reflection
1212

13-
from databricks import sql
13+
import databricks.sqlalchemy._ddl as dialect_ddl_impl
1414

1515
# This import is required to process our @compiles decorators
1616
import databricks.sqlalchemy._types as dialect_type_impl
17-
18-
19-
from databricks.sqlalchemy.base import (
20-
DatabricksDDLCompiler,
21-
DatabricksIdentifierPreparer,
17+
from databricks import sql
18+
from databricks.sqlalchemy.utils import (
19+
extract_identifier_groups_from_string,
20+
extract_identifiers_from_string,
2221
)
2322

2423
try:
@@ -39,13 +38,16 @@ class DatabricksDialect(default.DefaultDialect):
3938
name: str = "databricks"
4039
driver: str = "databricks"
4140
default_schema_name: str = "default"
42-
preparer = DatabricksIdentifierPreparer # type: ignore
43-
ddl_compiler = DatabricksDDLCompiler
41+
preparer = dialect_ddl_impl.DatabricksIdentifierPreparer # type: ignore
42+
ddl_compiler = dialect_ddl_impl.DatabricksDDLCompiler
43+
statement_compiler = dialect_ddl_impl.DatabricksStatementCompiler
4444
supports_statement_cache: bool = True
4545
supports_multivalues_insert: bool = True
4646
supports_native_decimal: bool = True
4747
supports_sane_rowcount: bool = False
4848
non_native_boolean_check_constraint: bool = False
49+
supports_identity_columns: bool = True
50+
supports_schemas: bool = True
4951
paramstyle: str = "named"
5052

5153
colspecs = {
@@ -149,25 +151,43 @@ def get_columns(self, connection, table_name, schema=None, **kwargs):
149151

150152
return columns
151153

152-
def get_pk_constraint(self, connection, table_name, schema=None, **kw):
154+
@reflection.cache
155+
def get_pk_constraint(
156+
self,
157+
connection,
158+
table_name: str,
159+
schema: Optional[str] = None,
160+
**kw: Any,
161+
) -> ReflectedPrimaryKeyConstraint:
153162
"""Return information about the primary key constraint on
154163
table_name`.
155-
156-
Given a :class:`_engine.Connection`, a string
157-
`table_name`, and an optional string `schema`, return primary
158-
key information as a dictionary with these keys:
159-
160-
constrained_columns
161-
a list of column names that make up the primary key
162-
163-
name
164-
optional name of the primary key constraint.
165-
166164
"""
167-
# TODO: implement this behaviour
168-
return {"constrained_columns": []}
169165

170-
def get_foreign_keys(self, connection, table_name, schema=None, **kw):
166+
with self.get_connection_cursor(connection) as cursor:
167+
# DESCRIBE TABLE EXTENDED doesn't support parameterised inputs :(
168+
result = cursor.execute(f"DESCRIBE TABLE EXTENDED {table_name}").fetchall()
169+
170+
# DESCRIBE TABLE EXTENDED doesn't give a deterministic name to the field where
171+
# a primary key constraint will be found in its output. So we cycle through its
172+
# output looking for a match that includes "PRIMARY KEY". This is brittle. We
173+
# could optionally make two roundtrips: the first would query information_schema
174+
# for the name of the primary key constraint on this table, and a second to
175+
# DESCRIBE TABLE EXTENDED, at which point we would know the name of the constraint.
176+
# But for now we instead assume that Python list comprehension is faster than a
177+
# network roundtrip.
178+
dte_dict = {row["col_name"]: row["data_type"] for row in result}
179+
target = [(k, v) for k, v in dte_dict.items() if "PRIMARY KEY" in v]
180+
if target:
181+
name, _constraint_string = target[0]
182+
column_list = extract_identifiers_from_string(_constraint_string)
183+
else:
184+
name, column_list = None, None
185+
186+
return {"constrained_columns": column_list, "name": name}
187+
188+
def get_foreign_keys(
189+
self, connection, table_name, schema=None, **kw
190+
) -> ReflectedForeignKeyConstraint:
171191
"""Return information about foreign_keys in `table_name`.
172192
173193
Given a :class:`_engine.Connection`, a string
@@ -190,8 +210,60 @@ def get_foreign_keys(self, connection, table_name, schema=None, **kw):
190210
a list of column names in the referred table that correspond to
191211
constrained_columns
192212
"""
193-
# TODO: Implement this behaviour
194-
return []
213+
"""Return information about the primary key constraint on
214+
table_name`.
215+
"""
216+
217+
with self.get_connection_cursor(connection) as cursor:
218+
# DESCRIBE TABLE EXTENDED doesn't support parameterised inputs :(
219+
result = cursor.execute(
220+
f"DESCRIBE TABLE EXTENDED {schema + '.' if schema else ''}{table_name}"
221+
).fetchall()
222+
223+
# DESCRIBE TABLE EXTENDED doesn't give a deterministic name to the field where
224+
# a foreign key constraint will be found in its output. So we cycle through its
225+
# output looking for a match that includes "FOREIGN KEY". This is brittle. We
226+
# could optionally make two roundtrips: the first would query information_schema
227+
# for the name of the foreign key constraint on this table, and a second to
228+
# DESCRIBE TABLE EXTENDED, at which point we would know the name of the constraint.
229+
# But for now we instead assume that Python list comprehension is faster than a
230+
# network roundtrip.
231+
dte_dict = {row["col_name"]: row["data_type"] for row in result}
232+
target = [(k, v) for k, v in dte_dict.items() if "FOREIGN KEY" in v]
233+
234+
def extract_constraint_dict_from_target(target):
235+
if target:
236+
name, _constraint_string = target
237+
_extracted = extract_identifier_groups_from_string(_constraint_string)
238+
constrained_columns_str, referred_columns_str = (
239+
_extracted[0],
240+
_extracted[1],
241+
)
242+
243+
constrained_columns = extract_identifiers_from_string(
244+
constrained_columns_str
245+
)
246+
referred_columns = extract_identifiers_from_string(referred_columns_str)
247+
referred_table = str(table_name)
248+
else:
249+
name, constrained_columns, referred_columns, referred_table = (
250+
None,
251+
None,
252+
None,
253+
None,
254+
)
255+
256+
return {
257+
"constrained_columns": constrained_columns,
258+
"name": name,
259+
"referred_table": referred_table,
260+
"referred_columns": referred_columns,
261+
}
262+
263+
if target:
264+
return [extract_constraint_dict_from_target(i) for i in target]
265+
else:
266+
return []
195267

196268
def get_indexes(self, connection, table_name, schema=None, **kw):
197269
"""Return information about indexes in `table_name`.
@@ -238,6 +310,7 @@ def do_rollback(self, dbapi_connection):
238310
# Databricks SQL Does not support transactions
239311
pass
240312

313+
@reflection.cache
241314
def has_table(
242315
self, connection, table_name, schema=None, catalog=None, **kwargs
243316
) -> bool:
@@ -252,7 +325,9 @@ def has_table(
252325

253326
try:
254327
res = connection.execute(
255-
sqlalchemy.text(f"DESCRIBE TABLE {_catalog}.{_schema}.{table_name}")
328+
sqlalchemy.text(
329+
f"DESCRIBE TABLE `{_catalog}`.`{_schema}`.`{table_name}`"
330+
)
256331
)
257332
return True
258333
except DatabaseError as e:

src/databricks/sqlalchemy/_ddl.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
import re
2+
from sqlalchemy.sql import compiler
3+
import logging
4+
5+
logger = logging.getLogger(__name__)
6+
7+
8+
class DatabricksIdentifierPreparer(compiler.IdentifierPreparer):
9+
"""https://docs.databricks.com/en/sql/language-manual/sql-ref-identifiers.html"""
10+
11+
legal_characters = re.compile(r"^[A-Z0-9_]+$", re.I)
12+
13+
def __init__(self, dialect):
14+
super().__init__(dialect, initial_quote="`")
15+
16+
17+
class DatabricksDDLCompiler(compiler.DDLCompiler):
18+
def post_create_table(self, table):
19+
return " USING DELTA"
20+
21+
def visit_unique_constraint(self, constraint, **kw):
22+
logger.warn("Databricks does not support unique constraints")
23+
pass
24+
25+
def visit_check_constraint(self, constraint, **kw):
26+
logger.warn("Databricks does not support check constraints")
27+
pass
28+
29+
def visit_identity_column(self, identity, **kw):
30+
"""When configuring an Identity() with Databricks, only the always option is supported.
31+
All other options are ignored.
32+
33+
Note: IDENTITY columns must always be defined as BIGINT. An exception will be raised if INT is used.
34+
35+
https://www.databricks.com/blog/2022/08/08/identity-columns-to-generate-surrogate-keys-are-now-available-in-a-lakehouse-near-you.html
36+
"""
37+
text = "GENERATED %s AS IDENTITY" % (
38+
"ALWAYS" if identity.always else "BY DEFAULT",
39+
)
40+
return text
41+
42+
def get_column_specification(self, column, **kwargs):
43+
"""Currently we override this method only to emit a log message if a user attempts to set
44+
autoincrement=True on a column. See comments in test_suite.py. We may implement implicit
45+
IDENTITY using this feature in the future, similar to the Microsoft SQL Server dialect.
46+
"""
47+
if column is column.table._autoincrement_column or column.autoincrement is True:
48+
logger.warn(
49+
"Databricks dialect ignores SQLAlchemy's autoincrement semantics. Use explicit Identity() instead."
50+
)
51+
52+
return super().get_column_specification(column, **kwargs)
53+
54+
55+
class DatabricksStatementCompiler(compiler.SQLCompiler):
56+
def limit_clause(self, select, **kw):
57+
"""Identical to the default implementation of SQLCompiler.limit_clause except it writes LIMIT ALL instead of LIMIT -1,
58+
since Databricks SQL doesn't support the latter.
59+
60+
https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-limit.html
61+
"""
62+
text = ""
63+
if select._limit_clause is not None:
64+
text += "\n LIMIT " + self.process(select._limit_clause, **kw)
65+
if select._offset_clause is not None:
66+
if select._limit_clause is None:
67+
text += "\n LIMIT ALL"
68+
text += " OFFSET " + self.process(select._offset_clause, **kw)
69+
return text

src/databricks/sqlalchemy/base.py

Lines changed: 0 additions & 17 deletions
This file was deleted.

src/databricks/sqlalchemy/pytest.ini

Whitespace-only changes.

src/databricks/sqlalchemy/requirements.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
in test_suite.py with a Databricks-specific reason.
1010
1111
See the special note about the array_type exclusion below.
12+
See special note about has_temp_table exclusion below.
1213
"""
1314

1415
import sqlalchemy.testing.requirements
@@ -93,4 +94,53 @@ def array_type(self):
9394
test runner will crash the pytest process due to an AttributeError
9495
"""
9596

97+
# TODO: Implement array type using inline?
9698
return sqlalchemy.testing.exclusions.closed()
99+
100+
@property
101+
def table_ddl_if_exists(self):
102+
"""target platform supports IF NOT EXISTS / IF EXISTS for tables."""
103+
104+
return sqlalchemy.testing.exclusions.open()
105+
106+
@property
107+
def identity_columns(self):
108+
"""If a backend supports GENERATED { ALWAYS | BY DEFAULT }
109+
AS IDENTITY"""
110+
return sqlalchemy.testing.exclusions.open()
111+
112+
@property
113+
def identity_columns_standard(self):
114+
"""If a backend supports GENERATED { ALWAYS | BY DEFAULT }
115+
AS IDENTITY with a standard syntax.
116+
This is mainly to exclude MSSql.
117+
"""
118+
return sqlalchemy.testing.exclusions.open()
119+
120+
@property
121+
def has_temp_table(self):
122+
"""target dialect supports checking a single temp table name
123+
124+
unfortunately this is not the same as temp_table_names
125+
126+
SQLAlchemy's HasTableTest is not normalised in such a way that temp table tests
127+
are separate from temp view and normal table tests. If those tests were split out,
128+
we would just add detailed skip markers in test_suite.py. But since we'd like to
129+
run the HasTableTest group for the features we support, we must set this exclusinon
130+
to closed().
131+
132+
It would be ideal if there were a separate requirement for has_temp_view. Without it,
133+
we're in a bind.
134+
"""
135+
return sqlalchemy.testing.exclusions.closed()
136+
137+
@property
138+
def temporary_views(self):
139+
"""target database supports temporary views"""
140+
return sqlalchemy.testing.exclusions.open()
141+
142+
@property
143+
def views(self):
144+
"""Target database must support VIEWs."""
145+
146+
return sqlalchemy.testing.exclusions.open()

0 commit comments

Comments
 (0)