-
Notifications
You must be signed in to change notification settings - Fork 413
maint: common catalog integration test suite #2090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maint: common catalog integration test suite #2090
Conversation
|
@kevinjqliu @Fokko thoughts here? I think would be pretty light work to set up a common test suite for all catalogs Some immediate open questions:
|
|
Good call @jayceslesar 🙌
I'm not sure if I understand the question correctly, but let me give it a try. I'd prefer to avoid mock and rely on the integration tests that we have with the rest and hive image. The rest container is updated daily from the Java repository, and will let us know if something incompatible changes. |
Amazing -- I will take a stab at consolidating the testing of all main catalog behavior in an integration test |
|
Ok check out the latest commit. All integration tests. Looks like unable to run locally because im missing Edit: Also the aws creds but I trust that this is all configured in CI and I just can't see it which is fine, so if we are happy with the little setup I have I can expand it to cover all common behavior |
|
@kevinjqliu can you please trigger tests again? Just want to see everything hopefully work |
|
oops, im late to the party.
+1 for these integration tests, it would be great to use the actual catalog implementations instead of mocks.
thats an implementation detail :)
i think we can start with |
|
@kevinjqliu I have implemented that change! Also, this testing immediately exposes some inconsistencies with the rest catalog...
These both seem like issues with the backend rest catalog and not pyiceberg at first glance |
thats what we want to see :) lets check in this general test suite and then add more tests to fix these specific issues |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! super excited for this.
|
Also just mostly fixed hive tests I believe?? Super unsure what the two errors I am still getting locally are Happy to exclude hive from that test but we should probably see if its fixable Edit: It looks like updating namespace properties (removing properties) on a hive catalog corrupts the namespace??? |
|
Fixed the issue with removing properties from hive catalog! Looks like that has existed since apache/iceberg#5391 All of these new tests are passing except for the two rest catalog issues mentioned above |
|
Yeah the response from the rest catalog is a 400 and it should be a 404 https://github.com/apache/iceberg/blob/77f1f5ba47f0e5250b8af4e16cbe6febc8244ee5/open-api/rest-catalog-open-api.yaml#L580 |
tests/conftest.py
Outdated
| try: | ||
| test_catalog.purge_table(identifier) | ||
| except Exception as e: | ||
| if drop_if_cannot_purge: | ||
| test_catalog.drop_table(identifier) | ||
| else: | ||
| raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just default to drop_table here? and worry about purge_table later. its not yet implemented in pyiceberg's hive client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I will modify to drop here instead of purge -- should be fine as integration tests are transient? I dont think we will end up writing dangling data right?
|
okay actually im pretty confused because running the following for the first time curl -X POST http://localhost:8181/v1/namespaces/new_uncreated_namespace/tables \
-H "Content-Type: application/json" \
-d '{
"name": "some_table",
"schema": {
"type": "struct",
"fields": [
{"id": 1, "name": "foo", "type": "string", "required": false},
{"id": 2, "name": "bar", "type": "int", "required": true},
{"id": 3, "name": "baz", "type": "boolean", "required": false},
{"id": 4, "name": "qux", "type": {"type": "list", "element-id": 8, "element": "string", "element-required": true}, "required": true},
{"id": 5, "name": "quux", "type": {
"type": "map", "key-id": 9, "key": "string", "value-id": 10,
"value": {
"type": "map", "key-id": 11, "key": "string", "value-id": 12,
"value": "int", "value-required": true
},
"value-required": true
}, "required": true},
{"id": 6, "name": "location", "type": {
"type": "list", "element-id": 13,
"element": {
"type": "struct",
"fields": [
{"id": 14, "name": "latitude", "type": "float", "required": false},
{"id": 15, "name": "longitude", "type": "float", "required": false}
]
},
"element-required": true
}, "required": true},
{"id": 7, "name": "person", "type": {
"type": "struct",
"fields": [
{"id": 16, "name": "name", "type": "string", "required": false},
{"id": 17, "name": "age", "type": "int", "required": true}
]
}, "required": false}
],
"schema-id": 0,
"identifier-field-ids": [2]
},
"partition-spec": {
"spec-id": 0,
"fields": []
},
"write-order": {
"order-id": 0,
"fields": []
},
"stage-create": false,
"properties": {}
}'results in the namespace and table being created?? Running rest server via |
|
Looks like we found another inconsistent behavior in the catalogs. This ones in the We can align the behavior by adding this to the we might also want to make the same change in upstream https://github.com/kevinjqliu/iceberg/blob/master/docker/iceberg-rest-fixture/Dockerfile |
|
Thank you for finding and explaining! I think I will make a PR upstream |
|
We can fix it here first to unblock this PR |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great @jayceslesar, it already caught some bugs :)
tests/integration/test_catalog.py
Outdated
|
|
||
|
|
||
| @pytest.mark.integration | ||
| @pytest.mark.parametrize( | ||
| "test_catalog", | ||
| [ | ||
| pytest.lazy_fixture("memory_catalog"), | ||
| pytest.lazy_fixture("sqlite_catalog_memory"), | ||
| pytest.lazy_fixture("sqlite_catalog_file"), | ||
| pytest.lazy_fixture("rest_catalog"), | ||
| pytest.lazy_fixture("hive_catalog"), | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this to a common variable instead of repeating this for each of the tests?
| @pytest.mark.integration | |
| @pytest.mark.parametrize( | |
| "test_catalog", | |
| [ | |
| pytest.lazy_fixture("memory_catalog"), | |
| pytest.lazy_fixture("sqlite_catalog_memory"), | |
| pytest.lazy_fixture("sqlite_catalog_file"), | |
| pytest.lazy_fixture("rest_catalog"), | |
| pytest.lazy_fixture("hive_catalog"), | |
| ], | |
| ) | |
| CATALOGS = [ | |
| pytest.lazy_fixture("memory_catalog"), | |
| pytest.lazy_fixture("sqlite_catalog_memory"), | |
| pytest.lazy_fixture("sqlite_catalog_file"), | |
| pytest.lazy_fixture("rest_catalog"), | |
| pytest.lazy_fixture("hive_catalog"), | |
| ] | |
| @pytest.mark.integration | |
| @pytest.mark.parametrize("test_catalog", CATALOGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, added
is also being fixed upstream in apache/iceberg#13599 |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! We caught 1 bug in the pyiceberg hive client and another in the pyiceberg rest client. And a bug in the apache/iceberg-rest-fixture!
This is going to be really useful. Looking forward to extending this with more tests and more catalogs.
eca9870#diff-7f3dd1244d08ce27c003cd091da10aa049f7bb0c7d5397acb4ec69767036accdR1302 i dont think |
|
Thanks @jayceslesar for the PR and @Fokko for the review |
In pursuit of apache#813 --------- Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
In pursuit of #813