-
Notifications
You must be signed in to change notification settings - Fork 8
Change ld_list according to #439 #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change ld_list according to #439 #445
Conversation
sdruskat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the syntactic changes - which I personally would roll back, but that's preference - I'd really like to get another pair of eyes on this as I'm currently struggling to reload all the context to understand all changes.
@led02, would you be able to have a look at this?
src/hermes/model/types/__init__.py
Outdated
| (ld_list.is_ld_list, {"ld_container": ld_list}), | ||
| (lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}), | ||
|
|
||
| # pythonize items from lists (expanded set is already handled above) | ||
| (ld_container.is_json_id, dict(python=lambda c, **_: c["@id"])), | ||
| (ld_container.is_typed_json_value, dict(python=ld_container.typed_ld_to_py)), | ||
| (ld_container.is_json_value, dict(python=lambda c, **_: c["@value"])), | ||
| (ld_list.is_container, dict(ld_container=lambda c, **kw: ld_list([c], **kw))), | ||
| (ld_dict.is_json_dict, dict(ld_container=lambda c, **kw: ld_dict([c], **kw))), | ||
| (ld_container.is_json_id, {"python": lambda c, **_: c["@id"]}), | ||
| (ld_container.is_typed_json_value, {"python": lambda c, **kw: ld_container.typed_ld_to_py([c], **kw)}), | ||
| (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}), | ||
| (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}), | ||
| (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}), | ||
| (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}), | ||
|
|
||
| # Convert internal data types to expanded_json | ||
| (lambda c: ld_container.is_json_id(c) or ld_container.is_json_value(c), dict(expanded_json=lambda c, **_: [c])), | ||
| (ld_dict.is_json_dict, dict(expanded_json=lambda c, **kw: ld_dict.from_dict(c, **kw).ld_value)), | ||
| (ld_dict.is_ld_dict, dict(expanded_json=lambda c, **kw: ld_dict.from_dict(c[0], **kw).ld_value)), | ||
| (lambda c: ld_container.is_json_id(c), {"expanded_json": lambda c, **_: c}), | ||
| (lambda c: ld_container.is_ld_id(c), {"expanded_json": lambda c, **_: c[0]}), | ||
| (lambda c: ld_container.is_json_value(c), {"expanded_json": lambda c, **_: [c]}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change in syntax? Why not keep using the dict constructor which - IMHO - reads cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I've tried both and liked None of them (with the dict(...) variant being a bit more readable ... but that's subjective).
If there is a refactoring going on here, do it properly (i.e., providing some class that encapuslates these rules and maybe also allow for better configurability, extensibility, documentabilty, ...). I would really appreciate this, making the connection more explicit without adding to much complexity.
Also note that the introduction of a rule class or similar is most probably required anyways when it comes to configuring the process step.
led02
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restricting the container types of ld_list will certainly break functionality. I fully understand the discussion about the unclear class name and would advise to just rename the class instead of removing alternate container types (e.g., the provenance data will be collected in a graph; I've come across example data that also uses @set).
When changeging the rules, don't stop at syntactic sugar but go the full way for the refactoring (which would be appreciated ;) )
Except for that, this basically looks good but the ordering of the rules took me a bit of fine-tuning back then, hence if there are changes to the filters or order, I would be a good idea to document the reasons for this in the code (even though I did not do this before... I'm not a good example).
The other thing is that developers need to be aware that those classes (in contrast to their counterparts in the merging package), are pure facade - i.e., changes to the contents must be directly reflected in the original expanded JSON-LD. If data is read from these classes, it should be returned in a way that it also delays the actual data extraction to the latest time possible... lazy evaluation. (It is clear that e.g., an integer cannot reflect changes and most probably should not... but this behaviour needs to be clear to the users, too.)
Hence, I would propose adding respective test cases similar to my suggestion in test_list_basics that check for identity and ensure the underlying data structure behaves as expected.
src/hermes/model/types/__init__.py
Outdated
| dict(ld_container=lambda c, **kw: ld_list([{"@list": c}], **kw)) | ||
| ), | ||
| (ld_list.is_ld_list, {"ld_container": ld_list}), | ||
| (lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The side condition is important to my understanding. This rule should invoked upon an expanded JSON-LD list. In fact, the rule should be even more strict ensuring the items in the list are not only dicts but valid JSON-LD values.
Or am I getting something wrong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When getting a value from an ld_dict it should return an ld_list always. Therefor it is important to know wheter a given value comes from an ld_list or an ld_dict. If the value, where expanded JSON-LD is expected, is not inside a list it previously was inside a list.
I removed the condition because when getting "@type" from {"@type": ["A"]} the value in the list is a string and not a dict. (I found no cases where this goes wrong because everything is valid expanded JSON-LD.)
We should know that the values are valid expanded JSON-LD here because this conversion happens only when converting to the pythonized version of a value that was previously expanded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see... that seems to be related to the removal of the explcit handling of @type in the conversion.
Well, that's fine if you can deal with the implications...
{"@type": "foo"}(which is valid) will always be{"@type": ["foo"]}when converted usingto_python- The value must not be wrapped in an
@listcontainer or individually wrapped in some@valueor@idnode. Yet, the values are subject to IRI expansion / compaction.
| (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}), | ||
| (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}), | ||
| (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}), | ||
| (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule is to generic IMHO, as it will try to compact all strings that are encountered, even those that are explicitly not meant as representing an IRI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't all other string should be known to be a value of some sort i.e. formed like {"@value": _} and therefor filtered by is_json_value because we get valid JSON-LD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have answers... but I'm totally not sure whether this is really true.
Apart from this, except for @type values, all expansible values should be wrapped in @id nodes... maybe this is a good point to check whether this assumption is correct. The other question is, which kind of expansion is required (i.e., vocab=True or not... see discussion about prefix, namespace, IRIs from the past)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... anyways, there should most proably be a documentation for the conversion rules and the order they are applied in (maybe directly generated from the code... ;) )
src/hermes/model/types/__init__.py
Outdated
| (lambda c: ld_container.is_ld_id(c), {"expanded_json": lambda c, **_: c[0]}), | ||
| (lambda c: ld_container.is_json_value(c), {"expanded_json": lambda c, **_: [c]}), | ||
| (lambda c: ld_container.is_ld_value(c), {"expanded_json": lambda c, **_: c}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the callback wrapped in an additional lambda c here?
However, this seems like a reasonable and important change... maybe an additinal hint in the comment about the side effects of the ordering of especially these rules might be helpful. (I'll also have look into the commit history to better understand the reasons behind this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. The additional lambda c can be removed.
I will add a few comments to all conversions, probably monday.
| elif full_iri == "@type": | ||
| value = [ | ||
| self.ld_proc.compact_iri(self.active_ctx, ld_type) | ||
| for ld_type in ld_value | ||
| ] | ||
| if len(value) == 1: | ||
| value = value[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what by my guts say this will break stuff in an unexpected way. Not sure why (and hence there should be better documentation) but I remember removing and adding this section over and over again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see hint above... this will break conversion of single type classes which is valid but which was already critizised towards me (or maybe there were even real errors... but I doubt it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found anything it breaks yet, e.g. no failing tests, but it took some effort to get this running.
src/hermes/model/types/ld_list.py
Outdated
| if not (self.is_ld_list(data) and "@list" in data[0]): | ||
| raise ValueError("The given data does not represent a ld_list.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments on the issue: Don't restrict it to @list here but better rename the class itself.
And please, bring back the docstring!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't about the name of the class but rather that it didn't work for @set's as their expanded version does not contain @set. Additionally inserting one collection into another has to be treated differently depending on what container type they have and every (python) list could be considered a set which makes all the checks wheter something is a list or not useless and the type conversions even more complicated. But that will be discussed in the issue.
Will do.
src/hermes/model/types/ld_list.py
Outdated
| # Moreover a set inside a list gets automaticaly converted to a list when expanded) | ||
|
|
||
| # FIXME: #439 what happens when a ld_list is put inside another also depends on their container types | ||
| # set your_ld_list[index] to the list [{"@type": "foo", "name": "bar"}] given in non expanded form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it should be considered whether this operation should be allowed in this context. However, in general, it should behave as expected (i.e., set the value to what is passed). What it clearly must not do is setting something in the data property that is not valid expanded JSON. People who really want to do this still can directly access item_list or data...
For this concrete question I see two valid solutions: Do handling as to what is easiest implementable (and document) or if in doubt, raise an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have decided that (for now) it should be treated as if it was meant to be a list containing only one element. Because why would the user put the dict inside a list if not to add other items.
Setting the value to what is passes is not really possible... Either it is added to a new dict containing the passed value as a value of @list because a python list is considered an @list or the outer list of the passed value is removed. (In both cases the values are expanded of course.)
| li = ld_list([{"@list": [0]}]) | ||
| assert li._data == [{"@list": [0]}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| li = ld_list([{"@list": [0]}]) | |
| assert li._data == [{"@list": [0]}] | |
| li_data = [{"@list": [0]}] | |
| li = ld_list(li_data) | |
| assert li._data is li_data |
…ded tests for them
led02
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reaction to answers... thanks for the vivid discussion :)
src/hermes/model/types/__init__.py
Outdated
| dict(ld_container=lambda c, **kw: ld_list([{"@list": c}], **kw)) | ||
| ), | ||
| (ld_list.is_ld_list, {"ld_container": ld_list}), | ||
| (lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see... that seems to be related to the removal of the explcit handling of @type in the conversion.
Well, that's fine if you can deal with the implications...
{"@type": "foo"}(which is valid) will always be{"@type": ["foo"]}when converted usingto_python- The value must not be wrapped in an
@listcontainer or individually wrapped in some@valueor@idnode. Yet, the values are subject to IRI expansion / compaction.
| (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}), | ||
| (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}), | ||
| (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}), | ||
| (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have answers... but I'm totally not sure whether this is really true.
Apart from this, except for @type values, all expansible values should be wrapped in @id nodes... maybe this is a good point to check whether this assumption is correct. The other question is, which kind of expansion is required (i.e., vocab=True or not... see discussion about prefix, namespace, IRIs from the past)
| (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}), | ||
| (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}), | ||
| (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}), | ||
| (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... anyways, there should most proably be a documentation for the conversion rules and the order they are applied in (maybe directly generated from the code... ;) )
| elif full_iri == "@type": | ||
| value = [ | ||
| self.ld_proc.compact_iri(self.active_ctx, ld_type) | ||
| for ld_type in ld_value | ||
| ] | ||
| if len(value) == 1: | ||
| value = value[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see hint above... this will break conversion of single type classes which is valid but which was already critizised towards me (or maybe there were even real errors... but I doubt it)
|
Thanks for your input so far @led02. P.S.: Creating the documentation for all this will take some time. |
| # while searching build a path such that it leads from the found ld_dicts ld_value to selfs data_dict/ item_list | ||
| parent = self | ||
| path = [] | ||
| while parent.__class__.__name__ != "ld_dict": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An if statement would be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure.
It could work and I see no real reason why it shouldn't but I preferred to use an actually existent ld_dict and not a dummy construct.
What doing it like I implemented would allow us to do though is that we could enforce that every ld_container has an index xor a key (which then actually is the value used the get this object in the parent container).
| # if neither self nor any of its parents is a ld_dict: | ||
| # create a dict with the key of the outer most parent of self and this parents ld_value as a value | ||
| # this dict is stored in an ld_container and simulates the most minimal JSON-LD object possible | ||
| if parent.__class__.__name__ != "ld_dict": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you could append this above
| special_types = (list, dict, ld_container, datetime, date, time) | ||
| while True: | ||
| # check if ready | ||
| if len(key_and_reference_todo_list) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could bring that into the while condition
| # expand value | ||
| expanded_value = self._to_expanded_json([value]) | ||
| # empty list -> no value to check | ||
| if len(expanded_value) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that way with normal contains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It made sense to me that because getting an empty list back indicates that whatever object has been expanded inside self would become nothing - meaning that appending it to self would leave self unchanged - it is contained inside self.
#439 is not yet completely discussed/ the propositions not approved.
This PR contains the changes that seem plausible as of now.