Skip to content

Conversation

@notactuallyfinn
Copy link
Collaborator

#439 is not yet completely discussed/ the propositions not approved.
This PR contains the changes that seem plausible as of now.

@notactuallyfinn notactuallyfinn added the data model Related to the hermes data model label Oct 17, 2025
Copy link
Contributor

@sdruskat sdruskat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the syntactic changes - which I personally would roll back, but that's preference - I'd really like to get another pair of eyes on this as I'm currently struggling to reload all the context to understand all changes.

@led02, would you be able to have a look at this?

Comment on lines 28 to 42
(ld_list.is_ld_list, {"ld_container": ld_list}),
(lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}),

# pythonize items from lists (expanded set is already handled above)
(ld_container.is_json_id, dict(python=lambda c, **_: c["@id"])),
(ld_container.is_typed_json_value, dict(python=ld_container.typed_ld_to_py)),
(ld_container.is_json_value, dict(python=lambda c, **_: c["@value"])),
(ld_list.is_container, dict(ld_container=lambda c, **kw: ld_list([c], **kw))),
(ld_dict.is_json_dict, dict(ld_container=lambda c, **kw: ld_dict([c], **kw))),
(ld_container.is_json_id, {"python": lambda c, **_: c["@id"]}),
(ld_container.is_typed_json_value, {"python": lambda c, **kw: ld_container.typed_ld_to_py([c], **kw)}),
(ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
(ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
(ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
(lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),

# Convert internal data types to expanded_json
(lambda c: ld_container.is_json_id(c) or ld_container.is_json_value(c), dict(expanded_json=lambda c, **_: [c])),
(ld_dict.is_json_dict, dict(expanded_json=lambda c, **kw: ld_dict.from_dict(c, **kw).ld_value)),
(ld_dict.is_ld_dict, dict(expanded_json=lambda c, **kw: ld_dict.from_dict(c[0], **kw).ld_value)),
(lambda c: ld_container.is_json_id(c), {"expanded_json": lambda c, **_: c}),
(lambda c: ld_container.is_ld_id(c), {"expanded_json": lambda c, **_: c[0]}),
(lambda c: ld_container.is_json_value(c), {"expanded_json": lambda c, **_: [c]}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change in syntax? Why not keep using the dict constructor which - IMHO - reads cleaner?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I've tried both and liked None of them (with the dict(...) variant being a bit more readable ... but that's subjective).

If there is a refactoring going on here, do it properly (i.e., providing some class that encapuslates these rules and maybe also allow for better configurability, extensibility, documentabilty, ...). I would really appreciate this, making the connection more explicit without adding to much complexity.

Also note that the introduction of a rule class or similar is most probably required anyways when it comes to configuring the process step.

led02
led02 previously requested changes Nov 7, 2025
Copy link
Member

@led02 led02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restricting the container types of ld_list will certainly break functionality. I fully understand the discussion about the unclear class name and would advise to just rename the class instead of removing alternate container types (e.g., the provenance data will be collected in a graph; I've come across example data that also uses @set).

When changeging the rules, don't stop at syntactic sugar but go the full way for the refactoring (which would be appreciated ;) )

Except for that, this basically looks good but the ordering of the rules took me a bit of fine-tuning back then, hence if there are changes to the filters or order, I would be a good idea to document the reasons for this in the code (even though I did not do this before... I'm not a good example).

The other thing is that developers need to be aware that those classes (in contrast to their counterparts in the merging package), are pure facade - i.e., changes to the contents must be directly reflected in the original expanded JSON-LD. If data is read from these classes, it should be returned in a way that it also delays the actual data extraction to the latest time possible... lazy evaluation. (It is clear that e.g., an integer cannot reflect changes and most probably should not... but this behaviour needs to be clear to the users, too.)
Hence, I would propose adding respective test cases similar to my suggestion in test_list_basics that check for identity and ensure the underlying data structure behaves as expected.

dict(ld_container=lambda c, **kw: ld_list([{"@list": c}], **kw))
),
(ld_list.is_ld_list, {"ld_container": ld_list}),
(lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The side condition is important to my understanding. This rule should invoked upon an expanded JSON-LD list. In fact, the rule should be even more strict ensuring the items in the list are not only dicts but valid JSON-LD values.

Or am I getting something wrong here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When getting a value from an ld_dict it should return an ld_list always. Therefor it is important to know wheter a given value comes from an ld_list or an ld_dict. If the value, where expanded JSON-LD is expected, is not inside a list it previously was inside a list.
I removed the condition because when getting "@type" from {"@type": ["A"]} the value in the list is a string and not a dict. (I found no cases where this goes wrong because everything is valid expanded JSON-LD.)

We should know that the values are valid expanded JSON-LD here because this conversion happens only when converting to the pythonized version of a value that was previously expanded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see... that seems to be related to the removal of the explcit handling of @type in the conversion.

Well, that's fine if you can deal with the implications...

  • {"@type": "foo"} (which is valid) will always be {"@type": ["foo"]} when converted using to_python
  • The value must not be wrapped in an @list container or individually wrapped in some @value or @id node. Yet, the values are subject to IRI expansion / compaction.

(ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
(ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
(ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
(lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is to generic IMHO, as it will try to compact all strings that are encountered, even those that are explicitly not meant as representing an IRI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't all other string should be known to be a value of some sort i.e. formed like {"@value": _} and therefor filtered by is_json_value because we get valid JSON-LD?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have answers... but I'm totally not sure whether this is really true.

Apart from this, except for @type values, all expansible values should be wrapped in @id nodes... maybe this is a good point to check whether this assumption is correct. The other question is, which kind of expansion is required (i.e., vocab=True or not... see discussion about prefix, namespace, IRIs from the past)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... anyways, there should most proably be a documentation for the conversion rules and the order they are applied in (maybe directly generated from the code... ;) )

Comment on lines 41 to 43
(lambda c: ld_container.is_ld_id(c), {"expanded_json": lambda c, **_: c[0]}),
(lambda c: ld_container.is_json_value(c), {"expanded_json": lambda c, **_: [c]}),
(lambda c: ld_container.is_ld_value(c), {"expanded_json": lambda c, **_: c}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the callback wrapped in an additional lambda c here?

However, this seems like a reasonable and important change... maybe an additinal hint in the comment about the side effects of the ordering of especially these rules might be helpful. (I'll also have look into the commit history to better understand the reasons behind this).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. The additional lambda c can be removed.
I will add a few comments to all conversions, probably monday.

Comment on lines -86 to -92
elif full_iri == "@type":
value = [
self.ld_proc.compact_iri(self.active_ctx, ld_type)
for ld_type in ld_value
]
if len(value) == 1:
value = value[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what by my guts say this will break stuff in an unexpected way. Not sure why (and hence there should be better documentation) but I remember removing and adding this section over and over again...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see hint above... this will break conversion of single type classes which is valid but which was already critizised towards me (or maybe there were even real errors... but I doubt it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found anything it breaks yet, e.g. no failing tests, but it took some effort to get this running.

Comment on lines 14 to 15
if not (self.is_ld_list(data) and "@list" in data[0]):
raise ValueError("The given data does not represent a ld_list.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments on the issue: Don't restrict it to @list here but better rename the class itself.

And please, bring back the docstring!

Copy link
Collaborator Author

@notactuallyfinn notactuallyfinn Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't about the name of the class but rather that it didn't work for @set's as their expanded version does not contain @set. Additionally inserting one collection into another has to be treated differently depending on what container type they have and every (python) list could be considered a set which makes all the checks wheter something is a list or not useless and the type conversions even more complicated. But that will be discussed in the issue.

Will do.

# Moreover a set inside a list gets automaticaly converted to a list when expanded)

# FIXME: #439 what happens when a ld_list is put inside another also depends on their container types
# set your_ld_list[index] to the list [{"@type": "foo", "name": "bar"}] given in non expanded form
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it should be considered whether this operation should be allowed in this context. However, in general, it should behave as expected (i.e., set the value to what is passed). What it clearly must not do is setting something in the data property that is not valid expanded JSON. People who really want to do this still can directly access item_list or data...

For this concrete question I see two valid solutions: Do handling as to what is easiest implementable (and document) or if in doubt, raise an error.

Copy link
Collaborator Author

@notactuallyfinn notactuallyfinn Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have decided that (for now) it should be treated as if it was meant to be a list containing only one element. Because why would the user put the dict inside a list if not to add other items.
Setting the value to what is passes is not really possible... Either it is added to a new dict containing the passed value as a value of @list because a python list is considered an @list or the outer list of the passed value is removed. (In both cases the values are expanded of course.)

Comment on lines 28 to 29
li = ld_list([{"@list": [0]}])
assert li._data == [{"@list": [0]}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
li = ld_list([{"@list": [0]}])
assert li._data == [{"@list": [0]}]
li_data = [{"@list": [0]}]
li = ld_list(li_data)
assert li._data is li_data

Copy link
Member

@led02 led02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reaction to answers... thanks for the vivid discussion :)

dict(ld_container=lambda c, **kw: ld_list([{"@list": c}], **kw))
),
(ld_list.is_ld_list, {"ld_container": ld_list}),
(lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see... that seems to be related to the removal of the explcit handling of @type in the conversion.

Well, that's fine if you can deal with the implications...

  • {"@type": "foo"} (which is valid) will always be {"@type": ["foo"]} when converted using to_python
  • The value must not be wrapped in an @list container or individually wrapped in some @value or @id node. Yet, the values are subject to IRI expansion / compaction.

(ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
(ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
(ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
(lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have answers... but I'm totally not sure whether this is really true.

Apart from this, except for @type values, all expansible values should be wrapped in @id nodes... maybe this is a good point to check whether this assumption is correct. The other question is, which kind of expansion is required (i.e., vocab=True or not... see discussion about prefix, namespace, IRIs from the past)

(ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
(ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
(ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
(lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... anyways, there should most proably be a documentation for the conversion rules and the order they are applied in (maybe directly generated from the code... ;) )

Comment on lines -86 to -92
elif full_iri == "@type":
value = [
self.ld_proc.compact_iri(self.active_ctx, ld_type)
for ld_type in ld_value
]
if len(value) == 1:
value = value[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see hint above... this will break conversion of single type classes which is valid but which was already critizised towards me (or maybe there were even real errors... but I doubt it)

@notactuallyfinn notactuallyfinn mentioned this pull request Nov 10, 2025
5 tasks
@notactuallyfinn
Copy link
Collaborator Author

notactuallyfinn commented Nov 21, 2025

Thanks for your input so far @led02.
In the last commits I readded the support for "@set" and "@graph".
As you mentioned somewhere ld_list, ld_dict, etc. should not be responsible for the expansion/ compaction of the JSON-LD. But they were through the conversions in __init__.py. I hopefully solved this by rewriting ld_container._to_expanded_json which now uses the already available JSON-LD-processor object. Therefor the conversions in the __init__.py file should only be used for conversion from expanded JSON-LD to the "pythonized" values (ld_list.__contains__ currently uses the old expansion method but that will be changed.).
As a side effect I had to change the default container_type from "@list" to the default used by JSON-LD ("@set") and had to always consider the option that elements of one container won't be contained by it anymore but its parent container.
One thing that I am unsure of how to implement is the comparison of ld_list's. I forgot to implement that "@set"'s are unordered but for example how should an "@set" and "@list" be compared? (Currently all collections are considered to be ordered, and their values compared by their ids and values (preferable the ids but if one value has non they are compared by their values).)
When using ld_list.from_list to create an "@set" with an "@set" as parent (i.e. when the values of the new list get assimilated by the parent) the values of the "child" are appended to the parent. Should also the new context be added to the context of the parent? (If yes see: #430)
It would be great to hear your thoughts on this.

P.S.: Creating the documentation for all this will take some time.

# while searching build a path such that it leads from the found ld_dicts ld_value to selfs data_dict/ item_list
parent = self
path = []
while parent.__class__.__name__ != "ld_dict":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An if statement would be enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure.
It could work and I see no real reason why it shouldn't but I preferred to use an actually existent ld_dict and not a dummy construct.
What doing it like I implemented would allow us to do though is that we could enforce that every ld_container has an index xor a key (which then actually is the value used the get this object in the parent container).

# if neither self nor any of its parents is a ld_dict:
# create a dict with the key of the outer most parent of self and this parents ld_value as a value
# this dict is stored in an ld_container and simulates the most minimal JSON-LD object possible
if parent.__class__.__name__ != "ld_dict":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you could append this above

special_types = (list, dict, ld_container, datetime, date, time)
while True:
# check if ready
if len(key_and_reference_todo_list) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could bring that into the while condition

# expand value
expanded_value = self._to_expanded_json([value])
# empty list -> no value to check
if len(expanded_value) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that way with normal contains?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It made sense to me that because getting an empty list back indicates that whatever object has been expanded inside self would become nothing - meaning that appending it to self would leave self unchanged - it is contained inside self.

@SKernchen SKernchen merged commit 5f2214e into refactor/385-test-ld_list Dec 19, 2025
1 of 5 checks passed
@SKernchen SKernchen deleted the refactor/439-restrict-ld_list branch December 19, 2025 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data model Related to the hermes data model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants