Change ld_list according to #439 #445

notactuallyfinn · 2025-10-17T11:54:57Z

#439 is not yet completely discussed/ the propositions not approved.
This PR contains the changes that seem plausible as of now.

…ifferent

@type

…rsion of @type values

sdruskat

Apart from the syntactic changes - which I personally would roll back, but that's preference - I'd really like to get another pair of eyes on this as I'm currently struggling to reload all the context to understand all changes.

@led02, would you be able to have a look at this?

sdruskat · 2025-11-07T08:18:39Z

src/hermes/model/types/__init__.py

+    (ld_list.is_ld_list, {"ld_container": ld_list}),
+    (lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}),

    # pythonize items from lists (expanded set is already handled above)
-    (ld_container.is_json_id, dict(python=lambda c, **_: c["@id"])),
-    (ld_container.is_typed_json_value, dict(python=ld_container.typed_ld_to_py)),
-    (ld_container.is_json_value, dict(python=lambda c, **_: c["@value"])),
-    (ld_list.is_container, dict(ld_container=lambda c, **kw: ld_list([c], **kw))),
-    (ld_dict.is_json_dict, dict(ld_container=lambda c, **kw: ld_dict([c], **kw))),
+    (ld_container.is_json_id, {"python": lambda c, **_: c["@id"]}),
+    (ld_container.is_typed_json_value, {"python": lambda c, **kw: ld_container.typed_ld_to_py([c], **kw)}),
+    (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
+    (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
+    (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
+    (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),

    # Convert internal data types to expanded_json
-    (lambda c: ld_container.is_json_id(c) or ld_container.is_json_value(c), dict(expanded_json=lambda c, **_: [c])),
-    (ld_dict.is_json_dict, dict(expanded_json=lambda c, **kw: ld_dict.from_dict(c, **kw).ld_value)),
-    (ld_dict.is_ld_dict, dict(expanded_json=lambda c, **kw: ld_dict.from_dict(c[0], **kw).ld_value)),
+    (lambda c: ld_container.is_json_id(c), {"expanded_json": lambda c, **_: c}),
+    (lambda c: ld_container.is_ld_id(c), {"expanded_json": lambda c, **_: c[0]}),
+    (lambda c: ld_container.is_json_value(c), {"expanded_json": lambda c, **_: [c]}),


Why the change in syntax? Why not keep using the dict constructor which - IMHO - reads cleaner?

In fact, I've tried both and liked None of them (with the dict(...) variant being a bit more readable ... but that's subjective).

If there is a refactoring going on here, do it properly (i.e., providing some class that encapuslates these rules and maybe also allow for better configurability, extensibility, documentabilty, ...). I would really appreciate this, making the connection more explicit without adding to much complexity.

Also note that the introduction of a rule class or similar is most probably required anyways when it comes to configuring the process step.

led02

Restricting the container types of ld_list will certainly break functionality. I fully understand the discussion about the unclear class name and would advise to just rename the class instead of removing alternate container types (e.g., the provenance data will be collected in a graph; I've come across example data that also uses @set).

When changeging the rules, don't stop at syntactic sugar but go the full way for the refactoring (which would be appreciated ;) )

Except for that, this basically looks good but the ordering of the rules took me a bit of fine-tuning back then, hence if there are changes to the filters or order, I would be a good idea to document the reasons for this in the code (even though I did not do this before... I'm not a good example).

The other thing is that developers need to be aware that those classes (in contrast to their counterparts in the merging package), are pure facade - i.e., changes to the contents must be directly reflected in the original expanded JSON-LD. If data is read from these classes, it should be returned in a way that it also delays the actual data extraction to the latest time possible... lazy evaluation. (It is clear that e.g., an integer cannot reflect changes and most probably should not... but this behaviour needs to be clear to the users, too.)
Hence, I would propose adding respective test cases similar to my suggestion in test_list_basics that check for identity and ensure the underlying data structure behaves as expected.

led02 · 2025-11-07T11:56:20Z

src/hermes/model/types/__init__.py

-     dict(ld_container=lambda c, **kw: ld_list([{"@list": c}], **kw))
-    ),
+    (ld_list.is_ld_list, {"ld_container": ld_list}),
+    (lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}),


The side condition is important to my understanding. This rule should invoked upon an expanded JSON-LD list. In fact, the rule should be even more strict ensuring the items in the list are not only dicts but valid JSON-LD values.

Or am I getting something wrong here?

When getting a value from an ld_dict it should return an ld_list always. Therefor it is important to know wheter a given value comes from an ld_list or an ld_dict. If the value, where expanded JSON-LD is expected, is not inside a list it previously was inside a list.
I removed the condition because when getting "@type" from {"@type": ["A"]} the value in the list is a string and not a dict. (I found no cases where this goes wrong because everything is valid expanded JSON-LD.)

We should know that the values are valid expanded JSON-LD here because this conversion happens only when converting to the pythonized version of a value that was previously expanded.

Okay, I see... that seems to be related to the removal of the explcit handling of @type in the conversion.

Well, that's fine if you can deal with the implications...

{"@type": "foo"} (which is valid) will always be {"@type": ["foo"]} when converted using to_python

The value must not be wrapped in an @list container or individually wrapped in some @value or @id node. Yet, the values are subject to IRI expansion / compaction.

led02 · 2025-11-07T11:58:58Z

src/hermes/model/types/__init__.py

+    (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
+    (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
+    (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
+    (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),


This rule is to generic IMHO, as it will try to compact all strings that are encountered, even those that are explicitly not meant as representing an IRI.

But shouldn't all other string should be known to be a value of some sort i.e. formed like {"@value": _} and therefor filtered by is_json_value because we get valid JSON-LD?

I see you have answers... but I'm totally not sure whether this is really true.

Apart from this, except for @type values, all expansible values should be wrapped in @id nodes... maybe this is a good point to check whether this assumption is correct. The other question is, which kind of expansion is required (i.e., vocab=True or not... see discussion about prefix, namespace, IRIs from the past)

... anyways, there should most proably be a documentation for the conversion rules and the order they are applied in (maybe directly generated from the code... ;) )

led02 · 2025-11-07T12:02:39Z

src/hermes/model/types/__init__.py

+    (lambda c: ld_container.is_ld_id(c), {"expanded_json": lambda c, **_: c[0]}),
+    (lambda c: ld_container.is_json_value(c), {"expanded_json": lambda c, **_: [c]}),
+    (lambda c: ld_container.is_ld_value(c), {"expanded_json": lambda c, **_: c}),


Why is the callback wrapped in an additional lambda c here?

However, this seems like a reasonable and important change... maybe an additinal hint in the comment about the side effects of the ordering of especially these rules might be helpful. (I'll also have look into the commit history to better understand the reasons behind this).

You are right. The additional lambda c can be removed.
I will add a few comments to all conversions, probably monday.

led02 · 2025-11-07T12:07:11Z

src/hermes/model/types/ld_container.py

-        elif full_iri == "@type":
-            value = [
-                self.ld_proc.compact_iri(self.active_ctx, ld_type)
-                for ld_type in ld_value
-            ]
-            if len(value) == 1:
-                value = value[0]


Not sure what by my guts say this will break stuff in an unexpected way. Not sure why (and hence there should be better documentation) but I remember removing and adding this section over and over again...

see hint above... this will break conversion of single type classes which is valid but which was already critizised towards me (or maybe there were even real errors... but I doubt it)

I haven't found anything it breaks yet, e.g. no failing tests, but it took some effort to get this running.

led02 · 2025-11-07T12:07:26Z

src/hermes/model/types/ld_list.py

+        if not (self.is_ld_list(data) and "@list" in data[0]):
+            raise ValueError("The given data does not represent a ld_list.")


See my comments on the issue: Don't restrict it to @list here but better rename the class itself.

And please, bring back the docstring!

It wasn't about the name of the class but rather that it didn't work for @set's as their expanded version does not contain @set. Additionally inserting one collection into another has to be treated differently depending on what container type they have and every (python) list could be considered a set which makes all the checks wheter something is a list or not useless and the type conversions even more complicated. But that will be discussed in the issue.

Will do.

led02 · 2025-11-07T12:13:41Z

src/hermes/model/types/ld_list.py

-        #    Moreover a set inside a list gets automaticaly converted to a list when expanded)
-
-        # FIXME: #439 what happens when a ld_list is put inside another also depends on their container types
+        # set your_ld_list[index] to the list [{"@type": "foo", "name": "bar"}] given in non expanded form


Well, it should be considered whether this operation should be allowed in this context. However, in general, it should behave as expected (i.e., set the value to what is passed). What it clearly must not do is setting something in the data property that is not valid expanded JSON. People who really want to do this still can directly access item_list or data...

For this concrete question I see two valid solutions: Do handling as to what is easiest implementable (and document) or if in doubt, raise an error.

We have decided that (for now) it should be treated as if it was meant to be a list containing only one element. Because why would the user put the dict inside a list if not to add other items.
Setting the value to what is passes is not really possible... Either it is added to a new dict containing the passed value as a value of @list because a python list is considered an @list or the outer list of the passed value is removed. (In both cases the values are expanded of course.)

led02 · 2025-11-07T12:16:21Z

test/hermes_test/model/types/test_ld_list.py

+    li = ld_list([{"@list": [0]}])
+    assert li._data == [{"@list": [0]}]


Suggested change

li = ld_list([{"@list": [0]}])

assert li._data == [{"@list": [0]}]

li_data = [{"@list": [0]}]

li = ld_list(li_data)

assert li._data is li_data

…ded tests for them

led02

Reaction to answers... thanks for the vivid discussion :)

led02 · 2025-11-07T13:18:57Z

src/hermes/model/types/__init__.py

-     dict(ld_container=lambda c, **kw: ld_list([{"@list": c}], **kw))
-    ),
+    (ld_list.is_ld_list, {"ld_container": ld_list}),
+    (lambda c: isinstance(c, list), {"ld_container": lambda c, **kw: ld_list([{"@list": c}], **kw)}),


Okay, I see... that seems to be related to the removal of the explcit handling of @type in the conversion.

Well, that's fine if you can deal with the implications...

{"@type": "foo"} (which is valid) will always be {"@type": ["foo"]} when converted using to_python

The value must not be wrapped in an @list container or individually wrapped in some @value or @id node. Yet, the values are subject to IRI expansion / compaction.

led02 · 2025-11-07T13:23:58Z

src/hermes/model/types/__init__.py

+    (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
+    (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
+    (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
+    (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),


I see you have answers... but I'm totally not sure whether this is really true.

Apart from this, except for @type values, all expansible values should be wrapped in @id nodes... maybe this is a good point to check whether this assumption is correct. The other question is, which kind of expansion is required (i.e., vocab=True or not... see discussion about prefix, namespace, IRIs from the past)

led02 · 2025-11-07T13:25:27Z

src/hermes/model/types/__init__.py

+    (ld_container.is_json_value, {"python": lambda c, **_: c["@value"]}),
+    (ld_list.is_container, {"ld_container": lambda c, **kw: ld_list([c], **kw)}),
+    (ld_dict.is_json_dict, {"ld_container": lambda c, **kw: ld_dict([c], **kw)}),
+    (lambda v: isinstance(v, str), {"python": lambda v, parent, **_: parent.ld_proc.compact_iri(parent.active_ctx, v)}),


... anyways, there should most proably be a documentation for the conversion rules and the order they are applied in (maybe directly generated from the code... ;) )

led02 · 2025-11-07T13:26:55Z

src/hermes/model/types/ld_container.py

-        elif full_iri == "@type":
-            value = [
-                self.ld_proc.compact_iri(self.active_ctx, ld_type)
-                for ld_type in ld_value
-            ]
-            if len(value) == 1:
-                value = value[0]


see hint above... this will break conversion of single type classes which is valid but which was already critizised towards me (or maybe there were even real errors... but I doubt it)

notactuallyfinn · 2025-11-21T12:36:35Z

Thanks for your input so far @led02.
In the last commits I readded the support for "@set" and "@graph".
As you mentioned somewhere ld_list, ld_dict, etc. should not be responsible for the expansion/ compaction of the JSON-LD. But they were through the conversions in __init__.py. I hopefully solved this by rewriting ld_container._to_expanded_json which now uses the already available JSON-LD-processor object. Therefor the conversions in the __init__.py file should only be used for conversion from expanded JSON-LD to the "pythonized" values (ld_list.__contains__ currently uses the old expansion method but that will be changed.).
As a side effect I had to change the default container_type from "@list" to the default used by JSON-LD ("@set") and had to always consider the option that elements of one container won't be contained by it anymore but its parent container.
One thing that I am unsure of how to implement is the comparison of ld_list's. I forgot to implement that "@set"'s are unordered but for example how should an "@set" and "@list" be compared? (Currently all collections are considered to be ordered, and their values compared by their ids and values (preferable the ids but if one value has non they are compared by their values).)
When using ld_list.from_list to create an "@set" with an "@set" as parent (i.e. when the values of the new list get assimilated by the parent) the values of the "child" are appended to the parent. Should also the new context be added to the context of the parent? (If yes see: #430)
It would be great to hear your thoughts on this.

P.S.: Creating the documentation for all this will take some time.

src/hermes/model/types/ld_container.py

SKernchen · 2025-12-10T09:20:25Z

src/hermes/model/types/ld_container.py

+        # while searching build a path such that it leads from the found ld_dicts ld_value to selfs data_dict/ item_list
+        parent = self
+        path = []
+        while parent.__class__.__name__ != "ld_dict":


An if statement would be enough?

I am not sure.
It could work and I see no real reason why it shouldn't but I preferred to use an actually existent ld_dict and not a dummy construct.
What doing it like I implemented would allow us to do though is that we could enforce that every ld_container has an index xor a key (which then actually is the value used the get this object in the parent container).

SKernchen · 2025-12-10T09:21:13Z

src/hermes/model/types/ld_container.py

+        # if neither self nor any of its parents is a ld_dict:
+        # create a dict with the key of the outer most parent of self and this parents ld_value as a value
+        # this dict is stored in an ld_container and simulates the most minimal JSON-LD object possible
+        if parent.__class__.__name__ != "ld_dict":


Then you could append this above

SKernchen · 2025-12-10T09:26:18Z

src/hermes/model/types/ld_container.py

+        special_types = (list, dict, ld_container, datetime, date, time)
+        while True:
+            # check if ready
+            if len(key_and_reference_todo_list) == 0:


Could bring that into the while condition

src/hermes/model/types/ld_container.py

SKernchen · 2025-12-10T10:32:44Z

src/hermes/model/types/ld_list.py

+        # expand value
+        expanded_value = self._to_expanded_json([value])
+        # empty list -> no value to check
+        if len(expanded_value) == 0:


Is it that way with normal contains?

It made sense to me that because getting an empty list back indicates that whatever object has been expanded inside self would become nothing - meaning that appending it to self would leave self unchanged - it is contained inside self.

src/hermes/model/types/ld_list.py

Outdated

changed ld_list according to issue #439

da136c3

notactuallyfinn requested review from SKernchen and sdruskat October 17, 2025 11:54

notactuallyfinn added the data model Related to the hermes data model label Oct 17, 2025

Michael Fritzsche added 5 commits October 20, 2025 10:06

implemented tests for problem 2 of the issue and more tests

03ebd32

added a few missing conversions and formatted the file a little bit d…

d1eb612

…ifferent

removed one conversion, added one and changed _to_python to fix conve…

d28dc10

…rsion of @type values

fixed formatting errors

fb5aa64

fixed ld_container tests, a type conversion and equals of ld_list

5248853

sdruskat reviewed Nov 7, 2025

View reviewed changes

led02 previously requested changes Nov 7, 2025

View reviewed changes

added detailed comparison for ld_list, ld_dict and json_values and ad…

e2c1bba

…ded tests for them

led02 reviewed Nov 7, 2025

View reviewed changes

notactuallyfinn mentioned this pull request Nov 10, 2025

Ld_list issues #439

Open

5 tasks

notactuallyfinn added 5 commits November 21, 2025 10:57

added support for graph and set again and improved expansion of json-ld

aeb88fb

removed unnecessary lambda

007e3a3

fixed set and from_list and added del

3c9ea61

updated tests of ld_container and fixed found bugs

66e03df

pacified flake8

c4aea65

Michael Fritzsche added 10 commits November 24, 2025 09:02

improved ld_list.__contains__ and added tests for it

7d30b16

added one TODO item to ld_list

f8d259f

began adding doc strings, type hints and comments for ld_container

2b98627

improved _to_expanded_json

1958518

added doc-strings and comments and pacified flake8

4acf8de

began commenting and adding doc-strings to ld_list

b751130

fixed bug in _to_expanded_json

3c243f2

added a few test cases to delete of ld_list

e0f0fc9

added more comments to ld_list

1b5656e

added more doc strings

071630c

Michael Fritzsche added 2 commits December 8, 2025 08:39

corrected type hints

d4b34b4

documented the rest of the methods in ld_container

f0f1818

SKernchen reviewed Dec 12, 2025

View reviewed changes

notactuallyfinn added 2 commits December 12, 2025 14:00

implemented list comparison and added tests for it

bcc233d

added tests and fixed small bug and fixed typos

287d37a

SKernchen merged commit 5f2214e into refactor/385-test-ld_list Dec 19, 2025
1 of 5 checks passed

SKernchen deleted the refactor/439-restrict-ld_list branch December 19, 2025 08:04

		if not (self.is_ld_list(data) and "@list" in data[0]):
		raise ValueError("The given data does not represent a ld_list.")

		li = ld_list([{"@list": [0]}])
		assert li._data == [{"@list": [0]}]

-    li = ld_list([{"@list": [0]}])
-    assert li._data == [{"@list": [0]}]
+    li_data = [{"@list": [0]}]
+    li = ld_list(li_data)
+    assert li._data is li_data

Change ld_list according to #439 #445

Change ld_list according to #439 #445

Uh oh!

Conversation

notactuallyfinn commented Oct 17, 2025

Uh oh!

sdruskat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

led02 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notactuallyfinn Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notactuallyfinn Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

led02 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notactuallyfinn commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

notactuallyfinn Nov 10, 2025 •

edited

Loading

notactuallyfinn Nov 10, 2025 •

edited

Loading

notactuallyfinn commented Nov 21, 2025 •

edited

Loading