Skip to content

Conversation

@led02
Copy link
Member

@led02 led02 commented Dec 9, 2024

No description provided.

@led02
Copy link
Member Author

led02 commented Jun 16, 2025

Okay, all new data model here.

  • Plugins have been updated to use at least some common base. I.e., all are classes that derive from a common base class. The interface is still sparse but provides access to the all new provenance enhancements.
  • Data is now expected to be expanded JSON-LD. However, there are nice wrappers in hermes.model.types that allow handling them like compact JSON-LD. Specialized derivatives for handling provenance data exist in hermes.model.prov.
  • For merging data, there is an implementation of the JSON-LD containers in hermes.model.merge that can be configured using strategies on how to merge certain data.
  • All those functionality depends on the pyld library for which some additional tools were built in.
    Especially the hermes.model.types.pyld_util.BundledLoader is from interest as it loads schemas from the package and also supports .ttl files with the help of RDLib.
  • The caches are re-organized. The directory structure stays the same (.hermes//) but the files are different.
    • 'expanded.json' contains the expanded JSON-LD data. This is used for internal processing.
    • 'context.json' contains the JSON-LD context that should be used to get back to the compact form.
    • 'codemeta.json' contains the compact JSON-LD representation (which should be somewhat CodeMeta compatible).
    • 'prov.json' exists in a separate result directory and contains the PROV-O data as an compact JSON-LD graph.
  • The cache is maintained by hermes.model.context_manager.HermesContext which is an awful name for a cache manager.

I think that's the most important stuff for now.

Of course, I did not do any proper testing (except for the obvious dog-feeding) and the documentation is rather sparse.
Some spots are still ugly and might need a good refactoring (e.g., provenance integration) but I tried to clean up the mess as good as possible.

Also, the code needs to be annotated so that REUSE is happy.

@led02
Copy link
Member Author

led02 commented Jun 17, 2025

Tests are still failing but I prefer not to deactivate them (yet) but hope for someone (maybe me) to fix them.

After a chat with @sdruskat I would propose to introduce this change step by step:

  1. Import the new data model (i.e., hermes.model.types, hermes.model.merge and hermes.model.context_manager). This is also a good opportunity to rename some of the stuff. ;)
  2. Adapt the plugin implementations to use the new data model. This requires adapting the command classes and the plugins itself. The implementations from this branch can be used as a template. However, I would not advise to directly import the implementations because the provenance recording is still very sloppy and might need a straight re-engineering.
  3. Get rid of legacy stuff.
  4. Having the plugins ported, decisions need to be made:
  • The plugin class model is slightly improved in this branch. However, as the plugin model is meant to be refactored, this can be skipped.
  • A fairly well tampered (little over-engineered) implementation for recording provenance is available in hermes.model.prov. This can be fitted to the plugins spitting valuable provenance information.
  • Test should be adapted and added .... massively!

Happy to discuss this with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants