Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6a00059
Ha Ha Hack-A-Thon! 🐱‍💻
led02 Dec 9, 2024
6b73af0
Implment JSON-LD as multi-access and add facades for value and graph …
led02 Dec 9, 2024
d12858e
Small refactoring of data access class(es)
led02 Dec 9, 2024
89c6ebb
Add additinal schemas for caching
led02 Dec 9, 2024
087b229
Document data model (esp. meta-metadata)
led02 Dec 9, 2024
bdd5889
Some additional tests
led02 Dec 9, 2024
6c7c04b
Merge remote-tracking branch 'origin/develop' into feature/153-refact…
led02 Jan 17, 2025
6ce2b22
Last state before deletion
led02 Jun 15, 2025
cc61618
Update pyproject.toml to current standard.
led02 Jun 16, 2025
7710b76
Remove audit logger
led02 Jun 16, 2025
a6d3ceb
Add classes to access expanded JSON-LD dicts.
led02 Jun 16, 2025
de1d23f
Add bundled schemas.
led02 Jun 16, 2025
904b26c
Add classes to handle provenance.
led02 Jun 16, 2025
fd7ac3c
Add basic merge implementation.
led02 Jun 16, 2025
d9a5326
Add missing package __init__ files.
led02 Jun 16, 2025
907679b
Update plugins to have a common class interface and to use the new da…
led02 Jun 16, 2025
75b50d5
Discard legacy.
led02 Jun 16, 2025
a4caa39
Update docs (very basic yet)
led02 Jun 16, 2025
f0256b8
Clean up.
led02 Jun 16, 2025
2c40ecd
Fix style errors.
led02 Jun 17, 2025
f7d6b6e
Fix style errors.
led02 Jun 17, 2025
aaed313
Add SPDX tags for REUSE compliance. Might have missed some still...
led02 Jun 17, 2025
d98236d
More SPDX license information supplied, for the bundled schemas.
led02 Jun 17, 2025
9e681d4
Fixing the last commit... ;)
led02 Jun 17, 2025
80199d9
Fixing the last commit... ;)
led02 Jun 17, 2025
38af83a
Add last missing copyright holder
led02 Jun 17, 2025
3f27d16
Add missing licenses.
led02 Jun 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
359 changes: 359 additions & 0 deletions LICENSES/CC-BY-SA-3.0.txt

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions LICENSES/W3C-20150513.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
This work is being provided by the copyright holders under the following license.

License
By obtaining and/or copying this work, you (the licensee) agree that you have read, understood, and will comply with the following terms and conditions.

Permission to copy, modify, and distribute this work, with or without modification, for any purpose and without fee or royalty is hereby granted, provided that you include the following on ALL copies of the work or portions thereof, including modifications:

• The full text of this NOTICE in a location viewable to users of the redistributed or derivative work.
• Any pre-existing intellectual property disclaimers, notices, or terms and conditions. If none exist, the W3C Software and Document Short Notice should be included.
• Notice of any changes or modifications, through a copyright statement on the new code or document such as "This software or document includes material copied from or derived from [title and URI of the W3C document]. Copyright (c) [YEAR] W3C® (MIT, ERCIM, Keio, Beihang)."

Disclaimers
THIS WORK IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE OR DOCUMENT WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.

COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE SOFTWARE OR DOCUMENT.

The name and trademarks of copyright holders may NOT be used in advertising or publicity pertaining to the work without specific, written prior permission. Title to copyright in this work will at all times remain with copyright holders.
28 changes: 22 additions & 6 deletions docs/source/dev/data_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,28 @@ All the data is collected in a directory called `.hermes` located in the root of
You should not need to interact with this data directly.
Instead, use {class}`hermes.model.context.HermesContext` and respective subclasses to access the data in a consistent way.

## Data representation

## Harvest Data
*hermes* operates on expanded JSON-LD datasets.
All internal data must be valid JSON-LD datasets in expanded form.
All internal data must use CodeMeta vocabulary where applicable.
All vocabulary used in internal datasets must be defined by a JSON-LD context.

The data of the havesters is cached in the sub-directory `.hermes/harvest`.
Each harvester has a separate cache file to allow parallel harvesting.
The cache file is encoded in JSON and stored in `.hermes/harvest/HARVESTER_NAME.json`
where `HARVESTER_NAME` corresponds to the entry point name.
*hermes* provides classes that facilitate the access to the expanded JSON-LD data.

### *hermes* internal processing data

*hermes* collects internal processing information in the `hermes-rt` namespace.

## Data cache

For each processing step there exists a command directory in the `.hermes` dir.
Within this command, there exists one further plugin directory for each plugin.
Within this plugin diretory, there are up to for files stores:

- `codemeta.json`: The (possibly extended) CodeMeta data representation of the dataset.
This should be valid compact JSON-LD data.
- `expanded.json`: The expanded representation of the dataset. This should be valid expanded JSON-LD data.
- `context.json`: The JSON-LD context that can be used to transform `expanded.json` into `codemeta.json`.
- `prov.json`: A JSON-LD dataset that contains the provenance collected by *hermes* during the run.

{class}`hermes.model.context.HermesHarvestContext` encapsulates these harvester caches.
2 changes: 1 addition & 1 deletion hermes.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# SPDX-License-Identifier: CC0-1.0

[harvest]
sources = [ "cff", "toml" ] # ordered priority (first one is most important)
sources = [ "cff", "toml", "git" ] # ordered priority (first one is most important)

[deposit]
target = "invenio_rdm"
Expand Down
104 changes: 55 additions & 49 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
# SPDX-FileContributor: Michael Meinel
# SPDX-FileContributor: David Pape

[tool.poetry]
[project]
# Reference at https://python-poetry.org/docs/pyproject/
name = "hermes"
version = "0.8.1"
version = "0.10.0dev0"
description = "Workflow to publish research software with rich metadata"
homepage = "https://software-metadata.pub"
license = "Apache-2.0"
authors = [
"Stephan Druskat <stephan.druskat@dlr.de>",
"Michael Meinel <michael.meinel@dlr.de>",
"Oliver Bertuch <o.bertuch@fz-juelich.de>",
"Jeffrey Kelling <j.kelling@hzdr.de>",
"Oliver Knodel <o.knodel@hzdr.de>",
"David Pape <d.pape@hzdr.de>",
"Sophie Kernchen <sohpie.kernchen@dlr.de>",
{name="Stephan Druskat", email="stephan.druskat@dlr.de"},
{name="Michael Meinel", email="michael.meinel@dlr.de"},
{name="Oliver Bertuch", email="o.bertuch@fz-juelich.de"},
{name="Jeffrey Kelling", email="j.kelling@hzdr.de"},
{name="Oliver Knodel", email="o.knodel@hzdr.de"},
{name="David Pape", email="d.pape@hzdr.de"},
{name="Sophie Kernchen", email="sophie.kernchen@dlr.de"},
]

readme = "README.md"
Expand All @@ -39,48 +39,53 @@ packages = [
{ include = "hermes", from = "src" }
]

[tool.poetry.dependencies]
python = "^3.10"
"ruamel.yaml" = "^0.17.21"
jsonschema = "^3.0.0"
pyld = "^2.0.3"
cffconvert = "^2.0.0"
toml = "^0.10.2"
pyparsing = "^3.0.9"
requests = "^2.28.1"
pydantic = "^2.5.1"
pydantic-settings = "^2.1.0"
requests-oauthlib = "^2.0.0"
pynacl = "^1.5.0"

[tool.poetry.group.dev.dependencies]
pytest = "^7.1.1"
pytest-cov = "^3.0.0"
taskipy = "^1.10.3"
flake8 = "^5.0.4"
requests-mock = "^1.10.0"
requires-python = "3.10"

dependencies = [
"ruamel.yaml>=0.17.21",
"jsonschema>=3.0.0",
"pyld>=2.0.3",
"cffconvert>=2.0.0",
"toml>=0.10.2",
"pyparsing>=3.0.9",
"requests>=2.28.1",
"pydantic>=2.5.1",
"pydantic-settings>=2.1.0",
"requests-oauthlib>=2.0.0",
"pynacl>=1.5.0",
"schemaorg>=0.1.1",
"jsonpath-ng>=1.7.0",
]

[project.optional-dependecies]
dev = [
"pytest>=7.1.1",
"pytest-cov>=3.0.0",
"taskipy>=1.10.3",
"flake8>=5.0.4",
"requests-mock>=1.10.0",
]

docs = [
"Sphin>=6.2.1",
# Sphinx - Additional modules
"myst-parser>=2.0.0",
"sphinx-book-theme>=1.0.1",
"sphinx-favicon>=0.2",
"sphinxcontrib-contentui>=^0.2.5",
"sphinxcontrib-images>=0.9.4",
"sphinx-icon>=0.1.2",
"sphinx-autobuild>=2021.3.14",
"sphinx-autoapi>=3.0.0",
"sphinxemoji>=0.2.0",
"sphinxext-opengraph>=0.6.3",
"sphinxcontrib-mermaid>=0.8.1",
"sphinx-togglebutton>=0.3.2",
"reuse>=1.1.2",
"sphinxcontrib-datatemplates>=0.11.0",
]

# Packages for developers for creating documentation
[tool.poetry.group.docs]
optional = true

[tool.poetry.group.docs.dependencies]
Sphinx = "^6.2.1"
# Sphinx - Additional modules
myst-parser = "^2.0.0"
sphinx-book-theme = "^1.0.1"
sphinx-favicon = "^0.2"
sphinxcontrib-contentui = "^0.2.5"
sphinxcontrib-images = "^0.9.4"
sphinx-icon = "^0.1.2"
sphinx-autobuild = "^2021.3.14"
sphinx-autoapi = "^3.0.0"
sphinxemoji = "^0.2.0"
sphinxext-opengraph = "^0.6.3"
sphinxcontrib-mermaid="^0.8.1"
sphinx-togglebutton="^0.3.2"
reuse = "^1.1.2"
sphinxcontrib-datatemplates = "^0.11.0"

[tool.poetry.plugins.console_scripts]
hermes = "hermes.commands.cli:main"
Expand All @@ -89,6 +94,7 @@ hermes-marketplace = "hermes.commands.marketplace:main"
[tool.poetry.plugins."hermes.harvest"]
cff = "hermes.commands.harvest.cff:CffHarvestPlugin"
codemeta = "hermes.commands.harvest.codemeta:CodeMetaHarvestPlugin"
pyproject = "hermes.commands.harvest.pyproject:PyprojectHarvestPlugin"

[tool.poetry.plugins."hermes.deposit"]
file = "hermes.commands.deposit.file:FileDepositPlugin"
Expand Down
78 changes: 68 additions & 10 deletions src/hermes/commands/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
from pydantic import BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict

from hermes import utils
from hermes.model.prov.ld_prov import ld_prov
from hermes.model.types import ld_context


class HermesSettings(BaseSettings):
"""Root class for HERMES configuration model."""
Expand All @@ -33,15 +37,52 @@ class HermesCommand(abc.ABC):
command_name: str = ""
settings_class: Type = HermesSettings

class prov:
@classmethod
def hermes_software(cls):
return {
"@type": "schema:SoftwareApplication",
"schema:name": utils.hermes_name,
"schema:version": utils.hermes_version,
"schema:url": utils.hermes_homepage,
}

@classmethod
def hermes_command(cls, cmd, app):
return {
"@type": "schema:SoftwareApplication",
"schema:name": cmd.command_name,
"schema:isPartOf": app.ref,
}

@classmethod
def hermes_plugin_run(cls, plugin, command):
return {
"prov:wasStaredBy": command.ref,
}

@classmethod
def hermes_json_data(cls, name, data):
return {
"@type": "schema:PropertyValue",
"schema:name": name,
"schema:value": {"@type": "@json", "@value": data.compact()},
}

def __init__(self, parser: argparse.ArgumentParser):
"""Initialize a new instance of any HERMES command.

:param parser: The command line parser used for reading command line arguments.
"""
self.prov_doc = ld_prov(context=ld_context.ALL_CONTEXTS)
self.app_entity = self.prov_doc.make_node('Entity', self.prov.hermes_software())

self.parser = parser
self.plugins = self.init_plugins()
self.settings = None

self.app_entity.commit()

self.log = logging.getLogger(f"hermes.{self.command_name}")
self.errors = []

Expand All @@ -50,17 +91,20 @@ def init_plugins(self):

# Collect all entry points for this group (i.e., all valid plug-ins for the step)
entry_point_group = f"hermes.{self.command_name}"
group_plugins = {
entry_point.name: entry_point.load()
for entry_point in metadata.entry_points(group=entry_point_group)
}
group_plugins = {}
group_settings = {}

for entry_point in metadata.entry_points(group=entry_point_group):
plugin_cls = entry_point.load()
ep_metadata = plugin_cls.get_metadata(entry_point)

plugin_cls.plugin_node = self.app_entity.add_related("schema:hasPart", "Entity", ep_metadata)

# Collect the plug-in specific configurations
self.derive_settings_class({
plugin_name: plugin_class.settings_class
for plugin_name, plugin_class in group_plugins.items()
if hasattr(plugin_class, "settings_class") and plugin_class.settings_class is not None
})
group_plugins[entry_point.name] = plugin_cls
if hasattr(plugin_cls, 'settings_class') and plugin_cls.settings_class is not None:
group_settings[entry_point.name] = plugin_cls.settings_class

self.derive_settings_class(group_settings)

return group_plugins

Expand Down Expand Up @@ -160,8 +204,22 @@ def __call__(self, args: argparse.Namespace):
class HermesPlugin(abc.ABC):
"""Base class for all HERMES plugins."""

pluing_node = None

settings_class: Optional[Type] = None

@classmethod
def get_metadata(cls, entry_point):
cls.entry_point = entry_point

return {
"@type": "schema:EntryPoint",
"schema:name": entry_point.name,
}

def __init__(self, plugin_prov):
self.prov_doc = plugin_prov

@abc.abstractmethod
def __call__(self, command: HermesCommand) -> None:
"""Execute the plugin.
Expand Down
4 changes: 3 additions & 1 deletion src/hermes/commands/clean/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,6 @@ def __call__(self, args: argparse.Namespace) -> None:
self.log.info("Removing HERMES caches...")

# Naive implementation for now... check errors, validate directory, don't construct the path ourselves, etc.
shutil.rmtree(args.path / '.hermes')
cache_path = args.path / '.hermes'
if cache_path.exists():
shutil.rmtree(cache_path)
2 changes: 1 addition & 1 deletion src/hermes/commands/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def main() -> None:

log.info("Run subcommand %s", args.command.command_name)
args.command(args)
except Exception as e:
except RuntimeError as e:
log.error("An error occurred during execution of %s", args.command.command_name)
log.debug("Original exception was: %s", e)

Expand Down
43 changes: 30 additions & 13 deletions src/hermes/commands/curate/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,12 @@
# SPDX-FileContributor: Michael Meinel

import argparse
import os
import shutil
import sys

from pydantic import BaseModel

from hermes.commands.base import HermesCommand
from hermes.model.context import CodeMetaContext
from hermes.model.context_manager import HermesContext
from hermes.model.types import ld_dict, ld_list


class CurateSettings(BaseModel):
Expand All @@ -34,14 +32,33 @@ def __call__(self, args: argparse.Namespace) -> None:

self.log.info("# Metadata curation")

ctx = CodeMetaContext()
process_output = ctx.hermes_dir / 'process' / (ctx.hermes_name + ".json")
ctx = HermesContext()
ctx.prepare_step("curate")

if not process_output.is_file():
self.log.error(
"No processed metadata found. Please run `hermes process` before curation."
)
sys.exit(1)
ctx.prepare_step("process")
with ctx["result"] as process_ctx:
expanded_data = process_ctx["expanded"]
context_data = process_ctx["context"]
prov_data = process_ctx["prov"]
ctx.finalize_step("process")

os.makedirs(ctx.hermes_dir / 'curate', exist_ok=True)
shutil.copy(process_output, ctx.hermes_dir / 'curate' / (ctx.hermes_name + '.json'))
prov_doc = ld_dict.from_dict({"hermes-rt:graph": prov_data, "@context": prov_data["@context"]})

nodes = {}
edges = {}

for node in prov_doc["hermes-rt:graph"]:
nodes[node["@id"]] = node

for rel in ('schema:isPartOf', "schema:hasPart", "prov:used", "prov:generated", "prov:wasStartedBy"):
if rel in node:
rel_ids = node[rel]
if not isinstance(rel_ids, ld_list):
rel_ids = [rel_ids]
edges[rel] = edges.get(rel, []) + [(node["@id"], rel_id) for rel_id in rel_ids]

with ctx["result"] as curate_ctx:
curate_ctx["expanded"] = expanded_data
curate_ctx["context"] = context_data

ctx.finalize_step("curate")
Loading
Loading