-
Notifications
You must be signed in to change notification settings - Fork 10
Description
At the University where I work, we are receiving more and more requests from researchers who would like to automatically create a Dataverse record based on the state of their GitHub repository. Currently, the dataverse-uploaded GitHub action requires the researcher to:
- Create a data record on Dataverse and fill in the metadata manually
- Copy the DOI from Dataverse and add it to the
workflow.ymlfile - Run the GitHub workflow
- Go back to Dataverse and submit for review
This workflow makes the researcher go back and forth between GitHub and Dataverse many times. I would like to propose the addition of a feature to create a data record upon running the GitHub action for the first time. This would reduce the number of times the researcher needs to switch between Dataverse and GitHub, and it might also help the researcher by automatically filling in as much metadata as possible based on the information in the GitHub repository.
Proposed user interface
The DATAVERSE_DATASET_DOI field could be defined to be optional (not required).
- If the DOI is provided, the action works as it currently does.
- If the DOI is not provided, a new Dataverse record is created using as much information as possible from the GitHub repository to define the metadata, and then the action proceeds as always.
Proposed implementation
- The "Create a Dataset in a Dataverse Collection" can be used to implement the main feature.
- GitHub Contexts can be used to automatically populate as much metadata as possible for the newly created record on Dataverse.
For example, the action could follow these steps:
- Create and populate a temporary "metadata.json" file based on the GitHub context (which is needed to create a record with given metadata)
- Make the API request to create a new record based on a given "metadata.json" file. Something like:
import os
import requests
headers = {
'X-Dataverse-key': os.getenv('API_TOKEN', ''),
'Content-type': 'application/json',
}
with open('metadata.json', 'rb') as f:
data = f.read()
response = requests.post(
'http://' + os.getenv('SERVER_URL', '') + '/api/dataverses/' + os.getenv('PARENT', '') + '/datasets',
headers=headers,
data=data,
)- Extract the DOI of the newly generated Dataverse record from the response object and use it as it had been provided by the user in the
DATAVERSE_DATASET_DOIfield. - Maybe it would even be possible to have the workflow replace the missing value of the
DATAVERSE_DATASET_DOIfield with the newly created DOI. This would ensure that no new Dataverse record is created if the action is rerun.