From 7e58814a8dd918d3e08cc0ddd739d9be9f1315ce Mon Sep 17 00:00:00 2001 From: egrace479 Date: Wed, 14 Jan 2026 16:46:19 -0500 Subject: [PATCH 1/8] Clarify license section, include link to policy clear up confusion over need for file or use of MIT license for datasets --- docs/wiki-guide/Hugging-Face-Repo-Guide.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/docs/wiki-guide/Hugging-Face-Repo-Guide.md b/docs/wiki-guide/Hugging-Face-Repo-Guide.md index 126640b..10c6f72 100644 --- a/docs/wiki-guide/Hugging-Face-Repo-Guide.md +++ b/docs/wiki-guide/Hugging-Face-Repo-Guide.md @@ -6,10 +6,10 @@ Need a repository to store your data or model? You've come to the right place! B ### Standard Files -For each repository, include the following files in the root directory as soon as possible; a license can (and should) be instantiated when you create a new repository, and the standard `.gitattributes` will be generated for you. On the [Imageomics HF](https://huggingface.co/imageomics) select `New` and pick which type of repository you need. +For each repository, include the following files and metadata in the root directory as soon as possible; a license can (and should) be instantiated with the Dataset or Model card (`README.md`), and the standard `.gitattributes` will be generated for you. On the [Imageomics HF](https://huggingface.co/imageomics) select `New` and pick which type of repository you need. - [README.md](#readme) -- [LICENSE.md](#license) +- [License](#license) - [.gitignore](#gitignore) - [.gitattributes](#gitattributes) @@ -19,20 +19,25 @@ The README.md file is generally referred to as either a Dataset or Model Card an Once you've created your repo, populate your README (you can do this online by selecting "Create Dataset/Model Card" and pasting in the appropriate Imageomics HF template, then filling in your info). Editing your README in the browser allows you to preview the formatting of the file before committing changes. -#### LICENSE +#### License ##### 1. Select a license -Alongside the appropriate stakeholders, select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant. +Alongside the appropriate stakeholders, select a license following the guidelines set forth in the [Imageomics Digital Products Release and Licensing Policy](Digital-products-release-licensing-policy.md). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)); Models and Spaces should be released under a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant. !!! note "Remember" A public repository on Hugging Face with no license can be viewed and accessed by others, but unless the author associates a license, it is unclear what others are allowed to do with it legally. Adding an OSI license can help others feel comfortable building off your work! -For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al. +Keep in mind that your available license options may also be limited by your data sources or base model. Data should not be republished where not explicitly warranted or required.[^1] -##### 2. Add LICENSE.md to the repository +[^1]: For instance, when working with images aggregated from multiple sources, a catalog of all images used with URLs to access the images and download instructions ([cautious-robot](Helpful-Tools-for-your-Workflow.md#cautious-robot) can help with this) respects the original source data producers interests. However, if you have processed the images in a resource-intensive pipeline and the image licenses allow, the _processed_ images should be published for ease of re-use. In this case, it is important to provide the citation for the source data as well. -Once a license has been chosen (if not initialized with one), add the appropriate license label in the `yaml` portion of the README (the web UI generates a dropdown of recommendations under "Edit dataset/model card"). +##### 2. Add a license to the repository + +Once a license has been chosen (if not initialized with one), add the appropriate license identifier in the `yaml` portion of the README (the web UI generates a dropdown of recommendations under "Edit dataset/model card", [license identifiers](https://huggingface.co/docs/hub/en/repositories-licenses)). + +!!! note + Unlike in GitHub, a `LICENSE.md` file is not required. Instead, the license for the digital object is added through the `yaml` (for ease of API access) and further clarifications can be included in the License Section of the Dataset or Model card. #### gitignore From c47262fe17c426ccd171fb7b3b4e5252050bb9fd Mon Sep 17 00:00:00 2001 From: egrace479 Date: Fri, 23 Jan 2026 18:59:15 -0500 Subject: [PATCH 2/8] Remove 'Imageomics' in front of policy, no need for that specification here --- docs/wiki-guide/Hugging-Face-Repo-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/wiki-guide/Hugging-Face-Repo-Guide.md b/docs/wiki-guide/Hugging-Face-Repo-Guide.md index 10c6f72..f3aff44 100644 --- a/docs/wiki-guide/Hugging-Face-Repo-Guide.md +++ b/docs/wiki-guide/Hugging-Face-Repo-Guide.md @@ -23,7 +23,7 @@ Once you've created your repo, populate your README (you can do this online by s ##### 1. Select a license -Alongside the appropriate stakeholders, select a license following the guidelines set forth in the [Imageomics Digital Products Release and Licensing Policy](Digital-products-release-licensing-policy.md). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)); Models and Spaces should be released under a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant. +Alongside the appropriate stakeholders, select a license following the guidelines set forth in the [Digital Products Release and Licensing Policy](Digital-products-release-licensing-policy.md). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)); Models and Spaces should be released under a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant. !!! note "Remember" A public repository on Hugging Face with no license can be viewed and accessed by others, but unless the author associates a license, it is unclear what others are allowed to do with it legally. Adding an OSI license can help others feel comfortable building off your work! From 527576be54eda3d7f82d656b344c9619aad21d67 Mon Sep 17 00:00:00 2001 From: egrace479 Date: Fri, 23 Jan 2026 19:00:04 -0500 Subject: [PATCH 3/8] Add pro-tip to use data/model card checklists --- docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md | 5 ++++- docs/wiki-guide/HF_ModelCard_Template_mkdocs.md | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md b/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md index 5441168..ace1706 100644 --- a/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md +++ b/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md @@ -1,6 +1,9 @@ # Dataset Card Template -Below are the Dataset Card templates for Imageomics and ABC. You can download or copy the appropriate dataset card content and paste it into a new Markdown file to create a README for your dataset. +Below are the Dataset Card templates for Imageomics and ABC. You can download or copy the appropriate dataset card content and paste it into a new Markdown file to create a README for your dataset. + +!!! tip "Pro tip" + Use the [Data Card Checklist](Data-Checklist.md) to help keep track of your progress.
Imageomics diff --git a/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md b/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md index f937829..67f5e17 100644 --- a/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md +++ b/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md @@ -1,6 +1,9 @@ # Model Card Template -Below are the Model Card templates for Imageomics and ABC. You can download or copy the appropriate model card content and paste it into a new Markdown file to create a README for your model repo. +Below are the Model Card templates for Imageomics and ABC. You can download or copy the appropriate model card content and paste it into a new Markdown file to create a README for your model repo. + +!!! tip "Pro tip" + Use the [Model Card Checklist](Model-Checklist.md) to help keep track of your progress.
Imageomics From d50629cb8db28fdeae4cd5e50412989eabb8dee1 Mon Sep 17 00:00:00 2001 From: egrace479 Date: Fri, 23 Jan 2026 19:00:48 -0500 Subject: [PATCH 4/8] Clarify license recommendations/references in templates aligns with repo guide page clarification --- docs/wiki-guide/HF_DatasetCard_Template_ABC.md | 5 ++--- docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md | 5 ++--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/docs/wiki-guide/HF_DatasetCard_Template_ABC.md b/docs/wiki-guide/HF_DatasetCard_Template_ABC.md index e83b113..acb59fc 100644 --- a/docs/wiki-guide/HF_DatasetCard_Template_ABC.md +++ b/docs/wiki-guide/HF_DatasetCard_Template_ABC.md @@ -17,9 +17,8 @@ description: # Add a short description (summary) of your dataset, this will rend NOTE: Add more tags (your particular animal, type of model and use-case, etc.). -As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant. -For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al. -See the [ABC Global Center policy for licensing](https://docs.google.com/document/d/1SlITG-r7kdJB6C8f4FCJ9Z7o7ccwldZoSRJKjhRAWVA/edit#heading=h.c1sxg0wsiqru) for more information. +As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is following the guidelines set forth in the [ABC Digital Products Release and Licensing Policy](https://ABC-Center.github.io/ABC-guide/wiki-guide/Digital-products-release-licensing-policy/). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)). +For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com). List of [HF license identifiers](https://huggingface.co/docs/hub/en/repositories-licenses) (for yaml). See more options for the above information by clicking "edit dataset card" on your repo. diff --git a/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md b/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md index 506d19c..c957253 100644 --- a/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md +++ b/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md @@ -17,9 +17,8 @@ description: # Add a short description (summary) of your dataset, this will rend NOTE: Add more tags (your particular animal, type of model and use-case, etc.). -As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant. -For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al. -See the [Imageomics policy for licensing](https://imageomics.github.io/Imageomics-guide/wiki-guide/Digital-products-release-licensing-policy/) for more information. +As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is following the guidelines set forth in the [Imageomics Digital Products Release and Licensing Policy]((https://imageomics.github.io/Imageomics-guide/wiki-guide/Digital-products-release-licensing-policy/). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)). +For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com). List of [HF license identifiers](https://huggingface.co/docs/hub/en/repositories-licenses) (for yaml). See more options for the above information by clicking "edit dataset card" on your repo. From 6e5155795f61e6370f92a1c82b471e8e2ccc1ed0 Mon Sep 17 00:00:00 2001 From: egrace479 Date: Fri, 23 Jan 2026 19:13:11 -0500 Subject: [PATCH 5/8] Add citation clarification in note under standard files --- docs/wiki-guide/Hugging-Face-Repo-Guide.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/wiki-guide/Hugging-Face-Repo-Guide.md b/docs/wiki-guide/Hugging-Face-Repo-Guide.md index f3aff44..b6a6748 100644 --- a/docs/wiki-guide/Hugging-Face-Repo-Guide.md +++ b/docs/wiki-guide/Hugging-Face-Repo-Guide.md @@ -13,6 +13,9 @@ For each repository, include the following files and metadata in the root direct - [.gitignore](#gitignore) - [.gitattributes](#gitattributes) +!!! note + Hugging Face does not support the use of a `CITATION.cff`. Instead, citation guidance is provided in the Citation Section of the [Dataset](HF_DatasetCard_Template_mkdocs.md) or [Model](HF_ModelCard_Template_mkdocs.md) card. When [generating a DOI on Hugging Face](DOI-Generation.md##1-generate-a-doi-on-hugging-face), author names must be added manually in the intended order for them to be displayed in the DOI "Cite this dataset" link. + #### README The README.md file is generally referred to as either a Dataset or Model Card and is what everyone will notice first when they open your repository on Hugging Face. Choose the appropriate Imageomics-specific HF template ([model](HF_ModelCard_Template_mkdocs.md) or [dataset](HF_DatasetCard_Template_mkdocs.md)) to get started. Be sure to include a brief description and as much information as possible at the beginning. You can update this file as you go, so don't remove the recommended sections prior to completion. The templates include descriptions of many fields, Imageomics grant information, citation formatting, and some notes on HF-flavored markdown to get you started. From 2feba11c336fb941b657a8997b24486c31d4b114 Mon Sep 17 00:00:00 2001 From: egrace479 Date: Fri, 23 Jan 2026 19:20:57 -0500 Subject: [PATCH 6/8] Add choose-a-license link back in still a good reference for both datasets and models --- docs/wiki-guide/Hugging-Face-Repo-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/wiki-guide/Hugging-Face-Repo-Guide.md b/docs/wiki-guide/Hugging-Face-Repo-Guide.md index b6a6748..7c17866 100644 --- a/docs/wiki-guide/Hugging-Face-Repo-Guide.md +++ b/docs/wiki-guide/Hugging-Face-Repo-Guide.md @@ -31,7 +31,7 @@ Alongside the appropriate stakeholders, select a license following the guideline !!! note "Remember" A public repository on Hugging Face with no license can be viewed and accessed by others, but unless the author associates a license, it is unclear what others are allowed to do with it legally. Adding an OSI license can help others feel comfortable building off your work! -Keep in mind that your available license options may also be limited by your data sources or base model. Data should not be republished where not explicitly warranted or required.[^1] +For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com). Keep in mind that your available license options may also be limited by your data sources or base model. Data should not be republished where not explicitly warranted or required.[^1] [^1]: For instance, when working with images aggregated from multiple sources, a catalog of all images used with URLs to access the images and download instructions ([cautious-robot](Helpful-Tools-for-your-Workflow.md#cautious-robot) can help with this) respects the original source data producers interests. However, if you have processed the images in a resource-intensive pipeline and the image licenses allow, the _processed_ images should be published for ease of re-use. In this case, it is important to provide the citation for the source data as well. From 5951c4ac52e071b1082a2ca6474308b00e4ef019 Mon Sep 17 00:00:00 2001 From: egrace479 Date: Mon, 26 Jan 2026 13:56:32 -0500 Subject: [PATCH 7/8] Clarify license not supported, as it's more about how the system works include also the links to the repo card templates --- docs/wiki-guide/Hugging-Face-Repo-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/wiki-guide/Hugging-Face-Repo-Guide.md b/docs/wiki-guide/Hugging-Face-Repo-Guide.md index 7c17866..0f187d6 100644 --- a/docs/wiki-guide/Hugging-Face-Repo-Guide.md +++ b/docs/wiki-guide/Hugging-Face-Repo-Guide.md @@ -40,7 +40,7 @@ For more information on how to choose a license and why it matters, see [Choose Once a license has been chosen (if not initialized with one), add the appropriate license identifier in the `yaml` portion of the README (the web UI generates a dropdown of recommendations under "Edit dataset/model card", [license identifiers](https://huggingface.co/docs/hub/en/repositories-licenses)). !!! note - Unlike in GitHub, a `LICENSE.md` file is not required. Instead, the license for the digital object is added through the `yaml` (for ease of API access) and further clarifications can be included in the License Section of the Dataset or Model card. + Unlike in GitHub, a `LICENSE.md` file is not supported. Instead, the license for the digital object is added through the `yaml` (for ease of API access) and further clarifications can be included in the License Section of the [Dataset](HF_DatasetCard_Template_mkdocs.md) or [Model](HF_ModelCard_Template_mkdocs.md) card. #### gitignore From 84074e95b3a0f1607ce8f06bd20c7a0996c869b0 Mon Sep 17 00:00:00 2001 From: Graham Taylor Date: Wed, 28 Jan 2026 13:02:25 -0500 Subject: [PATCH 8/8] fix: correct broken link and abbreviation in dataset card template Remove extra opening parenthesis in the Digital Products Release and Licensing Policy link that broke the markdown rendering. Also fix "eg." to "e.g." on the same line for correctness. --- docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md b/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md index c957253..3eae3f2 100644 --- a/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md +++ b/docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md @@ -17,7 +17,7 @@ description: # Add a short description (summary) of your dataset, this will rend NOTE: Add more tags (your particular animal, type of model and use-case, etc.). -As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (eg., your PI, co-authors), select a license that is following the guidelines set forth in the [Imageomics Digital Products Release and Licensing Policy]((https://imageomics.github.io/Imageomics-guide/wiki-guide/Digital-products-release-licensing-policy/). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)). +As with your GitHub Project repo, it is important to choose an appropriate license for your dataset. The default license is [CC0](https://creativecommons.org/publicdomain/zero/1.0/) (public domain dedication, see [Dryad's explanation of why to use CC0](https://blog.datadryad.org/2023/05/30/good-data-practices-removing-barriers-to-data-reuse-with-cc0-licensing/)). Alongside the appropriate stakeholders (e.g., your PI, co-authors), select a license that is following the guidelines set forth in the [Imageomics Digital Products Release and Licensing Policy](https://imageomics.github.io/Imageomics-guide/wiki-guide/Digital-products-release-licensing-policy/). For Datasets, this would be public domain or terms no more restrictive than requiring attribution (e.g., [CC-BY](https://creativecommons.org/licenses/by/4.0/)). For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com). List of [HF license identifiers](https://huggingface.co/docs/hub/en/repositories-licenses) (for yaml). See more options for the above information by clicking "edit dataset card" on your repo.