Skip to content

Privacy Statement#476

Draft
thehabes wants to merge 6 commits intomainfrom
474-privacy-statement
Draft

Privacy Statement#476
thehabes wants to merge 6 commits intomainfrom
474-privacy-statement

Conversation

@thehabes
Copy link
Member

@thehabes thehabes commented Feb 16, 2026

Closes #474

This prompted an audit to find places where TPEN Services and TPEN Interfaces may expose an E-mail address. Any endpoint where E-mail strings end up as part of a response body, and anywhere Interfaces may know the E-mail enough to where the code could expose it.

This audit was performed by Claude Code, and only performed once just as a general "let's see what we find". For an exhaustive list more effort is required.

Interfaces main branch audit
image

Services development branch audit
image

@github-actions
Copy link
Contributor

github-actions bot commented Feb 16, 2026

@thehabes thehabes self-assigned this Feb 16, 2026
privacy.html Outdated
<li><strong>api.t-pen.org</strong> &mdash; TPEN3 services API
(<a href="https://api.t-pen.org/API.html" target="_blank"
rel="noopener noreferrer">documentation</a>)</li>
<li><strong>static.t-pen.org</strong> &mdash; Published project manifests</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Published Project Resources" is probably more clear, since it may not just be Manifests and is always a thing we create for a specific project.

privacy.html Outdated
<p>
Our services are available at:
</p>
<ul>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a table layout would be tidier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this whole thing be a markdown file on three.t-pen.org? It doesn't feel like an interface.

privacy.html Outdated
<h3>A. Account &amp; Authentication Data</h3>
<p>
When you create an account, we collect your <strong>email address</strong> through our authentication
provider, Auth0. Authentication generates a JSON Web Token (JWT) that contains your user ID, an agent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth rephrasing this as Auth0 is the identity provider for Rerum services, which this is one of. The public agent IRI is created and associated with the login token. T-PEN.org does not display or share user emails through its services.

privacy.html Outdated
You may optionally provide the following for your public profile. None of these are required:
</p>
<ul>
<li>Display name</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Display name is generated from your email handle by default so if it is not changed it might be usable for some identifying mark.

privacy.html Outdated
identifier (IRI), and an expiration timestamp. No passwords are stored by TPEN directly.
</p>

<h3>B. Profile Information (Optional)</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than "optional" this is all opt-in sharing with other TPEN Users.

<h3>C. User-Generated Content</h3>
<ul>
<li><strong>Transcriptions and annotations</strong> &mdash; Stored in RERUM (our linked open data
store) and publicly accessible by design for open scholarship</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth saying that they URLs are obscure but openly available?

store) and publicly accessible by design for open scholarship</li>
<li><strong>Feedback and bug reports</strong> &mdash; Submitted through TPEN and posted as GitHub Issues
(includes your description and the page URL you submitted from)</li>
<li><strong>Transcription drafts</strong> &mdash; Auto-saved in your browser's local storage to prevent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the downloaded resources and idToken, if this is the place for that.

privacy.html Outdated

<h3>D. Project &amp; Collaboration Data</h3>
<ul>
<li>Project membership and your assigned role (Owner, Leader, Contributor, or Viewer)</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(linked to your public Agent)

privacy.html Outdated
<h3>D. Project &amp; Collaboration Data</h3>
<ul>
<li>Project membership and your assigned role (Owner, Leader, Contributor, or Viewer)</li>
<li>Email addresses of users you invite to collaborate on projects</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we collect this or just send the emails? I thought we had an "invite code" to reconnect these later?

privacy.html Outdated
<ul>
<li>Project membership and your assigned role (Owner, Leader, Contributor, or Viewer)</li>
<li>Email addresses of users you invite to collaborate on projects</li>
<li>Project modification timestamps</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a data collection thing.

privacy.html Outdated

<h3>E. Activity Data</h3>
<p>
We track limited activity metrics on the server to power features like the "continue working" panel on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"per User metrics" is maybe more specific?

privacy.html Outdated
<td>Legitimate interest (user experience)</td>
</tr>
<tr>
<td>Auto-save transcription drafts</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't saved on the server or even sent through the Internet, so I don't think we need to claim them.

privacy.html Outdated
<td>Legitimate interest (prevent data loss)</td>
</tr>
<tr>
<td>Process feedback and bug reports</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a GitHub submission, as we state above, so I don't think this is User Information.

privacy.html Outdated
<h3>A. Publicly Accessible Information</h3>
<ul>
<li><strong>Transcriptions and annotations</strong> are publicly accessible via
<a href="https://store.rerum.io" target="_blank" rel="noopener noreferrer">RERUM</a> and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would link rerum.io, as a user-facing site rather than store.rerum.io, which would be confusing.

privacy.html Outdated
<h3>B. Shared with Project Collaborators</h3>
<p>
Members of projects you belong to can see your display name and role within that project. Email
addresses are used for sending invitations but are not displayed to other collaborators through the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is available to the interfaces at all, which maybe is worth noting.

privacy.html Outdated
When you submit feedback or a bug report through TPEN, it creates a public GitHub Issue in our
<a href="https://github.com/CenterForDigitalHumanities/TPEN-Static" target="_blank"
rel="noopener noreferrer">TPEN-Static repository</a>. The issue includes your description text and the page URL from which you submitted.
Your email address is not included in the GitHub Issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No user information at all is included in the reports. Only what they type and where they came from.

privacy.html Outdated
<strong>RERUM</strong> (store.rerum.io) is our linked open data store for transcriptions and
annotations. RERUM is operated by the Research Computing Group at Saint Louis University. Transcriptions
stored in RERUM are publicly accessible by design to support open scholarship and are attributed to user
agent identifiers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"public user agent"

</p>

<h3>C. Email Delivery</h3>
<p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. We use a mailrelay server, so SLU only sees the address because it isn't encrypted (as far as I know). There is an option for secure mail which we might be using. Point is, I don't think this needs to be disclosed other than to say it is only used for sending invites and is not stored for any purpose.

privacy.html Outdated
rel="noopener noreferrer">jsDelivr Privacy Policy</a></td>
</tr>
<tr>
<td>unpkg.com</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have Chota in the TPEN stack somewhere? I bet we can take that out.

privacy.html Outdated
rel="noopener noreferrer">Pixabay Privacy Policy</a></td>
</tr>
<tr>
<td>OpenCV (docs.opencv.org)</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think libraries used inside of interfaces can be handwaved to say that TPEN3 is built on modular interfaces that we do not completely control and may call on libraries and frameworks served over CDNs.

privacy.html Outdated
rel="noopener noreferrer">OpenCV Privacy Policy</a></td>
</tr>
<tr>
<td>Lucid (corporate-assets.lucid.co)</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to find this one...

privacy.html Outdated
</tbody>
</table>

<h3>F. External IIIF Servers</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get rid of IIIF language here. It is the case for all Internet Resources, specifically images and manifest documents.

privacy.html Outdated

<h3>C. What Happens When You Log Out</h3>
<p>
All localStorage data is cleared when you log out, including your authentication token, cached resources,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS this true? I think we just get rid of the token.

privacy.html Outdated
and transcription drafts.
</p>

<h3>D. What We Do Not Use</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

privacy.html Outdated
<ul>
<li><strong>Encrypted transmission</strong> &mdash; All data is transmitted over HTTPS (TLS)</li>
<li><strong>Token-based authentication</strong> &mdash; JWT tokens with automatic expiration</li>
<li><strong>Secure cookie attributes</strong> &mdash; <code>Secure</code> and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so minimal I would not keep restating it.

privacy.html Outdated
<li><strong>Open-source codebase</strong> &mdash; Publicly auditable code allows community
security review</li>
<li><strong>Token URL cleanup</strong> &mdash; Authentication tokens are immediately removed from
the browser URL after login to prevent accidental sharing</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"browser history"

privacy.html Outdated
on shared computers.
</p>
<p>
In the event of a data breach affecting your personal information, we will notify affected users and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't make this promise, since we do not have a good way to notify people.

privacy.html Outdated

<h3>C. Local Storage</h3>
<p>
All browser-stored data (authentication tokens, drafts, cached resources) is cleared immediately when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, maybe not, but it could be

rights under the General Data Protection Regulation:
</p>
<ul>
<li><strong>Right of Access</strong> (Article 15) &mdash; Request a copy of the personal data we hold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ticket this as a simple report we can just run on someone so they can click on their own.

<li><strong>Right to Erasure</strong> (Article 17) &mdash; Request deletion of your personal data,
subject to the scholarly archiving exception in Article 17(3)(d) for transcriptions stored in
RERUM</li>
<li><strong>Right to Data Portability</strong> (Article 20) &mdash; Request your personal data in a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

privacy.html Outdated
the sources, purposes, and third parties with whom we share it</li>
<li><strong>Right to Delete</strong> &mdash; Request deletion of your personal information, subject to
exceptions for scholarly research integrity</li>
<li><strong>Right to Correct</strong> &mdash; Request correction of inaccurate personal information</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just say that all public information is editable by the user.

privacy.html Outdated
<ul>
<li><a href="https://three.t-pen.org" target="_blank" rel="noopener noreferrer">TPEN3 Home</a></li>
<li><a href="/about">About TPEN</a></li>
<li><a href="https://github.com/CenterForDigitalHumanities/TPEN-interfaces" target="_blank"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just the organization page here?

privacy.html Outdated
<h3>D. Additional Resources</h3>
<ul>
<li><a href="https://three.t-pen.org" target="_blank" rel="noopener noreferrer">TPEN3 Home</a></li>
<li><a href="/about">About TPEN</a></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete.

@thehabes
Copy link
Member Author

Thanks for the thorough review! All comments have been addressed. Here's the breakdown:


Section 1 — Who We Are

  • "Published Project Resources" — Changed from "Published project manifests." Updated matching references in Sections 4A and 5D too.
  • Table layout — Converted the services bullet list to a Domain / Purpose table.

Section 2 — Information We Collect

  • Auth0 / RERUM identity provider — Rewrote using your suggested framing: Auth0 is the identity provider for RERUM services, public agent IRI is created and associated with the login token, T-PEN.org does not display or share user emails through its services.
  • "Opt-in" not "Optional" — Heading now reads "Profile Information (Opt-In)." Body says "opt in to sharing … with other TPEN users."
  • Display name from email handle — Added note: "defaults to your email handle and may be identifying if not changed."
  • Downloaded resources & idToken — Added a "Cached resources" bullet covering downloaded resources (images, manifest documents) and the idToken in localStorage.
  • "(linked to your public Agent)" — Added to the project membership bullet.
  • Invite emails — collect or just send? — Confirmed the invited email IS stored in MongoDB as a temporary user record. Rephrased to: "stored to connect invite codes to temporary accounts until the invitee accepts."
  • Project modification timestamps — Removed. Not a data collection item.
  • "per User metrics" — Changed from "limited activity metrics."

Section 3 — How We Use Your Information

  • Auto-save drafts row — Removed from table. Drafts never leave the browser; already disclosed in Section 6B (Local Storage).
  • Feedback / bug reports row — Removed from table. This is a GitHub submission; already covered in Section 4C.

Section 4 — Information Sharing

  • Link rerum.io not store.rerum.io — Done.
  • URLs obscure but openly available — Added: "These URLs are obscure but openly available to anyone with the direct link."
  • Email not available to interfaces — Now reads: "Email addresses are not available to the TPEN interfaces at all and are only used server-side for sending invitations."
  • No user info in feedback reports — Simplified to: "No user information is included in the report — only what you type and the page URL you submitted from."
  • "public user agent" — Replaced "user agent identifiers" / "user agent IRI" throughout (Sections 3, 4A, 5B, 8B).

Section 5 — Third-Party Services

  • Email delivery → mailrelay — Now says emails go through a "mailrelay server at Saint Louis University" and the address "is only used for sending the invitation and is not stored by the mail service for any other purpose."
  • CDN table → general statement — Replaced the 11-row table with: "TPEN3 is built on modular interfaces that may load libraries and frameworks served over third-party CDNs (such as jsDelivr, Cloudflare, Skypack, and unpkg)." Google Fonts is still named with a privacy policy link.
  • Chota CSS — Confirmed still in use (css/index.css imports from unpkg). Covered by the general statement.
  • Lucid diagram — Confirmed still in use (quick-guide.html). Covered by the general statement.
  • Drop IIIF language — Section renamed to "External Resource Servers." Rephrased to "manifest documents" and "images." Cleaned up remaining IIIF references in Sections 2C and 2F.

Section 6 — Cookies & Local Storage

  • Logout only removes the token — Confirmed: TPEN.logout() only calls localStorage.removeItem("userToken"). Fixed the false claim in Section 6C and 8C. Corrected Duration columns in the localStorage table for drafts, vault cache, and tpen_redirected.
  • "What We Do Not Use" subsection — Deleted entirely.

Section 7 — Security

  • Secure cookie attributes bullet — Removed. Already stated in the cookie table.
  • "browser history" — Changed from "browser URL."
  • Breach notification — Softened to: "we will make reasonable efforts to notify affected users and relevant authorities in accordance with GDPR and applicable US state regulations."

Section 8 — Data Retention

  • Local storage retention — Corrected to: "Your authentication token is removed from browser storage when you log out. Cached resources and transcription drafts may persist until cleared manually or by your browser."

Sections 9–10 — GDPR / CCPA Rights

  • Self-service data report — Noted for a future ticket. Leaving language as-is for now.
  • Right to Correct — Simplified to: "All public information is editable by you through your profile settings."

Section 13 — Changes & Contact

  • "About TPEN" link — Removed.
  • GitHub link → org page — Now links to github.com/CenterForDigitalHumanities instead of the specific repo.

Meta

  • Move to three.t-pen.org? — Agreed this isn't really an interface. Keeping it here to finish the review cycle; will migrate afterward.

@thehabes
Copy link
Member Author

thehabes commented Feb 18, 2026

The claim "Email addresses are not available to the TPEN interfaces at all" is incorrect. The interfaces can and do access emails in these cases:

  1. Your own email — shown to you on your profile page
  2. Collaborator emails as fallback names — shown on the manage page when a member has no display name set
  3. Invitee's own email — shown on the decline-invitation page
  4. Legacy import emails — shown to the owner during TPEN 2.8 import

Would you like me to correct lines 215 and 340 in privacy.html to reflect this? The manage page fallback (#2) is the most notable — it means other project members can potentially see your email.

It does seem like Classes may be able to get to that information, in which case it could be shown by an interface. I am not sure where the API will respond with those E-mail addresses, but essentially if TPEN Services returns it as part of a response an interface can pick it up.

@cubap
Copy link
Member

cubap commented Feb 18, 2026

Okay.

Chota CSS — Confirmed still in use (css/index.css imports from unpkg). Covered by the general statement.
Lucid diagram — Confirmed still in use (quick-guide.html). Covered by the general statement.

These are on their way out, so we can remove them from the specific call-outs.

RE:Emails — I think we should outline how we use these more clearly. TPEN doesn't store them (except invites) but they are encoded in the Rerum User payload from Auth0 so any application you are authenticated with will be able to see it. When you invite another user, they see the email and not just the user from which the invite was sent (maybe we don't need to do that). I think we definitly don't need to show the email as a fallback if there is no displayName, as we already say the handle could be used. We should just show the pre-@ bit in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Privacy Statement & Legal Requirements

2 participants

Comments