Skip to content

Conversation

@evallesp
Copy link
Contributor

@evallesp evallesp commented Oct 30, 2025

We're moving crawl_n_mask to regexp based logic which is able
to mask any type of file (json, yaml, txt, log).

Regexps search for exact words for avoiding false positives (We don't
want to maske SecretName when searching for Secret keyword. This is
the reason of increase PROTECT_KEYS impacts positive in the performance.

New version also capable to mask two secrets in the same log line.

Avoiding masking Ansible headers such Task and Play.
Added multiprocessing support for parallel file masking.

Added file masking integration tests, with temporary files.

AI-Assisted: This change was developed with assistance from Claude
(Anthropic's AI assistant) for code refactoring, test development,
and PEP 8 compliance."

@evallesp evallesp requested a review from a team as a code owner October 30, 2025 10:30
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 30, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c91e6575b2cf45ada9e7e62c677560bc

openstack-k8s-operators-content-provider FAILURE in 14m 33s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ noop SUCCESS in 0s
cifmw-pod-ansible-test FAILURE in 5m 31s
cifmw-pod-pre-commit FAILURE in 6m 31s
✔️ build-push-container-cifmw-client SUCCESS in 17m 38s

@danpawlik
Copy link
Contributor

Wondering how fast it would be on some uni jobs.
Also don't think that the crawl_n_mask is a solution. The data should not be "leaked" into the logs. It means, that the issue is not that in some places we are dumping host_vars, but why we put those vars to be fact and why after use they are just not vanished?

@evallesp evallesp changed the title DNM: Move crawl_n_mask to regexp based logic Move crawl_n_mask to regexp based logic Nov 2, 2025
@evallesp evallesp force-pushed the unify-crawl-n-mask branch 3 times, most recently from 0e187d9 to 8943d08 Compare November 2, 2025 20:48
@evallesp evallesp force-pushed the unify-crawl-n-mask branch 2 times, most recently from 202fa1f to bad1362 Compare November 2, 2025 21:21
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/21e828d0be8f49d88841f346de47d0c4

openstack-k8s-operators-content-provider FAILURE in 12m 59s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 30s
✔️ noop SUCCESS in 0s
cifmw-pod-ansible-test FAILURE in 5m 47s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 45s
✔️ build-push-container-cifmw-client SUCCESS in 16m 02s
✔️ cifmw-molecule-artifacts SUCCESS in 4m 41s

@evallesp
Copy link
Contributor Author

evallesp commented Nov 3, 2025

recheck

@evallesp
Copy link
Contributor Author

evallesp commented Nov 5, 2025

recheck

1 similar comment
@evallesp
Copy link
Contributor Author

evallesp commented Nov 5, 2025

recheck

@evallesp
Copy link
Contributor Author

evallesp commented Nov 5, 2025

recheck

1 similar comment
@evallesp
Copy link
Contributor Author

evallesp commented Nov 6, 2025

recheck

michburk
michburk previously approved these changes Nov 6, 2025
Copy link
Contributor

@michburk michburk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we kept track of a list of the secret strings that actually matter so we could do a simple find and replace for each of those important strings. No matter the file format they appear in, we could mask just those strings and not worry about all of this complicated regex and rebuilding strings to try and preserve the file structure. I imagine this wouldn't be easy to do, though.

That being said, this version seems fast enough to me and masks more secrets than the current version, so I'm fine with it if others are as well!

@evallesp
Copy link
Contributor Author

evallesp commented Nov 7, 2025

It would be nice if we kept track of a list of the secret strings that actually matter so we could do a simple find and replace for each of those important strings. No matter the file format they appear in, we could mask just those strings and not worry about all of this complicated regex and rebuilding strings to try and preserve the file structure. I imagine this wouldn't be easy to do, though.

That being said, this version seems fast enough to me and masks more secrets than the current version, so I'm fine with it if others are as well!

Thanks for the review.
Now next step is refinement that list, and try to involve other teams to identify which we're masking that are not important and which we should mask asap.

About:

we could mask just those strings and not worry about all of this complicated regex and rebuilding strings to try and preserve the file structure

What do you mean?
Thanks for your review!

We're moving crawl_n_mask to regexp based logic which is able
to mask any type of file (json, yaml, txt, log).

Regexps search for exact words for avoiding false positives (We don't
want to maske SecretName when searching for Secret keyword. This is
the reason of increase PROTECT_KEYS impacts positive in the performance.

New version also capable to mask two secrets in the same log line.

Avoiding masking Ansible headers such Task and Play.
Added multiprocessing support for parallel file masking.

Added file masking integration tests, with temporary files.

AI-Assisted: This change was developed with assistance from Claude
(Anthropic's AI assistant) for code refactoring, test development,
and PEP 8 compliance."

Signed-off-by: Enrique Vallespi Gil <evallesp@redhat.com>
@danpawlik
Copy link
Contributor

IMHO we started to go too deeply. Would be interesting what requirement sec team says. 99% things are generated via CI job, find the missing 1% is difficult, but for sure issue is not that we should mask them in logs, but mask them before it goes to the logs - they never should be in the logs.
If required to mask all data, let's go.

@michburk
Copy link
Contributor

michburk commented Nov 7, 2025

About:

we could mask just those strings and not worry about all of this complicated regex and rebuilding strings to try and preserve the file structure

What do you mean? Thanks for your review!

I was mostly just thinking out loud saying it would be nice if we knew the strings for the secrets that actually mattered, so we could find and replace just those strings. A lot of the strings we end up masking are obviously unimportant, like 'password', 'null', or 'MTIzNDU2Nzg=' (which is just 12345678 base64-encoded). If we knew that the string 'importantsecret123' was important, we could do a find + replace for just 'importantsecret123' rather than trying to figure out which fields in which file formats might contain a key that has 'importantsecret123' as the value.

I think Daniel's idea is even better though - if we know some secret string is important, it should never make it to the logs in the first place :)

For now though this approach does seem to increase the number of actually important secrets masked, and the performance penalty doesn't seem too big IMO.

@danpawlik
Copy link
Contributor

let's say: it is fine.

@danpawlik danpawlik merged commit 83592de into openstack-k8s-operators:main Nov 13, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants