-
Notifications
You must be signed in to change notification settings - Fork 144
Move crawl_n_mask to regexp based logic #3447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move crawl_n_mask to regexp based logic #3447
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c91e6575b2cf45ada9e7e62c677560bc ❌ openstack-k8s-operators-content-provider FAILURE in 14m 33s |
|
Wondering how fast it would be on some uni jobs. |
c19610d to
a87b080
Compare
a87b080 to
e1ee687
Compare
e1ee687 to
e8791f5
Compare
0e187d9 to
8943d08
Compare
8943d08 to
ba6f9b6
Compare
202fa1f to
bad1362
Compare
bad1362 to
2bb565a
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/21e828d0be8f49d88841f346de47d0c4 ❌ openstack-k8s-operators-content-provider FAILURE in 12m 59s |
2bb565a to
eb06caf
Compare
eb06caf to
eba6c94
Compare
|
recheck |
d0d2d6e to
7e93be6
Compare
7e93be6 to
8acc307
Compare
|
recheck |
1 similar comment
|
recheck |
|
recheck |
1 similar comment
|
recheck |
8acc307 to
c446d8f
Compare
michburk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we kept track of a list of the secret strings that actually matter so we could do a simple find and replace for each of those important strings. No matter the file format they appear in, we could mask just those strings and not worry about all of this complicated regex and rebuilding strings to try and preserve the file structure. I imagine this wouldn't be easy to do, though.
That being said, this version seems fast enough to me and masks more secrets than the current version, so I'm fine with it if others are as well!
Thanks for the review. About:
What do you mean? |
c446d8f to
849aa18
Compare
849aa18 to
d7ee813
Compare
We're moving crawl_n_mask to regexp based logic which is able to mask any type of file (json, yaml, txt, log). Regexps search for exact words for avoiding false positives (We don't want to maske SecretName when searching for Secret keyword. This is the reason of increase PROTECT_KEYS impacts positive in the performance. New version also capable to mask two secrets in the same log line. Avoiding masking Ansible headers such Task and Play. Added multiprocessing support for parallel file masking. Added file masking integration tests, with temporary files. AI-Assisted: This change was developed with assistance from Claude (Anthropic's AI assistant) for code refactoring, test development, and PEP 8 compliance." Signed-off-by: Enrique Vallespi Gil <evallesp@redhat.com>
d7ee813 to
b904f90
Compare
|
IMHO we started to go too deeply. Would be interesting what requirement sec team says. 99% things are generated via CI job, find the missing 1% is difficult, but for sure issue is not that we should mask them in logs, but mask them before it goes to the logs - they never should be in the logs. |
I was mostly just thinking out loud saying it would be nice if we knew the strings for the secrets that actually mattered, so we could find and replace just those strings. A lot of the strings we end up masking are obviously unimportant, like 'password', 'null', or 'MTIzNDU2Nzg=' (which is just 12345678 base64-encoded). If we knew that the string 'importantsecret123' was important, we could do a find + replace for just 'importantsecret123' rather than trying to figure out which fields in which file formats might contain a key that has 'importantsecret123' as the value. I think Daniel's idea is even better though - if we know some secret string is important, it should never make it to the logs in the first place :) For now though this approach does seem to increase the number of actually important secrets masked, and the performance penalty doesn't seem too big IMO. |
|
let's say: it is fine. |
We're moving crawl_n_mask to regexp based logic which is able
to mask any type of file (json, yaml, txt, log).
Regexps search for exact words for avoiding false positives (We don't
want to maske SecretName when searching for Secret keyword. This is
the reason of increase PROTECT_KEYS impacts positive in the performance.
New version also capable to mask two secrets in the same log line.
Avoiding masking Ansible headers such Task and Play.
Added multiprocessing support for parallel file masking.
Added file masking integration tests, with temporary files.
AI-Assisted: This change was developed with assistance from Claude
(Anthropic's AI assistant) for code refactoring, test development,
and PEP 8 compliance."