Rust: Add generated models for standard libraries including core#18787
Rust: Add generated models for standard libraries including core#18787paldepind merged 6 commits intogithub:mainfrom
Conversation
7bce170 to
bfb716b
Compare
bfb716b to
0c3e8a0
Compare
There was a problem hiding this comment.
PR Overview
This PR adds generated models for standard Rust libraries (core, std, alloc, and proc_macro) and updates the associated tests. The key changes are:
- Updating a test annotation in dataflow/strings/main.rs from "hasTaintFlow" to "hasValueFlow".
- Removing several taint and value model entries in lang-core.model.yml to better scale with the new models.
Changes
| File | Description |
|---|---|
| rust/ql/test/library-tests/dataflow/strings/main.rs | Updated test comment annotation to reflect new model expectations. |
| rust/ql/lib/codeql/rust/frameworks/stdlib/lang-core.model.yml | Removed several model entries to adjust for increased model volume. |
Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
rust/ql/test/library-tests/dataflow/strings/main.rs:53
- Please ensure that changing the annotation from 'hasTaintFlow' to 'hasValueFlow' aligns with the updated test expectations and model semantics.
sink(s2); // $ hasValueFlow=36
rust/ql/lib/codeql/rust/frameworks/stdlib/lang-core.model.yml:7
- Review the removal of the model entry for 'crate::hint::must_use' to ensure that tests or taint propagation flows are still adequately covered.
- - ["lang:core", "crate::hint::must_use", "Argument[0]", "ReturnValue", "value", "manual"]
Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more
| input = "Argument[self]" and | ||
| output = "ReturnValue" and | ||
| preservesValue = true and | ||
| model = "generated" |
There was a problem hiding this comment.
Previously this had a model of "" and it seemed to be disabled/overwritten by the generated models. The generated models include a model for clone on i64, which caused the test for this method to fail. Changing the model to generated or manual fixed the problem. I just went with generated without worrying too much as this is temporary anyway.
| not exists(n.asExpr().getLocation()) | ||
| } | ||
|
|
||
| predicate postWithInFlowExclude(RustDataFlow::Node n) { n instanceof Node::FlowSummaryNode } |
There was a problem hiding this comment.
This fixes some data flow inconsistencies otherwise introduced by the new models. Ruby and C# have the same, so I think this is appropriate.
| | Macro calls - total | 2 | | ||
| | Macro calls - unresolved | 0 | | ||
| | Taint edges - number of edges | 4 | | ||
| | Taint edges - number of edges | 1465 | |
There was a problem hiding this comment.
A 366x increase in taint edges 📈 😃
There was a problem hiding this comment.
Sounds great, though I wonder what they all are. I'm assuming hello-project is pretty basic.
|
DCA shows taint reach going down by 1 on the iced project. That's unexpected, but in the tests things look good, so I don't thing there's much to worry about. |
Those tests have been starting to irritate me even before we started adding generated models. Thanks for cleaning them up. 👍
This is very minor, but surprising - surprising enough it might be worth investigating. If you download the database from DCA you could try and narrow down taint edges we have before the changes here but not afterwards??? |
geoffw0
left a comment
There was a problem hiding this comment.
Looks really good, a few points to discuss, and I should really review a few more of the models (at random)...
| | Macro calls - total | 2 | | ||
| | Macro calls - unresolved | 0 | | ||
| | Taint edges - number of edges | 4 | | ||
| | Taint edges - number of edges | 1465 | |
There was a problem hiding this comment.
Sounds great, though I wonder what they all are. I'm assuming hello-project is pretty basic.
| - ["lang:core", "<crate::result::Result>::unwrap_or", "Argument[self].Field[crate::result::Result::Ok(0)]", "ReturnValue", "value", "dfc-generated"] | ||
| - ["lang:core", "<crate::result::Result>::unwrap_or_default", "Argument[self].Field[crate::result::Result::Ok(0)]", "ReturnValue", "value", "dfc-generated"] | ||
| - ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[0].ReturnValue", "ReturnValue", "value", "dfc-generated"] | ||
| - ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[self].Field[crate::result::Result::Err(0)].Reference", "ReturnValue", "value", "dfc-generated"] |
There was a problem hiding this comment.
I don't see why this is true (the described method is here). Though (assuming I'm right) I doubt the model will do much harm anyway.
There was a problem hiding this comment.
Nicely spotted! That model is indeed odd. Both because the error value is not directly returned and because there are no references involved. The latter might be due to some mistakenly inserted reference read step.
In any case, the implementation is very simple, so I would expect the model to be accurate. I've created an internal issue for me to fix this.
| - ["lang:core", "<crate::result::Result>::unwrap_or_default", "Argument[self].Field[crate::result::Result::Ok(0)]", "ReturnValue", "value", "dfc-generated"] | ||
| - ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[0].ReturnValue", "ReturnValue", "value", "dfc-generated"] | ||
| - ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[self].Field[crate::result::Result::Err(0)].Reference", "ReturnValue", "value", "dfc-generated"] | ||
| - ["lang:core", "<crate::result::Result>::unwrap_or_else", "Argument[self].Field[crate::result::Result::Err(0)]", "Argument[0].Parameter[0]", "value", "dfc-generated"] |
There was a problem hiding this comment.
On the other hand this model is perfect and I missed it in the manual models. ✨
Spotting mistakes like the one in |
geoffw0
left a comment
There was a problem hiding this comment.
Yep, I agree we should merge this ASAP, but continue discussions about possible follow-up improvements.
I think I just created some merge conflicts by merging #18701 ; let me know if you need any help untangling what happened there (I expect mostly it's those .expected files that change too often).
| pack: codeql/rust-all | ||
| extensible: summaryModel | ||
| data: | ||
| - ["lang:std", "<&[u8] as crate::io::BufRead>::consume", "Argument[self].Element", "Argument[self].Reference.Reference", "value", "dfc-generated"] |
There was a problem hiding this comment.
This is another weird edge involving references. Based on the description I don't think consume should have any taint flows.
This adds generated models for some of the standard Rust libraries,
core,std,alloc, andproc_macro.We had some test that created
.expectedoutput growing with the number of models or taint steps caused by models. That didn't scale well to the new amount of models, so I've tweaked those tests.