-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Remove dependency on rust_decimal, remove ignore of RUSTSEC-2026-0001
#19666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rust_decimal, remove ignore of RUSTSEC-2026-0001
| # underlying rkyv is patched, but rustsec database not yet updated | ||
| # Can remove when this is merged: https://github.com/rustsec/advisory-db/pull/2565 | ||
| run: cargo audit --ignore RUSTSEC-2026-0001 | ||
| # Note: you can ignore specific RUSTSEC issues using the `--ignore` flag ,for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added in #19657, but unecessary as this PR removes the dependency on rkyv
|
|
||
| let start = Instant::now(); | ||
| let rows = self.get_client().query(sql, &[]).await?; | ||
| let messages = self.get_client().simple_query(sql).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive by cleanup to improve getting types -- use a prepared statement rather than running and looking at the first row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now always makes two queries - one for the types and another for the messages.
Before it was making the second query only if there were no messages.
Wouldn't this increase the time to run the test suite ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right -- I will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with this more -- the reason there are two queries now is that the "simple_query" API returns only text (no types) but the sqllogictest harness needs to get type information
However, if we use the prepared statement API that returns types, we must also then have code to parse the postgres binary numeric representation (what I think we are using the rust_decimal crate for).
On the balance, I think planning the query twice, while it does have additional overhead, is better than a bunch of custom numeric parsing code. Given the small number of queries that are run in this mode I suspect we won't be able to even tell the difference.
I will add comments explaining this rationale
In case we want to avoid the second planning, here is the code that I wrote with codex (I am not 100% sure it correctly implements numeric parsing):
fn convert_rows(rows: &[Row]) -> Vec<Vec<String>> {
rows.iter()
.map(|row| {
row.columns()
.iter()
.enumerate()
.map(|(idx, column)| cell_to_string(row, column, idx))
.collect::<Vec<String>>()
})
.collect::<Vec<_>>()
}
fn cell_to_string(row: &Row, column: &Column, idx: usize) -> String {
// Decode binary values by Postgres type to keep normalization aligned with
// the DataFusion engine output.
match column.type_().clone() {
Type::CHAR => {
let value: Option<i8> = row.get(idx);
value
.map(|value| value.to_string())
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::INT2 => {
let value: Option<i16> = row.get(idx);
value
.map(|value| value.to_string())
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::INT4 => {
let value: Option<i32> = row.get(idx);
value
.map(|value| value.to_string())
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::INT8 => {
let value: Option<i64> = row.get(idx);
value
.map(|value| value.to_string())
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::NUMERIC => {
let value: Option<PgNumeric> = row.get(idx);
match value {
Some(PgNumeric(PgNumericValue::Number(value))) => decimal_to_str(value),
Some(PgNumeric(PgNumericValue::NaN)) => "NaN".to_string(),
None => NULL_STR.to_string(),
}
}
Type::DATE => {
let value: Option<NaiveDate> = row.get(idx);
value
.map(|value| value.to_string())
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::TIME => {
let value: Option<NaiveTime> = row.get(idx);
value
.map(|value| value.to_string())
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::TIMESTAMP => {
let value: Option<NaiveDateTime> = row.get(idx);
value
.map(|value| format!("{value:?}"))
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::BOOL => {
let value: Option<bool> = row.get(idx);
value.map(bool_to_str).unwrap_or_else(|| NULL_STR.to_string())
}
Type::BPCHAR | Type::VARCHAR | Type::TEXT => {
let value: Option<&str> = row.get(idx);
value
.map(varchar_to_str)
.unwrap_or_else(|| NULL_STR.to_string())
}
Type::FLOAT4 => {
let value: Option<f32> = row.get(idx);
value.map(f32_to_str).unwrap_or_else(|| NULL_STR.to_string())
}
Type::FLOAT8 => {
let value: Option<f64> = row.get(idx);
value.map(f64_to_str).unwrap_or_else(|| NULL_STR.to_string())
}
Type::REGTYPE => {
let value: Option<u32> = row.get(idx);
value
.and_then(Type::from_oid)
.map(|value| value.to_string())
.unwrap_or_else(|| NULL_STR.to_string())
}
_ => unimplemented!("Unsupported type: {}", column.type_().name()),
}
}
/// Minimal `NUMERIC` decoder for Postgres' binary format.
///
/// We use it in sqllogictest to avoid pulling in `rust_decimal` while still
/// converting `NUMERIC` values into `BigDecimal` for normalized output.
///
/// Verification:
/// - The binary layout and field meanings come from the Postgres protocol docs:
/// https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-NUMERIC
/// - You can sanity-check correctness by comparing decoded values against the
/// server's text format, e.g. run `SELECT value::numeric, value::text` in
/// Postgres and ensure this parser produces the same normalized string.
enum PgNumericValue {
Number(BigDecimal),
NaN,
}
/// Wrapper type that implements `FromSql` for Postgres `NUMERIC` values.
///
/// This is intentionally scoped to the sqllogictest engine so we can parse
/// binary `NUMERIC` results in a single query without extra dependencies.
struct PgNumeric(PgNumericValue);
impl<'a> postgres_types::FromSql<'a> for PgNumeric {
fn from_sql(
ty: &Type,
raw: &'a [u8],
) -> Result<Self, Box<dyn std::error::Error + Sync + Send>> {
if *ty != Type::NUMERIC {
return Err("invalid type".into());
}
let mut buf = raw;
let ndigits = read_i16(&mut buf)? as i64;
let weight = read_i16(&mut buf)? as i64;
let sign = read_i16(&mut buf)?;
let _dscale = read_i16(&mut buf)?;
const NUMERIC_POS: i16 = 0x0000;
const NUMERIC_NEG: i16 = 0x4000;
const NUMERIC_NAN: i16 = 0xC000u16 as i16;
if sign == NUMERIC_NAN {
return Ok(PgNumeric(PgNumericValue::NaN));
}
let mut bigint = BigInt::from(0);
for _ in 0..ndigits {
let digit = read_i16(&mut buf)? as i64;
bigint = bigint * 10000 + digit;
}
if sign == NUMERIC_NEG {
bigint = -bigint;
} else if sign != NUMERIC_POS {
return Err(format!("invalid numeric sign {sign}").into());
}
let scale = if ndigits == 0 {
0
} else {
(ndigits - (weight + 1)) * 4
};
let value = BigDecimal::from_bigint(bigint, scale);
Ok(PgNumeric(PgNumericValue::Number(value)))
}
fn accepts(ty: &Type) -> bool {
*ty == Type::NUMERIC
}
}
fn read_i16(buf: &mut &[u8]) -> Result<i16, Box<dyn std::error::Error + Sync + Send>> {
if buf.len() < 2 {
return Err("invalid buffer size".into());
}
let val = i16::from_be_bytes([buf[0], buf[1]]);
*buf = &buf[2..];
Ok(val)
}
fn convert_types(types: Vec<Type>) -> Vec<DFColumnType> {
types
.into_iter()
.map(|t| match t {
Type::BOOL => DFColumnType::Boolean,
Type::INT2 | Type::INT4 | Type::INT8 => DFColumnType::Integer,
Type::BPCHAR | Type::VARCHAR | Type::TEXT => DFColumnType::Text,
Type::FLOAT4 | Type::FLOAT8 | Type::NUMERIC => DFColumnType::Float,
Type::DATE | Type::TIME => DFColumnType::DateTime,
Type::TIMESTAMP => DFColumnType::Timestamp,
_ => DFColumnType::Another,
})
.collect()
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we won't be able to even tell the difference.
👍🏻
If the execution time of the .slt tests is similar as before then there is no need to optimize further.
| .map(|d| format!("{d:?}")) | ||
| .unwrap_or_else(|| "NULL".to_string()) | ||
| fn cell_to_string(row: &SimpleQueryRow, column_type: &Type, idx: usize) -> String { | ||
| // simple_query returns text values, so we parse by Postgres type to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left in unwrap() so we fail fast in case there is some problem converting
| "syn 2.0.113", | ||
| ] | ||
|
|
||
| [[package]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust_decimal has many dependencies it turns out, so removing it results in quite a few less dependencies in datafusion
mhilton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Follow on to #19666 ## Rationale for this change While working on #19666 I noticed there were `149` dependencies behind the latest ```shell andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion2$ cargo update -p arrow Updating crates.io index Locking 0 packages to latest compatible versions note: pass `--verbose` to see 149 unchanged dependencies behind latest ``` ## What changes are included in this PR? Run `cargo update` and check in the result ## Are these changes tested? By CI ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb
xudong963
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also encoutered RUSTSEC issue when working on the PR #19661.
Thanks for the fix, do you mind cherry-picking the PR to branch-52 after it's merged? Then I'll update the PR19661
|
Thank you again @martin-g for your insightful review |
…001` (apache#19666) - Part of apache#19656 rust_decimal is a one person crate and is released somewhat infrequently * https://github.com/paupino/rust-decimal * https://crates.io/crates/rust_decimal It also uses a non trivial number of dependencies, including https://crates.io/crates/rkyv, some sort of zero copy deserialization framework that was recently subject to a RUSTSEC security advisory, see apache#19656 / apache#19657 Since `rust_decimal` is only used for sqllogictests to parse the results from postgres, we can pretty easily remove the dependency on `rust_decimal` and inline the very small amount functionality we need for sqllogictests This will both decrease the build time and our dependency trail. Removes the `rust_decimal` dependency from DataFusion and inlines the smallest required subset of decimal functionality we need for sqllogictests (which turns out to be pretty small) Yes by CI No, this is all internal testing infrastructure
…STSEC-2026-0001` (#19666) (#19686) ## Which issue does this PR close? - part of #18566 ## Rationale for this change Let's resolve the audit workflow on branch-52 by removing our dependency on the `rust_decimal` library, per @xudong963 's request here #19666 (review) ## What changes are included in this PR? - Backport #19666 to branch-52 ## Are these changes tested? By CI ## Are there any user-facing changes? No, this is an internal dev tool change
Which issue does this PR close?
Rationale for this change
rust_decimal is a one person crate and is released somewhat infrequently
It also uses a non trivial number of dependencies, including https://crates.io/crates/rkyv, some sort of zero copy deserialization framework that was recently subject to a RUSTSEC security advisory, see #19656 / #19657
Since
rust_decimalis only used for sqllogictests to parse the results from postgres, we can pretty easily remove the dependency onrust_decimaland inline the very small amount functionality we need for sqllogictestsThis will both decrease the build time and our dependency trail.
What changes are included in this PR?
Removes the
rust_decimaldependency from DataFusionand inlines the smallest required subset of decimal functionality we need for
sqllogictests (which turns out to be pretty small)
Are these changes tested?
Yes by CI
Are there any user-facing changes?
No, this is all internal testing infrastructure