Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (10)
💤 Files with no reviewable changes (2)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughReplaces boolean corrupt-field handling with a sealed CorruptFieldsPolicy (Disabled, Binary, Hex), threads the policy through parameters, schema, iterators, and record extraction, adds option to emit corrupt-field raw values as uppercase hex, and introduces StringUtils.convertArrayToHex plus related test updates. Changes
Sequence DiagramsequenceDiagram
participant User
participant Parser as CobolParametersParser
participant ReaderParams as ReaderParameters
participant Schema as CobolSchema
participant Iterator as NestedRowIterator
participant Extractor as RecordExtractors
participant Utils as StringUtils
User->>Parser: set options (generate_corrupt_fields, binary_as_hex)
Parser->>Parser: derive corruptFieldsPolicy (Disabled|Binary|Hex)
Parser->>ReaderParams: pass corruptFieldsPolicy
ReaderParams->>Schema: fromReaderParameters(corruptFieldsPolicy)
Schema->>Iterator: iterator created (policy available)
Iterator->>Iterator: derive booleans: generateCorruptFields, generateCorruptFieldsAsHex
Iterator->>Extractor: extractRecord(..., generateCorruptFields, generateCorruptFieldsAsHex)
alt generateCorruptFields && generateCorruptFieldsAsHex
Extractor->>Utils: convertArrayToHex(rawBytes)
Utils-->>Extractor: hexString
Extractor-->>Iterator: record with corrupt_fields raw_value as String
else generateCorruptFields && !generateCorruptFieldsAsHex
Extractor-->>Iterator: record with corrupt_fields raw_value as Binary
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
JaCoCo code coverage report - 'cobol-parser'
|
JaCoCo code coverage report - 'spark-cobol'
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`:
- Around line 523-527: The hex conversion for corruptFields in
RecordExtractors.scala uses "%02X" format on signed bytes
(CorruptField.rawValue: Array[Byte]) which can produce sign-extended hex like
FFFFFFD3; fix by masking each byte with 0xFF before formatting so each element
is treated as an unsigned 0-255 value. Update the branch that builds the hex
string (the handler.create call that maps
corruptFields(i).rawValue.map(...).mkString) to map each byte via (b & 0xFF)
prior to formatting, ensuring two-digit hex output for values >= 0x80.
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala`:
- Line 42: The docstring for parameter corruptSchemaPolicy in CobolSchema.scala
contains a typo: change the string '_corrput_fileds' to the correct
'_corrupt_fields' in the comment for corruptSchemaPolicy (locate the doc block
that documents `@param` corruptSchemaPolicy in the CobolSchema.scala file and
replace the misspelled token).
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`:
- Line 47: Update the Scaladoc for the parameter corruptFieldsPolicy in
CobolSchema to fix the typo and use the correct field name: replace the
misspelled "_corrput_fileds" (and any occurrences of "_corrupted_fields") with
"_corrupt_fields" so the documentation matches the intended Spark convention
(e.g., "_corrupt_record"); ensure the description still mentions that the field
will be generated when the policy is set.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/iterator/FixedLenNestedRowIterator.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/iterator/VarLenNestedIterator.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CorruptFieldsPolicy.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scalaspark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/builder/SparkCobolOptionsBuilder.scalaspark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scalaspark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/CobolSchemaSpec.scalaspark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/base/impl/DummyCobolSchema.scalaspark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test41CorruptFieldsSpec.scala
...arser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala
Show resolved
Hide resolved
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala
Outdated
Show resolved
Hide resolved
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
♻️ Duplicate comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala (1)
47-47: Fix Scaladoc field name typo. The doc still says'_corrput_fields'; it should be'_corrupt_fields'to match convention.Based on learnings: Rename the field from _corrupted_fields to _corrupt_fields to align with Apache Spark's naming convention (e.g., _corrupt_record).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala` at line 47, Scaladoc and code use a misspelled/incorrect field name: update the Scaladoc param for corruptFieldsPolicy to use '_corrupt_fields' (not '_corrput_fields'), and rename any code symbols from _corrupted_fields to _corrupt_fields to match Spark convention; search for usages in CobolSchema (and related references) and refactor the symbol name consistently (Scaladoc tag, field/column constant, and any tests or consumers) so all references use '_corrupt_fields'.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala`:
- Line 47: Scaladoc and code use a misspelled/incorrect field name: update the
Scaladoc param for corruptFieldsPolicy to use '_corrupt_fields' (not
'_corrput_fields'), and rename any code symbols from _corrupted_fields to
_corrupt_fields to match Spark convention; search for usages in CobolSchema (and
related references) and refactor the symbol name consistently (Scaladoc tag,
field/column constant, and any tests or consumers) so all references use
'_corrupt_fields'.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/DebugFieldsAdder.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/DecoderSelector.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedWithRecordLengthExprRawRecordExtractor.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/utils/StringUtils.scalacobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scalacobol-parser/src/test/scala/za/co/absa/cobrix/cobol/utils/StringUtilsSuite.scalaspark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/schema/CobolSchema.scala
💤 Files with no reviewable changes (2)
- cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecodersSpec.scala
- cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/StringDecoders.scala
🚧 Files skipped from review as they are similar to previous changes (1)
- cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala
…high performance implementation across its usages.
970c6ba to
9af73e8
Compare
Closes #822
Summary by CodeRabbit
New Features
Tests