fix: Image type recognition failure in FileTypeUtils#812
fix: Image type recognition failure in FileTypeUtils#812bengbengbalabalabeng wants to merge 4 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in FileTypeUtils where image type recognition was failing due to incorrect matching logic. The original code used exact string matching on a 56-character hex string against short magic numbers (6-8 characters), which always failed. The fix changes to a prefix-based matching approach using startsWith().
Changes:
- Modified getImageType() to use startsWith() for prefix matching instead of exact HashMap.get() lookup
- Added IMAGE_TYPE_MARK_MIN_LENGTH constant and adjusted length validation logic
- Added comprehensive unit tests for the FileTypeUtils class
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| fesod-sheet/src/main/java/org/apache/fesod/sheet/util/FileTypeUtils.java | Fixed image type detection logic by implementing prefix matching and improving input validation |
| fesod-sheet/src/test/java/org/apache/fesod/sheet/util/FileTypeUtilsTest.java | Added comprehensive test coverage for all code paths including edge cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (Map.Entry<String, ImageData.ImageType> entry : FILE_TYPE_MAP.entrySet()) { | ||
| String magicNumber = entry.getKey(); | ||
| if (hexString.startsWith(magicNumber)) { | ||
| return entry.getValue(); | ||
| } | ||
| } |
There was a problem hiding this comment.
The iteration order of HashMap entries is not guaranteed. While not an issue with the current two magic numbers, this could cause incorrect matches if future image types are added with overlapping prefixes. Consider using LinkedHashMap to maintain insertion order or implementing explicit ordering logic (e.g., longest prefix first) to ensure deterministic and correct matching behavior.
| for (Map.Entry<String, ImageData.ImageType> entry : FILE_TYPE_MAP.entrySet()) { | ||
| String magicNumber = entry.getKey(); | ||
| if (hexString.startsWith(magicNumber)) { | ||
| return entry.getValue(); | ||
| } | ||
| } |
There was a problem hiding this comment.
The iteration-based approach with startsWith() has O(n) time complexity for each lookup, compared to O(1) for the previous HashMap.get() approach. For better performance with the new logic, consider using a more efficient data structure like a Trie or maintaining a sorted list by magic number length for early exit optimization.
| private static final char[] DIGITS = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' | ||
| }; | ||
| private static final int IMAGE_TYPE_MARK_LENGTH = 28; | ||
| private static final int IMAGE_TYPE_MARK_MIN_LENGTH = 4; |
There was a problem hiding this comment.
The IMAGE_TYPE_MARK_MIN_LENGTH is set to 4 bytes, but JPEG's magic number "ffd8ff" only requires 3 bytes. The test test_getImageType_tooShort expects null for a 3-byte JPEG magic number, which would incorrectly reject valid JPEG headers. Either IMAGE_TYPE_MARK_MIN_LENGTH should be 3 to accommodate JPEG, or the JPEG magic number definition should be extended to require 4 bytes.
| private static final int IMAGE_TYPE_MARK_MIN_LENGTH = 4; | |
| // Must be at least the length (in bytes) of the shortest magic number (currently JPEG: "ffd8ff" = 3 bytes). | |
| private static final int IMAGE_TYPE_MARK_MIN_LENGTH = 3; |
|
|
||
| @Test | ||
| void test_getImageType_tooShort() { | ||
| byte[] input = new byte[] {(byte) 0xFF, (byte) 0xD8, (byte) 0xFF}; |
There was a problem hiding this comment.
This test creates a 3-byte array with valid JPEG magic bytes (0xFF, 0xD8, 0xFF) and expects null, but JPEG's magic number is "ffd8ff" which is exactly 3 bytes. This test appears to validate incorrect behavior where valid JPEG images would be rejected. The test should either expect PICTURE_TYPE_JPEG or use a different magic number that genuinely requires 4 bytes.
| byte[] input = new byte[] {(byte) 0xFF, (byte) 0xD8, (byte) 0xFF}; | |
| byte[] input = new byte[] {(byte) 0x00, (byte) 0x01}; |
Purpose of the pull request
Related: #811
Incorrect matching logic in FileTypeUtils.
What's changed?
Changes the logic to use
startsWith()to correctly match the file magic numbers.Checklist