Skip to content

Commit 879518a

Browse files
committed
Merge branch 'fix/remove-image-limits' into 'develop'
Remove image count limits for Bedrock API calls across IDP services See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!443
2 parents ba447ba + 80f8f56 commit 879518a

File tree

9 files changed

+38
-48
lines changed

9 files changed

+38
-48
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ SPDX-License-Identifier: MIT-0
66
## [Unreleased]
77

88
### Changed
9-
- Increased page image limit from 20 to 100 across all IDP services (classification, extraction, assessment) to support processing of longer document sections with large context models following recent Amazon Bedrock API limit increases
9+
- Removed page image limit entirely across all IDP services (classification, extraction, assessment) following Amazon Bedrock API removal of image count restrictions. The system now processes all document pages without artificial truncation, with info logging to track image counts for monitoring purposes.
1010
- Resolves #147
1111

1212
## [0.4.5]

docs/classification.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -401,11 +401,11 @@ classification:
401401

402402
### Multi-Page Documents
403403

404-
For documents with multiple pages, the system automatically handles image limits:
404+
For documents with multiple pages, the system provides comprehensive image support:
405405

406-
- **Bedrock Limit**: Maximum 100 images per request (automatically enforced)
407-
- **Warning Logging**: System logs warnings when images are truncated due to limits
408-
- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
406+
- **No Image Limits**: All document pages are processed following Bedrock API removal of image count restrictions
407+
- **Info Logging**: System logs image counts for monitoring and debugging purposes
408+
- **Automatic Pagination**: Images are processed in page order for all pages
409409

410410
## Setting Up Few Shot Examples in Pattern 2
411411

docs/extraction.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -334,9 +334,9 @@ extraction:
334334
For documents with multiple pages, the system provides robust image management:
335335

336336
- **Automatic Pagination**: Images are processed in page order
337-
- **Bedrock Compliance**: Maximum 100 images per request (automatically enforced)
338-
- **Smart Truncation**: Excess images are dropped with warning logs
339-
- **Performance Optimization**: Large image sets are efficiently handled
337+
- **No Image Limits**: All document pages are included following Bedrock API removal of image count restrictions
338+
- **Comprehensive Processing**: The system processes documents of any length without truncation
339+
- **Performance Optimization**: Efficient handling of large image sets with info logging
340340

341341
```yaml
342342
# Example configuration for multi-page invoices
@@ -346,7 +346,7 @@ extraction:
346346
347347
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
348348
349-
Document pages (up to 100 images):
349+
Document pages (all pages included):
350350
{DOCUMENT_IMAGE}
351351
352352
Combined text from all pages:

docs/idp-configuration-best-practices.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1253,12 +1253,12 @@ classification:
12531253

12541254
### Multi-Page Document Handling
12551255

1256-
For documents with multiple pages, the system provides robust image management:
1256+
For documents with multiple pages, the system provides comprehensive image support:
12571257

12581258
- **Automatic Pagination**: Images are processed in page order
1259-
- **Bedrock Compliance**: Maximum 100 images per request (automatically enforced)
1260-
- **Smart Truncation**: Excess images are dropped with warning logs
1261-
- **Performance Optimization**: Large image sets are efficiently handled
1259+
- **No Image Limits**: All document pages are processed following Bedrock API removal of image count restrictions
1260+
- **Info Logging**: System logs image counts for monitoring purposes
1261+
- **Comprehensive Processing**: Documents of any length are fully processed
12621262

12631263
### Best Practices for Image Processing
12641264

lib/idp_common_pkg/idp_common/assessment/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,8 +305,9 @@ assess each extracted field:
305305

306306
### Automatic Image Handling
307307
- Supports both single and multiple document images
308-
- Automatically limits to 100 images per Bedrock constraints
308+
- Processes all document pages without image count restrictions
309309
- Graceful fallback when images are unavailable
310+
- Info logging for image count monitoring
310311

311312
## Attribute Types and Assessment Formats
312313

lib/idp_common_pkg/idp_common/assessment/granular_service.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -472,13 +472,11 @@ def _build_cached_prompt_base(
472472
# Add the images if available
473473
if page_images:
474474
if isinstance(page_images, list):
475-
# Multiple images (limit to 100 as per Bedrock constraints)
476-
if len(page_images) > 100:
477-
logger.warning(
478-
f"Found {len(page_images)} images, truncating to 100 due to Bedrock constraints. "
479-
f"{len(page_images) - 100} images will be dropped."
480-
)
481-
for img in page_images[:100]:
475+
# Multiple images - no limit with latest Bedrock API
476+
logger.info(
477+
f"Attaching {len(page_images)} images to granular assessment prompt"
478+
)
479+
for img in page_images:
482480
content.append(image.prepare_bedrock_image_attachment(img))
483481
else:
484482
# Single image

lib/idp_common_pkg/idp_common/assessment/service.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -428,13 +428,11 @@ def _build_content_with_image_placeholder(
428428
# Add the image if available
429429
if image_content:
430430
if isinstance(image_content, list):
431-
# Multiple images (limit to 100 as per Bedrock constraints)
432-
if len(image_content) > 100:
433-
logger.warning(
434-
f"Found {len(image_content)} images, truncating to 100 due to Bedrock constraints. "
435-
f"{len(image_content) - 100} images will be dropped."
436-
)
437-
for img in image_content[:100]:
431+
# Multiple images - no limit with latest Bedrock API
432+
logger.info(
433+
f"Attaching {len(image_content)} images to assessment prompt"
434+
)
435+
for img in image_content:
438436
content.append(image.prepare_bedrock_image_attachment(img))
439437
else:
440438
# Single image

lib/idp_common_pkg/idp_common/extraction/agentic_idp.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -735,20 +735,18 @@ def _prepare_prompt_content(
735735
else:
736736
prompt_content = [ContentBlock(text=str(prompt))]
737737

738-
# Add page images if provided (limit to 100 as per Bedrock constraints)
738+
# Add page images if provided - no limit with latest Bedrock API
739739
if page_images:
740-
if len(page_images) > 100:
741-
prompt_content.append(
742-
ContentBlock(
743-
text=f"There are {len(page_images)} images, initially you'll see 100 of them, use the view_image tool to see the rest."
744-
)
745-
)
740+
logger.info(
741+
"Attaching images to agentic extraction prompt",
742+
extra={"image_count": len(page_images)},
743+
)
746744

747745
prompt_content += [
748746
ContentBlock(
749747
image=ImageContent(format="png", source=ImageSource(bytes=img_bytes))
750748
)
751-
for img_bytes in page_images[:100]
749+
for img_bytes in page_images
752750
]
753751

754752
# Add existing data context if provided

lib/idp_common_pkg/idp_common/extraction/service.py

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -148,13 +148,12 @@ def _get_default_prompt_content(self) -> list[dict[str, Any]]:
148148
"""
149149
content = [{"text": task_prompt}]
150150

151-
# Add image attachments to the content (limit to 100 images as per Bedrock constraints)
151+
# Add image attachments to the content - no limit with latest Bedrock API
152152
if self._page_images:
153153
logger.info(
154-
f"Attaching images to default prompt, for {len(self._page_images)} pages."
154+
f"Attaching {len(self._page_images)} images to default extraction prompt"
155155
)
156-
# Limit to 100 images as per Bedrock constraints
157-
for img in self._page_images[:100]:
156+
for img in self._page_images:
158157
content.append(image.prepare_bedrock_image_attachment(img))
159158

160159
return content
@@ -354,7 +353,7 @@ def _build_text_and_image_content(
354353

355354
def _prepare_image_attachments(self, image_content: Any) -> list[dict[str, Any]]:
356355
"""
357-
Prepare image attachments for Bedrock, limiting to 100 images.
356+
Prepare image attachments for Bedrock - no image limit.
358357
359358
Args:
360359
image_content: Single image or list of images
@@ -365,13 +364,9 @@ def _prepare_image_attachments(self, image_content: Any) -> list[dict[str, Any]]
365364
attachments: list[dict[str, Any]] = []
366365

367366
if isinstance(image_content, list):
368-
# Multiple images (limit to 100 as per Bedrock constraints)
369-
if len(image_content) > 100:
370-
logger.warning(
371-
f"Found {len(image_content)} images, truncating to 100 due to Bedrock constraints. "
372-
f"{len(image_content) - 100} images will be dropped."
373-
)
374-
for img in image_content[:100]:
367+
# Multiple images - no limit with latest Bedrock API
368+
logger.info(f"Attaching {len(image_content)} images to extraction prompt")
369+
for img in image_content:
375370
attachments.append(image.prepare_bedrock_image_attachment(img))
376371
else:
377372
# Single image

0 commit comments

Comments
 (0)