Skip to content

Commit dad6eb6

Browse files
committed
Merge remote-tracking branch 'origin/develop' into feature/mcp-server
2 parents 1c97e60 + c620ea2 commit dad6eb6

File tree

16 files changed

+294
-121
lines changed

16 files changed

+294
-121
lines changed

CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,17 @@ SPDX-License-Identifier: MIT-0
66
## [Unreleased]
77

88
### Changed
9-
- Increased page image limit from 20 to 100 across all IDP services (classification, extraction, assessment) to support processing of longer document sections with large context models following recent Amazon Bedrock API limit increases
9+
- Removed page image limit entirely across all IDP services (classification, extraction, assessment) following Amazon Bedrock API removal of image count restrictions. The system now processes all document pages without artificial truncation, with info logging to track image counts for monitoring purposes.
1010
- Resolves #147
1111

12+
### Fixed
13+
14+
- **Document Schema Builder Enum Support** - Fixed enum value handling in schema builder to properly support enumeration constraints for attribute definitions
15+
- **Agentic Extraction Parameter Passing** - Fixed temperature and top_p parameters now correctly passed to agentic extraction service, enabling proper model behavior control
16+
- **Document Schema Builder UI Labels** - Enhanced field labels and formats in document schema builder for improved clarity and user experience
17+
- **Retry Mechanism Improvements** - Enhanced retry logic for more reliable error handling and recovery across document processing workflows
18+
- **Type Safety Enhancements** - Improved type annotations and fixed undefined items handling to prevent runtime errors
19+
1220
## [0.4.5]
1321

1422
### Added

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,14 @@ ui-build:
129129

130130
commit: lint test
131131
$(info Generating commit message...)
132-
export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
132+
export COMMIT_MESSAGE="$(shell kiro-cli chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only on a single line." | grep ">" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
133133
git add . && \
134134
git commit -am "$${COMMIT_MESSAGE}" && \
135135
git push
136136

137137
fastcommit: fastlint
138138
$(info Generating commit message...)
139-
export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
139+
export COMMIT_MESSAGE="$(shell kiro-cli chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only on a single line." | grep ">" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
140140
git add . && \
141141
git commit -am "$${COMMIT_MESSAGE}" && \
142142
git push

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.4.6-wip1
1+
0.4.6-wip2

docs/classification.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -401,11 +401,11 @@ classification:
401401

402402
### Multi-Page Documents
403403

404-
For documents with multiple pages, the system automatically handles image limits:
404+
For documents with multiple pages, the system provides comprehensive image support:
405405

406-
- **Bedrock Limit**: Maximum 100 images per request (automatically enforced)
407-
- **Warning Logging**: System logs warnings when images are truncated due to limits
408-
- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
406+
- **No Image Limits**: All document pages are processed following Bedrock API removal of image count restrictions
407+
- **Info Logging**: System logs image counts for monitoring and debugging purposes
408+
- **Automatic Pagination**: Images are processed in page order for all pages
409409

410410
## Setting Up Few Shot Examples in Pattern 2
411411

docs/extraction.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -334,9 +334,9 @@ extraction:
334334
For documents with multiple pages, the system provides robust image management:
335335

336336
- **Automatic Pagination**: Images are processed in page order
337-
- **Bedrock Compliance**: Maximum 100 images per request (automatically enforced)
338-
- **Smart Truncation**: Excess images are dropped with warning logs
339-
- **Performance Optimization**: Large image sets are efficiently handled
337+
- **No Image Limits**: All document pages are included following Bedrock API removal of image count restrictions
338+
- **Comprehensive Processing**: The system processes documents of any length without truncation
339+
- **Performance Optimization**: Efficient handling of large image sets with info logging
340340

341341
```yaml
342342
# Example configuration for multi-page invoices
@@ -346,7 +346,7 @@ extraction:
346346
347347
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
348348
349-
Document pages (up to 100 images):
349+
Document pages (all pages included):
350350
{DOCUMENT_IMAGE}
351351
352352
Combined text from all pages:

docs/idp-configuration-best-practices.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1253,12 +1253,12 @@ classification:
12531253

12541254
### Multi-Page Document Handling
12551255

1256-
For documents with multiple pages, the system provides robust image management:
1256+
For documents with multiple pages, the system provides comprehensive image support:
12571257

12581258
- **Automatic Pagination**: Images are processed in page order
1259-
- **Bedrock Compliance**: Maximum 100 images per request (automatically enforced)
1260-
- **Smart Truncation**: Excess images are dropped with warning logs
1261-
- **Performance Optimization**: Large image sets are efficiently handled
1259+
- **No Image Limits**: All document pages are processed following Bedrock API removal of image count restrictions
1260+
- **Info Logging**: System logs image counts for monitoring purposes
1261+
- **Comprehensive Processing**: Documents of any length are fully processed
12621262

12631263
### Best Practices for Image Processing
12641264

lib/idp_common_pkg/idp_common/assessment/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,8 +305,9 @@ assess each extracted field:
305305

306306
### Automatic Image Handling
307307
- Supports both single and multiple document images
308-
- Automatically limits to 100 images per Bedrock constraints
308+
- Processes all document pages without image count restrictions
309309
- Graceful fallback when images are unavailable
310+
- Info logging for image count monitoring
310311

311312
## Attribute Types and Assessment Formats
312313

lib/idp_common_pkg/idp_common/assessment/granular_service.py

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ def __init__(
152152
import boto3
153153

154154
dynamodb = boto3.resource("dynamodb", region_name=self.region)
155-
self.cache_table = dynamodb.Table(self.cache_table_name)
155+
self.cache_table = dynamodb.Table(self.cache_table_name) # type: ignore[attr-defined]
156156
logger.info(
157157
f"Granular assessment caching enabled using table: {self.cache_table_name}"
158158
)
@@ -472,13 +472,11 @@ def _build_cached_prompt_base(
472472
# Add the images if available
473473
if page_images:
474474
if isinstance(page_images, list):
475-
# Multiple images (limit to 100 as per Bedrock constraints)
476-
if len(page_images) > 100:
477-
logger.warning(
478-
f"Found {len(page_images)} images, truncating to 100 due to Bedrock constraints. "
479-
f"{len(page_images) - 100} images will be dropped."
480-
)
481-
for img in page_images[:100]:
475+
# Multiple images - no limit with latest Bedrock API
476+
logger.info(
477+
f"Attaching {len(page_images)} images to granular assessment prompt"
478+
)
479+
for img in page_images:
482480
content.append(image.prepare_bedrock_image_attachment(img))
483481
else:
484482
# Single image
@@ -1134,8 +1132,8 @@ def _is_throttling_exception(self, exception: Exception) -> bool:
11341132
Returns:
11351133
True if exception indicates throttling, False otherwise
11361134
"""
1137-
if hasattr(exception, "response") and "Error" in exception.response:
1138-
error_code = exception.response["Error"]["Code"]
1135+
if hasattr(exception, "response") and "Error" in exception.response: # type: ignore[attr-defined]
1136+
error_code = exception.response["Error"]["Code"] # type: ignore[attr-defined]
11391137
return error_code in self.throttling_exceptions
11401138

11411139
# Check exception class name and message for throttling indicators

lib/idp_common_pkg/idp_common/assessment/service.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -428,13 +428,11 @@ def _build_content_with_image_placeholder(
428428
# Add the image if available
429429
if image_content:
430430
if isinstance(image_content, list):
431-
# Multiple images (limit to 100 as per Bedrock constraints)
432-
if len(image_content) > 100:
433-
logger.warning(
434-
f"Found {len(image_content)} images, truncating to 100 due to Bedrock constraints. "
435-
f"{len(image_content) - 100} images will be dropped."
436-
)
437-
for img in image_content[:100]:
431+
# Multiple images - no limit with latest Bedrock API
432+
logger.info(
433+
f"Attaching {len(image_content)} images to assessment prompt"
434+
)
435+
for img in image_content:
438436
content.append(image.prepare_bedrock_image_attachment(img))
439437
else:
440438
# Single image

lib/idp_common_pkg/idp_common/extraction/agentic_idp.py

Lines changed: 58 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -691,6 +691,37 @@ def _build_model_config(
691691
return model_config
692692

693693

694+
def _get_inference_params(temperature: float, top_p: float | None) -> dict[str, float]:
695+
"""
696+
Get inference parameters ensuring temperature and top_p are mutually exclusive.
697+
698+
Some Bedrock models don't allow both temperature and top_p to be specified.
699+
This follows the same logic as bedrock/client.py lines 348-364.
700+
701+
Args:
702+
temperature: Temperature value from config
703+
top_p: Top_p value from config (may be None)
704+
705+
Returns:
706+
Dict with only one of temperature or top_p
707+
"""
708+
params = {}
709+
710+
# Only use top_p if temperature is 0.0
711+
if top_p is not None and temperature == 0.0:
712+
params["top_p"] = top_p
713+
logger.debug(
714+
"Using top_p for inference (temperature is 0.0)", extra={"top_p": top_p}
715+
)
716+
else:
717+
params["temperature"] = temperature
718+
logger.debug(
719+
"Using temperature for inference", extra={"temperature": temperature}
720+
)
721+
722+
return params
723+
724+
694725
def _prepare_prompt_content(
695726
prompt: str | Message | Image.Image,
696727
page_images: list[bytes] | None,
@@ -735,20 +766,18 @@ def _prepare_prompt_content(
735766
else:
736767
prompt_content = [ContentBlock(text=str(prompt))]
737768

738-
# Add page images if provided (limit to 100 as per Bedrock constraints)
769+
# Add page images if provided - no limit with latest Bedrock API
739770
if page_images:
740-
if len(page_images) > 100:
741-
prompt_content.append(
742-
ContentBlock(
743-
text=f"There are {len(page_images)} images, initially you'll see 100 of them, use the view_image tool to see the rest."
744-
)
745-
)
771+
logger.info(
772+
"Attaching images to agentic extraction prompt",
773+
extra={"image_count": len(page_images)},
774+
)
746775

747776
prompt_content += [
748777
ContentBlock(
749778
image=ImageContent(format="png", source=ImageSource(bytes=img_bytes))
750779
)
751-
for img_bytes in page_images[:100]
780+
for img_bytes in page_images
752781
]
753782

754783
# Add existing data context if provided
@@ -1003,8 +1032,17 @@ async def structured_output_async(
10031032

10041033
# Track token usage
10051034
token_usage = _initialize_token_usage()
1035+
1036+
# Get inference params ensuring temperature and top_p are mutually exclusive
1037+
inference_params = _get_inference_params(
1038+
temperature=config.extraction.temperature, top_p=config.extraction.top_p
1039+
)
1040+
10061041
agent = Agent(
1007-
model=BedrockModel(**model_config), # pyright: ignore[reportArgumentType]
1042+
model=BedrockModel(
1043+
**model_config,
1044+
**inference_params,
1045+
), # pyright: ignore[reportArgumentType]
10081046
tools=tools,
10091047
system_prompt=final_system_prompt,
10101048
state={
@@ -1076,8 +1114,17 @@ async def structured_output_async(
10761114
connect_timeout=connect_timeout,
10771115
read_timeout=read_timeout,
10781116
)
1117+
1118+
# Get inference params for review agent ensuring temperature and top_p are mutually exclusive
1119+
review_inference_params = _get_inference_params(
1120+
temperature=config.extraction.temperature, top_p=config.extraction.top_p
1121+
)
1122+
10791123
agent = Agent(
1080-
model=BedrockModel(**review_model_config), # pyright: ignore[reportArgumentType]
1124+
model=BedrockModel(
1125+
**review_model_config,
1126+
**review_inference_params,
1127+
), # pyright: ignore[reportArgumentType]
10811128
tools=tools,
10821129
system_prompt=f"{final_system_prompt}",
10831130
state={
@@ -1094,7 +1141,7 @@ async def structured_output_async(
10941141
)
10951142

10961143
review_response = await invoke_agent_with_retry(
1097-
agent=agent, input=review_prompt
1144+
agent=agent, input=[review_prompt]
10981145
)
10991146
logger.debug("Review response received", extra={"review_completed": True})
11001147

0 commit comments

Comments
 (0)