Merge remote-tracking branch 'origin/develop' into feature/mcp-server

webarch-ai · webarch-ai · commit dad6eb6bd299 · 2025-12-03T15:58:39.000-05:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,9 +6,17 @@ SPDX-License-Identifier: MIT-0
 ## [Unreleased]
 
 ### Changed
-- Increased page image limit from 20 to 100 across all IDP services (classification, extraction, assessment) to support processing of longer document sections with large context models following recent Amazon Bedrock API limit increases
+- Removed page image limit entirely across all IDP services (classification, extraction, assessment) following Amazon Bedrock API removal of image count restrictions. The system now processes all document pages without artificial truncation, with info logging to track image counts for monitoring purposes.
   - Resolves #147
 
+### Fixed
+
+- **Document Schema Builder Enum Support** - Fixed enum value handling in schema builder to properly support enumeration constraints for attribute definitions
+- **Agentic Extraction Parameter Passing** - Fixed temperature and top_p parameters now correctly passed to agentic extraction service, enabling proper model behavior control
+- **Document Schema Builder UI Labels** - Enhanced field labels and formats in document schema builder for improved clarity and user experience
+- **Retry Mechanism Improvements** - Enhanced retry logic for more reliable error handling and recovery across document processing workflows
+- **Type Safety Enhancements** - Improved type annotations and fixed undefined items handling to prevent runtime errors
+
 ## [0.4.5]
 
 ### Added
diff --git a/Makefile b/Makefile
@@ -129,14 +129,14 @@ ui-build:
 
 commit: lint test
 	$(info Generating commit message...)
-	export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
+	export COMMIT_MESSAGE="$(shell kiro-cli chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only on a single line." | grep ">" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
 	git add . && \
 	git commit -am "$${COMMIT_MESSAGE}" && \
 	git push
 
 fastcommit: fastlint
 	$(info Generating commit message...)
-	export COMMIT_MESSAGE="$(shell q chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
+	export COMMIT_MESSAGE="$(shell kiro-cli chat --no-interactive --trust-all-tools "Understand pending local git change and changes to be committed, then infer a commit message. Return this commit message only on a single line." | grep ">" | tail -n 1 | sed 's/\x1b\[[0-9;]*m//g')" && \
 	git add . && \
 	git commit -am "$${COMMIT_MESSAGE}" && \
 	git push
diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.4.6-wip1
+0.4.6-wip2
diff --git a/docs/classification.md b/docs/classification.md
@@ -401,11 +401,11 @@ classification:
 
 ### Multi-Page Documents
 
-For documents with multiple pages, the system automatically handles image limits:
+For documents with multiple pages, the system provides comprehensive image support:
 
-- **Bedrock Limit**: Maximum 100 images per request (automatically enforced)
-- **Warning Logging**: System logs warnings when images are truncated due to limits
-- **Smart Handling**: Images are processed in page order, with excess images automatically dropped
+- **No Image Limits**: All document pages are processed following Bedrock API removal of image count restrictions
+- **Info Logging**: System logs image counts for monitoring and debugging purposes
+- **Automatic Pagination**: Images are processed in page order for all pages
 
 ## Setting Up Few Shot Examples in Pattern 2
 
diff --git a/docs/extraction.md b/docs/extraction.md
@@ -334,9 +334,9 @@ extraction:
 For documents with multiple pages, the system provides robust image management:
 
 - **Automatic Pagination**: Images are processed in page order
-- **Bedrock Compliance**: Maximum 100 images per request (automatically enforced)
-- **Smart Truncation**: Excess images are dropped with warning logs
-- **Performance Optimization**: Large image sets are efficiently handled
+- **No Image Limits**: All document pages are included following Bedrock API removal of image count restrictions
+- **Comprehensive Processing**: The system processes documents of any length without truncation
+- **Performance Optimization**: Efficient handling of large image sets with info logging
 
 ```yaml
 # Example configuration for multi-page invoices
@@ -346,7 +346,7 @@ extraction:
 
     {ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
 
-    Document pages (up to 100 images):
+    Document pages (all pages included):
     {DOCUMENT_IMAGE}
 
     Combined text from all pages:
diff --git a/docs/idp-configuration-best-practices.md b/docs/idp-configuration-best-practices.md
@@ -1253,12 +1253,12 @@ classification:
 
 ### Multi-Page Document Handling
 
-For documents with multiple pages, the system provides robust image management:
+For documents with multiple pages, the system provides comprehensive image support:
 
 - **Automatic Pagination**: Images are processed in page order
-- **Bedrock Compliance**: Maximum 100 images per request (automatically enforced)
-- **Smart Truncation**: Excess images are dropped with warning logs
-- **Performance Optimization**: Large image sets are efficiently handled
+- **No Image Limits**: All document pages are processed following Bedrock API removal of image count restrictions
+- **Info Logging**: System logs image counts for monitoring purposes
+- **Comprehensive Processing**: Documents of any length are fully processed
 
 ### Best Practices for Image Processing
 
diff --git a/lib/idp_common_pkg/idp_common/assessment/README.md b/lib/idp_common_pkg/idp_common/assessment/README.md
@@ -305,8 +305,9 @@ assess each extracted field:
 
 ### Automatic Image Handling
 - Supports both single and multiple document images
-- Automatically limits to 100 images per Bedrock constraints
+- Processes all document pages without image count restrictions
 - Graceful fallback when images are unavailable
+- Info logging for image count monitoring
 
 ## Attribute Types and Assessment Formats
 
diff --git a/lib/idp_common_pkg/idp_common/assessment/granular_service.py b/lib/idp_common_pkg/idp_common/assessment/granular_service.py
@@ -152,7 +152,7 @@ def __init__(
             import boto3
 
             dynamodb = boto3.resource("dynamodb", region_name=self.region)
-            self.cache_table = dynamodb.Table(self.cache_table_name)
+            self.cache_table = dynamodb.Table(self.cache_table_name)  # type: ignore[attr-defined]
             logger.info(
                 f"Granular assessment caching enabled using table: {self.cache_table_name}"
             )
@@ -472,13 +472,11 @@ def _build_cached_prompt_base(
             # Add the images if available
             if page_images:
                 if isinstance(page_images, list):
-                    # Multiple images (limit to 100 as per Bedrock constraints)
-                    if len(page_images) > 100:
-                        logger.warning(
-                            f"Found {len(page_images)} images, truncating to 100 due to Bedrock constraints. "
-                            f"{len(page_images) - 100} images will be dropped."
-                        )
-                    for img in page_images[:100]:
+                    # Multiple images - no limit with latest Bedrock API
+                    logger.info(
+                        f"Attaching {len(page_images)} images to granular assessment prompt"
+                    )
+                    for img in page_images:
                         content.append(image.prepare_bedrock_image_attachment(img))
                 else:
                     # Single image
@@ -1134,8 +1132,8 @@ def _is_throttling_exception(self, exception: Exception) -> bool:
         Returns:
             True if exception indicates throttling, False otherwise
         """
-        if hasattr(exception, "response") and "Error" in exception.response:
-            error_code = exception.response["Error"]["Code"]
+        if hasattr(exception, "response") and "Error" in exception.response:  # type: ignore[attr-defined]
+            error_code = exception.response["Error"]["Code"]  # type: ignore[attr-defined]
             return error_code in self.throttling_exceptions
 
         # Check exception class name and message for throttling indicators
diff --git a/lib/idp_common_pkg/idp_common/assessment/service.py b/lib/idp_common_pkg/idp_common/assessment/service.py
@@ -428,13 +428,11 @@ def _build_content_with_image_placeholder(
         # Add the image if available
         if image_content:
             if isinstance(image_content, list):
-                # Multiple images (limit to 100 as per Bedrock constraints)
-                if len(image_content) > 100:
-                    logger.warning(
-                        f"Found {len(image_content)} images, truncating to 100 due to Bedrock constraints. "
-                        f"{len(image_content) - 100} images will be dropped."
-                    )
-                for img in image_content[:100]:
+                # Multiple images - no limit with latest Bedrock API
+                logger.info(
+                    f"Attaching {len(image_content)} images to assessment prompt"
+                )
+                for img in image_content:
                     content.append(image.prepare_bedrock_image_attachment(img))
             else:
                 # Single image
diff --git a/lib/idp_common_pkg/idp_common/extraction/agentic_idp.py b/lib/idp_common_pkg/idp_common/extraction/agentic_idp.py
@@ -691,6 +691,37 @@ def _build_model_config(
     return model_config
 
 
+def _get_inference_params(temperature: float, top_p: float | None) -> dict[str, float]:
+    """
+    Get inference parameters ensuring temperature and top_p are mutually exclusive.
+
+    Some Bedrock models don't allow both temperature and top_p to be specified.
+    This follows the same logic as bedrock/client.py lines 348-364.
+
+    Args:
+        temperature: Temperature value from config
+        top_p: Top_p value from config (may be None)
+
+    Returns:
+        Dict with only one of temperature or top_p
+    """
+    params = {}
+
+    # Only use top_p if temperature is 0.0
+    if top_p is not None and temperature == 0.0:
+        params["top_p"] = top_p
+        logger.debug(
+            "Using top_p for inference (temperature is 0.0)", extra={"top_p": top_p}
+        )
+    else:
+        params["temperature"] = temperature
+        logger.debug(
+            "Using temperature for inference", extra={"temperature": temperature}
+        )
+
+    return params
+
+
 def _prepare_prompt_content(
     prompt: str | Message | Image.Image,
     page_images: list[bytes] | None,
@@ -735,20 +766,18 @@ def _prepare_prompt_content(
     else:
         prompt_content = [ContentBlock(text=str(prompt))]
 
-    # Add page images if provided (limit to 100 as per Bedrock constraints)
+    # Add page images if provided - no limit with latest Bedrock API
     if page_images:
-        if len(page_images) > 100:
-            prompt_content.append(
-                ContentBlock(
-                    text=f"There are {len(page_images)} images, initially you'll see 100 of them, use the view_image tool to see the rest."
-                )
-            )
+        logger.info(
+            "Attaching images to agentic extraction prompt",
+            extra={"image_count": len(page_images)},
+        )
 
         prompt_content += [
             ContentBlock(
                 image=ImageContent(format="png", source=ImageSource(bytes=img_bytes))
             )
-            for img_bytes in page_images[:100]
+            for img_bytes in page_images
         ]
 
     # Add existing data context if provided
@@ -1003,8 +1032,17 @@ async def structured_output_async(
 
     # Track token usage
     token_usage = _initialize_token_usage()
+
+    # Get inference params ensuring temperature and top_p are mutually exclusive
+    inference_params = _get_inference_params(
+        temperature=config.extraction.temperature, top_p=config.extraction.top_p
+    )
+
     agent = Agent(
-        model=BedrockModel(**model_config),  # pyright: ignore[reportArgumentType]
+        model=BedrockModel(
+            **model_config,
+            **inference_params,
+        ),  # pyright: ignore[reportArgumentType]
         tools=tools,
         system_prompt=final_system_prompt,
         state={
@@ -1076,8 +1114,17 @@ async def structured_output_async(
             connect_timeout=connect_timeout,
             read_timeout=read_timeout,
         )
+
+        # Get inference params for review agent ensuring temperature and top_p are mutually exclusive
+        review_inference_params = _get_inference_params(
+            temperature=config.extraction.temperature, top_p=config.extraction.top_p
+        )
+
         agent = Agent(
-            model=BedrockModel(**review_model_config),  # pyright: ignore[reportArgumentType]
+            model=BedrockModel(
+                **review_model_config,
+                **review_inference_params,
+            ),  # pyright: ignore[reportArgumentType]
             tools=tools,
             system_prompt=f"{final_system_prompt}",
             state={
@@ -1094,7 +1141,7 @@ async def structured_output_async(
         )
 
         review_response = await invoke_agent_with_retry(
-            agent=agent, input=review_prompt
+            agent=agent, input=[review_prompt]
         )
         logger.debug("Review response received", extra={"review_completed": True})
 
diff --git a/lib/idp_common_pkg/idp_common/extraction/service.py b/lib/idp_common_pkg/idp_common/extraction/service.py
@@ -148,13 +148,12 @@ def _get_default_prompt_content(self) -> list[dict[str, Any]]:
         """
         content = [{"text": task_prompt}]
 
-        # Add image attachments to the content (limit to 100 images as per Bedrock constraints)
+        # Add image attachments to the content - no limit with latest Bedrock API
         if self._page_images:
             logger.info(
-                f"Attaching images to default prompt, for {len(self._page_images)} pages."
+                f"Attaching {len(self._page_images)} images to default extraction prompt"
             )
-            # Limit to 100 images as per Bedrock constraints
-            for img in self._page_images[:100]:
+            for img in self._page_images:
                 content.append(image.prepare_bedrock_image_attachment(img))
 
         return content
@@ -354,7 +353,7 @@ def _build_text_and_image_content(
 
     def _prepare_image_attachments(self, image_content: Any) -> list[dict[str, Any]]:
         """
-        Prepare image attachments for Bedrock, limiting to 100 images.
+        Prepare image attachments for Bedrock - no image limit.
 
         Args:
             image_content: Single image or list of images
@@ -365,13 +364,9 @@ def _prepare_image_attachments(self, image_content: Any) -> list[dict[str, Any]]
         attachments: list[dict[str, Any]] = []
 
         if isinstance(image_content, list):
-            # Multiple images (limit to 100 as per Bedrock constraints)
-            if len(image_content) > 100:
-                logger.warning(
-                    f"Found {len(image_content)} images, truncating to 100 due to Bedrock constraints. "
-                    f"{len(image_content) - 100} images will be dropped."
-                )
-            for img in image_content[:100]:
+            # Multiple images - no limit with latest Bedrock API
+            logger.info(f"Attaching {len(image_content)} images to extraction prompt")
+            for img in image_content:
                 attachments.append(image.prepare_bedrock_image_attachment(img))
         else:
             # Single image
diff --git a/lib/idp_common_pkg/idp_common/utils/bedrock_utils.py b/lib/idp_common_pkg/idp_common/utils/bedrock_utils.py
diff --git a/src/ui/src/components/json-schema-builder/SchemaCanvas.jsx b/src/ui/src/components/json-schema-builder/SchemaCanvas.jsx
diff --git a/src/ui/src/components/json-schema-builder/constraints/StringConstraints.jsx b/src/ui/src/components/json-schema-builder/constraints/StringConstraints.jsx
diff --git a/src/ui/src/components/json-schema-builder/constraints/ValueConstraints.jsx b/src/ui/src/components/json-schema-builder/constraints/ValueConstraints.jsx
diff --git a/src/ui/src/constants/schemaConstants.js b/src/ui/src/constants/schemaConstants.js