Skip to content

Commit aaf8adb

Browse files
committed
Merge remote-tracking branch 'origin/develop' into feature/mcp-server
2 parents a92ab27 + 59f3738 commit aaf8adb

File tree

36 files changed

+1323
-219
lines changed

36 files changed

+1323
-219
lines changed

CHANGELOG.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,43 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
### Added
9+
10+
- **Amazon Nova 2 Lite Model Support**
11+
- Added support for Amazon Nova 2 Lite model (`us.amazon.nova-2-lite-v1:0`, `eu.amazon.nova-2-lite-v1:0`)
12+
- Available for configuration across all document processing steps
13+
- Added to prompt caching supported models list
14+
15+
- **Anthropic Claude Opus 4.5 Model Support**
16+
- Added support for Claude Opus 4.5 model (`us.anthropic.claude-opus-4-5-20251101-v1:0`, `eu.anthropic.claude-opus-4-5-20251101-v1:0`)
17+
- Available for configuration across all document processing steps
18+
- Added to prompt caching supported models list
19+
20+
- **Qwen Model Support**
21+
- Added support for Qwen 3 VL model (`qwen.qwen3-vl-235b-a22b`)
22+
- Available for configuration in document processing workflows
23+
24+
- **Configurable Section Splitting Strategies for Enhanced Document Segmentation Control**
25+
- Added new `sectionSplitting` configuration option to control how classified pages are grouped into document sections
26+
- **Three Strategies Available**:
27+
- `disabled`: Entire document treated as single section with first detected class (simplest case)
28+
- `page`: One section per page preventing automatic joining of same-type documents (deterministic, solves Issue #146)
29+
- `llm_determined`: Uses LLM boundary detection with "Start"/"Continue" indicators (default, maintains existing behavior)
30+
- **Key Benefits**: Deterministic splitting for long documents with multiple same-type forms (e.g., multiple W-2s, multiple invoices), eliminates LLM boundary detection failures for critical government form processing, provides flexibility across simple to complex document scenarios
31+
- Resolves #146
32+
833
### Changed
34+
35+
- **Improved Temperature and Top_P Parameter Logic for Deterministic Output**
36+
- Changed inference parameter selection logic to allow `temperature=0.0` for deterministic output (recommended by Anthropic and other model providers)
37+
- **New Logic**: Uses `top_p` only when it has a positive value (> 0); otherwise uses `temperature` including `temperature=0.0`
38+
- **Previous Logic**: Used `top_p` whenever `temperature=0.0`, preventing proper deterministic configuration
39+
- **Key Benefits**: Enables proper deterministic output with `temperature=0.0`, more intuitive parameter behavior, aligns with model provider best practices (Anthropic recommends `temperature=0` for consistent outputs)
40+
- **Affected Components**: Bedrock client (`lib/idp_common_pkg/idp_common/bedrock/client.py`), Agentic extraction service (`lib/idp_common_pkg/idp_common/extraction/agentic_idp.py`)
41+
- **Configuration Guidance**: Set `top_p: 0` to use `temperature` parameter; set `top_p` to positive value to override temperature
42+
- Set temperature to 0.0 in discovery config for deterministic discovery output (was previously set to 1.0)
43+
- Set top_p to 0.0 in all repo config files to force use of temperature setting by default.
44+
945
- Removed page image limit entirely across all IDP services (classification, extraction, assessment) following Amazon Bedrock API removal of image count restrictions. The system now processes all document pages without artificial truncation, with info logging to track image counts for monitoring purposes.
1046
- Resolves #147
1147

config_library/pattern-1/lending-package-sample/config.yaml

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ assessment:
66
default_confidence_threshold: '0.8'
77
summarization:
88
enabled: true
9-
top_p: '0.1'
9+
top_p: "0.0"
1010
max_tokens: '4096'
1111
top_k: '5'
1212
task_prompt: >-
@@ -62,14 +62,14 @@ summarization:
6262
6363
Do not include any text, explanations, or notes outside of this JSON
6464
structure. The JSON must be properly formatted and parseable.
65-
temperature: '0.0'
65+
temperature: "0.0"
6666
model: us.amazon.nova-premier-v1:0
6767
system_prompt: >-
6868
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
6969
evaluation:
7070
enabled: true
7171
llm_method:
72-
top_p: '0.1'
72+
top_p: "0.0"
7373
max_tokens: '4096'
7474
top_k: '5'
7575
task_prompt: >-
@@ -101,7 +101,7 @@ evaluation:
101101
"score": 0.0 to 1.0,
102102
"reason": "Your explanation here"
103103
}
104-
temperature: '0.0'
104+
temperature: "0.0"
105105
model: us.anthropic.claude-3-haiku-20240307-v1:0
106106
system_prompt: >-
107107
You are an evaluator that helps determine if the predicted and expected values match for document attribute extraction. You will consider the context and meaning rather than just exact string matching.
@@ -151,8 +151,8 @@ discovery:
151151
]
152152
}
153153
with_ground_truth:
154-
top_p: '0.1'
155-
temperature: '1.0'
154+
top_p: "0.0"
155+
temperature: "0.0"
156156
user_prompt: >-
157157
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
158158
<GROUND_TRUTH_REFERENCE>
@@ -179,8 +179,8 @@ discovery:
179179
field definitions.
180180
max_tokens: '10000'
181181
without_ground_truth:
182-
top_p: '0.1'
183-
temperature: '1.0'
182+
top_p: "0.0"
183+
temperature: "0.0"
184184
user_prompt: >-
185185
This image contains forms data. Analyze the form line by line.
186186
Image may contains multiple pages, process all the pages.
@@ -336,6 +336,16 @@ pricing:
336336
price: '2.5E-6'
337337
- name: outputTokens
338338
price: '1.25E-5'
339+
- name: bedrock/us.amazon.nova-2-lite-v1:0
340+
units:
341+
- name: inputTokens
342+
price: '3.0E-7'
343+
- name: outputTokens
344+
price: '2.5E-6'
345+
- name: cacheReadInputTokens
346+
price: '7.5E-8'
347+
- name: cacheWriteInputTokens
348+
price: '3.0E-7'
339349
- name: bedrock/us.anthropic.claude-3-haiku-20240307-v1:0
340350
units:
341351
- name: inputTokens
@@ -442,6 +452,16 @@ pricing:
442452
price: '1.5E-6'
443453
- name: cacheWriteInputTokens
444454
price: '1.875E-5'
455+
- name: bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0
456+
units:
457+
- name: inputTokens
458+
price: '5.0E-06'
459+
- name: outputTokens
460+
price: '2.5E-05'
461+
- name: cacheReadInputTokens
462+
price: '5.0E-07'
463+
- name: cacheWriteInputTokens
464+
price: '6.25E-06'
445465
# EU model pricing
446466
- name: bedrock/eu.amazon.nova-lite-v1:0
447467
units:
@@ -463,6 +483,16 @@ pricing:
463483
price: '2.6E-7'
464484
- name: cacheWriteInputTokens
465485
price: '1.0E-6'
486+
- name: bedrock/eu.amazon.nova-2-lite-v1:0
487+
units:
488+
- name: inputTokens
489+
price: '3.9E-7'
490+
- name: outputTokens
491+
price: '3.27E-6'
492+
- name: cacheReadInputTokens
493+
price: '9.75E-8'
494+
- name: cacheWriteInputTokens
495+
price: '3.9E-7'
466496
- name: bedrock/eu.anthropic.claude-3-haiku-20240307-v1:0
467497
units:
468498
- name: inputTokens
@@ -529,6 +559,22 @@ pricing:
529559
price: '6.6E-7'
530560
- name: cacheWriteInputTokens
531561
price: '8.25E-6'
562+
- name: bedrock/eu.anthropic.claude-opus-4-5-20251101-v1:0
563+
units:
564+
- name: inputTokens
565+
price: '5.0E-6'
566+
- name: outputTokens
567+
price: '2.5E-5'
568+
- name: cacheReadInputTokens
569+
price: '5.0E-7'
570+
- name: cacheWriteInputTokens
571+
price: '6.25E-6'
572+
- name: bedrock/qwen.qwen3-vl-235b-a22b
573+
units:
574+
- name: inputTokens
575+
price: '5.3E-7'
576+
- name: outputTokens
577+
price: '2.66E-6'
532578
# AWS Lambda pricing (US East - N. Virginia)
533579
- name: lambda/requests
534580
units:

config_library/pattern-2/bank-statement-sample/config.yaml

Lines changed: 56 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ classification:
100100
image:
101101
target_height: ""
102102
target_width: ""
103-
top_p: "0.1"
103+
top_p: "0.0"
104104
max_tokens: "4096"
105105
top_k: "5"
106106
task_prompt: >-
@@ -238,11 +238,12 @@ classification:
238238
system_prompt: >-
239239
You are a document classification expert who can analyze and classify multiple documents and their page boundaries within a document package from various domains. Your task is to determine the document type based on its content and structure, using the provided document type definitions. Your output must be valid JSON according to the requested format.
240240
classificationMethod: textbasedHolisticClassification
241+
sectionSplitting: llm_determined
241242
extraction:
242243
image:
243244
target_height: ""
244245
target_width: ""
245-
top_p: "0.1"
246+
top_p: "0.0"
246247
max_tokens: "10000"
247248
top_k: "5"
248249
task_prompt: >-
@@ -337,7 +338,7 @@ extraction:
337338
You are a document assistant. Respond only with JSON. Never make up data, only provide data found in the document being provided.
338339
summarization:
339340
enabled: true
340-
top_p: "0.1"
341+
top_p: "0.0"
341342
max_tokens: "4096"
342343
top_k: "5"
343344
task_prompt: >-
@@ -410,7 +411,7 @@ assessment:
410411
simple_batch_size: "3"
411412
list_batch_size: "1"
412413
default_confidence_threshold: "0.8"
413-
top_p: "0.1"
414+
top_p: "0.0"
414415
max_tokens: "10000"
415416
top_k: "5"
416417
temperature: "0.0"
@@ -559,7 +560,7 @@ assessment:
559560
evaluation:
560561
enabled: true
561562
llm_method:
562-
top_p: "0.1"
563+
top_p: "0.0"
563564
max_tokens: "4096"
564565
top_k: "5"
565566
task_prompt: >-
@@ -641,8 +642,8 @@ discovery:
641642
]
642643
}
643644
with_ground_truth:
644-
top_p: "0.1"
645-
temperature: "1.0"
645+
top_p: "0.0"
646+
temperature: "0.0"
646647
user_prompt: >-
647648
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
648649
<GROUND_TRUTH_REFERENCE>
@@ -669,8 +670,8 @@ discovery:
669670
field definitions.
670671
max_tokens: "10000"
671672
without_ground_truth:
672-
top_p: "0.1"
673-
temperature: "1.0"
673+
top_p: "0.0"
674+
temperature: "0.0"
674675
user_prompt: >-
675676
This image contains forms data. Analyze the form line by line.
676677
Image may contains multiple pages, process all the pages.
@@ -842,6 +843,16 @@ pricing:
842843
price: "2.5E-6"
843844
- name: outputTokens
844845
price: "1.25E-5"
846+
- name: bedrock/us.amazon.nova-2-lite-v1:0
847+
units:
848+
- name: inputTokens
849+
price: '3.0E-7'
850+
- name: outputTokens
851+
price: '2.5E-6'
852+
- name: cacheReadInputTokens
853+
price: '7.5E-8'
854+
- name: cacheWriteInputTokens
855+
price: '3.0E-7'
845856
- name: bedrock/us.anthropic.claude-3-haiku-20240307-v1:0
846857
units:
847858
- name: inputTokens
@@ -948,6 +959,16 @@ pricing:
948959
price: "1.5E-6"
949960
- name: cacheWriteInputTokens
950961
price: "1.875E-5"
962+
- name: bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0
963+
units:
964+
- name: inputTokens
965+
price: '5.0E-06'
966+
- name: outputTokens
967+
price: '2.5E-05'
968+
- name: cacheReadInputTokens
969+
price: '5.0E-07'
970+
- name: cacheWriteInputTokens
971+
price: '6.25E-06'
951972
# EU model pricing
952973
- name: bedrock/eu.amazon.nova-lite-v1:0
953974
units:
@@ -969,6 +990,16 @@ pricing:
969990
price: "2.6E-7"
970991
- name: cacheWriteInputTokens
971992
price: "1.0E-6"
993+
- name: bedrock/eu.amazon.nova-2-lite-v1:0
994+
units:
995+
- name: inputTokens
996+
price: '3.9E-7'
997+
- name: outputTokens
998+
price: '3.27E-6'
999+
- name: cacheReadInputTokens
1000+
price: '9.75E-8'
1001+
- name: cacheWriteInputTokens
1002+
price: '3.9E-7'
9721003
- name: bedrock/eu.anthropic.claude-3-haiku-20240307-v1:0
9731004
units:
9741005
- name: inputTokens
@@ -1035,6 +1066,22 @@ pricing:
10351066
price: "6.6E-7"
10361067
- name: cacheWriteInputTokens
10371068
price: "8.25E-6"
1069+
- name: bedrock/eu.anthropic.claude-opus-4-5-20251101-v1:0
1070+
units:
1071+
- name: inputTokens
1072+
price: '5.0E-6'
1073+
- name: outputTokens
1074+
price: '2.5E-5'
1075+
- name: cacheReadInputTokens
1076+
price: '5.0E-7'
1077+
- name: cacheWriteInputTokens
1078+
price: '6.25E-6'
1079+
- name: bedrock/qwen.qwen3-vl-235b-a22b
1080+
units:
1081+
- name: inputTokens
1082+
price: '5.3E-7'
1083+
- name: outputTokens
1084+
price: '2.66E-6'
10381085
# AWS Lambda pricing (US East - N. Virginia)
10391086
- name: lambda/requests
10401087
units:

0 commit comments

Comments
 (0)