Skip to content

Commit 7af81bf

Browse files
committed
Merge branch 'fix/change-temp-top_p-logic' into 'develop'
> Fix temperature and top_p parameter logic to enable deterministic output with temperature=0.0 See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!446
2 parents 52f81f0 + 3332bc4 commit 7af81bf

File tree

27 files changed

+150
-128
lines changed

27 files changed

+150
-128
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,17 @@ SPDX-License-Identifier: MIT-0
1717
- Resolves #146
1818

1919
### Changed
20+
21+
- **Improved Temperature and Top_P Parameter Logic for Deterministic Output**
22+
- Changed inference parameter selection logic to allow `temperature=0.0` for deterministic output (recommended by Anthropic and other model providers)
23+
- **New Logic**: Uses `top_p` only when it has a positive value (> 0); otherwise uses `temperature` including `temperature=0.0`
24+
- **Previous Logic**: Used `top_p` whenever `temperature=0.0`, preventing proper deterministic configuration
25+
- **Key Benefits**: Enables proper deterministic output with `temperature=0.0`, more intuitive parameter behavior, aligns with model provider best practices (Anthropic recommends `temperature=0` for consistent outputs)
26+
- **Affected Components**: Bedrock client (`lib/idp_common_pkg/idp_common/bedrock/client.py`), Agentic extraction service (`lib/idp_common_pkg/idp_common/extraction/agentic_idp.py`)
27+
- **Configuration Guidance**: Set `top_p: 0` to use `temperature` parameter; set `top_p` to positive value to override temperature
28+
- Set temperature to 0.0 in discovery config for deterministic discovery output (was previously set to 1.0)
29+
- Set top_p to 0.0 in all repo config files to force use of temperature setting by default.
30+
2031
- Removed page image limit entirely across all IDP services (classification, extraction, assessment) following Amazon Bedrock API removal of image count restrictions. The system now processes all document pages without artificial truncation, with info logging to track image counts for monitoring purposes.
2132
- Resolves #147
2233

config_library/pattern-1/lending-package-sample/config.yaml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ assessment:
66
default_confidence_threshold: '0.8'
77
summarization:
88
enabled: true
9-
top_p: '0.1'
9+
top_p: "0.0"
1010
max_tokens: '4096'
1111
top_k: '5'
1212
task_prompt: >-
@@ -62,14 +62,14 @@ summarization:
6262
6363
Do not include any text, explanations, or notes outside of this JSON
6464
structure. The JSON must be properly formatted and parseable.
65-
temperature: '0.0'
65+
temperature: "0.0"
6666
model: us.amazon.nova-premier-v1:0
6767
system_prompt: >-
6868
You are a document summarization expert who can analyze and summarize documents from various domains including medical, financial, legal, and general business documents. Your task is to create a summary that captures the key information, main points, and important details from the document. Your output must be in valid JSON format. \nSummarization Style: Balanced\\nCreate a balanced summary that provides a moderate level of detail. Include the main points and key supporting information, while maintaining the document's overall structure. Aim for a comprehensive yet concise summary.\n Your output MUST be in valid JSON format with markdown content. You MUST strictly adhere to the output format specified in the instructions.
6969
evaluation:
7070
enabled: true
7171
llm_method:
72-
top_p: '0.1'
72+
top_p: "0.0"
7373
max_tokens: '4096'
7474
top_k: '5'
7575
task_prompt: >-
@@ -101,7 +101,7 @@ evaluation:
101101
"score": 0.0 to 1.0,
102102
"reason": "Your explanation here"
103103
}
104-
temperature: '0.0'
104+
temperature: "0.0"
105105
model: us.anthropic.claude-3-haiku-20240307-v1:0
106106
system_prompt: >-
107107
You are an evaluator that helps determine if the predicted and expected values match for document attribute extraction. You will consider the context and meaning rather than just exact string matching.
@@ -151,8 +151,8 @@ discovery:
151151
]
152152
}
153153
with_ground_truth:
154-
top_p: '0.1'
155-
temperature: '1.0'
154+
top_p: "0.0"
155+
temperature: "0.0"
156156
user_prompt: >-
157157
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
158158
<GROUND_TRUTH_REFERENCE>
@@ -179,8 +179,8 @@ discovery:
179179
field definitions.
180180
max_tokens: '10000'
181181
without_ground_truth:
182-
top_p: '0.1'
183-
temperature: '1.0'
182+
top_p: "0.0"
183+
temperature: "0.0"
184184
user_prompt: >-
185185
This image contains forms data. Analyze the form line by line.
186186
Image may contains multiple pages, process all the pages.

config_library/pattern-2/bank-statement-sample/config.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ classification:
100100
image:
101101
target_height: ""
102102
target_width: ""
103-
top_p: "0.1"
103+
top_p: "0.0"
104104
max_tokens: "4096"
105105
top_k: "5"
106106
task_prompt: >-
@@ -243,7 +243,7 @@ extraction:
243243
image:
244244
target_height: ""
245245
target_width: ""
246-
top_p: "0.1"
246+
top_p: "0.0"
247247
max_tokens: "10000"
248248
top_k: "5"
249249
task_prompt: >-
@@ -338,7 +338,7 @@ extraction:
338338
You are a document assistant. Respond only with JSON. Never make up data, only provide data found in the document being provided.
339339
summarization:
340340
enabled: true
341-
top_p: "0.1"
341+
top_p: "0.0"
342342
max_tokens: "4096"
343343
top_k: "5"
344344
task_prompt: >-
@@ -411,7 +411,7 @@ assessment:
411411
simple_batch_size: "3"
412412
list_batch_size: "1"
413413
default_confidence_threshold: "0.8"
414-
top_p: "0.1"
414+
top_p: "0.0"
415415
max_tokens: "10000"
416416
top_k: "5"
417417
temperature: "0.0"
@@ -560,7 +560,7 @@ assessment:
560560
evaluation:
561561
enabled: true
562562
llm_method:
563-
top_p: "0.1"
563+
top_p: "0.0"
564564
max_tokens: "4096"
565565
top_k: "5"
566566
task_prompt: >-
@@ -642,8 +642,8 @@ discovery:
642642
]
643643
}
644644
with_ground_truth:
645-
top_p: "0.1"
646-
temperature: "1.0"
645+
top_p: "0.0"
646+
temperature: "0.0"
647647
user_prompt: >-
648648
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
649649
<GROUND_TRUTH_REFERENCE>
@@ -670,8 +670,8 @@ discovery:
670670
field definitions.
671671
max_tokens: "10000"
672672
without_ground_truth:
673-
top_p: "0.1"
674-
temperature: "1.0"
673+
top_p: "0.0"
674+
temperature: "0.0"
675675
user_prompt: >-
676676
This image contains forms data. Analyze the form line by line.
677677
Image may contains multiple pages, process all the pages.

config_library/pattern-2/criteria-validation/config.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -232,8 +232,8 @@ discovery:
232232
]
233233
}
234234
with_ground_truth:
235-
top_p: '0.1'
236-
temperature: '1.0'
235+
top_p: "0.0"
236+
temperature: "0.0"
237237
user_prompt: >-
238238
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
239239
<GROUND_TRUTH_REFERENCE>
@@ -260,8 +260,8 @@ discovery:
260260
field definitions.
261261
max_tokens: '10000'
262262
without_ground_truth:
263-
top_p: '0.1'
264-
temperature: '1.0'
263+
top_p: "0.0"
264+
temperature: "0.0"
265265
user_prompt: >-
266266
This image contains forms data. Analyze the form line by line.
267267
Image may contains multiple pages, process all the pages.

config_library/pattern-2/lending-package-sample/config.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1194,7 +1194,7 @@ classification:
11941194
target_width: ""
11951195
model: us.amazon.nova-pro-v1:0
11961196
temperature: "0.0"
1197-
top_p: "0.1"
1197+
top_p: "0.0"
11981198
max_tokens: "4096"
11991199
top_k: "5"
12001200
system_prompt: >-
@@ -1256,7 +1256,7 @@ extraction:
12561256
image:
12571257
target_width: ""
12581258
target_height: ""
1259-
top_p: "0.1"
1259+
top_p: "0.0"
12601260
max_tokens: "10000"
12611261
top_k: "5"
12621262
task_prompt: >-
@@ -1351,7 +1351,7 @@ extraction:
13511351
You are a document assistant. Respond only with JSON. Never make up data, only provide data found in the document being provided.
13521352
summarization:
13531353
enabled: true
1354-
top_p: "0.1"
1354+
top_p: "0.0"
13551355
max_tokens: "4096"
13561356
top_k: "5"
13571357
task_prompt: >-
@@ -1435,7 +1435,7 @@ assessment:
14351435
simple_batch_size: "3"
14361436
list_batch_size: "1"
14371437
default_confidence_threshold: "0.8"
1438-
top_p: "0.1"
1438+
top_p: "0.0"
14391439
max_tokens: "10000"
14401440
top_k: "5"
14411441
temperature: "0.0"
@@ -1583,7 +1583,7 @@ assessment:
15831583
evaluation:
15841584
enabled: true
15851585
llm_method:
1586-
top_p: "0.1"
1586+
top_p: "0.0"
15871587
max_tokens: "4096"
15881588
top_k: "5"
15891589
task_prompt: >-
@@ -1665,8 +1665,8 @@ discovery:
16651665
]
16661666
}
16671667
with_ground_truth:
1668-
top_p: "0.1"
1669-
temperature: "1.0"
1668+
top_p: "0.0"
1669+
temperature: "0.0"
16701670
user_prompt: >-
16711671
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
16721672
<GROUND_TRUTH_REFERENCE>
@@ -1693,8 +1693,8 @@ discovery:
16931693
field definitions.
16941694
max_tokens: "10000"
16951695
without_ground_truth:
1696-
top_p: "0.1"
1697-
temperature: "1.0"
1696+
top_p: "0.0"
1697+
temperature: "0.0"
16981698
user_prompt: >-
16991699
This image contains forms data. Analyze the form line by line.
17001700
Image may contains multiple pages, process all the pages.

config_library/pattern-2/rvl-cdip-package-sample-with-few-shot-examples/config.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -816,7 +816,7 @@ classification:
816816
sectionSplitting: llm_determined
817817
model: us.amazon.nova-pro-v1:0
818818
temperature: "0.0"
819-
top_p: "0.1"
819+
top_p: "0.0"
820820
top_k: "5"
821821
max_tokens: "4096"
822822
system_prompt: >-
@@ -869,7 +869,7 @@ extraction:
869869
target_width: ""
870870
model: us.amazon.nova-pro-v1:0
871871
temperature: "0.0"
872-
top_p: "0.1"
872+
top_p: "0.0"
873873
top_k: "5"
874874
max_tokens: "4096"
875875
system_prompt: >
@@ -973,7 +973,7 @@ assessment:
973973
simple_batch_size: "3"
974974
list_batch_size: "1"
975975
default_confidence_threshold: "0.8"
976-
top_p: "0.1"
976+
top_p: "0.0"
977977
max_tokens: "10000"
978978
top_k: "5"
979979
temperature: "0.0"
@@ -1121,7 +1121,7 @@ assessment:
11211121
evaluation:
11221122
enabled: true
11231123
llm_method:
1124-
top_p: "0.1"
1124+
top_p: "0.0"
11251125
max_tokens: "4096"
11261126
top_k: "5"
11271127
task_prompt: >
@@ -1165,7 +1165,7 @@ evaluation:
11651165
context and meaning rather than just exact string matching.
11661166
summarization:
11671167
enabled: true
1168-
top_p: "0.1"
1168+
top_p: "0.0"
11691169
max_tokens: "4096"
11701170
top_k: "5"
11711171
task_prompt: >
@@ -1287,8 +1287,8 @@ discovery:
12871287
]
12881288
}
12891289
with_ground_truth:
1290-
top_p: "0.1"
1291-
temperature: "1.0"
1290+
top_p: "0.0"
1291+
temperature: "0.0"
12921292
user_prompt: >-
12931293
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
12941294
<GROUND_TRUTH_REFERENCE>
@@ -1315,8 +1315,8 @@ discovery:
13151315
field definitions.
13161316
max_tokens: "10000"
13171317
without_ground_truth:
1318-
top_p: "0.1"
1319-
temperature: "1.0"
1318+
top_p: "0.0"
1319+
temperature: "0.0"
13201320
user_prompt: >-
13211321
This image contains forms data. Analyze the form line by line.
13221322
Image may contains multiple pages, process all the pages.

config_library/pattern-2/rvl-cdip-package-sample/config.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -767,7 +767,7 @@ classification:
767767
image:
768768
target_height: ""
769769
target_width: ""
770-
top_p: "0.1"
770+
top_p: "0.0"
771771
max_tokens: "4096"
772772
top_k: "5"
773773
task_prompt: >-
@@ -910,7 +910,7 @@ extraction:
910910
image:
911911
target_width: ""
912912
target_height: ""
913-
top_p: "0.1"
913+
top_p: "0.0"
914914
max_tokens: "10000"
915915
top_k: "5"
916916
task_prompt: >-
@@ -1005,7 +1005,7 @@ extraction:
10051005
You are a document assistant. Respond only with JSON. Never make up data, only provide data found in the document being provided.
10061006
summarization:
10071007
enabled: true
1008-
top_p: "0.1"
1008+
top_p: "0.0"
10091009
max_tokens: "4096"
10101010
top_k: "5"
10111011
task_prompt: >-
@@ -1077,7 +1077,7 @@ assessment:
10771077
simple_batch_size: "3"
10781078
list_batch_size: "1"
10791079
default_confidence_threshold: "0.8"
1080-
top_p: "0.1"
1080+
top_p: "0.0"
10811081
max_tokens: "10000"
10821082
top_k: "5"
10831083
temperature: "0.0"
@@ -1225,7 +1225,7 @@ assessment:
12251225
evaluation:
12261226
enabled: true
12271227
llm_method:
1228-
top_p: "0.1"
1228+
top_p: "0.0"
12291229
max_tokens: "4096"
12301230
top_k: "5"
12311231
task_prompt: >-
@@ -1307,8 +1307,8 @@ discovery:
13071307
]
13081308
}
13091309
with_ground_truth:
1310-
top_p: "0.1"
1311-
temperature: "1.0"
1310+
top_p: "0.0"
1311+
temperature: "0.0"
13121312
user_prompt: >-
13131313
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
13141314
<GROUND_TRUTH_REFERENCE>
@@ -1335,8 +1335,8 @@ discovery:
13351335
field definitions.
13361336
max_tokens: "10000"
13371337
without_ground_truth:
1338-
top_p: "0.1"
1339-
temperature: "1.0"
1338+
top_p: "0.0"
1339+
temperature: "0.0"
13401340
user_prompt: >-
13411341
This image contains forms data. Analyze the form line by line.
13421342
Image may contains multiple pages, process all the pages.

0 commit comments

Comments
 (0)