Skip to content

Commit d7e2c61

Browse files
author
Bob Strahan
committed
Merge branch 'develop' into feature/add-claude4.5-model
2 parents 1737130 + 111b085 commit d7e2c61

File tree

40 files changed

+4633
-733
lines changed

40 files changed

+4633
-733
lines changed

.gitlab-ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@ developer_tests:
5151

5252
integration_tests:
5353
stage: integration_tests
54-
timeout: 2h
5554
# variables:
5655
# # In order to run tests in another account, add a AWS_CREDS_TARGET_ROLE variable to the Gitlab pipeline variables.
5756
# AWS_CREDS_TARGET_ROLE: ${AWS_CREDS_TARGET_ROLE}

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,23 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
## [0.3.19]
9+
10+
### Added
11+
12+
- **Error Analyzer (Troubleshooting Tool) for AI-Powered Failure Diagnosis**
13+
- Introduced intelligent AI-powered troubleshooting agent that automatically diagnoses document processing failures using Claude Sonnet 4 with the Strands agent framework
14+
- **Key Capabilities**: Natural language query interface, intelligent routing between document-specific and system-wide analysis, multi-source data correlation (CloudWatch Logs, DynamoDB, Step Functions), root cause identification with actionable recommendations, evidence-based analysis with collapsible log details
15+
- **Web UI Integration**: Accessible via "Troubleshoot" button on failed documents with real-time job status, progress tracking, automatic job resumption, and formatted results (Root Cause, Recommendations, Evidence sections)
16+
- **Tool Ecosystem**: 8 specialized tools including analyze_errors (main router), analyze_document_failure, analyze_recent_system_errors, CloudWatch log search tools, DynamoDB integration tools, and Lambda context retrieval - additional tools will be added as the feature evolves.
17+
- **Configuration**: Configurable via Web UI including model selection (Claude Sonnet 4 recommended), system prompt customization, max_log_events (default: 5), and time_range_hours_default (default: 24)
18+
- **Documentation**: Comprehensive guide in `docs/error-analyzer.md` with architecture diagrams, usage examples, best practices, troubleshooting guide.
19+
20+
### Fixed
21+
- Problem with setting correctly formatted WAF IPv4 CIDR range - #73
22+
23+
24+
825
## [0.3.18]
926

1027
### Added

config_library/pattern-1/lending-package-sample/config.yaml

Lines changed: 167 additions & 101 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,173 @@ evaluation:
104104
model: us.anthropic.claude-3-haiku-20240307-v1:0
105105
system_prompt: >-
106106
You are an evaluator that helps determine if the predicted and expected values match for document attribute extraction. You will consider the context and meaning rather than just exact string matching.
107+
discovery:
108+
output_format:
109+
sample_json: |-
110+
{
111+
"document_class" : "Form-1040",
112+
"document_description" : "Brief summary of the document",
113+
"groups" : [
114+
{
115+
"name" : "PersonalInformation",
116+
"description" : "Personal information of Tax payer",
117+
"attributeType" : "group",
118+
"groupAttributes" : [
119+
{
120+
"name": "FirstName",
121+
"dataType" : "string",
122+
"description" : "First Name of Taxpayer"
123+
},
124+
{
125+
"name": "Age",
126+
"dataType" : "number",
127+
"description" : "Age of Taxpayer"
128+
}
129+
]
130+
},
131+
{
132+
"name" : "Dependents",
133+
"description" : "Dependents of taxpayer",
134+
"attributeType" : "list",
135+
"listItemTemplate": {
136+
"itemAttributes" : [
137+
{
138+
"name": "FirstName",
139+
"dataType" : "string",
140+
"description" : "Dependent first name"
141+
},
142+
{
143+
"name": "Age",
144+
"dataType" : "number",
145+
"description" : "Dependent Age"
146+
}
147+
]
148+
}
149+
}
150+
]
151+
}
152+
with_ground_truth:
153+
top_p: '0.1'
154+
temperature: '1.0'
155+
user_prompt: >-
156+
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
157+
<GROUND_TRUTH_REFERENCE>
158+
{ground_truth_json}
159+
</GROUND_TRUTH_REFERENCE>
160+
Ground truth reference JSON has the fields we are interested in extracting from the document/image. Use the ground truth to optimize field extraction. Match field names, data types, and groupings from the reference.
161+
Image may contain multiple pages, process all pages.
162+
Extract all field names including those without values.
163+
Do not change the group name and field name from ground truth in the extracted data json.
164+
Add field_description field for every field which will contain instruction to LLM to extract the field data from the image/document. Add data_type field for every field.
165+
Add two fields document_class and document_description.
166+
For document_class generate a short name based on the document content like W4, I-9, Paystub.
167+
For document_description generate a description about the document in less than 50 words.
168+
If the group repeats and follows table format, update the attributeType as "list".
169+
Do not extract the values.
170+
Format the extracted data using the below JSON format:
171+
Format the extracted groups and fields using the below JSON format:
172+
173+
model_id: us.amazon.nova-pro-v1:0
174+
system_prompt: >-
175+
You are an expert in processing forms. Extracting data from images and
176+
documents. Use provided ground truth data as reference to optimize field
177+
extraction and ensure consistency with expected document structure and
178+
field definitions.
179+
max_tokens: '10000'
180+
without_ground_truth:
181+
top_p: '0.1'
182+
temperature: '1.0'
183+
user_prompt: >-
184+
This image contains forms data. Analyze the form line by line.
185+
Image may contains multiple pages, process all the pages.
186+
Form may contain multiple name value pair in one line.
187+
Extract all the names in the form including the name value pair which doesn't have value.
188+
Organize them into groups, extract field_name, data_type and field description
189+
Field_name should be less than 60 characters, should not have space use '-' instead of space.
190+
field_description is a brief description of the field and the location of the field like box number or line number in the form and section of the form.
191+
Field_name should be unique within the group.
192+
Add two fields document_class and document_description.
193+
For document_class generate a short name based on the document content like W4, I-9, Paystub.
194+
For document_description generate a description about the document in less than 50 words.
195+
196+
Group the fields based on the section they are grouped in the form. Group should have attributeType as "group".
197+
If the group repeats and follows table format, update the attributeType as "list".
198+
Do not extract the values.
199+
Return the extracted data in JSON format.
200+
Format the extracted data using the below JSON format:
201+
Format the extracted groups and fields using the below JSON format:
202+
model_id: us.amazon.nova-pro-v1:0
203+
system_prompt: >-
204+
You are an expert in processing forms. Extracting data from images and
205+
documents. Analyze forms line by line to identify field names, data types,
206+
and organizational structure. Focus on creating comprehensive blueprints
207+
for document processing without extracting actual values.
208+
max_tokens: '10000'
209+
agents:
210+
error_analyzer:
211+
model_id: us.anthropic.claude-sonnet-4-20250514-v1:0
212+
213+
system_prompt: |-
214+
You are an intelligent error analysis agent for the GenAI IDP system.
215+
216+
Use the analyze_errors tool to investigate issues. ALWAYS format your response with exactly these three sections in this order:
217+
218+
## Root Cause
219+
Identify the specific underlying technical reason why the error occurred. Focus on the primary cause, not symptoms.
220+
221+
## Recommendations
222+
Provide specific, actionable steps to resolve the issue. Limit to top three recommendations only.
223+
224+
<details>
225+
<summary><strong>Evidence</strong></summary>
226+
227+
Format log entries with their source information. For each log entry, show:
228+
**Log Group:**
229+
[full log_group name from tool response]
230+
231+
**Log Stream:**
232+
[full log_stream name from tool response]
233+
```
234+
[ERROR] timestamp message (from events data)
235+
```
236+
237+
</details>
238+
239+
FORMATTING RULES:
240+
- Use the exact three-section structure above
241+
- Make Evidence section collapsible using HTML details tags
242+
- Extract log_group, log_stream, and events data from tool response
243+
- Show complete log group and log stream names without truncation
244+
- Present actual log messages from events array in code blocks
245+
246+
RECOMMENDATION GUIDELINES:
247+
For code-related issues or system bugs:
248+
- Do not suggest code modifications
249+
- Include error details, timestamps, and context
250+
251+
For configuration-related issues:
252+
- Direct users to UI configuration panel
253+
- Specify exact configuration section and parameter names
254+
255+
For operational issues:
256+
- Provide immediate troubleshooting steps
257+
- Include preventive measures
258+
259+
TIME RANGE PARSING:
260+
- recent/recently: 1 hour
261+
- last week: 168 hours
262+
- last day/yesterday: 24 hours
263+
- No time specified: 24 hours (default)
264+
265+
SPECIAL CASES:
266+
If analysis_type is "document_not_found": explain document cannot be located, focus on verification steps and processing issues.
267+
268+
DO NOT include code suggestions, technical summaries, or multiple paragraphs of explanation. Keep responses concise and actionable.
269+
270+
IMPORTANT: Do not include any search quality reflections, search quality scores, or meta-analysis sections in your response. Only provide the three required sections: Root Cause, Recommendations, and Evidence.
271+
parameters:
272+
max_log_events: 5
273+
time_range_hours_default: 24
107274
pricing:
108275
- name: bda/documents-custom
109276
units:
@@ -244,105 +411,4 @@ pricing:
244411
units:
245412
- name: gb_seconds
246413
price: '1.66667E-5' # $0.0000166667 per GB-second ($16.67 per 1M GB-seconds)
247-
discovery:
248-
output_format:
249-
sample_json: |-
250-
{
251-
"document_class" : "Form-1040",
252-
"document_description" : "Brief summary of the document",
253-
"groups" : [
254-
{
255-
"name" : "PersonalInformation",
256-
"description" : "Personal information of Tax payer",
257-
"attributeType" : "group",
258-
"groupAttributes" : [
259-
{
260-
"name": "FirstName",
261-
"dataType" : "string",
262-
"description" : "First Name of Taxpayer"
263-
},
264-
{
265-
"name": "Age",
266-
"dataType" : "number",
267-
"description" : "Age of Taxpayer"
268-
}
269-
]
270-
},
271-
{
272-
"name" : "Dependents",
273-
"description" : "Dependents of taxpayer",
274-
"attributeType" : "list",
275-
"listItemTemplate": {
276-
"itemAttributes" : [
277-
{
278-
"name": "FirstName",
279-
"dataType" : "string",
280-
"description" : "Dependent first name"
281-
},
282-
{
283-
"name": "Age",
284-
"dataType" : "number",
285-
"description" : "Dependent Age"
286-
}
287-
]
288-
}
289-
}
290-
]
291-
}
292-
with_ground_truth:
293-
top_p: '0.1'
294-
temperature: '1.0'
295-
user_prompt: >-
296-
This image contains unstructured data. Analyze the data line by line using the provided ground truth as reference.
297-
<GROUND_TRUTH_REFERENCE>
298-
{ground_truth_json}
299-
</GROUND_TRUTH_REFERENCE>
300-
Ground truth reference JSON has the fields we are interested in extracting from the document/image. Use the ground truth to optimize field extraction. Match field names, data types, and groupings from the reference.
301-
Image may contain multiple pages, process all pages.
302-
Extract all field names including those without values.
303-
Do not change the group name and field name from ground truth in the extracted data json.
304-
Add field_description field for every field which will contain instruction to LLM to extract the field data from the image/document. Add data_type field for every field.
305-
Add two fields document_class and document_description.
306-
For document_class generate a short name based on the document content like W4, I-9, Paystub.
307-
For document_description generate a description about the document in less than 50 words.
308-
If the group repeats and follows table format, update the attributeType as "list".
309-
Do not extract the values.
310-
Format the extracted data using the below JSON format:
311-
Format the extracted groups and fields using the below JSON format:
312-
313-
model_id: us.amazon.nova-pro-v1:0
314-
system_prompt: >-
315-
You are an expert in processing forms. Extracting data from images and
316-
documents. Use provided ground truth data as reference to optimize field
317-
extraction and ensure consistency with expected document structure and
318-
field definitions.
319-
max_tokens: '10000'
320-
without_ground_truth:
321-
top_p: '0.1'
322-
temperature: '1.0'
323-
user_prompt: >-
324-
This image contains forms data. Analyze the form line by line.
325-
Image may contains multiple pages, process all the pages.
326-
Form may contain multiple name value pair in one line.
327-
Extract all the names in the form including the name value pair which doesn't have value.
328-
Organize them into groups, extract field_name, data_type and field description
329-
Field_name should be less than 60 characters, should not have space use '-' instead of space.
330-
field_description is a brief description of the field and the location of the field like box number or line number in the form and section of the form.
331-
Field_name should be unique within the group.
332-
Add two fields document_class and document_description.
333-
For document_class generate a short name based on the document content like W4, I-9, Paystub.
334-
For document_description generate a description about the document in less than 50 words.
335414

336-
Group the fields based on the section they are grouped in the form. Group should have attributeType as "group".
337-
If the group repeats and follows table format, update the attributeType as "list".
338-
Do not extract the values.
339-
Return the extracted data in JSON format.
340-
Format the extracted data using the below JSON format:
341-
Format the extracted groups and fields using the below JSON format:
342-
model_id: us.amazon.nova-pro-v1:0
343-
system_prompt: >-
344-
You are an expert in processing forms. Extracting data from images and
345-
documents. Analyze forms line by line to identify field names, data types,
346-
and organizational structure. Focus on creating comprehensive blueprints
347-
for document processing without extracting actual values.
348-
max_tokens: '10000'

0 commit comments

Comments
 (0)