Skip to content

Commit a2275de

Browse files
committed
Merge branch 'develop' into feat/multimodal_page_boundary_detection
2 parents e001f78 + 03d576f commit a2275de

File tree

89 files changed

+8971
-26
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+8971
-26
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ For detailed deployment and testing instructions, see the [Deployment Guide](./d
122122
- [Architecture](./docs/architecture.md) - Detailed component architecture and data flow
123123
- [Deployment](./docs/deployment.md) - Build, publish, deploy, and test instructions
124124
- [Web UI](./docs/web-ui.md) - Web interface features and usage
125+
- [Agent Analysis](./docs/agent-analysis.md) - Natural language analytics and data visualization feature
125126
- [Configuration](./docs/configuration.md) - Configuration and customization options
126127
- [Classification](./docs/classification.md) - Customizing document classification
127128
- [Extraction](./docs/extraction.md) - Customizing information extraction

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.3.9
1+
0.3.10-wip

docs/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ This folder contains detailed documentation on various aspects of the GenAI Inte
1010
- [Architecture](./architecture.md) - Detailed component architecture and data flow
1111
- [Deployment](./deployment.md) - Build, publish, deploy, and test instructions
1212
- [Web UI](./web-ui.md) - Web interface features and usage
13+
- [Agent Analysis](./agent-analysis.md) - Natural language analytics and data visualization feature
1314
- [Knowledge Base](./knowledge-base.md) - Document knowledge base query feature
1415
- [Evaluation Framework](./evaluation.md) - Accuracy assessment system
1516
- [Assessment Feature](./assessment.md) - Extraction confidence evaluation using LLMs

docs/agent-analysis.md

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
SPDX-License-Identifier: MIT-0
3+
4+
# Agent Analysis Feature
5+
6+
The GenAIIDP solution includes an integrated Agent Analysis feature that enables you to interactively query and analyze your processed document data using natural language. This feature leverages AI agents to convert natural language questions into SQL queries, execute them against your document analytics database, and generate visualizations or tables to answer your questions.
7+
8+
## Overview
9+
10+
The Agent Analysis feature provides intelligent data exploration capabilities that allow users to:
11+
12+
- **Natural Language Querying**: Ask questions about your document data in plain English
13+
- **Automated SQL Generation**: AI agents convert your questions into optimized SQL queries
14+
- **Interactive Visualizations**: Generate charts, graphs, and tables from query results
15+
- **Real-time Analysis**: Get insights from your processed documents without manual data analysis
16+
- **Secure Code Execution**: Python visualization code runs in isolated AWS Bedrock AgentCore sandboxes
17+
18+
## Key Features
19+
20+
- **Multi-Modal AI Agent**: Uses advanced language models (Claude 3.7 Sonnet by default) for intelligent query understanding
21+
- **Secure Architecture**: All code execution happens in AWS Bedrock AgentCore sandboxes, not in Lambda functions
22+
- **Database Schema Discovery**: Agents automatically explore and understand your database structure
23+
- **Flexible Visualization**: Supports multiple chart types including bar charts, line charts, pie charts, and data tables
24+
- **Query History**: Track and manage previous analytics queries through the web interface
25+
- **Real-time Progress**: Live display of agent thought processes and SQL query execution
26+
- **Error Handling**: Intelligent retry logic for failed queries with automatic corrections
27+
28+
## Architecture
29+
30+
### Agent Workflow
31+
32+
1. **Question Processing**: User submits a natural language question through the web UI
33+
2. **Database Discovery**: Agent explores database schema using `get_database_info` tool
34+
3. **SQL Generation**: Agent converts the question into optimized SQL queries with proper column quoting
35+
4. **Query Execution**: SQL queries are executed against Amazon Athena with results stored in S3
36+
5. **Data Processing**: Query results are securely transferred to AWS Bedrock AgentCore sandbox
37+
6. **Visualization Generation**: Python code generates charts or tables from the data
38+
7. **Result Display**: Final visualizations are displayed in the web interface
39+
40+
### Security Architecture
41+
42+
The Agent Analysis feature implements a security-first design:
43+
44+
- **Sandboxed Execution**: All Python code runs in AWS Bedrock AgentCore, completely isolated from the rest of the AWS environment and the internet
45+
- **Secure Data Transfer**: Query results are transferred via S3 and AgentCore APIs, never through direct file system access
46+
- **Session Management**: Code interpreter sessions are properly managed and cleaned up after use
47+
- **Minimal Permissions**: Each component requests only the necessary AWS permissions
48+
- **Audit Trail**: Comprehensive logging and monitoring for security reviews
49+
50+
### Data Flow
51+
52+
```
53+
User Question → Analytics Request Handler → Analytics Processor → Agent Tools:
54+
├── Database Info Tool
55+
├── Athena Query Tool
56+
├── Code Sandbox Tool
57+
└── Python Execution Tool
58+
59+
Results ← Web UI ← AppSync Subscription ← DynamoDB ← Agent Response
60+
```
61+
62+
## Available Tools
63+
64+
The analytics agent has access to four specialized tools:
65+
66+
### 1. Database Information Tool
67+
- **Purpose**: Discovers database schema and table structures
68+
- **Usage**: Automatically called to understand available tables and columns
69+
- **Output**: Table names, column definitions, and data types
70+
71+
### 2. Athena Query Tool
72+
- **Purpose**: Executes SQL queries against the analytics database
73+
- **Features**:
74+
- Automatic column name quoting for Athena compatibility
75+
- Query result storage in S3
76+
- Error handling and retry logic
77+
- Support for both exploratory and final queries
78+
79+
### 3. Code Sandbox Tool
80+
- **Purpose**: Securely transfers query results to AgentCore sandbox
81+
- **Security**: Isolated environment with no Lambda file system access
82+
- **Data Format**: CSV files containing query results
83+
84+
### 4. Python Execution Tool
85+
- **Purpose**: Generates visualizations and tables from query data
86+
- **Libraries**: Pandas, Matplotlib, and other standard Python libraries
87+
- **Output**: JSON-formatted charts and tables for web display
88+
89+
## Using Agent Analysis
90+
91+
### Accessing the Feature
92+
93+
1. Log in to the GenAIIDP Web UI
94+
2. Navigate to the "Document Analytics" section in the main navigation
95+
3. You'll see a chat-like interface for querying your document data
96+
97+
### Asking Questions
98+
99+
The agent can answer various types of questions about your processed documents:
100+
101+
**Document Volume Questions:**
102+
- "How many documents were processed last month?"
103+
- "What's the trend in document processing over time?"
104+
- "Which document types are most common?"
105+
106+
**Processing Performance Questions:**
107+
- "What's the average processing time by document type?"
108+
- "Which documents failed processing and why?"
109+
- "Show me processing success rates by day"
110+
111+
**Content Analysis Questions:**
112+
- "What are the most common vendor names in invoices?"
113+
- "Show me the distribution of invoice amounts"
114+
- "Which documents have the highest confidence scores?"
115+
116+
**Comparative Analysis Questions:**
117+
- "How do confidence scores vary by document type?"
118+
- "What's the relationship between document size and processing time?"
119+
120+
### Sample Queries
121+
122+
Here are some example questions you can ask:
123+
124+
```
125+
"Show me a chart of document processing volume by day for the last 30 days"
126+
127+
"What are the top 10 most common document classifications?"
128+
129+
"Create a table showing average confidence scores by document type"
130+
131+
"Plot the relationship between document page count and processing time"
132+
133+
"Which extraction fields have the lowest average confidence scores?"
134+
```
135+
136+
### Understanding Results
137+
138+
The agent can return three types of results:
139+
140+
1. **Charts/Plots**: Visual representations of data trends and patterns
141+
2. **Tables**: Structured data displays for detailed information
142+
3. **Text Responses**: Direct answers to simple questions
143+
144+
Each result includes:
145+
- The original question
146+
- SQL queries that were executed
147+
- The final visualization or answer
148+
- Agent reasoning and thought process
149+
150+
## Testing with Sample Data
151+
152+
The solution includes sample W2 tax documents for testing the analytics feature:
153+
154+
### Sample Documents Location
155+
- **Path**: `/samples/w2/`
156+
- **Files**: 20 sample W2 documents (W2_XL_input_clean_1000.pdf through W2_XL_input_clean_1019.pdf)
157+
- **Purpose**: Realistic test data for exploring analytics capabilities
158+
- **Source**: Sample W2 documents are from [this kaggle dataset](https://www.kaggle.com/datasets/mcvishnu1/fake-w2-us-tax-form-dataset) and are 100% synthetic with a [CC0 1.0 public domain license](https://creativecommons.org/publicdomain/zero/1.0/).
159+
160+
### Testing Steps
161+
162+
1. **Upload Sample Documents**:
163+
- Use the Web UI to upload documents from the `/samples/w2/` folder
164+
- Or copy them directly to the S3 input bucket
165+
166+
2. **Wait for Processing**:
167+
- Monitor document processing through the Web UI dashboard
168+
- Ensure all documents complete successfully
169+
170+
3. **Try Sample Queries**:
171+
```
172+
"How many W2 documents have been processed?"
173+
174+
"Make a bar chart histogram of total earnings in all W2s with bins $25000 wide"
175+
176+
"What employee from the state of California paid the most tax?"
177+
178+
"What is the ratio of state tax paid to federal tax paid for the following states: Vermont, Nevada, Indiana, and Oregon?"
179+
```
180+
181+
## Configuration
182+
183+
The Agent Analysis feature is configured through CloudFormation parameters:
184+
185+
### Model Selection
186+
```yaml
187+
DocumentAnalysisAgentModelId:
188+
Type: String
189+
Default: "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
190+
Description: Model to use for Document Analysis Agent (analytics queries)
191+
```
192+
193+
**Supported Models:**
194+
- `us.anthropic.claude-3-7-sonnet-20250219-v1:0` (Default - Recommended)
195+
- `us.anthropic.claude-3-5-sonnet-20241022-v2:0`
196+
- `us.anthropic.claude-3-haiku-20240307-v1:0`
197+
- `us.amazon.nova-pro-v1:0`
198+
- `us.amazon.nova-lite-v1:0`
199+
200+
### Infrastructure Components
201+
202+
The feature automatically creates:
203+
- **DynamoDB Table**: Tracks analytics job status and results
204+
- **Lambda Functions**: Request handler and processor functions
205+
- **AppSync Resolvers**: GraphQL API endpoints for web UI integration
206+
- **IAM Roles**: Minimal permissions for secure operation
207+
208+
### Environment Variables
209+
210+
Key configuration settings:
211+
- `ANALYTICS_TABLE`: DynamoDB table for job tracking
212+
- `ATHENA_DATABASE`: Database containing processed document data
213+
- `ATHENA_OUTPUT_LOCATION`: S3 location for query results
214+
- `DOCUMENT_ANALYSIS_AGENT_MODEL_ID`: AI model for agent processing
215+
216+
## Best Practices
217+
218+
### Query Optimization
219+
220+
1. **Start Broad**: Begin with general questions before diving into specifics
221+
2. **Be Specific**: Clearly state what information you're looking for
222+
3. **Use Follow-ups**: Build on what you learned in previous questions to explore topics in depth (note: each question is independent; there is no actual conversation history)
223+
4. **Check Results**: Verify visualizations make sense for your data
224+
225+
### Security Best Practices
226+
227+
1. **Data Access**: Only authenticated users can access analytics features
228+
2. **Query Isolation**: Each user's queries are isolated and tracked separately
229+
3. **Audit Logging**: All queries and results are logged for security reviews
230+
4. **Sandbox Security**: Python code execution is completely isolated from system resources
231+
232+
## Troubleshooting
233+
234+
### Common Issues
235+
236+
**Agent Not Responding:**
237+
- Check CloudWatch logs for the Analytics Processor Lambda function
238+
- Verify Bedrock model access is enabled for your selected model
239+
- Ensure sufficient Lambda timeout (15 minutes) for complex queries
240+
241+
**SQL Query Errors:**
242+
- Agent automatically retries failed queries up to 5 times
243+
- Check that column names are properly quoted in generated SQL
244+
- Verify database permissions for Athena access
245+
246+
**Visualization Errors:**
247+
- Check that query results contain expected data types
248+
- Verify Python code generation in AgentCore sandbox
249+
- Review agent messages for detailed error information
250+
251+
**Performance Issues:**
252+
- Consider using simpler queries for large datasets
253+
- Try breaking complex questions into smaller parts
254+
- Monitor Athena query performance and optimize if needed
255+
256+
### Monitoring and Logging
257+
258+
- **CloudWatch Logs**: Detailed logs for both Lambda functions
259+
- **DynamoDB Console**: View job status and results directly
260+
- **Athena Console**: Monitor SQL query execution and performance
261+
- **Agent Messages**: Real-time display of agent reasoning in web UI
262+
263+
## Cost Considerations
264+
265+
The Agent Analysis feature uses several AWS services that incur costs:
266+
267+
- **Amazon Bedrock**: Model inference costs for agent processing
268+
- **AWS Bedrock AgentCore**: Code interpreter session costs
269+
- **Amazon Athena**: Query execution costs based on data scanned
270+
- **Amazon S3**: Storage costs for query results
271+
- **AWS Lambda**: Function execution costs
272+
- **Amazon DynamoDB**: Storage and request costs for job tracking
273+
274+
To optimize costs:
275+
- Choose appropriate Bedrock models based on accuracy vs. cost requirements
276+
- Monitor usage through AWS Cost Explorer
277+
278+
## Integration with Other Features
279+
280+
The Agent Analysis feature has access to _all_ tables that the GenAIIDP stores in Athena. Therefore it integrates seamlessly with other GenAIIDP capabilities:
281+
282+
### Evaluation Framework Integration
283+
- Query evaluation metrics and accuracy scores
284+
- Analyze patterns in document processing quality
285+
- Compare performance across different processing patterns
286+
287+
### Assessment Feature Integration
288+
- Explore confidence scores across document types
289+
- Identify low-confidence extractions requiring review
290+
- Analyze relationships between confidence and accuracy
291+
292+
## Future Enhancements
293+
294+
Planned improvements for the Agent Analysis feature include:
295+
296+
- **Dashboard Creation**: Save and share custom analytics dashboards
297+
- **Possible KB Unification**: Have one chat box in the UI which is capable of answering questions based either on the knowledge base (with semantic abilities), or on the Athena tables.

docs/web-ui.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,40 @@ The solution includes a responsive web-based user interface built with React tha
2323
- Document upload from local computer
2424
- Knowledge base querying for document collections
2525
- **Document Process Flow visualization** for detailed workflow execution monitoring and troubleshooting
26+
- **Document Analytics** for querying and visualizing processed document data
27+
28+
## Document Analytics
29+
30+
The Document Analytics feature allows users to query their processed documents using natural language and receive results in various formats including charts, tables, and text responses.
31+
32+
### Key Capabilities
33+
34+
- **Natural Language Queries**: Ask questions about your processed documents in plain English
35+
- **Multiple Response Types**: Results can be displayed as:
36+
- Interactive charts and graphs (using Chart.js)
37+
- Structured data tables with pagination and sorting
38+
- Text-based responses and summaries
39+
- **Real-time Processing**: Query processing status updates with visual indicators
40+
- **Query History**: Track and review previous analytics queries
41+
42+
### Technical Implementation Notes
43+
44+
The analytics feature uses a combination of real-time subscriptions and polling for status updates:
45+
46+
- **Primary Method**: GraphQL subscriptions via AWS AppSync for immediate notifications when queries complete
47+
- **Fallback Method**: Polling every 5 seconds to ensure status updates are received even if subscriptions fail
48+
- **Current Limitation**: The AppSync subscription currently returns a Boolean completion status rather than full job details, requiring a separate query to fetch results when notified
49+
50+
**TODO**: Implement proper AppSync subscriptions that return complete AnalyticsJob objects to eliminate the need for additional queries and improve real-time user experience.
51+
52+
### How to Use
53+
54+
1. Navigate to the "Document Analytics" section in the web UI
55+
2. Enter your question in natural language (e.g., "How many documents were processed last week?")
56+
3. Click "Submit Query" to start processing
57+
4. Monitor the status indicator as your query is processed
58+
5. View results in the appropriate format (chart, table, or text)
59+
6. Use the debug information toggle to inspect raw response data if needed
2660

2761
## Document Process Flow Visualization
2862

lib/idp_common_pkg/idp_common/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ def __getattr__(name):
2727
"assessment",
2828
"models",
2929
"reporting",
30+
"agents",
3031
]:
3132
if name not in _submodules:
3233
_submodules[name] = __import__(f"idp_common.{name}", fromlist=["*"])
@@ -62,6 +63,7 @@ def __getattr__(name):
6263
"assessment",
6364
"models",
6465
"reporting",
66+
"agents",
6567
"get_config",
6668
"Document",
6769
"Page",

0 commit comments

Comments
 (0)