You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Key Benefits**: Deterministic splitting for long documents with multiple same-type forms (e.g., multiple W-2s, multiple invoices), eliminates LLM boundary detection failures for critical government form processing, provides flexibility across simple to complex document scenarios
17
+
- Resolves #146
18
+
8
19
### Changed
9
20
- Removed page image limit entirely across all IDP services (classification, extraction, assessment) following Amazon Bedrock API removal of image count restrictions. The system now processes all document pages without artificial truncation, with info logging to track image counts for monitoring purposes.
Copy file name to clipboardExpand all lines: config_library/pattern-2/bank-statement-sample/config.yaml
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -238,6 +238,7 @@ classification:
238
238
system_prompt: >-
239
239
You are a document classification expert who can analyze and classify multiple documents and their page boundaries within a document package from various domains. Your task is to determine the document type based on its content and structure, using the provided document type definitions. Your output must be valid JSON according to the requested format.
Copy file name to clipboardExpand all lines: config_library/pattern-2/rvl-cdip-package-sample/config.yaml
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -905,6 +905,7 @@ classification:
905
905
system_prompt: >-
906
906
You are a document classification expert who can analyze and classify multiple documents and their page boundaries within a document package from various domains. Your task is to determine the document type based on its content and structure, using the provided document type definitions. Your output must be valid JSON according to the requested format.
Copy file name to clipboardExpand all lines: docs/classification.md
+148Lines changed: 148 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,6 +148,154 @@ Despite its strengths in handling full-document context, this method has several
148
148
- Performs multi-modal page-level classification (classifies each page based on OCR data and page image)
149
149
- Not configurable inside the GenAIIDP solution
150
150
151
+
## Section Splitting Strategies
152
+
153
+
The `sectionSplitting` configuration controls how classified pages are grouped into document sections. This setting works with both classification methods and provides three strategies:
154
+
155
+
### Available Strategies
156
+
157
+
#### 1. `disabled` - No Splitting (Entire Document = One Section)
158
+
159
+
**Behavior:**
160
+
- All pages are assigned to a single section
161
+
- Uses the first detected document class for the entire document
162
+
- Ignores any page-level classification boundaries
163
+
164
+
**Use Cases:**
165
+
- Documents known to be single-type with no internal divisions
166
+
- Simplified processing where granular section splitting isn't needed
167
+
- When you want to force all pages to be treated as one cohesive document
- Document with 10 pages → 10 sections (one per page)
202
+
- Each page maintains its individual classification
203
+
204
+
**GitHub Issue Reference:**
205
+
This strategy directly addresses [Issue #146](https://github.com/aws-solutions-library-samples/accelerated-intelligent-document-processing-on-aws/issues/146) where long documents with multiple same-type forms were being incorrectly joined together.
0 commit comments