optimize the multimodal page level classification system and task prompt

mmiakashs · mmiakashs · commit 1b087ece4dde · 2025-07-17T14:01:50.000-07:00
diff --git a/config_library/pattern-2/default-lending/config.yaml b/config_library/pattern-2/default-lending/config.yaml
@@ -922,138 +922,67 @@ classification:
   top_k: '5'
   task_prompt: >-
     <task-description>
-
-    You are a document classification system. Your task is to analyze a document package containing multiple pages and identify distinct document segments, classifying each segment according to the predefined document types provided below.
-
+    Analyze the provided document using both its visual layout and textual content to determine its document type. You must classify it into exactly one of the predefined categories.
     </task-description>
 
-
     <document-types>
-
     {CLASS_NAMES_AND_DESCRIPTIONS}
-
     </document-types>
 
-
-    <terminology-definitions>
-
-    Key terms used in this task:
-
-    - ordinal_start_page: The one-based beginning page number of a document segment within the document package
-
-    - ordinal_end_page: The one-based ending page number of a document segment within the document package
-
-    - document_type: The document type code detected for a document segment
-
-    - document segment: A continuous range of pages that form a single, complete document
-
-    </terminology-definitions>
-
-
     <classification-instructions>
-
-    Follow these steps to classify documents:
-
-    1. Read through the entire document package to understand its contents
-
-    2. Identify page ranges that form complete, distinct documents
-
-    3. Match each document segment to ONE of the document types listed in <document-types>
-
-    4. CRITICAL: Only use document types explicitly listed in the <document-types> section
-
-    5. If a document doesn't clearly match any listed type, assign it to the most similar listed type
-
-    6. Pay special attention to adjacent documents of the same type - they must be separated into distinct segments
-
-    7. Record the ordinal_start_page and ordinal_end_page for each identified segment
-
-    8. Provide appropriate reasons and facts for the predicted document type
-
+    Follow these steps to classify the document:
+    1. Examine the visual layout: headers, logos, formatting, structure, and visual organization
+    2. Analyze the textual content: key phrases, terminology, purpose, and information type
+    3. Identify distinctive features that match the document type descriptions
+    4. Consider both visual and textual evidence together to determine the best match
+    5. CRITICAL: Only use document types explicitly listed in the <document-types> section
     </classification-instructions>
 
-
-    <document-boundary-rules>
-
-    Rules for determining document boundaries:
-
-    - Content continuity: Pages with continuing paragraphs, numbered sections, or ongoing narratives belong to the same document
-
-    - Visual consistency: Similar layouts, headers, footers, and styling indicate pages belong together
-
-    - Logical structure: Documents typically have clear beginning, middle, and end sections
-
-    - New document indicators: Title pages, cover sheets, or significantly different subject matter signal a new document
-
-    - Topic coherence: Pages discussing the same subject should be grouped together
-
-    - IMPORTANT: Distinct documents of the same type that are adjacent must be separated into different segments
-
-    </document-boundary-rules>
-
+    <reasoning-guidelines>
+    When determining the document type:
+    - First identify the document's primary purpose and function
+    - Note specific visual elements (letterhead, forms, tables, signatures)
+    - Identify key textual indicators (terminology, phrases, structure)
+    - Consider the document's intended audience and use case
+    - Provide specific evidence from both visual and textual analysis
+    </reasoning-guidelines>
 
     <output-format>
-
     Return your classification as valid JSON following this exact structure:
-
-    ```json
-
     {
-        "segments": [
-            {
-                "ordinal_start_page": 1,
-                "ordinal_end_page": 3,
-                "type": "document_type_from_list",
-                "reason": "facts and reasons to classify as the predicted type",
-            },
-            {
-                "ordinal_start_page": 4,
-                "ordinal_end_page": 7,
-                "type": "document_type_from_list"
-                "reason": "facts and reasons to classify as the predicted type",
-            }
-        ]
+      "classification_reason": "Detailed reasoning including specific visual and textual evidence that led to this classification",
+      "class": "exact_document_type_from_list"
     }
-
-    ```
-
     </output-format>
 
-
     <<CACHEPOINT>>
 
-
-    <document-text>
-
+    <document-ocr-data>
     {DOCUMENT_TEXT}
+    </document-ocr-data>
 
-    </document-text>
-
+    <document-image>
+    {DOCUMENT_IMAGE}
+    </document-image>
 
     <final-instructions>
-
-    Analyze the <document-text> provided above and:
-
-    1. Apply the <classification-instructions> to identify distinct document segments
-
-    2. Use the <document-boundary-rules> to determine where one document ends and another begins
-
-    3. Classify each segment using ONLY the document types from the <document-types> list
-
-    4. Ensure adjacent documents of the same type are separated into distinct segments
-
-    5. Output your classification in the exact JSON format specified in <output-format>
-
-    6. You can get this information from the previous message. Analyze the previous messages to get these instructions.
-
-
-    Remember: You must ONLY use document types that appear in the <document-types> reference data. Do not invent or create new document types.
-
+    Analyze the document above by:
+    1. Applying the <classification-instructions> to examine both visual and textual features
+    2. Following the <reasoning-guidelines> to build your classification rationale
+    3. Selecting ONLY from document types in <document-types>
+    4. Providing clear reasoning with specific evidence before the classification
+    5. Outputting in the exact JSON format specified in <output-format>
     </final-instructions>
   temperature: '0.0'
   model: us.amazon.nova-pro-v1:0
   system_prompt: >-
-    You are a document classification expert who can analyze and classify multiple documents and their page boundaries within a document package from various domains. Your task is to determine the document type based on its content and structure, using the provided document type definitions. Your output must be valid JSON according to the requested format.
+    You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
+
+    <variables>
+    DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification
+    DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information
+    CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into
+    </variables>
   classificationMethod: textbasedHolisticClassification
 extraction:
   image: