Skip to content

Commit 1b087ec

Browse files
committed
optimize the multimodal page level classification system and task prompt
1 parent 4d90feb commit 1b087ec

File tree

1 file changed

+35
-106
lines changed

1 file changed

+35
-106
lines changed

config_library/pattern-2/default-lending/config.yaml

Lines changed: 35 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -922,138 +922,67 @@ classification:
922922
top_k: '5'
923923
task_prompt: >-
924924
<task-description>
925-
926-
You are a document classification system. Your task is to analyze a document package containing multiple pages and identify distinct document segments, classifying each segment according to the predefined document types provided below.
927-
925+
Analyze the provided document using both its visual layout and textual content to determine its document type. You must classify it into exactly one of the predefined categories.
928926
</task-description>
929927
930-
931928
<document-types>
932-
933929
{CLASS_NAMES_AND_DESCRIPTIONS}
934-
935930
</document-types>
936931
937-
938-
<terminology-definitions>
939-
940-
Key terms used in this task:
941-
942-
- ordinal_start_page: The one-based beginning page number of a document segment within the document package
943-
944-
- ordinal_end_page: The one-based ending page number of a document segment within the document package
945-
946-
- document_type: The document type code detected for a document segment
947-
948-
- document segment: A continuous range of pages that form a single, complete document
949-
950-
</terminology-definitions>
951-
952-
953932
<classification-instructions>
954-
955-
Follow these steps to classify documents:
956-
957-
1. Read through the entire document package to understand its contents
958-
959-
2. Identify page ranges that form complete, distinct documents
960-
961-
3. Match each document segment to ONE of the document types listed in <document-types>
962-
963-
4. CRITICAL: Only use document types explicitly listed in the <document-types> section
964-
965-
5. If a document doesn't clearly match any listed type, assign it to the most similar listed type
966-
967-
6. Pay special attention to adjacent documents of the same type - they must be separated into distinct segments
968-
969-
7. Record the ordinal_start_page and ordinal_end_page for each identified segment
970-
971-
8. Provide appropriate reasons and facts for the predicted document type
972-
933+
Follow these steps to classify the document:
934+
1. Examine the visual layout: headers, logos, formatting, structure, and visual organization
935+
2. Analyze the textual content: key phrases, terminology, purpose, and information type
936+
3. Identify distinctive features that match the document type descriptions
937+
4. Consider both visual and textual evidence together to determine the best match
938+
5. CRITICAL: Only use document types explicitly listed in the <document-types> section
973939
</classification-instructions>
974940
975-
976-
<document-boundary-rules>
977-
978-
Rules for determining document boundaries:
979-
980-
- Content continuity: Pages with continuing paragraphs, numbered sections, or ongoing narratives belong to the same document
981-
982-
- Visual consistency: Similar layouts, headers, footers, and styling indicate pages belong together
983-
984-
- Logical structure: Documents typically have clear beginning, middle, and end sections
985-
986-
- New document indicators: Title pages, cover sheets, or significantly different subject matter signal a new document
987-
988-
- Topic coherence: Pages discussing the same subject should be grouped together
989-
990-
- IMPORTANT: Distinct documents of the same type that are adjacent must be separated into different segments
991-
992-
</document-boundary-rules>
993-
941+
<reasoning-guidelines>
942+
When determining the document type:
943+
- First identify the document's primary purpose and function
944+
- Note specific visual elements (letterhead, forms, tables, signatures)
945+
- Identify key textual indicators (terminology, phrases, structure)
946+
- Consider the document's intended audience and use case
947+
- Provide specific evidence from both visual and textual analysis
948+
</reasoning-guidelines>
994949
995950
<output-format>
996-
997951
Return your classification as valid JSON following this exact structure:
998-
999-
```json
1000-
1001952
{
1002-
"segments": [
1003-
{
1004-
"ordinal_start_page": 1,
1005-
"ordinal_end_page": 3,
1006-
"type": "document_type_from_list",
1007-
"reason": "facts and reasons to classify as the predicted type",
1008-
},
1009-
{
1010-
"ordinal_start_page": 4,
1011-
"ordinal_end_page": 7,
1012-
"type": "document_type_from_list"
1013-
"reason": "facts and reasons to classify as the predicted type",
1014-
}
1015-
]
953+
"classification_reason": "Detailed reasoning including specific visual and textual evidence that led to this classification",
954+
"class": "exact_document_type_from_list"
1016955
}
1017-
1018-
```
1019-
1020956
</output-format>
1021957
1022-
1023958
<<CACHEPOINT>>
1024959
1025-
1026-
<document-text>
1027-
960+
<document-ocr-data>
1028961
{DOCUMENT_TEXT}
962+
</document-ocr-data>
1029963
1030-
</document-text>
1031-
964+
<document-image>
965+
{DOCUMENT_IMAGE}
966+
</document-image>
1032967
1033968
<final-instructions>
1034-
1035-
Analyze the <document-text> provided above and:
1036-
1037-
1. Apply the <classification-instructions> to identify distinct document segments
1038-
1039-
2. Use the <document-boundary-rules> to determine where one document ends and another begins
1040-
1041-
3. Classify each segment using ONLY the document types from the <document-types> list
1042-
1043-
4. Ensure adjacent documents of the same type are separated into distinct segments
1044-
1045-
5. Output your classification in the exact JSON format specified in <output-format>
1046-
1047-
6. You can get this information from the previous message. Analyze the previous messages to get these instructions.
1048-
1049-
1050-
Remember: You must ONLY use document types that appear in the <document-types> reference data. Do not invent or create new document types.
1051-
969+
Analyze the document above by:
970+
1. Applying the <classification-instructions> to examine both visual and textual features
971+
2. Following the <reasoning-guidelines> to build your classification rationale
972+
3. Selecting ONLY from document types in <document-types>
973+
4. Providing clear reasoning with specific evidence before the classification
974+
5. Outputting in the exact JSON format specified in <output-format>
1052975
</final-instructions>
1053976
temperature: '0.0'
1054977
model: us.amazon.nova-pro-v1:0
1055978
system_prompt: >-
1056-
You are a document classification expert who can analyze and classify multiple documents and their page boundaries within a document package from various domains. Your task is to determine the document type based on its content and structure, using the provided document type definitions. Your output must be valid JSON according to the requested format.
979+
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
980+
981+
<variables>
982+
DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification
983+
DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information
984+
CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into
985+
</variables>
1057986
classificationMethod: textbasedHolisticClassification
1058987
extraction:
1059988
image:

0 commit comments

Comments
 (0)