|
1 | 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. |
2 | 2 | # SPDX-License-Identifier: MIT-0 |
3 | 3 |
|
4 | | -notes: Default settings |
| 4 | +notes: Default settings for the rvl-cdip-sample-with-few-shot-examples config |
5 | 5 | ocr: |
6 | 6 | backend: "textract" # Default to Textract for backward compatibility |
7 | 7 | model_id: "us.anthropic.claude-3-7-sonnet-20250219-v1:0" |
@@ -657,58 +657,49 @@ classification: |
657 | 657 | top_k: '5' |
658 | 658 | max_tokens: '4096' |
659 | 659 | system_prompt: >- |
660 | | - You are a document classification system that analyzes business documents, |
661 | | - forms, and publications. Your sole task is to classify documents into |
662 | | - categories based on their visual layout and textual content. You must: |
| 660 | + You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format. |
663 | 661 |
|
664 | | - 1. Output only a JSON object containing a single "class" field with the |
665 | | - classification label |
666 | | -
|
667 | | - 2. Use exactly one of the predefined categories, using the exact spelling |
668 | | - and case provided |
669 | | -
|
670 | | - 3. Never include explanations, reasoning, or additional text in your |
671 | | - response |
672 | | -
|
673 | | - 4. Respond with nothing but the JSON containing the classification |
674 | | -
|
675 | | -
|
676 | | - Example correct response: |
677 | | -
|
678 | | - {"class": "letter"} |
| 662 | + <variables> |
| 663 | + DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification |
| 664 | + DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information |
| 665 | + CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into |
| 666 | + </variables> |
679 | 667 | task_prompt: >- |
680 | | - Classify this document into exactly one of these categories: |
681 | | -
|
682 | | -
|
683 | | - {CLASS_NAMES_AND_DESCRIPTIONS} |
684 | | -
|
685 | | -
|
686 | | - <few_shot_example_with_multimodal_page_classifications> |
687 | | -
|
688 | | - {few_shot_example_with_multimodal_page_classificationS} |
689 | | -
|
690 | | - </few_shot_example_with_multimodal_page_classifications> |
691 | | -
|
| 668 | + <reasoning-guidelines> |
| 669 | + When determining the document type: |
| 670 | + - First identify the document's primary purpose and function |
| 671 | + - Note specific visual elements (letterhead, forms, tables, signatures) |
| 672 | + - Identify key textual indicators (terminology, phrases, structure) |
| 673 | + - Consider the document's intended audience and use case |
| 674 | + - Provide specific evidence from both visual and textual analysis |
| 675 | + </reasoning-guidelines> |
| 676 | +
|
| 677 | + <output-format> |
| 678 | + Return your classification as valid JSON following this exact structure: |
| 679 | + { |
| 680 | + "classification_reason": "Detailed reasoning including specific visual and textual evidence that led to this classification", |
| 681 | + "class": "exact_document_type_from_list" |
| 682 | + } |
| 683 | + </output-format> |
692 | 684 |
|
693 | 685 | <<CACHEPOINT>> |
694 | 686 |
|
| 687 | + <document-ocr-data> |
| 688 | + {DOCUMENT_TEXT} |
| 689 | + </document-ocr-data> |
695 | 690 |
|
696 | | - <document_ocr_data> |
697 | | -
|
698 | | - {DOCUMENT_TEXT} |
699 | | -
|
700 | | - </document_ocr_data> |
701 | | -
|
702 | | -
|
703 | | - <document_image> |
704 | | -
|
705 | | - {DOCUMENT_IMAGE} |
706 | | -
|
707 | | - </document_image> |
708 | | -
|
| 691 | + <document-image> |
| 692 | + {DOCUMENT_IMAGE} |
| 693 | + </document-image> |
709 | 694 |
|
710 | | - Respond only with a JSON object containing the class label. For example: |
711 | | - {{"class": "letter"}} |
| 695 | + <final-instructions> |
| 696 | + Analyze the document above by: |
| 697 | + 1. Applying the <classification-instructions> to examine both visual and textual features |
| 698 | + 2. Following the <reasoning-guidelines> to build your classification rationale |
| 699 | + 3. Selecting ONLY from document types in <document-types> |
| 700 | + 4. Providing clear reasoning with specific evidence before the classification |
| 701 | + 5. Outputting in the exact JSON format specified in <output-format> |
| 702 | + </final-instructions> |
712 | 703 | extraction: |
713 | 704 | image: |
714 | 705 | target_height: '' |
|
0 commit comments