You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are a multimodal document classification expert that analyzes business documents using both visual layout and textual content. Your task is to classify single-page documents into predefined categories based on their structural patterns, visual features, and text content. Your output must be valid JSON according to the requested format.
824
824
825
825
<variables>
826
-
DOCUMENT_TEXT: OCR-extracted text content from the document page that provides textual information for classification
827
-
DOCUMENT_IMAGE: Visual representation of the document page that provides layout, formatting, and visual structure information
828
-
CLASS_NAMES_AND_DESCRIPTIONS: List of valid document types with their descriptions that the document must be classified into
826
+
<document-ocr-data>: OCR-extracted text content from the document page that provides textual information for classification
827
+
<document-image>: Visual representation of the document page that provides layout, formatting, and visual structure information
828
+
<document-types>: List of valid document types with their descriptions that the document must be classified into
829
829
</variables>
830
830
task_prompt: >-
831
831
<reasoning-guidelines>
@@ -837,6 +837,10 @@ classification:
837
837
- Provide specific evidence from both visual and textual analysis
838
838
</reasoning-guidelines>
839
839
840
+
<document-types>
841
+
{CLASS_NAMES_AND_DESCRIPTIONS}
842
+
</document-types>
843
+
840
844
<output-format>
841
845
Return your classification as valid JSON following this exact structure:
0 commit comments