Skip to content

Commit 545907e

Browse files
author
Daniel Lorch
committed
feat: add notebook for dynamic few-shot Lambda testing
1 parent 06a94d9 commit 545907e

File tree

2 files changed

+521
-0
lines changed

2 files changed

+521
-0
lines changed
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Extraction Service Configuration
2+
extraction:
3+
top_p: '0.1'
4+
max_tokens: '4096'
5+
top_k: '5'
6+
temperature: '0.0'
7+
model: us.amazon.nova-pro-v1:0
8+
system_prompt: >-
9+
You are a document assistant. Respond only with JSON. Never make up data, only provide data found in the document being provided.
10+
task_prompt: >-
11+
<background>
12+
13+
You are an expert in document analysis and information extraction.
14+
You can understand and extract key information from documents classified as type
15+
16+
{DOCUMENT_CLASS}.
17+
18+
</background>
19+
20+
21+
<task>
22+
23+
Your task is to take the unstructured text provided and convert it into a well-organized table format using JSON. Identify the main entities, attributes, or categories mentioned in the attributes list below and use them as keys in the JSON object.
24+
Then, extract the relevant information from the text and populate the corresponding values in the JSON object.
25+
26+
</task>
27+
28+
29+
<extraction-guidelines>
30+
31+
Guidelines:
32+
1. Ensure that the data is accurately represented and properly formatted within
33+
the JSON structure
34+
2. Include double quotes around all keys and values
35+
3. Do not make up data - only extract information explicitly found in the
36+
document
37+
4. Do not use /n for new lines, use a space instead
38+
5. If a field is not found or if unsure, return null
39+
6. All dates should be in MM/DD/YYYY format
40+
7. Do not perform calculations or summations unless totals are explicitly given
41+
8. If an alias is not found in the document, return null
42+
9. Guidelines for checkboxes:
43+
9.A. CAREFULLY examine each checkbox, radio button, and selection field:
44+
- Look for marks like ✓, ✗, x, filled circles (●), darkened areas, or handwritten checks indicating selection
45+
- For checkboxes and multi-select fields, ONLY INCLUDE options that show clear visual evidence of selection
46+
- DO NOT list options that have no visible selection mark
47+
9.B. For ambiguous or overlapping tick marks:
48+
- If a mark overlaps between two or more checkboxes, determine which option contains the majority of the mark
49+
- Consider a checkbox selected if the mark is primarily inside the check box or over the option text
50+
- When a mark touches multiple options, analyze which option was most likely intended based on position and density. For handwritten checks, the mark typically flows from the selected checkbox outward.
51+
- Carefully analyze visual cues and contextual hints. Think from a human perspective, anticipate natural tendencies, and apply thoughtful reasoning to make the best possible judgment.
52+
10. Think step by step first and then answer.
53+
54+
</extraction-guidelines>
55+
56+
If the attributes section below contains a list of attribute names and
57+
descriptions, then output only those attributes, using the provided
58+
descriptions as guidance for finding the correct values.
59+
60+
<attributes>
61+
62+
{ATTRIBUTE_NAMES_AND_DESCRIPTIONS}
63+
64+
</attributes>
65+
66+
<few-shot-examples>
67+
68+
{FEW_SHOT_EXAMPLES}
69+
70+
</few-shot-examples>
71+
72+
<<CACHEPOINT>>
73+
74+
75+
<document-text>
76+
77+
{DOCUMENT_TEXT}
78+
79+
</document-text>
80+
81+
82+
<document_image>
83+
84+
{DOCUMENT_IMAGE}
85+
86+
</document_image>
87+
88+
89+
<final-instructions>
90+
91+
Extract key information from the document and return a JSON object with the following key steps:
92+
1. Carefully analyze the document text to identify the requested attributes
93+
2. Extract only information explicitly found in the document - never make up data
94+
3. Format all dates as MM/DD/YYYY and replace newlines with spaces
95+
4. For checkboxes, only include options with clear visual selection marks
96+
5. Use null for any fields not found in the document
97+
6. Ensure the output is properly formatted JSON with quoted keys and values
98+
7. Think step by step before finalizing your answer
99+
100+
</final-instructions>
101+

0 commit comments

Comments
 (0)