feat!: deduplicate documents based on ID in MultiQuery components#10496
feat!: deduplicate documents based on ID in MultiQuery components#10496
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
In principle, this PR looks good to me. However, I’m wondering if we should standardize how we deduplicate documents across components. I think we could utilize the |
Now that your PR is merged, I'm happy to switch over to the new utility function! |
|
@bogdankostic I updated both components to use the new utility method! |
| highest_scoring_docs: dict[str, "Document"] = {} | ||
| for doc in documents: | ||
| doc_id = doc.id | ||
| score = doc.score if doc.score is not None else -inf | ||
| best = highest_scoring_docs.get(doc.id) | ||
|
|
||
| if doc_id not in highest_scoring_docs: | ||
| highest_scoring_docs[doc_id] = doc | ||
| else: | ||
| existing_doc = highest_scoring_docs[doc_id] | ||
| existing_score = existing_doc.score if existing_doc.score is not None else -inf | ||
| if score > existing_score: | ||
| highest_scoring_docs[doc_id] = doc | ||
| if best is None or score > (best.score if best.score is not None else -inf): | ||
| highest_scoring_docs[doc.id] = doc |
There was a problem hiding this comment.
@bogdankostic I also slightly compacted the utility method which I think helps with readability and still makes sure not to create intermediate lists
bogdankostic
left a comment
There was a problem hiding this comment.
Looking good! Nice refactoring of _deduplicate_documents making it much more concise :)
Related Issues
Proposed Changes:
from_dictmethods to utilize the newfrom_dictimprovements. Theto_dictmethods were not updateable sincedefault_to_dictdoesn't support serializing init variables that are components.How did you test it?
Existing tests
Notes for the reviewer
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.