fix: stemming #2

stephantul · 2025-07-15T07:57:47Z

The stemming algorithm in the original model was applied after the original terms were indexed. This resulted in indexing errors if two different terms had the same stem.

Before:

_, y = d.process_document("hello hellos")
print(y)
# {"hello": 2}

Now:

_, y = d.process_document("hello hellos")
print(y)
# {"hello": 1}

This raises scores on Nanobeir a tiny bit.

fix: stemming bug

726e0d9

stephantul changed the base branch from master to wandb July 15, 2025 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stemming #2

fix: stemming #2

Uh oh!

stephantul commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: stemming #2

Are you sure you want to change the base?

fix: stemming #2

Uh oh!

Conversation

stephantul commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant