Skip to content

Conversation

@xmedr
Copy link
Contributor

@xmedr xmedr commented Jun 11, 2025

Overview

This branch makes the model a bit better at identifying building names.

Demo

  • input:
    • 6906 Donachie Rd TowsonTown Place Apartments, Apt 1203 Baltimore, MD 21239
  • output:
    • ('6906', 'AddressNumber'),
      ('Donachie', 'StreetName'),
      ('Rd', 'StreetNamePostType'),
      ('TowsonTown', 'BuildingName'),
      ('Place', 'BuildingName'),
      ('Apartments,', 'BuildingName'),
      ('Apt', 'OccupancyType'),
      ('1203', 'OccupancyIdentifier'),
      ('Baltimore,', 'PlaceName'),
      ('MD', 'StateName'),
      ('21239', 'ZipCode')

Notes

This has a good bit of training data because this was tricky to get right while still passing all the regression test addresses we have. And even then it's still imperfect - there's some really ambiguous apartment names out there. But it seems to do well when there's some indication that it's looking at a building name, like having a "The" at the beginning.

Testing Instructions

  • Confirm that the test suite passes
  • Pull down this branch and install this version to locally spot check some addresses
    • After opening a venv, install and train model with: pip install -e ".[dev]" -v
    • Example address: 5136 Oaklawn Rd Gwynnbrook Townhomes, Unit CA4810 Baltimore, MD 21207

@xmedr xmedr marked this pull request as ready for review June 11, 2025 20:39
@xmedr xmedr requested a review from derekeder June 11, 2025 20:39
@gelodefaultbrain
Copy link

Hi, just a question, would this PR fix my issue here? Seems to be mentioning the same error according to the mentioned issue that it'll close. Thank you.

@xmedr
Copy link
Contributor Author

xmedr commented Jul 10, 2025

Hi @gelodefaultbrain! Unfortunately, this pr is resolving a different issue than the one you've linked. We're accounting for better parsing on building names specifically here, while your issue seems to be more about multiple occupancy identifiers in the same address

@xmedr xmedr merged commit 0961cef into main Jul 10, 2025
34 checks passed
@xmedr xmedr deleted the patch/buildings_w_apts branch July 10, 2025 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error parsing address containing building name and apartment

4 participants