Add data for building names with/without apartments #394
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This branch makes the model a bit better at identifying building names.
Demo
('Donachie', 'StreetName'),
('Rd', 'StreetNamePostType'),
('TowsonTown', 'BuildingName'),
('Place', 'BuildingName'),
('Apartments,', 'BuildingName'),
('Apt', 'OccupancyType'),
('1203', 'OccupancyIdentifier'),
('Baltimore,', 'PlaceName'),
('MD', 'StateName'),
('21239', 'ZipCode')
Notes
This has a good bit of training data because this was tricky to get right while still passing all the regression test addresses we have. And even then it's still imperfect - there's some really ambiguous apartment names out there. But it seems to do well when there's some indication that it's looking at a building name, like having a "The" at the beginning.
Testing Instructions
pip install -e ".[dev]" -v5136 Oaklawn Rd Gwynnbrook Townhomes, Unit CA4810 Baltimore, MD 21207