Skip to content

Conversation

@ecatkins
Copy link

@ecatkins ecatkins commented Apr 5, 2018

Hi,

I've been working with the usaddress library, and have added some patterns that I have seen fail in my datasets. This commit includes the xml files for training (training/dealstat_addresses_v1.xml) and test sets (measure_performance/test_data/dealstat_tests_v1.xml). The csv files were excluded by the .gitignore file, I'm not sure if you require these?

Patterns

  1. Unknown Illinois pattern PlaceName in StateName #221: see the referenced Issue, I'm not sure why this was failing
  2. No StreetNamePostType: Sometimes common streets will be referenced without a StreetNamePostType e.g. "200 East Main, San Diego California"
  3. StreetNamePostType = "Grade": Not something I have come across more than once, I don't think it is very common. But I included the specific example "19 Hargrove Grade, Palm Coast FL 32137" in the training data (without a corresponding test).
  4. Rhode Island: "Rhode Island" is occasionally being picked up as a PlaceName not a StateName
  5. Direction in PlaceName: Sometimes a Direction in the PlaceName is being read as a StreetNamePostDirection e.g. "5548 Elmer Avenue, N. Hollywood, CA 91601"
  6. Fort Lauderdale: If the address does not have a StreetNamePostType, the "Fort" is being read in as such, rather than as part of the PlaceName e.g. "225 West Elm, Fort Lauderdale, FL 33301"

Both the nose tests and my tests are passing. Let me know how else I can be of assistance. I'm hoping to continue to add new patterns and make pull requests as I work through my datasets.

@ecatkins ecatkins changed the title no street post type pattern & unknown illinois pattern Adding various patterns to to training data Apr 10, 2018
@xmedr
Copy link
Contributor

xmedr commented May 2, 2025

Thanks for taking the time to do this @ecatkins! I'm pulling this training/testing data into #390 mostly because of the new testing suite that was added to this repo. So I'll be closing this pr.

@xmedr xmedr closed this May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants