Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Overview

Brief description of what this PR does, and why it is needed.

If this pr closes an issue, make note of it here 👇
Closes #XXX

### Demo

Optional. Screenshots, `curl` examples, etc.

### Notes

Optional. Ancillary topics, caveats, alternative strategies that didn't work out, anything else.

## Testing Instructions

* How to test this PR
* Prefer bulleted description
* Start after checking out this branch
* Include any setup required, such as bundling scripts, restarting services, etc.
* Include test case, and expected output
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Having trouble building the code? [Open an issue](https://github.com/datamade/us

### Adding new training data

If usaddress is consistently failing on particular address patterns, you can adjust the parser's behavior by adding new training data to the model. [Follow our guide in the training directory](https://github.com/datamade/usaddress/blob/master/training/README.md), and be sure to make a pull request so that we can incorporate your contribution into our next release!
If usaddress is consistently failing on particular address patterns, you can adjust the parser's behavior by adding new training data to the model. [Follow our guide in the training directory](./training/README.md), and be sure to make a pull request so that we can incorporate your contribution into our next release!

## Important links

Expand All @@ -91,7 +91,7 @@ If usaddress is consistently failing on particular address patterns, you can adj

Report issues in the [issue tracker](https://github.com/datamade/usaddress/issues)

If an address was parsed incorrectly, please let us know! You can either [open an issue](https://github.com/datamade/usaddress/issues/new) or (if you're adventurous) [add new training data to improve the parser's model.](https://github.com/datamade/usaddress/blob/master/training/README.md) When possible, please send over a few real-world examples of similar address patterns, along with some info about the source of the data - this will help us train the parser and improve its performance.
If an address was parsed incorrectly, please let us know! You can either [open an issue](https://github.com/datamade/usaddress/issues/new) or (if you're adventurous) [add new training data to improve the parser's model.](./training/README.md) When possible, please send over a few real-world examples of similar address patterns, along with some info about the source of the data - this will help us train the parser and improve its performance.

If something in the library is not behaving intuitively, it is a bug, and should be reported.

Expand All @@ -103,4 +103,4 @@ If something in the library is not behaving intuitively, it is a bug, and should

## Copyright

Copyright (c) 2025 Atlanta Journal Constitution. Released under the [MIT License](https://github.com/datamade/usaddress/blob/master/LICENSE).
Copyright (c) 2025 Atlanta Journal Constitution. Released under the [MIT License](./LICENSE).
30 changes: 30 additions & 0 deletions measure_performance/test_data/labeled.xml
Original file line number Diff line number Diff line change
Expand Up @@ -126,4 +126,34 @@
<AddressString><AddressNumber>150</AddressNumber> <StreetName>Citizens</StreetName> <StreetNamePostType>Circle</StreetNamePostType> <PlaceName>Little</PlaceName> <PlaceName>River,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29566</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>4079</AddressNumber> <StreetNamePreType>U.S.</StreetNamePreType> <StreetName>17</StreetName> <StreetName>Business</StreetName> <PlaceName>Murrells</PlaceName> <PlaceName>Inlet,</PlaceName> <StateName>South</StateName> <StateName>Carolina</StateName> <ZipCode>29576</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><AddressNumber>43</AddressNumber> <StreetNamePreDirectional>South</StreetNamePreDirectional> <StreetName>Broadway</StreetName> <PlaceName>Pitman,</PlaceName> <StateName>New</StateName> <StateName>Jersey</StateName> <ZipCode>08071</ZipCode> <CountryName>United</CountryName> <CountryName>States</CountryName></AddressString>
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>2333</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>85</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>284</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>27</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>7326</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>66</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupID>992</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>88</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupType>R</USPSBoxGroupType> <USPSBoxGroupID>32</USPSBoxGroupID> <USPSBoxType>Box</USPSBoxType> <USPSBoxID>#</USPSBoxID> <USPSBoxID>e3</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HC</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>72</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>1A</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HIGHWAY</USPSBoxGroupType> <USPSBoxGroupType>CONTRACT</USPSBoxGroupType> <USPSBoxGroupType>rte</USPSBoxGroupType> <USPSBoxGroupID>#</USPSBoxGroupID> <USPSBoxGroupID>46</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>#</USPSBoxID> <USPSBoxID>992</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HIGHWAY</USPSBoxGroupType> <USPSBoxGroupType>CONtraCT</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>56</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>45C</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>StaR</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>75</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>5Z</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HCR</USPSBoxGroupType> <USPSBoxGroupID>4e</USPSBoxGroupID> <USPSBoxType>box</USPSBoxType> <USPSBoxID>#</USPSBoxID> <USPSBoxID>32</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HCR</USPSBoxGroupType> <USPSBoxGroupID>88</USPSBoxGroupID> <USPSBoxType>bOX</USPSBoxType> <USPSBoxID>76E</USPSBoxID></AddressString>
<AddressString><USPSBoxGroupType>HWY</USPSBoxGroupType> <USPSBoxGroupType>CONTRACT</USPSBoxGroupType> <USPSBoxGroupType>ROUTE</USPSBoxGroupType> <USPSBoxGroupID>102</USPSBoxGroupID> <USPSBoxType>BOX</USPSBoxType> <USPSBoxID>255A</USPSBoxID></AddressString>
<AddressString><AddressNumber>4510</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>GV,</StreetName> <PlaceName>APPLETON,</PlaceName> <StateName>WI</StateName> <ZipCode>54913</ZipCode></AddressString>
<AddressString><AddressNumber>7575</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>ZZZ,</StreetName> <PlaceName>MILWAUKEE,</PlaceName> <StateName>WI</StateName> <ZipCode>54567</ZipCode></AddressString>
<AddressString><AddressNumber>123A</AddressNumber> <StreetNamePreDirectional>E</StreetNamePreDirectional> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>DV,</StreetName> <PlaceName>WAUPACA,</PlaceName> <StateName>WI</StateName> <ZipCode>54981</ZipCode></AddressString>
<AddressString><AddressNumber>1331</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>AA</StreetName> <StreetNamePostDirectional>NE,</StreetNamePostDirectional> <PlaceName>AMHERST</PlaceName> <PlaceName>JUNCTION,</PlaceName> <StateName>WI</StateName> <ZipCode>54407</ZipCode></AddressString>
<AddressString><AddressNumber>133</AddressNumber> <StreetNamePreDirectional>W</StreetNamePreDirectional> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>LL,</StreetName> <PlaceName>AMHERST,</PlaceName> <StateName>WI</StateName> <ZipCode>54406</ZipCode></AddressString>
<AddressString><AddressNumber>123</AddressNumber> <StreetNamePreType>COUNTY</StreetNamePreType> <StreetNamePreType>ROAD</StreetNamePreType> <StreetName>ABC,</StreetName> <OccupancyType>APT</OccupancyType> <OccupancyIdentifier>12,</OccupancyIdentifier> <PlaceName>IOLA,</PlaceName> <StateName>WI</StateName> <ZipCode>54445</ZipCode></AddressString>
<AddressString><AddressNumber>200</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>ELM,</StreetName> <PlaceName>DENVER,</PlaceName> <StateName>COLORADO</StateName></AddressString>
<AddressString><AddressNumber>55</AddressNumber> <StreetName>WINDSOR</StreetName> <StreetNamePostType>PLACE,</StreetNamePostType> <PlaceName>CHAMPAIGN,</PlaceName> <StateName>ILLINOIS</StateName></AddressString>
<AddressString><AddressNumber>5</AddressNumber> <StreetNamePreDirectional>NORTH</StreetNamePreDirectional> <StreetName>MAIN,</StreetName> <PlaceName>VAN</PlaceName> <PlaceName>NUYS,</PlaceName> <StateName>CALIFORNIA</StateName></AddressString>
<AddressString><AddressNumber>2609</AddressNumber> <StreetName>BAYVIEW,</StreetName> <PlaceName>FORT</PlaceName> <PlaceName>LAUDERDALE,</PlaceName> <StateName>FL</StateName></AddressString>
<AddressString><AddressNumber>12855</AddressNumber> <StreetName>6TH</StreetName> <StreetNamePostType>AVE,</StreetNamePostType> <PlaceName>N.</PlaceName> <PlaceName>MIAMI,</PlaceName> <StateName>FL</StateName> <ZipCode>33161</ZipCode></AddressString>
<AddressString><AddressNumber>783</AddressNumber> <StreetName>HOPE</StreetName> <StreetNamePostType>ST,</StreetNamePostType> <PlaceName>PROVIDENCE,</PlaceName> <StateName>RHODE</StateName> <StateName>ISLAND</StateName> <ZipCode>02906</ZipCode></AddressString>
<AddressString><AddressNumber>200</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>ELM,</StreetName> <PlaceName>DENVER,</PlaceName> <StateName>COLORADO</StateName></AddressString>
<AddressString><AddressNumber>977</AddressNumber> <StreetName>PLEASANT</StreetName> <StreetNamePostType>STREET,</StreetNamePostType> <PlaceName>N.</PlaceName> <PlaceName>ORANGE,</PlaceName> <StateName>NJ</StateName> <ZipCode>07052</ZipCode></AddressString>
<AddressString><AddressNumber>610</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>MAIN</StreetName> <PlaceName>MARION</PlaceName> <StateName>KANSAS</StateName></AddressString>
<AddressString><AddressNumber>10</AddressNumber> <StreetNamePreDirectional>EAST</StreetNamePreDirectional> <StreetName>LAKE,</StreetName> <PlaceName>DENVER,</PlaceName> <StateName>COLORADO</StateName></AddressString>
<AddressString><AddressNumber>2735</AddressNumber> <StreetName>PAWTUCKET</StreetName> <StreetNamePostType>AVE</StreetNamePostType> <PlaceName>EAST</PlaceName> <PlaceName>PROVIDENCE</PlaceName> <StateName>RHODE</StateName> <StateName>ISLAND</StateName> <ZipCode>02914</ZipCode></AddressString>
<AddressString><AddressNumber>5548</AddressNumber> <StreetName>ELMER</StreetName> <StreetNamePostType>AVENUE,</StreetNamePostType> <PlaceName>N.</PlaceName> <PlaceName>HOLLYWOOD,</PlaceName> <StateName>CA</StateName> <ZipCode>91601</ZipCode></AddressString>
</AddressCollection>
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "usaddress"
version = "0.5.13"
version = "0.5.14"
description = "Parse US addresses using conditional random fields"
readme = "README.md"
license = {text = "MIT License", url = "http://www.opensource.org/licenses/mit-license.php"}
Expand Down
16 changes: 16 additions & 0 deletions training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,22 @@ Congratulations! The model has officially improved. You can safely move on to st

If any of our tests failed, however, things become more complicated. The output will break down the tests that failed, showing you the parse that the model produced (labeled `pred`) and the parse that the test expected (labeled `true`). In this case, jump to step 5a to debug your errors.

If you'd like to additionally spot check singular addresses in the python shell, install a virtual environment, activate it, install your WIP version of this package, and open a shell.
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]" -v
python
# shell starts up
>>>
```

Then import usaddress and start parsing!
```python
>>> import usaddress
>>> usaddress.parse("a funky address")
```

**5a. Repeat steps 1-4 until the tests pass.**

If you've arrived at this step, it means that some of your tests failed. Uh oh!
Expand Down
Loading