Skip to content

Conversation

@bbharathrao
Copy link

@bbharathrao bbharathrao commented Mar 25, 2019

Training for the address with 3 or 4 (numeric only) numbers followed by HC. The numbers can be length of 3 to 4 characters with or without leading zeros.
Example 1:

address1 = usaddress.tag("HC 095 Box 23")

Currently It is throwing an error as

ORIGINAL STRING: HC 095 Box 23
PARSED TOKENS: [(u'HC', 'USPSBoxType'), (u'095', 'USPSBoxID'), (u'Box', 'USPSBoxType'), (u'23', 'USPSBoxID')]
UNCERTAIN LABEL: USPSBoxType

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '095'),
('USPSBoxType', 'Box'),
('USPSBoxID', '23')]),
'PO Box)

Example 2:

address1 = usaddress.tag("HC 235 Box 1A")

Currently It is throwing an error as

ORIGINAL STRING: HC 235 Box 1A
PARSED TOKENS: (u'HC', 'USPSBoxType'), (u'235', 'USPSBoxID'), (u'Box', 'USPSBoxType'), (u'1A', 'USPSBoxID')]
UNCERTAIN LABEL: USPSBoxType

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '235'),
('USPSBoxType', 'Box'),
('USPSBoxID', '1A')]),
'PO Box)

Example 3:

address1 = usaddress.tag("HC 2302 Box 65")

Currently It is parsing as Street Address

pprint(address1)
(OrderedDict([('AddressNumber', 'HC'),
('StreetName', '2302'),
('USPSBoxType', 'Box'),
('USPSBoxID', '65')]),
'Street Address')

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '2302'),
('USPSBoxType', 'Box'),
('USPSBoxID', '65')]),
'PO Box)

Example 4:

address1 = usaddress.tag("HC 0955 Box 12")

Currently It is parsing as Street Address

pprint(address1)
(OrderedDict([('AddressNumber', 'HC'),
('StreetName', '0955'),
('USPSBoxType', 'Box'),
('USPSBoxID', '12')]),
'Street Address')

It needs to be parsed this way.

pprint(address1)
(OrderedDict([('USPSBoxGroupType', 'HC'),
('USPSBoxGroupID', '0955'),
('USPSBoxType', 'Box'),
('USPSBoxID', '65')]),
'PO Box)

Training xml located at:
usaddress/training/HC_XXXX.xml

Testing xml located at:
usaddress/measure_performance/test_data/test_HC_XXXX.xml

@xmedr
Copy link
Contributor

xmedr commented Apr 17, 2025

This is great work here @bbharathrao, thank you! We've got this training/testing data in #390 mostly because of the new testing suite that was added to this repo. So I'll be closing this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants