Skip to content

Conversation

@connortodd21
Copy link
Contributor

@connortodd21 connortodd21 commented Aug 24, 2025

Overview

For us_counties, change the counties URL to use the published 2020 version. This version contains a few changes described here

Fixes #44

Changelog

  • Makefile
    • add make_data action. Helped me when running make dl as without this action there will be a file not found error for data/
    • update dl action use use make_data
    • update help action to reference make_data
    • add clean-data action to remove directory. Added this to clean action as well
    • Change data/us_counties.txt action to use new URL
  • bin/us_counties.py
    • Added definitions for what each index references
    • parse new counties file
      • Decided to go with a loop instead of list compression as there is another step required

Testing

After running make dl && make json, the new us_counties.json contains entries for each of the counties mentioned in #44
Taken by manually scanning us_counties.json

{"fips": "02158", "name": "Kusilvak Census Area", "state": "AK"}
{"fips": "30111", "name": "Yellowstone County", "state": "MT"}
{"fips": "46102", "name": "Oglala Lakota County", "state": "SD"}

@connortodd21
Copy link
Contributor Author

connortodd21 commented Aug 25, 2025

Wrote a test here to capture the difference between the old and new outputs. As you can see, most are due to special characters (in PR county names) and a few census name changes that happened between 2010 and 2020

import pandas as pd

from geonamescache import GeonamesCache

geonamescache = GeonamesCache()
counties = geonamescache.get_us_counties()

old = pd.DataFrame(counties)
new = pd.read_json("../geonamescache/data/us_counties.json",dtype={'fips': str})

extra_in_new = new.merge(old, how='left', indicator=True).query('_merge=="left_only"')
missing_in_new = old.merge(new, how='left', indicator=True).query('_merge=="left_only"')


print("\n Counties in 2020 dataset, but not in previous dataset \n")
print(extra_in_new)

print("\n Counties in previous dataset, but not in 2020 dataset \n")
print(missing_in_new)

Output below (and reasons for changes manually added by me)

 Counties in 2020 dataset, but not in previous dataset 

       fips                      name state     _merge
72    02063       Chugach Census Area    AK  left_only
73    02066  Copper River Census Area    AK  left_only
83    02158      Kusilvak Census Area    AK  left_only
89    02195        Petersburg Borough    AK  left_only
1143  22059            LaSalle Parish    LA  left_only
1803  35013           Doña Ana County    NM  left_only
2413  46102      Oglala Lakota County    SD  left_only
3158  72011          Añasco Municipio    PR  left_only
3163  72021         Bayamón Municipio    PR  left_only
3167  72029       Canóvanas Municipio    PR  left_only
3169  72033          Cataño Municipio    PR  left_only
3175  72045         Comerío Municipio    PR  left_only
3181  72055         Guánica Municipio    PR  left_only
3191  72075      Juana Díaz Municipio    PR  left_only
3195  72083      Las Marías Municipio    PR  left_only
3197  72087           Loíza Municipio    PR  left_only
3199  72091          Manatí Municipio    PR  left_only
3202  72097        Mayagüez Municipio    PR  left_only
3209  72111        Peñuelas Municipio    PR  left_only
3212  72117          Rincón Municipio    PR  left_only
3213  72119      Río Grande Municipio    PR  left_only
3216  72125      San Germán Municipio    PR  left_only
3219  72131   San Sebastián Municipio    PR  left_only

 Counties in previous dataset, but not in 2020 dataset 

       fips                        name state     _merge
86    02195      Petersburg Census Area    AK  left_only               (renamed to Petersburg Borough)               
91    02261  Valdez-Cordova Census Area    AK  left_only               (split in to Copper River and Chugach Census Areas)
92    02270    Wade Hampton Census Area    AK  left_only               (Renamed/merged into Kusilvak Census Area)
1142  22059             La Salle Parish    LA  left_only               (name change -> space removed)
1802  35013             Dona Ana County    NM  left_only               (Special character in name)
2417  46113              Shannon County    SD  left_only               (Renamed/merged with Oglala Lakota County)
2916  51515                Bedford city    VA  left_only               (renamed to Bedford County)
3158  72011            Anasco Municipio    PR  left_only               (Special character in name)
3163  72021           Bayamon Municipio    PR  left_only               (Special character in name)
3167  72029         Canovanas Municipio    PR  left_only               (Special character in name)
3169  72033            Catano Municipio    PR  left_only               (Special character in name)
3175  72045           Comerio Municipio    PR  left_only               (Special character in name)
3181  72055           Guanica Municipio    PR  left_only               (Special character in name)
3191  72075        Juana Diaz Municipio    PR  left_only               (Special character in name)
3195  72083        Las Marias Municipio    PR  left_only               (Special character in name)
3197  72087             Loiza Municipio    PR  left_only               (Special character in name)
3199  72091            Manati Municipio    PR  left_only               (Special character in name)
3202  72097          Mayaguez Municipio    PR  left_only               (Special character in name)
3209  72111          Penuelas Municipio    PR  left_only               (Special character in name)
3212  72117            Rincon Municipio    PR  left_only               (Special character in name)
3213  72119        Rio Grande Municipio    PR  left_only               (Special character in name)
3216  72125        San German Municipio    PR  left_only               (Special character in name)
3219  72131     San Sebastian Municipio    PR  left_only               (Special character in name)

@yaph
Copy link
Owner

yaph commented Aug 28, 2025

Thanks for the PR Connor! I merged your code from the command line. Before I create a new release, I need a solution for #43 though.

@yaph
Copy link
Owner

yaph commented Sep 2, 2025

Thanks for the contribution!

Merged in 74399c9 and 351ded3

@yaph yaph closed this Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

US counties missing two entries

2 participants