Skip to content

brianjbuck/us-address-parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

usaddress

usaddress is a python library for parsing unstructured address strings into address components, using advanced NLP methods.

What this can do: Using a probabilistic model, it makes (very educated) guesses in identifying address components, even in tricky cases where rule-based parsers typically break down.

What this cannot do: It cannot identify address components with perfect accuracy, nor can it verify that a given address is correct/valid.

We currently only support python 2.7

Installation

> pip install usaddress

To build and test development code.

> pip install -r requirements.txt
> python setup.py develop
> python training/training.py
> nosetests .

Usage

>>> import usaddress
>>> usaddress.parse('123 Main St. Suite 100 Chicago, IL')
[('123', 'AddressNumber'), 
 ('Main', 'StreetName'), 
 ('St.', 'StreetNamePostType'), 
 ('Suite', 'OccupancyType'), 
 ('100', 'OccupancyIdentifier'), 
 ('Chicago,', 'PlaceName'), 
 ('IL', 'StateName')]

Important links

Team

Errors / Bugs

If something is not behaving intuitively, it is a bug, and should be reported. Report it here

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Send us a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2014 Atlanta Journal Constitution. Released under the MIT License.

About

US address parsing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published