Segarmakers, Wheelwrights, Merchants: Extracting a Million-Record Dataset from Historical NYC City Directories

City directories present a tantalizing data source for the demographic, occupational, and spatial history of urban environments. New York City’s listings are no exception, with more than 120 years of directories and over a million entries documenting the city’s inhabitants available for public use.

While these directories have been digitized and made publicly available by the New York Public Library and other institutions, extracting the directory entries for data analysis poses additional challenges involving computer-assisted automated field detection and language parsing. Come hear about updates on this ongoing effort, part of the NYPL’s Space/Time Directory, being completed in collaboration with members of New York University’s Data Services team.

To attend, please RSVP via Meetup here.