Replies: 1 comment 2 replies
-
Hi @JerryXin2, I'll show you how to do it in the Linux command line (it's useful to get a bit familiar with this): Assuming you've unpacked the files in a folder named cd download
ls -a
business-licenses.csv
business-owners.csv
cityofchicago_business_licenses_dataset_description.pdf
socrata_metadata_business-licenses.json
socrata_metadata_business-owners.json We'll use csvkit to look at sudo apt install csvkit
head business-owners.csv | csvlook
| Account Number | Legal Name | Owner First Name | Owner Middle Initial | Owner Last Name | Suffix | Legal Entity Owner | Title |
| -------------- | --------------------------------------------- | ---------------- | -------------------- | --------------- | ------ | ------------------ | --------------- |
| 373,231 | PROCURIA CONSULTING, LLC | GUY | WILLIAM | SELLARS | | | MANAGING MEMBER |
| 203,002 | NONA LISA FLOWERS & GIFTS | NELCY | | SANTANA | | | PARTNER |
| 338,012 | HAROLDS II BAR AND GRILL INC. | GREGORY | | EDINGBURG | | | SECRETARY |
| 55,221 | CHICAGO PROVINCE SOCIETY OF JE | GEORGE | A. | LANE | | | PRESIDENT |
| 41,354 | WORLDWIDE TRAVEL MANAGEMENT INC | ANTHONY | | VERA | | | SECRETARY |
| 35,593 | FRANCES RENK | FRANCES | | RENK | | | OTHER |
| 454,624 | KIDS PLANET ACADEMY, INC. | STELLA | | NTIM | | | SECRETARY |
| 261,601 | ATLANTIC HEATING AND AIR CONDITIONING COMPANY | ROBERT | B | BISHOP | | | PRESIDENT |
| 203,029 | MIDDLE OF NOWHERE, INC. | Dion | | Antic | | | PRESIDENT | Everything looks good, so let's extract the second column. head business-owners.csv | csvcut -c 2
Legal Name
"PROCURIA CONSULTING, LLC"
NONA LISA FLOWERS & GIFTS
HAROLDS II BAR AND GRILL INC.
CHICAGO PROVINCE SOCIETY OF JE
WORLDWIDE TRAVEL MANAGEMENT INC
FRANCES RENK
"KIDS PLANET ACADEMY, INC."
ATLANTIC HEATING AND AIR CONDITIONING COMPANY
"MIDDLE OF NOWHERE, INC." Looks like the right thing, so now let's take the entire second column and save it to a file: csvcut -c 2 business-owners.csv > business_names.txt The result is in If you want to remove the quotes around some of the business names, you can do the following, using the
I'll attach the result in a second |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Olivier, I got a list of chicago business names to use for the address matching case study, but I can't seem for the life of me isolate the business names into 1 csv file.
https://www.kaggle.com/datasets/chicago/chicago-business-licenses-and-owners
Here is the kaggle link for the dataset. Do you have any tips on how I can convert the data into a csv file so I can read it into the address matching example?
I have tried using Rstudio to select for "Legal Name" and slice the first 30000 values, couldn't export it as a csv
I tried natively doing it in Numbers/excel, couldn't get a correct csv file.
Jerry
Beta Was this translation helpful? Give feedback.
All reactions