Before I learned how to scrape data from websites – I happen to use the BeautifulSoup package – I learned about this brute force method using a Python script to download images from Google Street View. I want to grab all the Google Street View images for a particular search value. I have a great interest in art and culture as social infrastructure so I'm going to query "museum"; but I can just as easily look up "gallery", "theater", or "cinema" etc. You get the idea. The results shown are: MoMA, Guggenheim, Met, New York Historical Society, Children's Museum, and Bronx Museum.
Manually "scrape" Google Maps by typing in the search term ("museum") and selecting all the results on the first page. Copy the selected results and paste into a blank Google Sheet. The only information relevant to this exercise is the latitude and longitude. Sort the results (Data > Sort Sheet by Column A-Z) and delete all the rows without an address.
At this point, all the data is in one column. Split the text into their respective columns and clean off the non-address portions. Highlight the column, split the text (Data > Split text to columns...), change the dropdown options from comma to space since this is the delimiter on which we'll will be splitting the data. Highlight the non-address columns and delete, shift rows to the left.
The address now needs to be concatenated together to include the city and state – in this case, it's New York, New York. To combine the results of several cells in Google Sheets (or MS Excel) paste the following formula into the first empty cell adjacent to your cell data. Include as many cells as you have filled with data and add a spacer (" ") between each.
=CONCATENATE(A1," ",B1," ",C1," ",D1, " New York, New York")
Copy down the formula by selecting the cell that has the formula, place the cursor in the lower-right corner until it turns into a plus sign (+) and drag the fill handle down. Great! Now we have addresses that can be turned latitude and longitude values using a batch geocoder. Copy the addresses and paste them into the Addresses field of the geocoder. Make sure:
- Addresses are in: United States
- Separate text output with: Tabs
- Include these columns in text output: Deselect all checkboxes