Welcome to the final post where we are walking through a data visualization task (read part 1 here and part 2 here). The goal is to create a U.S. map for a company (John Doe Cleans), where each state in which the company has a presence is shaded and the states with no presence are left unshaded.
We’re just about done walking through the data visualization work to produce a U.S. map for a company (John Doe Cleaning) based on the states where the company has a presence. We need to shade each state in which the company has a presence and leave unshaded the states with no presence. The primary challenge–other than a very tight timeline–is that “presence” could hold different meanings. John Doe Cleaning has business locations, which can have territories and vehicles, all three of which have geospatial coordinates, and all three of which can constitute a presence in that area.
Since my deadline for this task was under one business day, my goal was to go for a quick and dirty solution. This means that I used different tools for different purposes: quick data matching using a spreadsheet, more complex joins using R code, reverse geocoding through a service on the web. In Part III, we’ll use another of my favorite tools, Venngage, to create our map for John Doe Cleaning.
To quickly review what we’ve done so far:
- We started with CSV files for Locations, Territories, Vehicles, and Geopoints
- We generated lists from these and concatenated them into one for reverse geocoding
- We got the state for each set of geographical coordinates in the list.
What we need for our map, then, is the list of U.S. states with an indicator of whether there are any “presences” in each state or not. Once we have this, we are all set to make our map.
Reviewing our reverse-geocoded data
At the end of Part II, we had just sent our list of geographical coordinates to the Geocodio service. The spreadsheet we receive back looks something like this.
We have columns for the longitude and latitude that we submitted, with the matching coordinates for the closest street address to that point. There are two accuracy measures, and then the actual street address that is closest to each point. We have city, state, ZIP code, the county, and the source of the data that Geocodio used for its lookup.
Pivoting the data by state
We have all the information we need, but it isn’t yet in the easiest form for making our map. This is a perfect time to use a PivotTable. If you haven’t used PivotTables in your spreadsheet app before, it’s okay; we’ll go one step at a time, and we’ll make a very simple table. I will show Excel instructions here. You can find Google Sheets instructions at the end of this post.
First, select all of the data in the Geocodio spreadsheet. On the Insert ribbon, the first option is PivotTable. The Create PivotTable box gives us two options: the source data to use (which you selected already), and the destination for the PivotTable. I almost always accept the default, which is a new tab in the spreadsheet.
Now we see an empty PivotTable on the left side of the page. At the right side is the PivotTable Fields pane. What I want is a list of states, so I want each Row to represent a state. To do this, I start in the Field Name box at the top and scroll down the list until I find State. Now we just need to click State, drag it down to the Rows block at the bottom left of the pane, and drop it there. You should see State in the Rows block, and if you glance over where the empty PivotTable was, you can see that it now has a row for each state in our data.
Next, I want to add a value to each state to show me how many times that state appears in the list of data. The easiest way to do this is to identify a column from our data that is not numeric, like “Accuracy Type” or “Street.” Then we click on that field in the Field Name box, drag it down to the Values block, and drop it there. The PivotTable Fields pane should now look like this. And at the left of the spreadsheet tab, your PivotTable should now hold a row for each state in your data with the number of locations in that state from our list. The Grand Total should be the same as the number of rows of data that we selected. If the last row in our big data list is 494, then we have 493 rows (not including the heading row), and so the Grand Total should be 493. If it isn’t, there may be a problem in our data, such as a location that could not be matched to a state.
Making our map
Now that we have a list of states where John Doe Cleaning has a presence, we are ready to map!
As mentioned in Part II, R has some great libraries for generating maps from your data. Ordinarily, I’d use R code to generate the map, but I was short on time for this task, so I took a shortcut by using another favorite tool: Venngage.com. Venngage helps in making infographics by providing polished templates, a rich set of icons and illustrations, the ability to add your own images, and a bunch of charts for your data.
We need just a blank background with a US map, so I chose the “Blank” template. In the Maps section of the toolbar on the left, I dragged the United States Map onto the blank page, which gives me a default map with many relaxing shades of blue.
Next, I need to use the list in my PivotTable to fill in the states. Venngage will use each value I give it to produce a chloropleth or heat map. For this purpose, I don’t want to give clues about how many locations I may have in each state; I only want to shade the states that have locations in them. As a result, I will assign a value of one if my state has a value of one or more in the PivotTable, or zero otherwise.
To enter the data, double-click the map on the page, and you will see a panel at right with a mini-spreadsheet; even better, all of the states are already listed, so we won’t forget any of them. Now we just fill in column B with our values of zero or one. If a state isn’t in the PivotTable, then we know it has zero locations of John Doe Cleaning. The map will automatically update as you enter the data.
Now that we’ve entered the data for all the states, the map will have only two shades of blue. My original task was to shade the state if it has a location and leave it unshaded if not. This means I’ll need to adjust the colors as well. At the top of the panel, you’ll see that we’ve worked on the Data tab. The Settings tab has the design and formatting options we need, including colors, fonts, titles, legends, and labels. I had the color palette for my client’s logo and website, so I made a few variations for them.
The Download button gives options for downloading your map as an image or a PDF file, so you can include it in a document or presentation.
Here’s the final map I made as an example while working on this blog post. Each state shaded green is one where our fictional John Doe Cleaning has an office location, a territory, or a vehicle. The states in white are the ones where John Doe Cleaning has no presence at all.
Let’s quickly review what we did for this analysis task. Our goal was to create a map of the U.S., where each state where John Doe Cleaning has a presence is shaded, and the remaining states are left unshaded. The size of each state’s presence did not matter. Based on discussing the question with the client, we learned that John Doe Cleaning has Locations (business offices) as well as Territories and Vehicles, which can also be tied to geolocation. We decided to use the geolocation of each of these, to maximize the number of states we can shade. The client gave us CSV files for Locations, Territories, Vehicles, and Geopoints. Oh yes, and this was needed ASAP, like in 2 or 3 hours.
- We started by pulling the CSV files together into one spreadsheet for some organization.
- We realized that each Location has to fit one of four cases: Location with Territories, Location with Vehicles, Location with both Territories and Vehicles, and Location with neither Territories nor Vehicles. Based on these, we used COUNTIF and AND functions to tell us which case each Location fits. Then we split each case into a separate tab.
- We observed that even though we made a tab for the case where a Location has no Territories and no Vehicles, we still need to look up the Geopoint record for all Locations. So we will use the tabs for Locations (all), Locations with Territories, Locations with Vehicles, and Locations with both. We will not need the tab for Locations with either.
- In RStudio, we read in the data from the original Locations tab and the three tabs that have Locations with Territories and/or Vehicles. We also read in the Geopoints data.
- We used left_join to add the longitude and latitude from Geopoints to each Location, Location-Territory, and Location-Vehicle.
- Then we combined our dataframes into one, which we saved off as a spreadsheet and uploaded to the Geocodio service.
- Geocodio gave us a CSV file that gives the state for each “presence” John Doe Cleaning has in the U.S.
- We used a PivotTable to compress that data down into state and number of presences in each state.
- Using Venngage, we created a U.S. Map and entered one (= a presence) or zero (= no presence) for each state. After adjusting our colors to meet the definition of the initial task, we downloaded the map as an image.
To accomplish this, we used a spreadsheet, like Excel or Google Sheets, RStudio, Geocodio, and Venngage. With the exception of Excel, all of the services could work on a free tier. I know it sounds more complicated to cobble together four different tools and services for the task. You may work more efficiently with a different set, which is just fine. This was the quickest path I saw to my destination.
Google Sheets Pivot Table Instructions
First, select all of the data in the Geocodio spreadsheet. Select Data, Pivot table from the menu. The Create Pivot table box gives us two options: the source data to use (which you selected already), and the destination for the PivotTable. I almost always accept the default, which is a new tab in the spreadsheet.
Now we see an empty PivotTable on the left side of the page. At the right side is the Pivot table editor pane. What I want is a list of states, so I want each Row to represent a state. To do this, I click the Add button next to Rows, and select State from the list. Now you should see State in the Rows block, and if you glance over where the empty Pivot table was, you can see that it now has a row for each state in our data.
Next, I want to add a value to each state to show me how many times that state appears in the list of data. The easiest way to do this is to identify a column from our data that is not numeric, like “Accuracy Type” or “Street.” Then click on the Add button next to Values and select the non-numeric column of your choice. At the left, your Pivot table should now hold a row for each state in your data with the number of locations in that state from our list. The Grand Total should be the same as the number of rows of data that we selected. If the last row in our big data list is 494, then we have 493 rows (not including the heading row), and so the Grand Total should be 493. If it isn’t, there may be a problem in our data, such as a location that could not be matched to a state.
We love data science here at Web Teks. If you have an idea for a project, app, or website, let us know how we can help!