Stamen Design: “Last month, Stamen launched parks.stamen.com, a project we created in partnership with the Electric Roadrunner Lab, with the goal of revealing the diversity of social media activity that happens inside parks and other open spaces in California. If you haven’t already looked at the site, please go visit it now! Find your favorite park, or the parks that are nearest to you, or just stroll between random parks using the wander button. For more background about the goals of the project, read Eric’s blog post: A Conversation About California Parks.
In this post I’d like to describe some of the algorithms we use to collect the social media data that feeds the park pages. Currently we collect data from four social media platforms: Twitter, Foursquare, Flickr, and Instagram. We chose these because they all have public APIs (Application Programming Interfaces) that are easy to work with, and we expect they will provide a view into the different facets of each park, and the diverse communities who enjoy these parks. Each social media service creates its own unique geographies, and its own way of representing these parks. For example, the kinds of photos you upload to Instagram might be different from the photos you upload to Flickr. The way you describe experiences using Twitter might be different from the moments you document by checking into Foursquare. In the future we may add more feeds, but for now there’s a lot we can learn from these four.
Through the course of collecting data from these social network services, I also found that each service’s public API imposes certain constraints on our queries, producing their own intricate patterns. Thus, the quirks of how each API was written results in distinct and fascinating geometries. Also, since we are only interested in parks for this project, the process of culling non-park-related content further produces unusual and interesting patterns. Rural areas have large parks that cover huge areas, while cities have lots of (relatively) tiny parks, which creates its own challenges for how we query the APIs.
Broadly, we followed a similar approach for all the social media services. First, we grab the geocoded data from the APIs. This ignores any media that don’t have a latitude and longitude associated with them. In Foursquare, almost all checkins have a latitude and longitude, and for Flickr and Instagram most photos have a location associated with them. However, for Twitter, only around 1% of all tweets have geographic coordinates. But as we will see, even 1% still results in a whole lot of tweets!
After grabbing the social media data, we intersect it with the outlines of parks and open spaces in California, using polygons from the California Protected Areas Database maintained by GreenInfo Network. Everything that doesn’t intersect one of these parks, we throw away. The following maps represent the data as it looks before the filtering process.
But enough talking, let’s look at some maps!”