A Million Walks in the Park

How can “big data” inform the ways we understand and intervene in urban spaces to provide public benefit? In this guest post, Wenfei Xu of CARTO research describes the mapping project “A Million Walks in the Park,” which analyzed mobile phone data to illustrate how people move through New York City’s parks. She argues that such data analyses can inform more traditional techniques of observation in public life studies, which in turn can shape better design and policy related to public spaces. – Gehl Institute.


Sheep Meadow, Central Park.

The vast majority (90%) of park and recreation professionals express that data plays an important role in their strategic and day-to-day decisions. [1] However, data collection and analysis remains difficult and labor intensive, as it is often gathered through time- and money-intensive surveys of residents or paid staff/volunteers manually counting users. This begs the question: How might we measure the use of parks, and develop a more granular understanding of the ways in which people move through them, while reducing the cost of data collection?

At CARTO research, we recently published a spatial data science project called “A Million Walks in the Park”, which demonstrates how data collected from mobile phones can help us better understand patterns of use in New York City’s public parks.

Our goal for this project was twofold: We wanted to showcase the breadth of actionable information for planners, designers, and parks officials that can be derived from big spatial data and data science; further, we wanted to create an interactive tool that makes this analysis accessible and meaningful to a broader community.  Overall, we believe that applying spatial data science methods to urban design and planning questions can be complementary to ground-level, observational techniques, such as those pioneered by Jan Gehl and further developed by Gehl Institute.

The three parts of our project are outlined below. First, the “Large”: how we turned a raw dataset into information for understanding the granular movement of people through Central and Prospect Parks in New York City. Second, the “Small”: how this data can be scaled to look at each of the parks in the city. Third, the “Meta”: a comparison across all parks on how the size of parks can relate to the numbers of park visitors they attract, and what this might mean for design and policy.


CARTO dashboard showing spatio-temporal mobility and interaction potential areas in Central Park

Our project began with a raw dataset of over 10 million observations across New York City in May of 2017, provided to us by a digital advertisement company called LiveRamp, which collects data from a wide range of mobile applications. We created a geofence around all of the parks in Manhattan and Brooklyn, which gave us a dataset for each park.

Next, we ran a series of analyses on each park, including a count of the number of users, trips, trip durations, weekly mobility patterns, travel modes, the pattern of hotspots within the park, and potential for social interaction. Some of the hotspots in Central Park are centered on well-known destinations such as the Metropolitan Museum of Art or the Delacorte Theater, while others are areas that arise more naturally, perhaps because they were discovered to be attractive picnic or recreational destinations. You can explore our findings with this interactive tool.

We created an “interaction potential” measure that looks at how much time overlap people share in different clusters of spaces in Central Park and Prospect Park (see map above). For instance, two people lingering and reading in the same space at the same time would have a higher interaction rating than two joggers who pass each other because they share a larger space-time overlap.

Chart showing how social interaction potential increases with an increase in the number of unique trips in each of the hotspots in Central Park. Although the results are generally not surprising — the more popular a hotspot is, the more social interaction potential there is — the ‘outliers’ (circled) are interesting indicators of how urban design and program can influence unusually high social interaction.

We expected a positive relationship between the most popular clusters, where there are more people, and a higher interaction potential. However, we discovered that small and less-populated points-of-interest can garner unusually high interaction potential, as highlighted in the chart above. These are places such as the Oak Bridge and the Alice in Wonderland statues in Central Park. They are not the most frequented sites, but when people choose to visit they often do so at similar times and coexist in the space together. What explains their unusually high interaction potential?

View from the Oak Bridge, one of the hotspots within Central Park.

A quick search on a traveler’s website revealed that Oak Bridge offers beautiful views of the lake, which makes it popular among tourists who may follow similar paths through the park.

The Alice in Wonderland statue in Central Park is another hotspot that has an unusually high social interaction potential.

We also discovered that the Alice and Wonderland statues function as a children’s playground, where children and adults gather at similar times after school or on weekends.

Thus, while we are able to highlight spatial regions of interest through our data-driven research, it was an observational awareness that provided us with a context and potential explanation for why certain spaces are unique. When we think about the role of urban data science in design and planning, we should view it as an instrument that can inform the design process, rather than being fully explanatory or determining the design process.


To demonstrate how such analyses can scale with relative ease, we created an interactive catalogue of most Brooklyn and Manhattan parks. This catalogue gives us quick insight into different parks and enables us to compare the behaviors of their users. For instance, we found that neighborhood parks such as Fort Greene Park have regular visit patterns throughout the week, whereas a park that is more programmatically specific, such as the Abe Stark Skating Rink, sees a majority of its visitors on the weekends. Within each park, we could see that most of the walking and running activity took place on paved paths, while most lingering happened at points-of-interest or grassy areas. We also calculated that the average park visit is about 15 minutes among all users.

Such insights can then be used as evidence for better design, planning, and resource allocation. For instance, an analysis of where and how many people run at night may provide evidence for improved street lights within the park. Or, a study of usage patterns before and after the installation of new playground equipment can help agencies advocate for improvements with relative ease and fewer monetary resources. These metrics might also help us understand the impact of policy changes, new features, seasonal and weather effects, and specific events.

A comparison of the visit and mobility patterns of Abe Stark Skating Rink and Fort Greene Park. The skating rink sees a spike visits on the weekends and with most people spatially gathered around the entrance of the rink, whereas a neighborhood park like Fort Greene Park has a more regular usage pattern across the week with the major park paths used for more active activity and grassy areas used for more static activity.


The last piece of our project looks at the connection between the physical footprint of parks and the number of visits they receive. Our analysis showed that park visits actually grow relatively slowly in relation to their size. In other words, smaller parks often host more people relative to their size than bigger parks; the number of visits grows slower than at a one-to-one pace relative to the square footage of a park. [2] This raises questions about resource allocation and sets up a possible case for greater per-square-foot proportional funding for smaller, neighborhood parks.

Rather than providing answers, our analysis brought out interesting questions for us about the factors that explain park usage: How does design impact usage? Does residential density impact these numbers? Do green space or programmatic elements such as playgrounds and running paths matter? And how does the age of these facilities factor in? Do small parks receive a lot of in-kind local stewardship support, making up the difference in spending in some areas?

An interactive exploration of how quickly park visits grow with the size of the park, colored by park type. This kind of tool can allow decision-makers to explore larger questions of resource allocation and factors that explain park usage across different type and sizes of parks.

Increasingly, we live in a world where “big data” is becoming smaller; it is becoming more granular and specific, which enables us to make discoveries in smaller regions in space. On the other end, qualitative data are becoming “bigger”; methods of collecting ground-level data are becoming more streamlined, allowing them to scale more easily. As these oppositely-scaled methods seem to slowly converge, I hope that we have demonstrated that they can act in complementary ways. Data analyses, especially about space and people, needs contextual evidence, while ground-level observations can be guided by large-scale computational techniques.



[1]  See: May 2016 report by the National Recreation and Park Association.

[2] For those curious about the more technical details, we looked at the scaling properties of visits per square foot and found that visits scale sublinearly at a slope of approximately 0.8.

CARTO research works at the intersection of data science and geography to create new insight from location data streams. They partner with business, academic, and non-profit organizations to create case studies, develop solutions, and build tools to address questions that are fundamentally spatial in nature.

A note on privacy: In this data, we were not given any demographic information about the users or were told specifically what applications the data came from. Nevertheless, privacy is an important issue and we are still working on developing the necessary policies and practices to properly anonymize data that we use, including further anonymization and aggregation methods.