06 Nov 2016

Subscribe to Newsletter

I stumbled on a pretty awesome dump of NYC bike stats recently through the NYC OpenData project. Apparently, every year on a particular day in the fall teams go out to specific locations throughout the city and collect data on a bunch of bike related measures - such as how many riders, the split between male and female riders, whether people are wearing helmets, etc. They have been doing this consistently since 2005 in 10 locations throughout the city. With all of those counts, it makes me wonder if I have ever been counted as part of this program! Here are some interesting things that I found.


Ridership Over Times

In general, data suggests that bike riding is on the rise in NYC. On average, the ridership has gone up 10% year on year. I would argue that this is a lower bound, and the real growth is more likely to be higher. I say this because the growth is a little erradic, and I am guessing that surveying was done in poor weather in some years (like 2014, where there was a 2.35% drop in ridership despite a 32% growth in 2013 and a 20.84% growth in 2015). Ridership is growing throughout the city, but as you can see below, lower Manhattan has been pretty dominate in the ranking of areas with the most riders.

Ridership Rank Over Time


It is pretty clear that there are more male riders than female riders in NYC, but it is a little surprising to see just how off balance it is. It is about 80% male versus 20% female, and while that has come down a bit from 2005 (14% female and 86% male) the change is not that signficant and women are still vastly under-represented in the bike lanes..

Helmet Usage

Let me just start by saying that there is no good reason not to wear a helmet when riding a bike in NYC. To each their own, but I question anyone’s intelligence and/or sense of self worth when I see them riding around without a lid. You can be the best cyclist in the world, but all it takes is a pedestrian staring at their iPhone stepping into the bike lane, another less coordinated cyclist clipping your wheel, or a passenger opening a door without looking to land you in the hospital. And with a helmet, you are more likely to walk out of that hospital and ride another day. </sermon>

Helmet usage is pretty disappointing in NYC. Based on the stats, it looks like only around 41% of riders wear helmets. Women are better about it than men, with 45% of women versus 40% of men wearing helmets, but even 45% is pretty low. Pro tip: NYC gives away helmets for free, find out more information here.

Helmet Utilization

Bike Lane Usage

There are some good reasons not to use bike lanes sometimes (like cars parking in them), but in general I would argue that riding in the bike lane is safer than riding in traffic. The hierarchy might look something like: bike lane > traffic > against traffic > biking on the sidewalk. Don’t bike on the sidewalk. Seriously.

Looks like NYC does pretty good by this measure, and is getting even better. Most riders were counted in the bike lanes (78.15%), very few on the sidewalk (0.83%), and not many riding against traffic (3.91%). Back in 2005, 4.09% of riders were counted on the sidewalk, so we can see a big improvement in this regard.

Bike Lane Utilization

And here is a handy link to the laws for cyclists in NYC. No biking on the sidewalks!

nyc bike safety opendata

30 Oct 2016

Subscribe to Newsletter

I made a point in my last point about differentiating between a change in the crime rate and a significant change. I want to elaborate on that point a little more, because this is something that is so often overlooked but is vital to understanding any analysis. So if you know all of this already, feel free to skip the next two paragraphs.

When someone quotes a stat, like saying that overall major felonies were down 1.6% in 2015, they are only presenting part of the story. It is not a lie - crime was really down 1.6% in 2015 - but you need to ask is that significant, ie, is that something real or could it be random noise. This is where a more adroit publication would quote a significance level, or p value, but I personally do not find them intuitive. So here is my explanation in a nutshell. Something like the crime rate is going to have a natural fluctuation. Crime stats are a complex system, a lot of factors move the stats, and this all contributes to what amounts to random noise.

Luckily, we have ways of dealing with this randomness, and determining the degree to which stats are part of it or not. We ask the question “what is the chance that this stat is because of noise, versus a real effect” and say something is significant if it is above a certain threshold. Commonly, that threshold is 95%, but it is important to recognize that it is not binary - the probability exists on a spectrum, even though we use this shorthand of calling something “significant” or not. We do this by calculating the variance of the changes, which is a measure of how much they bounce around. Through a simple transformation we turn this into a standard deviation. Back to our example, the standard deviation in the crime rate in NYC from 2001 to 2014 is 3.97%. Using what is called a normal distribution (which is a pretty awesome math thing in its own right), we can judge how likely particular values are to be noise. The normal distribution tells us that at one standard deviation, i.e. a 3.97% rise or fall, there is a 32% chance that what you are measuring is noise. And at two standard deviations, i.e. a 7.94% rise or fall, there a 5% chance that what are are measuring is noise, or said differently, you are 95% confident in the measure. That is a normal cutoff- 95% - but as you can see, there is more to it than just a significant or not explanation.

What does this mean for the NYC crime data? The 2015 stats suggest a 1.6% drop in major felonies, however the standard deviation of the crime rate is 3.97%, so this nets out to about a 69% chance that what we are measuring is noise. So that is to say, we cannot really say much. The drop in crime rate for 2015, based on these assumptions, is just not significant.

There are some precincts though which do have a significant change in crime rate in 2015:

Significant Changes in NYC Crime

The picture does not look great, of the 6 precincts that had a significant change (using the 95% cutoff), 5 had a turn for the worse, and there is some evidence that the one good one is a misnomer. Here is a quick rundown of what I was able to find for each:

  • 1st Precinct (Tribeca area) - 2.5 stars on Yelp, seems like grand theft auto is on the rise in that area.
  • 32nd and 34th Precinct (Harlem) - looks like they have a good connection with the community, however the head cop transfered out in 2015 (story). It looks like there was a significant rise in murders in these precincts.
  • 40th Precinct (Mott Haven, Bronx) - with a 25% rise in crime in 2015, this begs for some serious questions about what is going on. And it looks like the NYT did some in depth reporting on this. Also, this precinct was caught juicing their 2014 stats, which makes any numbers coming from this precinct highly suspect (and statistically speaking will make it difficult to make any assertions about crime stats moving forward - so the impact of this will last for years).
  • 105th Precinct (Queens Village)- this is a huge precinct, and it looks like it is split to better serve the community.
  • 104th Precinct (Ridgewood, Queens)- the one darling, with a 10.8% drop in major crimes, however cursory investigation yields articles like this and this, suggesting that the drop in crime rate has more to do with cops not responding to calls (earning them 1.5 stars on Yelp).
nyc crime opendata

25 Oct 2016

Subscribe to Newsletter

A lot has been made recently about a jump in the crime rate, so I decided to take a look at the major felony rates in NYC, with the help of the NYC OpenData project. There you can find records of all the major felonies reported, going back to 2006, with location and felony classification. Off the bat, it is pretty easy to look at the number of crimes committed over time:

NYC Crimes Committed

As you can see, crime is down overall. But in all honesty, I look at that graph and I see a pretty steady line, dominated by grand larceny (ie, stealing). But, lets dive in, and look at the rates of change of individual crimes:

NYC Crimes Rates of Change

Ok - so one conclusion to draw from this is that there is a 5.71% uptick in murders and a 5.83% uptick in rapes in 2015. While tragic in their own right, taken in isolation I would advise that those rates are a little misleading. If you look at the totality of the history we have, going back to 2001 at the aggregate crime level, one standard deviation for the rate of change of murders and rapes are 6.49% and 7.94%, respectively. So that is not to say that the upticks are a good thing, because they certainly are not, but they are also not necessarily indicative of a regression in crime rates, rather they seem to be within the normal fluctuations of the system. Contrast that with burglary rates, which dropped by over 10% in 2015 and have an annual standard deviation of 4.11%. This is more likely a relevant drop.

Worst Areas

NYC Worst Areas

The top three police precincts, ranked by total number of the 7 major felonies, are 75 (East New York, Brooklyn), 14 (Midtown South, Manhattan), and 43 (Sound View, Bronx). If you are surprised by Midtown making the list, it is because of an extremely high count of larcenies - 16% higher than any other precinct.

Changes in Crime Rate

NYC Changes in Crime

Precincts in blue have a drop in crime rate (good), whereas red has an increase in crime. The top three police precints, ranked by the rise in total crime rates of the 7 major felonies, are 115 (Jamaica, Brooklyn), 88 (Clinton Hill, Brooklyn), and 108 (Long Island City).


The map plots in this writeup are done using the basemap package from matplotlib. NYC provides useful shapefiles for all of its census tracts, those can be found here, and you can also pull down the crime rate data from NYC OpenData. I am happy to make available any raw data/code used, just ask.

nyc crime opendata

07 Oct 2016

Subscribe to Newsletter

One of the datasets available on the amazing NYC bike data website (available here) is information on accident rates, sorted by police precinct and accident type. They make this information available in the dreaded PDF format, and it took me a little while to get over this, however with the help of Tabula I was able to convert the PDFs into something useful.

There are four different types of crashes, defined by what the bike crashes into: car, pedestrian, themselves, and other bikes. Presumably “themselves” encompasses a miscellaneous assortment of tragic collisions, including street poles, pot holes, and the occasional spontaneous combustion.

In 2015 there were 5270 crashes reported, with collisions between cars and bikes being by far the most representative with 83.7% of all accidents. I imagine there is a reporting bias here, in that these types of collisions are more likely to get picked in a police report than other types. Car collisions are followed by self collisions (8%), pedestrian collisions (6.6%), and lastly collisions with other bikes (1.7%). It is interesting to speculate why car crashes have such a higher representation - as a cyclist in NYC I would apriori expect pedestrian crash rates to be about as high if not higher - but I do not see a clear reason in the data.

One might think car crashes are reported more is because of more damage (both in terms of property damage and personal injury). We cannot judge property damage rates from the data, however the dataset does contain injury rates. They are all pretty close - with a car, 50% of the time the bicyclist is injured, with another bike 45%, themselves 42%, and with pedestrians only 4% (although pedestrians suffer some sort of injury in those accidents at a high rate - 52% of the time). So it is unclear why bike on car crashes are so highly represented, unless it is just the reality. What is perhaps more clear is that as expected, in asymmetical exchanges, the lower mass/more vulnerable median is more likely to get hurt (between bike and car, it is 0.6% for car vs 50% for bike, and between bike and pedestria it is 4% for bike and 52% for pedestrian).

Since this data is indexed by police precinct, it is possible to plot accident rates by geographic area. Below is a plot of bike accidents with cars in 2015, with accident rates normalized by the 2010 census population in each precinct.

Bike on Car Accidents, 2015

It is a little disheartening to note that my normal commute crosses almost all of the most dangerous precincts. Oh well…


The plots in this writeup are done using the basemap package from matplotlib, and a helpful tutorial here. NYC provides the data on collisions rates here. NYC also provides all sorts of useful information by census tract, including population, here. Since the collision rates are reported by precinct and the population is by tract, you need a way to translate census tract into police precincts, for which I found a static mapping that you can pull down here.

nyc biking accidents opendata