New York Metropolis’s open data platform is an unbelievable supply of knowledge. All public knowledge collected and generated by the town is mandated by law to be made obtainable by means of the portal, in addition to being free to be used by the general public.
Datasets vary from transport, housing, and motorcar incidents, to a Central Park squirrel census, and even park ranger experiences of aggressive turtle encounters.
Geography, infrastructure, and sociology datasets like these symbolize real-world processes and occasions. Even if in case you have no connection to or little curiosity in NYC or city areas typically, they provide you an opportunity to work with knowledge that appears much more like what you’ll encounter in knowledgeable position than the likes of MNIST or Titanic survivors. Higher nonetheless, they’re nearly as simple to entry.
We’re going to run by means of an illustration of simply how simple these datasets are to make use of and construct some attention-grabbing visuals within the course of.
To maintain the code blocks as succinct as doable, listed here are the required modules for all of the code on this publish:
import folium
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.categorical as px
import plotly.graph_objects as go
import requests
from scipy.stats import gaussian_kde
import seaborn as sns
from shapely.geometry import Level, form, field, Polygon
Make certain they’re put in if you wish to replicate something your self.
That is one in every of my favourite datasets to mess around with. The information contains footprint polygons, ages, and heights for many of the buildings in NYC.
We’ll begin with knowledge pull separate from the visualization code since we’re utilizing this dataset for a few totally different visuals.
# Pull knowledge
api_endpoint = 'https://knowledge.cityofnewyork.us/useful resource/qb5r-6dgf.json'
restrict = 1000 # Variety of rows per request
offset = 0 # Beginning offset
data_frames = [] # Record to carry chunks of knowledge# Loop to fetch knowledge iteratively
# whereas offset <= 100000: # uncomment this and remark whereas True…