The Drawback:
I wanted to generate a transect each 10 toes throughout each avenue in New York Metropolis, and I needed to do it in a New York Minute. The Generate Transects Alongside Traces instrument constructed into ArcGIS Professional was going to take a number of days to run, so I needed to give you one thing no less than barely sooner.
To resolve this downside, I turned to 2 of my favourite bears, Geopandas and Polars, to make use of spatial algorithms and a splash of linear algebra to make 4.2 million transects in round 60 seconds.
Why Transects? On this circumstance, I might go on to make use of these transects to separate up roadway traces into equal intervals for storm harm evaluation. It’s simpler to separate a line with one other line than it’s to separate a line with a degree, since you’ll be able to assure they are going to intersect and also you don’t have to fret about floating level imprecision. However, transects are super-useful for plenty of utility, akin to sampling river depth profiles and creating an everyday grid from a curvilinear line.
The Course of:
First, I generated factors alongside every line at 10 ft intervals utilizing Geopandas built-in interpolate operate. As soon as I had these factors, I swapped them over to Polars, the place I used to be capable of benefit from Polars’ built-in parallelism and intuitive syntax to rapidly convert these factors into (roughly) perpendicular transects. Lastly, I put the transects right into a GeoDataFrame for additional spatial enjoyable.
It might sound annoying to travel between Geopandas and Polars, however every has its personal benefits — GeoPandas has spatial operations, and Polars is depraved quick and I strongly choose the syntax. The Polars .to_pandas() and .from_pandas() strategies work so effectively that I barely discover the swap. Whereas placing a panda bear and a polar bear into the identical zoo enclosure might have disasterous penalties, on this circumstance they’re able to play collectively fairly properly.
Necessities: This code was developed in a mamba atmosphere with the next packages and has not been examined with different variations
geopandas==1.0.1
pandas==2.2.3
polars==1.18.0
Shapely==2.0.6
The info used on this article comes from NYC Open Knowledge, and is offered for obtain here. On this instance, I’m utilizing the December 24, 2024 launch of the NYC Road Centerlines.
Step 1: Generate factors alongside traces each 10 toes with Geopandas
With the intention to create transects alongside every line, we first want to search out the centerpoint of every transect. Since we would like a transect each 10 ft, we are going to seize a degree each 10 ft alongside each linestring within the dataset. We’re additionally seize an extra level barely offset from every transect level; this can enable us to findthe orientation wanted to make the transect perpendicular to the unique line. Earlier than we get to all that, we have to load the info.
import geopandas as gpdROADS_LOC = r"NYC Road CenterlineNYC_Roads.shp"
ROADS_GDF = (
gpd.read_file(ROADS_LOC)
.reset_index(names="unique_id")
.to_crs("epsg:6539")
)
Along with loading the info, we additionally assign a unique_id to every street based mostly on its index, and make sure the GeoDataFrame is in a projected coordinate system with toes as its base unit — since these street traces are in New York Metropolis, we use New York — Long Island State Plane because the coordinate reference system. The unique_id column will enable us to hyperlink our generated transects again to the unique street phase.
As soon as the info has been loaded, we have to pattern alongside each line at 10 ft intervals. We will accomplish this utilizing the GeoPandas interpolate operate, as beneath:
import pandas as pd
from typing import Tuple, Uniondef interpolate_points(
gdf: gpd.GeoDataFrame, distance: Union[float, int], is_transect: bool
) -> Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
gdf = gdf.loc[gdf.length > distance]
new_points = gdf.interpolate(distance)
new_gdf = gpd.GeoDataFrame(
geometry=new_points,
information={
"unique_id": gdf["unique_id"],
"distance": distance,
"is_transect": is_transect,
},
crs=gdf.crs,
)
return gdf, new_gdf
def get_road_points(roads: gpd.GeoDataFrame, segment_length: int)
-> gpd.GeoDataFrame:
working_roads = roads.copy()
road_points = []
for segment_distance in vary(
segment_length, int(roads.size.max()) + 1, segment_length
):
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance - 1, False
)
road_points.append(new_gdf)
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance, True
)
road_points.append(new_gdf)
return pd.concat(road_points, ignore_index=True)
The operate get_road_points() loops by way of all of the phase distances within the roads GeoDataFrame, given a phase size s. Each iteration i, we choose solely roads that are longer than s * i. We pattern a degree alongside every line at s * i distance. These factors would be the centerpoints of our new transects. Since we need to orient our new transects perpendicular to the road from which they spawned, we additionally pattern a degree at distance (s * i) -1. This offers us the approximate orientation of the road, which we are going to use afterward when setting up the transects. As this technique outputs a GeoSeries, we construct a brand new GeoDataFrame, holding the gap of the purpose alongside the road, whether or not the purpose can be used as a transect, and our beforehand created unique_id column. Lastly, we concatenate all these GeoDataframes collectively, giving us one thing that appears like this:
Lets strive it out with our roads dataset and see if it really works:
import matplotlib.pyplot as pltpoints_10ft = get_road_points(ROADS_GDF, 10)
fig, ax = plt.subplots(figsize=(10, 12))
ROADS_GDF.loc[ROADS_GDF['unique_id'] == 100].plot(ax=ax, coloration='blue')
points_10ft.loc[(points_10ft['unique_id'] == 100) &
(points_10ft['is_transect'])].plot(ax=ax, coloration='purple')
Step 2: Use Polars and rotation matrices to show these factors into transects
Now that now we have the factors we want, we will use Polar’s from_pandas() operate to transform our GeoDataFrame to Polars. Earlier than we do this, we need to save the knowledge from the geometry column — our x and y areas — and drop the geometry column.
import polars as pl def roads_to_polars(road_points: gpd.GeoDataFrame) -> pl.DataFrame:
road_points.loc[:, "x"] = road_points.geometry.x
road_points.loc[:, "y"] = road_points.geometry.y
pl_points = pl.from_pandas(road_points.drop("geometry", axis=1))
return pl_points
Lastly, now we have every part we have to assemble our transects. In essence, all now we have to do is locate the angle of every transect, assemble a transect of the requested size, and rotate and translate the transect so it’s correctly positioned in house. That will sound like a doozy, but when we break it down into steps we’re not doing something overly difficult.
- Align every transect level with the purpose previous to it. This can enable us to orient every transect perpendicular to the unique line by pivoting across the prior level. The truth is, lets name the prior level the ‘pivot level’ to make this extra clear.
- Translate the transect level/pivot level pair in order that the pivot level is at 0, 0. By doing this, we will simply rotate the transect level in order that it sits at y = 0.
- Rotate the transect level in order that it sits at y = 0. By doing this, the mathematics to assemble new factors which is able to type our transect turns into easy addition.
- Assemble new factors above and beneath the rotated transect level in order that now we have a brand new line with distance equal to line size.
- Rotate the brand new factors again to the unique orientation relative to the pivot level.
- Translate the brand new factors in order that the pivot level returns to its unique place.
Lets check out the code to do that. On my previous laptop computer this code takes about 3 seconds to generate 4.2 million transects.
def make_lines(
pl_points: pl.DataFrame, line_length: Union[float, int])
-> pl.DataFrame:
return (
pl_points.kind("unique_id", "distance")
# Pair every transect level with its previous pivot level
# over() teams by distinctive id, and shift(1) strikes
# every column down by 1 row
.with_columns(
origin_x=pl.col("x").shift(1).over(pl.col("unique_id")),
origin_y=pl.col("y").shift(1).over(pl.col("unique_id")),
)
# Take away pivot level rows
.filter(pl.col("is_transect"))
# Translate every level so the pivot is now (0, 0)
.with_columns(
translated_x=pl.col("x") - pl.col("origin_x"),
translated_y=pl.col("y") - pl.col("origin_y"),
)
# Calculate angle theta between translated transect level and x-axis
.with_columns(theta=pl.arctan2(
pl.col("translated_y"), pl.col("translated_x")
)
)
# Calculate phrases in rotation matrix
.with_columns(
theta_cos=pl.col("theta").cos(),
theta_sin=pl.col("theta").sin()
)
# Add y-coordinates above and beneath x axis so now we have a line of desired
# size. Since we all know that x = 1 (the gap between the transect
# and pivot factors), we do not have to explicitly set that time period.
.with_columns(
new_y1=-line_length / 2,
new_y2=line_length / 2,
)
# Apply rotation matrix to factors (1, new_y1) and (1, new_y2)
# and translate again to the unique location
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
# Assemble a Nicely-Identified Textual content illustration of the transects
# for simple conversion to GeoPandas
.with_columns(
wkt=pl.concat_str(
pl.lit("LINESTRING ("),
pl.col("new_x1"),
pl.col("new_y1"),
pl.lit(","),
pl.col("new_x2"),
pl.col("new_y2"),
pl.lit(")"),
separator=" ",
)
)
)
How are we really carrying out the rotation?
We rotate the transect level in regards to the origin utilizing a rotation matrix. It is a 2×2 matrix which, when utilized to a degree in a 2nd-plane, rotates the purpose θ levels across the origin (0, 0).
To make use of it, all we have to do is calculate θ, then multiply the purpose (x, y) and the rotation matrix. This offers us the formulation new_x = x*cos(θ)-y*sin(θ), new_y = x*sin(θ) + y*cos(θ). We calculate the consequence utilizing this code:
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
We’re capable of simplify the calculation, since we all know that x is all the time 1 for our factors (x, y). As a substitute of calculating x*cos(θ)-y*sin(θ) to search out the x worth of a rotated level, we will simply do cos(θ)-y*sin(θ), saving us slightly little bit of computation. After all, this received’t work in case your pivot level and transect level usually are not 1 unit aside, however for our functions it’s simply fantastic. As soon as now we have the rotation accomplished, all now we have to do is translate the newly rotated factors by the gap between the pivot level and (0,0), and growth, we’re completed. If you’re fascinated about studying extra about linear algebra and affine transformations, I extremely reccomend the 3Blue1Brown sequence Essence of Linear Algebra.
Utilizing the above capabilities, we will take our GeoDataFrame, convert it to Polars, and calculate new transects like so:
points_10ft_pl = roads_to_polars(points_10ft)
transects_10ft = make_lines(points_10ft_pl, 10)
transects_10ft.choose(['unique_id', 'wkt'])
Step 3: Convert the transects again to Geopandas
Lastly! We now have the geometric coordinates of the transects we want, however we would like them in a GeoDataFrame so we will return to the consolation of GIS operations. Since we constructed our new transects as WKT Linestrings on the finish of the make_lines() operate, the code to take action is easy:
transects_gdf = gpd.GeoDataFrame(
crs=ROADS_GDF.crs,
information={"unique_id": transects_10ft.choose("unique_id").to_series()},
geometry=gpd.GeoSeries.from_wkt(
transects_10ft.choose("wkt").to_series().to_list()
),
)
And thats it! We now have a GeoDataFrame with 4.2 million transects, every linked to their unique roadway utilizing the unique_id column.
Lets try our new geometry:
ROAD_ID = 98237
fig, ax = plt.subplots(figsize=(10, 12))
ROADS_GDF.loc[ROADS_GDF['unique_id'] == ROAD_ID].plot(ax=ax, coloration='blue')
transects_gdf.loc[transects_gdf['unique_id'] == ROAD_ID].plot(ax=ax, coloration='inexperienced')
ax.set_aspect("equal")
Transects spaced 10ft aside alongside a pedestrian pathLooks good! We’re dropping slightly little bit of precision with the WKT conversion, so there’s most likely a greater means to do this, however it works effectively sufficient for me.
Placing all of it Collectively
To your comfort, I’ve mixed the code to generate the transects beneath. This can solely work with information projected to a planar coordinate system, and if you wish to use a pivot level that’s additional than 1 unit from a transect level, you will want to change the code.
from typing import Tuple, Unionimport geopandas as gpd
import pandas as pd
import polars as pl
def interpolate_points(
gdf: gpd.GeoDataFrame, distance: Union[float, int], is_transect: bool
) -> Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
gdf = gdf.loc[gdf.length > distance]
new_points = gdf.interpolate(distance)
new_gdf = gpd.GeoDataFrame(
geometry=new_points,
information={
"unique_id": gdf["unique_id"],
"distance": distance,
"is_transect": is_transect,
},
crs=gdf.crs,
)
return gdf, new_gdf
def get_road_points(roads: gpd.GeoDataFrame, segment_length: int) -> gpd.GeoDataFrame:
working_roads = roads.copy()
road_points = []
for segment_distance in vary(
segment_length, int(roads.size.max()) + 1, segment_length
):
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance - 1, False
)
road_points.append(new_gdf)
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance, True
)
road_points.append(new_gdf)
return pd.concat(road_points, ignore_index=True)
def roads_to_polars(road_points: gpd.GeoDataFrame) -> pl.DataFrame:
road_points.loc[:, "x"] = road_points.geometry.x
road_points.loc[:, "y"] = road_points.geometry.y
pl_points = pl.from_pandas(road_points.drop("geometry", axis=1))
return pl_points
def make_lines(pl_points: pl.DataFrame, line_length: Union[float, int]) -> pl.DataFrame:
return (
pl_points.kind("unique_id", "distance")
.with_columns(
origin_x=pl.col("x").shift(1).over(pl.col("unique_id")),
origin_y=pl.col("y").shift(1).over(pl.col("unique_id")),
)
.filter(pl.col("is_transect"))
.with_columns(
translated_x=pl.col("x") - pl.col("origin_x"),
translated_y=pl.col("y") - pl.col("origin_y"),
)
.with_columns(theta=pl.arctan2(pl.col("translated_y"), pl.col("translated_x")))
.with_columns(theta_cos=pl.col("theta").cos(), theta_sin=pl.col("theta").sin())
.with_columns(
new_y1=-line_length / 2,
new_y2=line_length / 2,
)
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
.with_columns(
wkt=pl.concat_str(
pl.lit("LINESTRING ("),
pl.col("new_x1"),
pl.col("new_y1"),
pl.lit(","),
pl.col("new_x2"),
pl.col("new_y2"),
pl.lit(")"),
separator=" ",
)
)
)
ROADS_LOC = r"NYC Road CenterlineNYC_Roads.shp"
ROADS_GDF = gpd.read_file(ROADS_LOC).reset_index(names="unique_id").to_crs("epsg:6539")points_10ft = get_road_points(roads=ROADS_GDF, segment_length=10)
points_10ft_pl = roads_to_polars(road_points=points_10ft)
transects_10ft = make_lines(pl_points=points_10ft_pl, line_length=10)
transects_gdf = gpd.GeoDataFrame(
crs=ROADS_GDF.crs,
information={"unique_id": transects_10ft.choose("unique_id").to_series()},
geometry=gpd.GeoSeries.from_wkt(transects_10ft.choose("wkt").to_series().to_list()),
)
Thanks for studying! There are most likely much more methods this code might be optimized or made extra versatile, however it solved an enormous downside for me and lower down compute time from days to seconds. With slightly little bit of concept and understanding when to make use of what instrument, issues that beforehand appear insurmountable change into shockingly straightforward. It solely took me about half-hour and slightly wikipedia linear algebra referesher to write down the code for this put up, saving me tons of time and, extra importantly, defending me from having to elucidate to a shopper why a challenge was pointlessly delayed. Let me know if this was useful to you!
Except in any other case famous, all photographs are by the writer.