How to Perform a Spatial Join in Python (GeoPandas)
How to perform a spatial join in Python with GeoPandas sjoin(), with examples for point-in-polygon and other predicates.
Problem statement
A spatial join lets you transfer attributes from one spatial layer to another based on location instead of a shared ID field.
Typical GIS tasks include:
- assigning district names to customer points
- joining addresses to census tracts
- attaching flood zone attributes to parcel polygons
- matching building footprints to administrative boundaries
In GeoPandas, this means taking two vector layers and joining them using a spatial relationship such as intersects, within, or contains.
This page focuses only on vector spatial joins with GeoPandas.
Quick answer
Use geopandas.sjoin() with two GeoDataFrames that use the same CRS, then choose the correct spatial predicate for your workflow.
import geopandas as gpd
points = gpd.read_file("data/customer_points.geojson")
polygons = gpd.read_file("data/city_districts.gpkg")
if points.crs != polygons.crs:
polygons = polygons.to_crs(points.crs)
joined = gpd.sjoin(points, polygons, how="left", predicate="within")
print(joined.head())
This keeps all points and adds polygon attributes where each point falls within a polygon.
Step-by-step solution
Load the two spatial layers
Read both datasets into GeoDataFrames.
import geopandas as gpd
points = gpd.read_file("data/customer_points.geojson")
districts = gpd.read_file("data/city_districts.shp")
print(type(points))
print(type(districts))
Both objects should be geopandas.geodataframe.GeoDataFrame.
GeoPandas can read shapefiles, GeoJSON, and GeoPackage files with read_file().
Check geometry types
Geometry type affects which spatial predicate makes sense.
print(points.geom_type.value_counts())
print(districts.geom_type.value_counts())
Typical join patterns include:
- points to polygons
- polygons to polygons
- lines to polygons
If you want to assign polygon attributes to points, within is usually the clearest choice.
Make sure both layers use the same CRS
Spatial joins require both layers to use the same coordinate reference system.
print(points.crs)
print(districts.crs)
If they do not match, reproject one layer before joining:
districts = districts.to_crs(points.crs)
If you skip this step, the join may return no matches or incorrect matches.
Choose the right spatial predicate
Common predicates include:
intersects: features touch or overlap in any waywithin: left geometry is completely inside right geometrycontains: left geometry completely contains right geometry
Common choices:
- points inside polygons: usually
within - overlapping polygons: usually
intersects - lines crossing polygons: usually
intersects
If you want to inspect which predicates are available in your environment, check the spatial index first:
if points.has_sindex:
print(points.sindex.valid_query_predicates)
This depends on your installed spatial index backend, so treat it as a useful check rather than a required step.
Run the spatial join
Use gpd.sjoin() to join the left layer to the right layer based on geometry.
joined = gpd.sjoin(
points,
districts,
how="left",
predicate="within"
)
Join types:
left: keep all rows from the left GeoDataFrameinner: keep only matched rowsright: available in some workflows, but verify behavior in your GeoPandas version before relying on it
In most point-in-polygon workflows, the left layer is the layer you want to preserve.
Review the output columns
The result includes:
- original columns from the left layer
- columns from the right layer, unless you subset the right GeoDataFrame before joining
- an
index_rightcolumn showing the matched feature index from the right layer
print(joined.columns)
print(joined.head())
If both layers contain the same column name, GeoPandas adds suffixes to distinguish them.
Save the joined result
Export the result to a new file.
joined.to_file("output/customer_points_with_districts.gpkg", driver="GPKG")
Other formats:
joined.to_file("output/customer_points_with_districts.shp")
joined.to_file("output/customer_points_with_districts.geojson", driver="GeoJSON")
Shapefiles have stricter limits on field names and data types than GeoPackage.
Code examples
Example 1: Join points to polygons
Assign each customer point to a city district.
import geopandas as gpd
customers = gpd.read_file("data/customer_locations.geojson")
districts = gpd.read_file("data/city_districts.gpkg")
if customers.crs != districts.crs:
districts = districts.to_crs(customers.crs)
result = gpd.sjoin(
customers,
districts[["district_name", "geometry"]],
how="left",
predicate="within"
)
print(result[["customer_id", "district_name"]].head())
Example 2: Keep all points even when no match is found
Use a left join so unmatched points remain in the output.
result = gpd.sjoin(
customers,
districts[["district_name", "geometry"]],
how="left",
predicate="within"
)
unmatched = result[result["district_name"].isna()]
print(f"Unmatched points: {len(unmatched)}")
Unmatched features will have NaN in the joined columns.
Example 3: Join overlapping polygons
Join flood zones to parcel polygons where they intersect.
import geopandas as gpd
parcels = gpd.read_file("data/parcels.shp")
flood_zones = gpd.read_file("data/flood_zones.shp")
if parcels.crs != flood_zones.crs:
flood_zones = flood_zones.to_crs(parcels.crs)
parcel_flood = gpd.sjoin(
parcels,
flood_zones[["zone_type", "geometry"]],
how="inner",
predicate="intersects"
)
print(parcel_flood[["parcel_id", "zone_type"]].head())
One parcel can produce multiple rows if it intersects more than one flood polygon.
Example 4: Reproject before joining
Fix a CRS mismatch before running the join.
import geopandas as gpd
sites = gpd.read_file("data/sites.geojson")
boundaries = gpd.read_file("data/admin_boundaries.shp")
print(sites.crs)
print(boundaries.crs)
if sites.crs != boundaries.crs:
boundaries = boundaries.to_crs(sites.crs)
joined = gpd.sjoin(
sites,
boundaries[["admin_name", "geometry"]],
how="left",
predicate="within"
)
print(joined.head())
Explanation
A spatial join is different from a normal attribute join.
- An attribute join uses a shared key such as
parcel_id - A spatial join uses geometry relationships such as overlap, containment, or intersection
In GeoPandas, sjoin() compares the geometry in the left GeoDataFrame with the geometry in the right GeoDataFrame using the predicate you choose.
Examples include:
- point within polygon
- polygon intersects polygon
- line intersects polygon
GeoPandas uses spatial indexing when available, which reduces the number of geometry comparisons and helps performance on larger datasets.
A spatial join can also increase row counts. One feature may match multiple features, especially when:
- polygons overlap
- a line crosses several polygons
- boundary features match more than one geometry
- source administrative boundaries or zone layers are not cleanly non-overlapping
That is why the output may contain more rows than the left input layer.
Edge cases and notes
Duplicate matches
One feature can match more than one feature.
This often happens with:
- overlapping polygon layers
- polygon-to-polygon joins
- line-to-polygon joins
- boundary cases depending on the predicate
If you need exactly one match per feature, you will need additional cleanup logic after the join.
CRS mismatches
Always check CRS before every spatial join.
print(left_gdf.crs)
print(right_gdf.crs)
If needed:
right_gdf = right_gdf.to_crs(left_gdf.crs)
Missing or invalid geometries
Null or invalid geometries can cause missing matches or errors.
left_gdf = left_gdf[left_gdf.geometry.notna()]
right_gdf = right_gdf[right_gdf.geometry.notna()]
right_gdf = right_gdf[right_gdf.is_valid]
In some workflows, you may need to repair invalid geometry before joining.
Boundary behavior
Points or lines on polygon boundaries can produce unexpected results.
withinrequires the left geometry to be in the interior of the right geometrycontainsis also strict and does not treat boundary-only contact as containedintersectsis more inclusive and matches touching geometries
If features on edges are not matching as expected, test intersects and compare the output.
Performance on large datasets
Large joins can be slow.
To reduce overhead:
- keep only the columns you need before joining
- make sure a spatial index is available
- filter either dataset first if you can reduce the search area
- avoid unnecessary reprojection or repeated file reads inside loops
Internal links
If you need the underlying concept first, read What Is a Spatial Join in GIS?.
Related tasks:
If your join returns empty or unexpected output, see Why Is My GeoPandas Spatial Join Returning No Results?.
FAQ
How do I do a spatial join in GeoPandas?
Use gpd.sjoin(left_gdf, right_gdf, how=..., predicate=...). Both inputs must be GeoDataFrames and both must use the same CRS.
joined = gpd.sjoin(points, polygons, how="left", predicate="within")
What is the difference between intersects, within, and contains in a spatial join?
intersects: geometries touch or overlap in any waywithin: left geometry is inside the right geometrycontains: left geometry contains the right geometry
For joining points to polygons, within is usually the best starting point.
Why does my spatial join return empty results?
Common causes include:
- CRS mismatch
- wrong predicate
- invalid or null geometries
- empty input layers
Check CRS first, then inspect geometry validity and test a different predicate.
Why do I get multiple rows for one feature after a spatial join?
Because one feature can match multiple features in the other layer. This is common with overlapping polygons and polygon intersection joins.