API Reference¶
popexposure provides simple, consistent methods for population exposure analysis. Users can access this through the PopEstimator class, which provides methods that prepare data and perform exposure estimation.
Key Methods
est_exposed_pop(): Calculate population exposure to hazards.est_total_pop(): Estimate total population in administrative areas.
Data Requirements:
-
Hazard data:
GeoJSONorGeoParquet, must contain stringID_hazardcolumn with unique hazard IDs,buffer_dist_*numeric columns, andgeometrycolumn with geometry objects. -
Admin units:
GeoJSONorGeoParquet, must contain stringID_admin_unitcolumn with unique admin IDs, andgeometrycolumn with geometry objects. -
Population raster: Any format supported by rasterio with any CRS.
See tutorials and API docs below for more detailed information on data requirements and examples.
PopEstimator(pop_data: str | Path, admin_data: str | Path | gpd.GeoDataFrame | None = None)
¶
Estimate population exposure to environmental hazards using geospatial analysis.
PopEstimator provides a complete workflow for calculating how many people live within specified buffer distances of environmental hazards (e.g., wildfires, oil wells, toxic sites) using gridded population data. The class handles data loading, geometry processing, buffering operations, and raster value extraction to produce exposure estimates.
Estimate population exposure to environmental hazards using geospatial analysis.
PopEstimator provides a complete workflow for calculating how many people live within specified buffer distances of environmental hazards (e.g., wildfires, oil wells, toxic sites) using gridded population data. The class handles data loading, geometry processing, buffering operations, and raster value extraction to produce exposure estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pop_data
|
str or Path
|
Path to the population raster file. Any raster format supported by rasterio is acceptable (e.g., GeoTIFF, NetCDF). The raster can be in any coordinate reference system. |
required |
admin_data
|
str, pathlib.Path, geopandas.GeoDataFrame, or None
|
Administrative unit boundaries for breaking down exposure estimates. Can be: - File path (str or Path) to vector data (GeoJSON, Shapefile, GeoParquet, etc.) - Preprocessed GeoDataFrame with admin boundaries - None (default) for exposure estimates without administrative breakdowns If provided as a file path or unprocessed GeoDataFrame, the data must contain: - A string column with "ID" in the name for unique admin unit identifiers - A geometry column with valid geometric objects The data will be automatically processed (cleaned, reprojected to WGS84). If provided as a preprocessed GeoDataFrame that meets all requirements, processing will be skipped for better performance. |
None
|
Key Features
- Flexible hazard data: Works with point, line, polygon, multipolygon, or geometry collection hazards.
- Multiple buffer distances: Calculate exposure at different distances simultaneously.
- Administrative breakdowns: Get exposure counts by census tracts, ZIP codes, etc.
- Hazard-specific or combined estimates: Choose individual hazard impacts or cumulative exposure (see est_exposed_pop).
- Automatic geometry processing: Handles CRS transformations, invalid geometries, and projections seamlessly.
- Partial pixel extraction: Uses area-weighted raster sampling for accurate population counts.
Workflow
- **Construct an estimator containing population and administrative data.
- Calculate exposure with :meth:
est_exposed_pop. - Get total administrative unit populations with :meth:
est_total_pop(optional).
Examples:
Basic exposure analysis without admin data:
>>> import popexposure
>>>
>>> # Initialize with only population raster (no admin data)
>>> estimator = PopEstimator(
... pop_data="data/population.tif"
... )
>>> # Estimate population exposure to hazards (e.g., wildfires)
>>> exposure = estimator.est_exposed_pop(
... hazard_data="data/wildfire_perimeters.geojson",
... hazard_specific=True
... )
>>> print(exposure.head())
ID_hazard exposed_500 exposed_1000
0 fire_001 1234 2345
1 fire_002 567 890
Exposure analysis with admin data:
>>> import popexposure
>>> # Initialize with population raster and admin boundaries
>>> estimator = PopEstimator(
... pop_data="data/population.tif",
... admin_data="data/admin_units.geojson"
... )
>>> # Estimate population exposure to hazards (e.g., wildfires)
>>> exposure = estimator.est_exposed_pop(
... hazard_data="data/wildfire_perimeters.geojson",
... hazard_specific=True
... )
>>> print(exposure.head())
ID_hazard ID_admin_unit exposed_500 exposed_1000
0 fire_001 06001 1234 2345
1 fire_002 06013 567 890
>>> # Estimate total population in each admin unit
>>> total_pop = estimator.est_total_pop()
>>> print(total_pop.head())
ID_admin_unit population
0 06001 100000
1 06013 150000
Notes
Data Requirements:
- Hazard data:
GeoJSONorGeoParquet, must contain stringID_hazardcolumn with unique hazard IDs,buffer_dist_*numeric columns, andgeometrycolumn with geometry objects. - Admin units:
GeoJSONorGeoParquet, must contain stringID_admin_unitcolumn with unique admin IDs, andgeometrycolumn with geometry objects. - Population raster: Any format supported by rasterio with any CRS.
Buffer Distance Naming:
- Column
buffer_dist_500createsbuffered_hazard_500andexposed_500 - Column
buffer_dist_maincreatesbuffered_hazard_mainandexposed_main - Distances are in meters and can vary by hazard
Coordinate Reference Systems:
- Input data can use any CRS
- Buffering uses optimal UTM projections for accuracy
- Population raster CRS is automatically handled
See Also
- est_exposed_pop : Calculate population exposure to hazards
- est_total_pop : Calculate total population in administrative units
Source code in popexposure/pop_estimator.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
est_exposed_pop(hazard_data: str | Path | gpd.GeoDataFrame, hazard_specific: bool = True, stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame
¶
Estimate the number of people living within a buffer distance of environmental hazard(s) using a gridded population raster.
This method calculates the population exposed to hazards by summing
raster values within buffered hazard geometries, or within the
intersection of these buffers and administrative geographies (if
provided to the class). Users can choose between hazard-specific
counts (population exposed to each individual hazard) or cumulative
counts (population exposed to any hazard, without double counting).
Exposure can be estimated for multiple buffer distances simultaneously,
as specified by the buffered hazard columns in the input data. If
the admin_data attribute is not None, results
are further broken down by these geographies (e.g., census tracts or
ZIP codes). At least one buffered hazard column must be present in the
hazard data; additional columns allow for exposure estimates at multiple
distances.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hazard_specific
|
bool
|
If True, exposure is calculated for each hazard individually (hazard-specific estimates). If False, geometries are combined before exposure is calculated, producing a single cumulative estimate. |
True
|
hazards
|
GeoDataFrame
|
A GeoDataFrame with a coordinate reference system containing a
string column called |
required |
stat
|
str
|
Statistic to calculate from raster values. Options: - "sum": Total population within geometry (default) - "mean": Average raster value/population value within geometry |
"sum"
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame with the following columns:
The number of rows in the output DataFrame depends on the method arguments:
|
Notes
There are four ways to use this method:
-
Hazard-specific exposure, no additional administrative geographies (
hazard_specific=True, admin_units=None): Calculates the exposed population for each buffered hazard geometry. Returns a DataFrame with one row per hazard and oneexposedcolumn per buffered hazard column. If people lived within the buffer distance of more than one hazard, they are included in the exposure counts for each hazard they are near. -
Combined hazards, no additional administrative geographies (
hazard_specific=False, admin_units=None): All buffered hazard geometries in each buffered hazard column are merged into a single geometry, and the method calculates the total exposed population for the union of those buffered hazards. Returns a DataFrame with a single row and oneexposedcolumn for each buffered hazard column. If people were close to more than one hazard in the hazard set, they are counted once. -
Hazard-specific exposure within admin units (
hazard_specific=True, admin_unitsprovided): Calculates the exposed population for each intersection of each buffered hazard geometry and each admin unit. Returns a DataFrame with one row per buffered hazard-admin unit pair and oneexposedcolumn per buffered hazard column. If people lived within the buffer distance of more than one hazard, they are included in the exposure counts for their admin unit-hazard combination for each hazard they are near. -
Combined hazards within admin units (
hazard_specific=False, admin_unitsprovided): All buffered hazard geometries in the same column are merged into a single geometry. Calculates the exposed population for the intersection of each buffered hazard combined geometry with each admin unit. Returns a DataFrame with one row per admin unit and oneexposedcolumn per buffered hazard column. If people were close to more than one hazard in the hazard set, they are counted once.
Source code in popexposure/pop_estimator.py
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 | |
est_total_pop(stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame
¶
Estimate the total population residing within administrative geographies using a gridded population raster.
This method estimates the total population residing within administrative
geographies (e.g., ZCTAs, census tracts) encoded in the admin_data attribute,
according to the gridded population raster encoded in the pop_data attribute.
This method is meant to be used with the same population
raster as est_exposed_pop to provide denominators for the total population
in each administrative geography, allowing the user to compute the
percentage of people exposed to hazards in each admin unit. est_total_pop
calculates the sum of raster values within the boundaries of each
administrative geography geometry provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stat
|
str
|
Statistic to calculate from raster values. Options: - "sum": Total population within geometry (default) - "mean": Average raster value within geometry |
"sum"
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with an |