API Reference¶

popexposure provides simple, consistent methods for population exposure analysis. Users can access this through the PopEstimator class, which provides methods that prepare data and perform exposure estimation.

Key Methods

est_exposed_pop(): Calculate population exposure to hazards.
est_total_pop(): Estimate total population in administrative areas.

Data Requirements:

Hazard data: GeoJSON or GeoParquet, must contain string ID_hazard column with unique hazard IDs, buffer_dist_* numeric columns, and geometry column with geometry objects.
Admin units: GeoJSON or GeoParquet, must contain string ID_admin_unit column with unique admin IDs, and geometry column with geometry objects.
Population raster: Any format supported by rasterio with any CRS.

See tutorials and API docs below for more detailed information on data requirements and examples.

`PopEstimator(pop_data: str | Path, admin_data: str | Path | gpd.GeoDataFrame | None = None)` ¶

Estimate population exposure to environmental hazards using geospatial analysis.

PopEstimator provides a complete workflow for calculating how many people live within specified buffer distances of environmental hazards (e.g., wildfires, oil wells, toxic sites) using gridded population data. The class handles data loading, geometry processing, buffering operations, and raster value extraction to produce exposure estimates.

Estimate population exposure to environmental hazards using geospatial analysis.

PopEstimator provides a complete workflow for calculating how many people live within specified buffer distances of environmental hazards (e.g., wildfires, oil wells, toxic sites) using gridded population data. The class handles data loading, geometry processing, buffering operations, and raster value extraction to produce exposure estimates.

Parameters:

Name	Type	Description	Default
`pop_data`	`str or Path`	Path to the population raster file. Any raster format supported by rasterio is acceptable (e.g., GeoTIFF, NetCDF). The raster can be in any coordinate reference system.	required
`admin_data`	`str, pathlib.Path, geopandas.GeoDataFrame, or None`	Administrative unit boundaries for breaking down exposure estimates. Can be: - File path (str or Path) to vector data (GeoJSON, Shapefile, GeoParquet, etc.) - Preprocessed GeoDataFrame with admin boundaries - None (default) for exposure estimates without administrative breakdowns If provided as a file path or unprocessed GeoDataFrame, the data must contain: - A string column with "ID" in the name for unique admin unit identifiers - A geometry column with valid geometric objects The data will be automatically processed (cleaned, reprojected to WGS84). If provided as a preprocessed GeoDataFrame that meets all requirements, processing will be skipped for better performance.	`None`

Key Features

Flexible hazard data: Works with point, line, polygon, multipolygon, or geometry collection hazards.
Multiple buffer distances: Calculate exposure at different distances simultaneously.
Administrative breakdowns: Get exposure counts by census tracts, ZIP codes, etc.
Hazard-specific or combined estimates: Choose individual hazard impacts or cumulative exposure (see est_exposed_pop).
Automatic geometry processing: Handles CRS transformations, invalid geometries, and projections seamlessly.
Partial pixel extraction: Uses area-weighted raster sampling for accurate population counts.

Workflow

**Construct an estimator containing population and administrative data.
Calculate exposure with :meth:est_exposed_pop.
Get total administrative unit populations with :meth:est_total_pop (optional).

Examples:

Basic exposure analysis without admin data:

>>> import popexposure
>>>
>>> # Initialize with only population raster (no admin data)
>>> estimator = PopEstimator(
...     pop_data="data/population.tif"
... )
>>> # Estimate population exposure to hazards (e.g., wildfires)
>>> exposure = estimator.est_exposed_pop(
...     hazard_data="data/wildfire_perimeters.geojson",
...     hazard_specific=True
... )
>>> print(exposure.head())
ID_hazard  exposed_500  exposed_1000
0   fire_001        1234          2345
1   fire_002         567           890

Exposure analysis with admin data:

>>> import popexposure
>>> # Initialize with population raster and admin boundaries
>>> estimator = PopEstimator(
...     pop_data="data/population.tif",
...     admin_data="data/admin_units.geojson"
... )
>>> # Estimate population exposure to hazards (e.g., wildfires)
>>> exposure = estimator.est_exposed_pop(
...     hazard_data="data/wildfire_perimeters.geojson",
...     hazard_specific=True
... )
>>> print(exposure.head())
ID_hazard  ID_admin_unit  exposed_500  exposed_1000
0   fire_001         06001        1234          2345
1   fire_002         06013         567           890

>>> # Estimate total population in each admin unit
>>> total_pop = estimator.est_total_pop()
>>> print(total_pop.head())
ID_admin_unit  population
0         06001      100000
1         06013      150000

Notes

Data Requirements:

Hazard data: GeoJSON or GeoParquet, must contain string ID_hazard column with unique hazard IDs, buffer_dist_* numeric columns, and geometry column with geometry objects.
Admin units: GeoJSON or GeoParquet, must contain string ID_admin_unit column with unique admin IDs, and geometry column with geometry objects.
Population raster: Any format supported by rasterio with any CRS.

Buffer Distance Naming:

Column buffer_dist_500 creates buffered_hazard_500 and exposed_500
Column buffer_dist_main creates buffered_hazard_main and exposed_main
Distances are in meters and can vary by hazard

Coordinate Reference Systems:

Input data can use any CRS
Buffering uses optimal UTM projections for accuracy
Population raster CRS is automatically handled

See Also

est_exposed_pop : Calculate population exposure to hazards
est_total_pop : Calculate total population in administrative units

Source code in popexposure/pop_estimator.py

def __init__(
    self,
    pop_data: str | Path,
    admin_data: str | Path | gpd.GeoDataFrame | None = None,
):
    """
    Estimate population exposure to environmental hazards using geospatial analysis.

    PopEstimator provides a complete workflow for calculating how many people live
    within specified buffer distances of environmental hazards (e.g., wildfires,
    oil wells, toxic sites) using gridded population data. The class handles data
    loading, geometry processing, buffering operations, and raster value extraction
    to produce exposure estimates.

    Parameters
    ----------
    pop_data : str or pathlib.Path
        Path to the population raster file. Any raster format supported by
        rasterio is acceptable (e.g., GeoTIFF, NetCDF). The raster can be in
        any coordinate reference system.
    admin_data : str, pathlib.Path, geopandas.GeoDataFrame, or None, optional
        Administrative unit boundaries for breaking down exposure estimates.
        Can be:
        - File path (str or Path) to vector data (GeoJSON, Shapefile, GeoParquet, etc.)
        - Preprocessed GeoDataFrame with admin boundaries
        - None (default) for exposure estimates without administrative breakdowns

        If provided as a file path or unprocessed GeoDataFrame, the data must contain:
        - A string column with "ID" in the name for unique admin unit identifiers
        - A geometry column with valid geometric objects

        The data will be automatically processed (cleaned, reprojected to WGS84).
        If provided as a preprocessed GeoDataFrame that meets all requirements,
        processing will be skipped for better performance.

    Key Features
    ------------
    - **Flexible hazard data**: Works with point, line, polygon, multipolygon, or geometry collection hazards.
    - **Multiple buffer distances**: Calculate exposure at different distances simultaneously.
    - **Administrative breakdowns**: Get exposure counts by census tracts, ZIP codes, etc.
    - **Hazard-specific or combined estimates**: Choose individual hazard impacts or cumulative exposure (see est_exposed_pop).
    - **Automatic geometry processing**: Handles CRS transformations, invalid geometries, and projections seamlessly.
    - **Partial pixel extraction**: Uses area-weighted raster sampling for accurate population counts.

    Workflow
    --------
    1. **Construct an estimator containing population and administrative data.
    2. **Calculate exposure** with :meth:`est_exposed_pop`.
    3. **Get total administrative unit populations** with :meth:`est_total_pop` (optional).

    Examples
    --------
    Basic exposure analysis without admin data:

    >>> import popexposure
    >>>
    >>> # Initialize with only population raster (no admin data)
    >>> estimator = PopEstimator(
    ...     pop_data="data/population.tif"
    ... )
    >>> # Estimate population exposure to hazards (e.g., wildfires)
    >>> exposure = estimator.est_exposed_pop(
    ...     hazard_data="data/wildfire_perimeters.geojson",
    ...     hazard_specific=True
    ... )
    >>> print(exposure.head())
    ID_hazard  exposed_500  exposed_1000
    0   fire_001        1234          2345
    1   fire_002         567           890


    Exposure analysis with admin data:

    >>> import popexposure
    >>> # Initialize with population raster and admin boundaries
    >>> estimator = PopEstimator(
    ...     pop_data="data/population.tif",
    ...     admin_data="data/admin_units.geojson"
    ... )
    >>> # Estimate population exposure to hazards (e.g., wildfires)
    >>> exposure = estimator.est_exposed_pop(
    ...     hazard_data="data/wildfire_perimeters.geojson",
    ...     hazard_specific=True
    ... )
    >>> print(exposure.head())
    ID_hazard  ID_admin_unit  exposed_500  exposed_1000
    0   fire_001         06001        1234          2345
    1   fire_002         06013         567           890

    >>> # Estimate total population in each admin unit
    >>> total_pop = estimator.est_total_pop()
    >>> print(total_pop.head())
    ID_admin_unit  population
    0         06001      100000
    1         06013      150000

    Notes
    -----
    **Data Requirements:**

    - **Hazard data**: `GeoJSON` or `GeoParquet`, must contain string ``ID_hazard`` column with unique hazard IDs, ``buffer_dist_*`` numeric columns, and ``geometry`` column with geometry objects.
    - **Admin units**: `GeoJSON` or `GeoParquet`, must contain string ``ID_admin_unit`` column with unique admin IDs, and ``geometry`` column with geometry objects.
    - **Population raster**: Any format supported by rasterio with any CRS.

    **Buffer Distance Naming:**

    - Column ``buffer_dist_500`` creates ``buffered_hazard_500`` and ``exposed_500``
    - Column ``buffer_dist_main`` creates ``buffered_hazard_main`` and ``exposed_main``
    - Distances are in meters and can vary by hazard

    **Coordinate Reference Systems:**

    - Input data can use any CRS
    - Buffering uses optimal UTM projections for accuracy
    - Population raster CRS is automatically handled

    See Also
    --------
    - est_exposed_pop : Calculate population exposure to hazards
    - est_total_pop : Calculate total population in administrative units
    """
    self.pop_data = pop_data
    if admin_data is not None:
        if isinstance(admin_data, gpd.GeoDataFrame) and self._is_admin_data_prepped(
            admin_data
        ):
            self.admin_data = admin_data.copy()
        else:
            self.admin_data = self._process_admin_data(admin_data)
    else:
        self.admin_data = None

`est_exposed_pop(hazard_data: str | Path | gpd.GeoDataFrame, hazard_specific: bool = True, stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame` ¶

Estimate the number of people living within a buffer distance of environmental hazard(s) using a gridded population raster.

This method calculates the population exposed to hazards by summing raster values within buffered hazard geometries, or within the intersection of these buffers and administrative geographies (if provided to the class). Users can choose between hazard-specific counts (population exposed to each individual hazard) or cumulative counts (population exposed to any hazard, without double counting). Exposure can be estimated for multiple buffer distances simultaneously, as specified by the buffered hazard columns in the input data. If the admin_data attribute is not None, results are further broken down by these geographies (e.g., census tracts or ZIP codes). At least one buffered hazard column must be present in the hazard data; additional columns allow for exposure estimates at multiple distances.

Parameters:

Name	Type	Description	Default
`hazard_specific`	`bool`	If True, exposure is calculated for each hazard individually (hazard-specific estimates). If False, geometries are combined before exposure is calculated, producing a single cumulative estimate.	`True`
`hazards`	`GeoDataFrame`	A GeoDataFrame with a coordinate reference system containing a string column called `ID_hazard` with unique hazard IDs, and one or more geometry columns starting with `buffered_hazard` containing buffered hazard geometries. `buffered_hazard` columns must each have a unique suffix (e.g., `buffered_hazard_10`, `buffered_hazard_100`, `buffered_hazard_1000`).	required
`stat`	`str`	Statistic to calculate from raster values. Options: - "sum": Total population within geometry (default) - "mean": Average raster value/population value within geometry	`"sum"`

Returns:

Type Description

DataFrame

A DataFrame with the following columns:

ID_hazard: Always included.
ID_admin_unit: Included only if admin units were provided.
One or more exposed columns: Each corresponds to a buffered hazard column (e.g., if the input had columns buffered_hazard_10, buffered_hazard_100, and buffered_hazard_1000, the output will have exposed_10, exposed_100, and exposed_1000). Each exposed column contains the statistic (sum or mean) of raster values (population) within the relevant buffered hazard geometry or buffered hazard geometry and admin unit intersection.

The number of rows in the output DataFrame depends on the method arguments:

If hazard_specific is True, the DataFrame contains one row per hazard or per hazard-admin unit pair, if admin units are provided.
If hazard_specific is False, the DataFrame contains a single row or one row per admin unit, if admin units are provided, with each exposed column representing the total population in the union of all buffered hazard geometries in that buffered hazard column.

Notes

There are four ways to use this method:

Hazard-specific exposure, no additional administrative geographies (hazard_specific=True, admin_units=None): Calculates the exposed population for each buffered hazard geometry. Returns a DataFrame with one row per hazard and one exposed column per buffered hazard column. If people lived within the buffer distance of more than one hazard, they are included in the exposure counts for each hazard they are near.
Combined hazards, no additional administrative geographies (hazard_specific=False, admin_units=None): All buffered hazard geometries in each buffered hazard column are merged into a single geometry, and the method calculates the total exposed population for the union of those buffered hazards. Returns a DataFrame with a single row and one exposed column for each buffered hazard column. If people were close to more than one hazard in the hazard set, they are counted once.
Hazard-specific exposure within admin units (hazard_specific=True, admin_units provided): Calculates the exposed population for each intersection of each buffered hazard geometry and each admin unit. Returns a DataFrame with one row per buffered hazard-admin unit pair and one exposed column per buffered hazard column. If people lived within the buffer distance of more than one hazard, they are included in the exposure counts for their admin unit-hazard combination for each hazard they are near.
Combined hazards within admin units (hazard_specific=False, admin_units provided): All buffered hazard geometries in the same column are merged into a single geometry. Calculates the exposed population for the intersection of each buffered hazard combined geometry with each admin unit. Returns a DataFrame with one row per admin unit and one exposed column per buffered hazard column. If people were close to more than one hazard in the hazard set, they are counted once.

Source code in popexposure/pop_estimator.py

def est_exposed_pop(
    self,
    hazard_data: str | Path | gpd.GeoDataFrame,
    hazard_specific: bool = True,
    stat: Literal["sum", "mean"] = "sum",
) -> pd.DataFrame:
    """
    Estimate the number of people living within a buffer distance of
    environmental hazard(s) using a gridded population raster.

    This method calculates the population exposed to hazards by summing
    raster values within buffered hazard geometries, or within the
    intersection of these buffers and administrative geographies (if
    provided to the class). Users can choose between hazard-specific
    counts (population exposed to each individual hazard) or cumulative
    counts (population exposed to any hazard, without double counting).
    Exposure can be estimated for multiple buffer distances simultaneously,
    as specified by the buffered hazard columns in the input data. If
    the ``admin_data`` attribute is not None, results
    are further broken down by these geographies (e.g., census tracts or
    ZIP codes). At least one buffered hazard column must be present in the
    hazard data; additional columns allow for exposure estimates at multiple
    distances.

    Parameters
    ----------
    hazard_specific : bool
        If True, exposure is calculated for each hazard individually
        (hazard-specific estimates). If False, geometries are combined before
        exposure is calculated, producing a single cumulative estimate.
    hazards : geopandas.GeoDataFrame
        A GeoDataFrame with a coordinate reference system containing a
        string column called ``ID_hazard`` with unique hazard IDs, and one
        or more geometry columns starting with ``buffered_hazard``
        containing buffered hazard geometries. ``buffered_hazard`` columns
        must each have a unique suffix (e.g., ``buffered_hazard_10``,
        ``buffered_hazard_100``, ``buffered_hazard_1000``).
    stat : str, default "sum"
        Statistic to calculate from raster values. Options:
        - "sum": Total population within geometry (default)
        - "mean": Average raster value/population value within geometry

    Returns
    -------
    pandas.DataFrame
        A DataFrame with the following columns:

        - `ID_hazard`: Always included.
        - `ID_admin_unit`: Included only if admin units were provided.
        - One or more `exposed` columns: Each corresponds to a buffered
          hazard column (e.g., if the input had columns `buffered_hazard_10`,
          `buffered_hazard_100`, and `buffered_hazard_1000`, the output
          will have `exposed_10`, `exposed_100`, and `exposed_1000`).
          Each `exposed` column contains the statistic (sum or mean) of raster
          values (population) within the relevant buffered hazard geometry or
          buffered hazard geometry and admin unit intersection.

        The number of rows in the output DataFrame depends on the method
        arguments:

        - If `hazard_specific` is True, the DataFrame contains one row per
          hazard or per hazard-admin unit pair, if admin units are provided.
        - If `hazard_specific` is False, the DataFrame contains a single
          row or one row per admin unit, if admin units are provided, with
          each `exposed` column representing the total population in the
          union of all buffered hazard geometries in that buffered hazard column.

    Notes
    -----
    There are four ways to use this method:

    1. Hazard-specific exposure, no additional administrative geographies
    (``hazard_specific=True, admin_units=None``):
       Calculates the exposed population for each buffered hazard geometry.
       Returns a DataFrame with one row per hazard and one ``exposed``
       column per buffered hazard column. If people lived within the buffer
       distance of more than one hazard, they are included in the exposure
       counts for each hazard they are near.

    2. Combined hazards, no additional administrative geographies
    (``hazard_specific=False, admin_units=None``):
       All buffered hazard geometries in each buffered hazard column are
       merged into a single geometry, and the method calculates the total
       exposed population for the union of those buffered hazards. Returns a
       DataFrame with a single row and one ``exposed`` column for each
       buffered hazard column. If people were close to more than one hazard
       in the hazard set, they are counted once.

    3. Hazard-specific exposure within admin units
    (``hazard_specific=True, admin_units`` provided):
       Calculates the exposed population for each intersection of each
       buffered hazard geometry and each admin unit. Returns a DataFrame
       with one row per buffered hazard-admin unit pair and one ``exposed``
       column per buffered hazard column. If people lived within the buffer
       distance of more than one hazard, they are included in the exposure
       counts for their admin unit-hazard combination for each hazard
       they are near.

    4. Combined hazards within admin units
    (``hazard_specific=False, admin_units`` provided):
       All buffered hazard geometries in the same column are merged into a
       single geometry. Calculates the exposed population for the
       intersection of each buffered hazard combined geometry with each admin
       unit. Returns a DataFrame with one row per admin unit and one
       ``exposed`` column per buffered hazard column. If people were close
       to more than one hazard in the hazard set, they are counted once.
    """
    if isinstance(hazard_data, gpd.GeoDataFrame) and self._is_hazard_data_prepped(
        hazard_data
    ):
        hazard_data = hazard_data.copy()
    elif hazard_data is None:
        return None
    else:
        hazard_data = self._process_hazard_data(hazard_data)

    if not hazard_specific:
        hazard_data = go.combine_geometries_by_column(hazard_data)

    if self.admin_data is not None:
        hazard_data = go.get_geometry_intersections(
            hazards_gdf=hazard_data, admin_units_gdf=self.admin_data
        )

    exposed = mask_raster_partial_pixel(
        hazard_data, raster_path=self.pop_data, stat=stat
    )

    return exposed

`est_total_pop(stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame` ¶

Estimate the total population residing within administrative geographies using a gridded population raster.

This method estimates the total population residing within administrative geographies (e.g., ZCTAs, census tracts) encoded in the admin_data attribute, according to the gridded population raster encoded in the pop_data attribute. This method is meant to be used with the same population raster as est_exposed_pop to provide denominators for the total population in each administrative geography, allowing the user to compute the percentage of people exposed to hazards in each admin unit. est_total_pop calculates the sum of raster values within the boundaries of each administrative geography geometry provided.

Parameters:

Name	Type	Description	Default
`stat`	`str`	Statistic to calculate from raster values. Options: - "sum": Total population within geometry (default) - "mean": Average raster value within geometry	`"sum"`

Returns:

Type	Description
`DataFrame`	DataFrame with an `ID_admin_unit` column matching the input and a `population` column, where each value is the specified statistic (sum or mean) of raster values within the corresponding admin unit geometry.

Source code in popexposure/pop_estimator.py

def est_total_pop(self, stat: Literal["sum", "mean"] = "sum") -> pd.DataFrame:
    """
    Estimate the total population residing within administrative geographies
    using a gridded population raster.

    This method estimates the total population residing within administrative
    geographies (e.g., ZCTAs, census tracts) encoded in the ``admin_data`` attribute,
    according to the gridded population raster encoded in the ``pop_data`` attribute.
    This method is meant to be used with the same population
    raster as ``est_exposed_pop`` to provide denominators for the total population
    in each administrative geography, allowing the user to compute the
    percentage of people exposed to hazards in each admin unit. ``est_total_pop``
    calculates the sum of raster values within the boundaries of each
    administrative geography geometry provided.

    Parameters
    ----------
    stat : str, default "sum"
        Statistic to calculate from raster values. Options:
        - "sum": Total population within geometry (default)
        - "mean": Average raster value within geometry

    Returns
    -------
    pandas.DataFrame
        DataFrame with an ``ID_admin_unit`` column matching the input and a
        ``population`` column, where each value is the specified statistic
        (sum or mean) of raster values within the corresponding admin unit geometry.
    """
    if self.admin_data is None:
        raise TypeError(
            "The 'admin_data' attribute cannot be None when calling est_total_pop."
        )

    residing = mask_raster_partial_pixel(
        self.admin_data, raster_path=self.pop_data, stat=stat
    )
    residing = residing.rename(
        columns=lambda c: c.replace("exposedgeometry", "population")
    )
    return residing

API Reference¶

PopEstimator(pop_data: str | Path, admin_data: str | Path | gpd.GeoDataFrame | None = None) ¶

est_exposed_pop(hazard_data: str | Path | gpd.GeoDataFrame, hazard_specific: bool = True, stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame ¶

est_total_pop(stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame ¶

`PopEstimator(pop_data: str | Path, admin_data: str | Path | gpd.GeoDataFrame | None = None)` ¶

`est_exposed_pop(hazard_data: str | Path | gpd.GeoDataFrame, hazard_specific: bool = True, stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame` ¶

`est_total_pop(stat: Literal['sum', 'mean'] = 'sum') -> pd.DataFrame` ¶