Inverse Distance Weighting Interpolation in Python

Pareekshith Katti
6 min readAug 18, 2023

Geospatial interpolation is a process used to estimate values for unknown points in a geographical area using known values.

Inverse Distance Weighting, or IDW for short, is one of the most popular methods used for geospatial interpolation. This article will teach us how to do IDW interpolation in Python.

The IDW interpolation method assumes that the closer values are more related than the farther ones. IDW estimates the value at an unknown point by using the known values weighted by their distance from the point whose value is to be estimated.

We can also use another variable called “power” in IDW to control the influence of known values. Higher power will increase the influence of nearby values.

Here is a simple DIY implementation of IDW in Python

import numpy as np

def inverse_distance_weighting(distances, values, power):
# Calculate the weighted sum of known values based on distances
numerator = np.sum(values / (distances ** power))

# Calculate the sum of weights (inverse of distances)
weights = np.sum(1 / (distances ** power))

# Calculate the interpolated value using Inverse Distance Weighting (IDW) formula
interpolated_value = numerator / weights

return interpolated_value

# Example data: distances and corresponding values
distances = np.array([1.38, 3.04, 7.36, 7.59, 9.67])
values = np.array([17.7, 15.7, 34.3, 6.3, 23.3])

# Power parameter to control the influence of known values (try different values)
power = 1

# Calculate and print the interpolated value using the IDW algorithm
interpolated_value = inverse_distance_weighting(distances, values, power)
print("Interpolated Value:", interpolated_value)

In the IDW algorithm:

- The numerator adds up the known values, dividing each by its distance raised to a power. This emphasizes the impact of closer values on the estimate for the unknown point.

- The denominator sums up the inverses of distances, also raised to the same power. This normalizes the influence of neighboring values, ensuring the estimate considers varying distances.

The power parameter:

- Controls the strength of nearby values’ impact. Higher values make closer values very important, while lower values spread influence more evenly among neighbors.

Using Built-in Libraries to Perform IDW Interpolation

In the above example, we used precalculated distances, but in most practical use cases, we have to calculate the distance ourselves. We can use the haversine distance to calculate it, but when we have a lot of points, this can be cumbersome. This is where we can use preexisting libraries that calculate the distance and perform the interpolation.

In our example, we will be interpolating PM2.5 values from air quality data for the city of Bengaluru.

grid_space = 0.01
grid_lon = np.arange(np.amin(lons), np.amax(lons), grid_space)
grid_lat = np.arange(np.amin(lats), np.amax(lats), grid_space)

Let us first generate a grid of points for which we need to estimate values. We’ve set approximately 1 km of grid space.

“lons” contains a list of longitudes and “lats” contains a list of latitudes. We’ve generated the grid using the min and max of longitudes and latitudes.

all_lats = np.meshgrid(grid_lon, grid_lat)[1].ravel()
all_lons = np.meshgrid(grid_lon, grid_lat)[0].ravel()
itrp = pd.DataFrame()
itrp['lat'] = all_lats
itrp['lng'] = all_lons

In the above code, we created a data frame containing the pairs of latitude and longitude for which we need to estimate values. The code does the following:

  1. It uses `np.meshgrid` to create a grid of latitude and longitude values from the `grid_lon` and `grid_lat` arrays. The `[1]` index extracts the latitude values, and the `[0]` index extracts the longitude values.
  2. The `.ravel()` method flattens the arrays, converting them into one-dimensional arrays containing all latitude and longitude values.
  3. It creates an empty DataFrame named `itrp`.
  4. It adds the flattened latitude array to the DataFrame as a column named `’lat’`.
  5. It adds the flattened longitude array to the DataFrame as a column named `’lng’`.

We can use Sklearn’s KNN implementation to simulate IDW. The code to do it is given below.

x = sample[['lat', 'lng']]
y = sample['PM2.5']
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor(algorithm='kd_tree', n_neighbors=8, weights='distance').fit(x, y)

The “sample” dataframe contains the station air quality data for a single timestamp. We give latitude and longitude as explanatory variables and PM2.5 as the variable that needs to be interpolated. We should use “kd_tree” as the algorithm and set “n_neighbors” as the number of stations, which in this case is eight. We should also set “weights” as the distance to perform IDW.

pred=model.predict(itrp[['lat','lng']])

We will use the predict method to estimate the values for our grid points, which are stored in the itrp dataframe.

We’ll now load some shapefiles to help us visualize the interpolation results.

# Read shapefile into the data GeoDataFrame
data = gpd.read_file('Taluk_Boundary.shp')

# Filter data to select rows with KGISDistri = 20 and limit to the first 4 rows
data = data[data['KGISDistri'] == 20].iloc[0:4]

# Convert itrp's latitude and longitude columns to a geometry column
itrp = gpd.GeoDataFrame(itrp, geometry=gpd.points_from_xy(itrp.lng, itrp.lat))

# Convert stn's latitude and longitude columns to a geometry column
stn = gpd.GeoDataFrame(stn, geometry=gpd.points_from_xy(stn.lng, stn.lat))

# Convert sample's latitude and longitude columns to a geometry column
sample = gpd.GeoDataFrame(sample, geometry=gpd.points_from_xy(sample.lng, sample.lat))

# Set CRS of sample GeoDataFrame to EPSG:4326 (WGS 84)
sample.crs = {'init': 'epsg:4326'}

# Convert CRS of sample GeoDataFrame to match the CRS of the data GeoDataFrame
sample = sample.to_crs(data.crs)

# Set CRS of stn GeoDataFrame to EPSG:4326
stn.crs = {'init': 'epsg:4326'}

# Convert CRS of stn GeoDataFrame to match the CRS of the data GeoDataFrame
stn = stn.to_crs(data.crs)

# Set CRS of itrp GeoDataFrame to EPSG:4326
itrp.crs = {'init': 'epsg:4326'}

# Convert CRS of itrp GeoDataFrame to match the CRS of the data GeoDataFrame
itrp = itrp.to_crs(data.crs)

The “data” contains the shapefile for the city of Bengaluru.

We converted “itrp,” “sample,” and “stn” to GeoDataFrame to plot the points.

Lastly, we set the coordinate reference system (CRS, for short) for all the newly created geodataframes. This should be the same as our shapefile “data”.

The following is the result of the interpolation.

import geopandas as gpd
import matplotlib.pyplot as plt

# Create a plot of the 'data' GeoDataFrame with white background, black edges
ax = data.plot(color='white', edgecolor='black', figsize=(25, 30))

# Plot 'itrp' GeoDataFrame with PM2.5 values as colors and legend
itrp.plot(ax=ax, column='PM2.5', markersize=50, figsize=(25, 30), legend=True)

# Plot 'sample' GeoDataFrame with markers, PM2.5 values as colors, and labels
sample.plot(ax=ax, marker='o', column='PM2.5', markersize=50, figsize=(25, 30), label='SiteName')

# Annotate points in the 'sample' GeoDataFrame with labels
for x, y, label in zip(sample.geometry.x, sample.geometry.y, sample.SiteName):
ax.annotate(label.split(',')[0], xy=(x, y), xytext=(10, 10), textcoords="offset points")

# Show the plot
plt.show()

We can also use other IDW implementations, one of which is shown below.

from photutils.utils import ShepardIDWInterpolator as idw

# Create an instance of ShepardIDWInterpolator with sample lat/lng and PM2.5 data
vals = idw(sample[['lat', 'lng']].to_numpy(), sample['PM2.5'].to_numpy())

# Interpolate PM2.5 values for itrp using the previously created instance
itrp['PM2.5'] = vals(itrp[['lat', 'lng']])

Let us plot and see the results.

import geopandas as gpd
import matplotlib.pyplot as plt

# Create a plot of the 'data' GeoDataFrame with white background, black edges
ax = data.plot(color='white', edgecolor='black', figsize=(25, 30))

# Plot 'itrp' GeoDataFrame with PM2.5 values as colors and legend
itrp.plot(ax=ax, column='PM2.5', markersize=50, figsize=(25, 30), legend=True)

# Plot 'sample' GeoDataFrame with markers, PM2.5 values as colors, and labels
sample.plot(ax=ax, marker='o', column='PM2.5', markersize=50, figsize=(25, 30), label='SiteName')

# Annotate points in the 'sample' GeoDataFrame with labels
for x, y, label in zip(sample.geometry.x, sample.geometry.y, sample.SiteName):
ax.annotate(label.split(',')[0], xy=(x, y), xytext=(10, 10), textcoords="offset points")

# Show the plot
plt.show()

As we can see, both methods’ results are the same.

And that’s the end of the tutorial. In this tutorial, we learned the basics of IDW by manually implementing IDW from scratch and using built-in libraries to perform IDW. Although there are a lot of other methods to perform interpolation, IDW is one of the easiest-to-understand and most robust methods. Hence, it’s one of the most popular methods.

--

--