Harnessing the Power of GIS and Python for Property Value Analysis at Scale

ODSC - Open Data Science
6 min readMar 4, 2024

Editor’s note: Mike Dezube is a speaker for ODSC East this April 23–25. Be sure to check out his talk, “Unlocking Insights in Home Values: A Multimillion-Row Journey with Polars,” there!

Tremendous amounts (Petabytes) of public information are available via the Census Bureau and local municipalities, and we at Charles River Data love exploring this data and the stories they tell for our clients, and for the public good! In this tutorial we’re going to cover how to acquire this data, process it, and understand it at scale — we’ll start with all (2.5M) property records in Massachusetts.

We’ll be doing a deep dive on Massachusetts home property value analysis at the ODSC conference in April, exploring how they have changed over time due to factors such as the advent of covid, proximity to Boston and population density. In the final section we’ll show how to extend this analysis to any state.

But as a preview for this talk, let’s get our feet wet with some explorations to understand the data at hand.

Average home property value analysis by town in Massachusetts. A visual we’ll build together at ODSC this year.

Loading all property values in Massachusetts

Per mass.gov, there are about 2.5M properties in Massachusetts. In the code below we’ll load an export of mass.gov from 2024 and explore the 180,312 properties in Boston. This data is available for every state and as a full nation set as well, contact us for details.

Let’s get started.

# See https://github.com/mdezube/property-assessments/blob/main/README.md for
# conda install directions
import geopandas as gpd
# Load in the geometric boundaries of all properties in eastern MA. We could readily load
# western too and merge but for this quick tutorial we'll start with Eastern.
# Download from https://www.mass.gov/forms/massgis-request-statewide-parcel-data
EASTERN_MA_SHP_FILE = "<location on your machine>/L3_TAXPAR_POLY_ASSESS_EAST.shp"
eastern_ma_parcels = gpd.read_file(
EASTERN_MA_SHP_FILE,
engine="pyogrio",
use_arrow=True,
)
eastern_ma_parcels["OWNER1"] = eastern_ma_parcels["OWNER1"].str.replace(
r"COMMONWLTH\b|COMMWLTH\b", "COMMONWEALTH", regex=True
)
eastern_ma_parcels["OWNER1"] = eastern_ma_parcels["OWNER1"].str.replace(
r"MASS\b", "MASSACHUSETTS", regex=True
)
boston_parcels = eastern_ma_parcels[eastern_ma_parcels["CITY"].str.upper() == "BOSTON"]
print(
f"{eastern_ma_parcels.shape[0]:,} properties – {boston_parcels.shape[0]:,} in Boston."
)

1,879,297 properties — 180,312 in Boston.

Let’s view a sample of 10 at random to get a feel for the data. There are >40 columns describing each property, but we’ll focus on just a handful in this tutorial. We notice the data tell us quite a bit: the exact address, value of the building, the land, the style, the use type (see full details here) and even who owns it (a column we’ll explore later).

boston_parcels[[
"CITY", "ZIP", "SITE_ADDR", "TOTAL_VAL", "BLDG_VAL", "LAND_VAL", "RES_AREA", "STYLE",
"USE_CODE"
]].head(10)

What are the most expensive properties in Boston?

Now that we have the properties, we can start interrogating them a bit. Let’s start with the simplest of questions, what are the top 10 most expensive?

boston_parcels[[
"SITE_ADDR", "TOTAL_VAL", "BLDG_VAL", "LAND_VAL", "LOT_SIZE", "OWNER1", "STYLE"
]].set_index("SITE_ADDR").sort_values(by="TOTAL_VAL", ascending=False).head(10)

Not terribly surprising, we see the most expensive properties are massive entities well known in Boston such as the The Hancock Building (#1) and Brigham & Women’s Hospital (#2). There are however two interesting examples here that buck the trend being massive in size or lacking size information. These are #4, UMass Boston, which stands out owning a stunning 170 acres in Boston, and part of Harvard University (#6), the largest building ever built by Harvard, which per their tax records hasn’t had the land value assessed yet. This may seem odd, but as a nonprofit taxes aren’t always due and hence the tax assessed land value isn’t as critical. Or, perhaps the land underneath the building is owned by another entity (a potential fun exploration for the reader).

Perhaps more interesting, we can take the same data but filter to a residential USE_CODE, so then we can see the most expensive homes in Boston and who owns them, or as you’ll see for #3 and #4 below, the ones that have yet to sell. Note we omitted LOT_SIZE which is less interesting for apartments (it’s 0) but added RES_AREA which is the square footage of the unit.

boston_parcels[boston_parcels.USE_CODE.str.contains("10.*")]

Although quite large, these values are accurate (as a potential follow-up for the user, you can explore these properties in Zillow to get a sense for what a $34 million apartment looks like).

Who owns most of Boston?

Circling back to non-residential, how much of Boston do these large entities own?

Below we see that the city of Boston itself owns most of the property which isn’t terribly surprising, but what comes next is a bit more of a shock, Boston University owns 1.5% of the total property in Boston, almost as much as the state of Mass (at 2.0%) and more than the federal government at 0.6%. Harvard is at 1.2%, but this jumps dramatically if we start thinking about the properties they own in Cambridge as well.

import seaborn as sns
df = boston_parcels.groupby("OWNER1")["TOTAL_VAL"].sum().to_frame()
df["PERCENT_OF_TOTAL"] = df["TOTAL_VAL"] / df["TOTAL_VAL"].sum() * 100
df = df.sort_values(by="PERCENT_OF_TOTAL", ascending=False)
ax = sns.barplot(
data=_df[:10], x="PERCENT_OF_TOTAL", y="OWNER1", palette="flare", hue="PERCENT_OF_TOTAL"
)
ax.set(xlabel="Percent of Boston Owned", ylabel="OWNER1")
for i in range(10):
ax.bar_label(ax.containers[i], fontsize=11, fmt=" %0.1f%%")

If we ask the question another way, who owns the most land in Boston, then we can see the numbers change drastically with the city owning 13.0% given their large ownership of parks.

Wrap up

This exploration scratches the surface of what’s possible with GIS, and can be accelerated quite a bit by using https://pola.rs/, an important step to drop the analysis calls from 10’s of seconds to milliseconds. Attend the talk on Unlocking Insights in Home Values: A Multimillion Row Journey with Polars to dive deeper across the full state, look at home parcel definitions, town trends, the impacts of COVID, and learn how to continue to explore and expand on your own using code we make readily available. We include a visual at the start that we’ll create together, and a few more questions to think about, answerable from this dataset:

  • Most expensive homes on each street / in each zip
  • Most expensive styles of homes, and which held their value the best
  • Homes that changed owners the most / least recently
  • Areas where commercial is high %, vs. mixed residential, vs. only residential
  • How to extend to other states and other nations

Questions? Want to apply these techniques to your own data? Reach out at https://www.charlesriverdata.com/get-started

About the Author:

Mike Dezube is the Founder & CEO of Charles River Data, a Boston-based data science consulting firm. Charles River Data helps its clients solve complex problems through the use of advanced data science and machine learning. More than just engineering talent, Charles River has recruited from Google, Amazon, Meta, BCG, Jefferies, and JP Morgan, bringing experience across retail, operations, GIS, defense, healthcare tech, hospitals, insurance (health and other perils), digital marketing, finance, banking, insurance and private equity.

Dezube has leveraged his 7+ years of experience at Google to both form this all-star team, seed it with a wealth of experience in the industry, and to attract seed funding to position Charles River Data for growth.

Mike also performs academic research at Mass General Brigham in cancer, opioid reduction, and improving post-surgical recovery, along with acting as a GIS consultant for Blue Hills, the largest conservation land area in the Greater Boston Area.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.