Data Filtering Using Python Pandas

Uzair Adamjee
4 min readAug 27, 2023
Photo by Nathan Dumlao on Unsplash

When working with large datasets in Python, efficient data filtering is crucial to extract the specific information you need. Python’s Pandas library offers an array of powerful tools for data manipulation, and filtering is no exception. In this article, we’ll explore five effective ways to filter data using Python Pandas, along with relevant examples for each technique.

1. Filtering with Greater Than (>) or Equal To (==)

Filtering data based on numerical conditions like greater than or equal to can be achieved effortlessly with Pandas. This technique allows you to narrow down your dataset based on specific numeric values.

Example: Suppose you have a dataset of sales records, and you want to extract all entries where the sales amount is greater than $1000.

import pandas as pd

# Creating a sample DataFrame
data = {'Product': ['A', 'B', 'C', 'D'],
'SalesAmount': [1200, 800, 1500, 600]}
df = pd.DataFrame(data)

# Filtering data where SalesAmount is greater than 1000
filtered_data = df[df['SalesAmount'] > 1000]
print(filtered_data)

2. Filtering with .isin()

The .isin() function in Pandas allows you to filter data based on a list of predefined values. This is particularly useful when you want to extract rows that match specific categorical values.

Example: Assume you have a dataset of customer information, and you want to extract entries for customers located in New York or Los Angeles.

import pandas as pd

# Creating a sample DataFrame
data = {'CustomerID': ['A', 'B', 'C', 'D'],
'Location': ['New York', 'Los Angeles', 'Chicago', 'Los Angeles', 'Miami']}
df = pd.DataFrame(data)

# Filtering data for customers in New York or Los Angeles
cities_to_filter = ['New York', 'Los Angeles']
filtered_data = df[df['Location'].isin(cities_to_filter)]
print(filtered_data)

3. Filtering with .str.contains()

When dealing with text data, the .str.contains() function enables you to filter rows containing specific substrings. This technique is valuable for scenarios where you're interested in certain patterns within your text-based dataset.

Example: Imagine you have a dataset of product names, and you want to extract products that contain the word ‘premium’ in their names.

import pandas as pd

# Creating a sample DataFrame
data = {'ProductID': [1, 2, 3, 4, 5],
'ProductName': ['Premium Laptop', 'Tablet', 'Premium Smartphone', 'Keyboard', 'Budget Headphone']}
df = pd.DataFrame(data)

# Filtering products with 'premium' in their names
filtered_data = df[df['ProductName'].str.contains('premium', case=False)]
print(filtered_data)

# Filtering products with 'phones' in their names
filtered_data = df[df['ProductName'].str.contains('phone', case=False)]
print(filtered_data)

4. SQL-like Filtering with .query()

Pandas provides a .query() method that allows you to perform SQL-like filtering on your DataFrame. This can simplify complex filtering conditions and make your code more readable.

Example: Suppose you have a dataset of students’ exam scores, and you want to extract entries where both math and physics scores are greater than 80.

import pandas as pd

# Creating a sample DataFrame
data = {'StudentID': [1, 2, 3, 4, 5],
'MathScore': [90, 75, 85, 95, 70],
'PhysicsScore': [85, 88, 92, 76, 90]}
df = pd.DataFrame(data)

# SQL-like filtering using .query()
filtered_data = df.query('MathScore > 80 and PhysicsScore > 80')
print(filtered_data)

5. Filtering with .where()

The .where() function in Pandas allows you to create a new DataFrame with the same shape as the original, where rows not satisfying the condition are replaced with NaN values.

Example: Assume you have a dataset of temperature records, and you want to keep only the rows where the temperature is above 25°C.

import pandas as pd

# Creating a sample DataFrame
data = {'Date': ['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04'],
'Temperature': [26.5, 24.8, 27.2, 23.0]}
df = pd.DataFrame(data)

# Filtering using .where() to keep rows with temperature above 25°C
filtered_data = df.where(df['Temperature'] > 25)
print(filtered_data)

Conclusion

In the world of data manipulation, Python Pandas shines as a versatile toolset, and filtering data is a breeze with its powerful techniques. We’ve delved into five methods, each with its own superpower:

  1. Numeric Filtering: Easily extract data based on numerical conditions. It’s like picking the juiciest apples from the orchard.
  2. .isin() Magic: Select rows matching specific categories effortlessly. It's like sorting your favorite books from the library.
  3. Text Treasure Hunt: Filter out rows containing certain text patterns. It’s like finding hidden gems in a treasure chest.
  4. SQL Simplified: Apply SQL-like filtering using .query(). It's like crafting a secret code to unlock your data's secrets.
  5. NaN Handling with .where(): Create a new view of your data with conditional replacement. It's like keeping only the sunniest days from a weather report.

With these techniques in your toolkit, you’re ready to filter data like a pro. By honing these skills, you’ll uncover valuable insights from your datasets, making your data-driven decisions even smarter. So go ahead, filter, and discover the power of precise data manipulation!

Thank you for reading!

Let’s Connect:

WRITER at MLearning.ai // AI Video // Multimodal Machine Learning

--

--

Uzair Adamjee

Unleash the power of data with my insights! Join me on a journey to explore the untold stories hidden in numbers. Let's dive in together.