Implementing Sklearn-like Transformers in Keras: A Custom Preprocessing Layer Example

Fernando Nieuwveldt
4 min readApr 22, 2023

In this article we will be looking at implementing a Sklearn-like transformer as a Keras Preprocessing layer. The example transformer will be the MinMax scaler.

Keras is a powerful deep learning library that enables the creation and training of neural network models. However, before training any machine learning model, it is essential to preprocess the data. Sklearn provides a variety of transformers to preprocess data, including the MinMaxScaler that scales features to a specific range. In this blog post, we will demonstrate how to implement a custom Keras preprocessing layer that behaves like Sklearn’s MinMaxScaler, using the MinMaxScalerLayer example. We will then compare the results using the same data to highlight the similarities between the two methods.

The MinMaxScalerLayer

Our custom preprocessing layer, MinMaxScalerLayer, scales features to a specified range by transforming them according to a provided feature_range. The layer can be easily integrated into a Keras model as it is a subclass of tf.python.keras.engine.base_preprocessing_layer.PreprocessingLayer. The main components of the layer are:

  • __init__ method: Initializes the layer with the required feature range.
  • adapt method: Computes the minimum and maximum values for each feature in the input data; i.e contains the state for the scaler
  • call method: Applies the scaling transformation to the input data.
  • get_config method: Returns the layer configuration for serialization purposes.

Here is the implementation to create an equivalent custom MinMax scaler in Keras:

import tensorflow as tf

class MinMaxScalerLayer(tf.python.keras.engine.base_preprocessing_layer.PreprocessingLayer):
def __init__(self, feature_range=(0, 1), **kwargs):
super(MinMaxScalerLayer, self).__init__(**kwargs)
self.feature_range = feature_range
self.data_min = None
self.data_max = None

def adapt(self, data):
data = tf.convert_to_tensor(data)
self.data_min = tf.math.reduce_min(data, axis=0)
self.data_max = tf.math.reduce_max(data, axis=0)

def call(self, inputs):
if self.data_min is None or self.data_max is None:
raise RuntimeError("The layer has not been adapted. Call 'adapt' before using the layer.")

inputs = tf.convert_to_tensor(inputs)
scaled_data = (inputs - self.data_min) / (self.data_max - self.data_min)
return self.feature_range[0] + (scaled_data * (self.feature_range[1] - self.feature_range[0]))

def get_config(self):
config = super(MinMaxScalerLayer, self).get_config()
config.update({
"feature_range": self.feature_range
})
return config

Comparing MinMaxScalerLayer with Sklearn’s MinMaxScaler

To demonstrate the similarities between our custom preprocessing layer and Sklearn’s MinMaxScaler, we will use the same data for both methods and compare the results.

First, let’s generate a dataset:

import numpy as np

data = np.random.randn(100, 10).astype(np.float32)
labels = np.random.randn(100, 1)

Now, let’s preprocess the data using Sklearn’s MinMaxScaler:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
data_scaled_sklearn = scaler.fit_transform(data)

Next, we will preprocess the data using our custom MinMaxScalerLayer:

minmax_scaler_layer = MinMaxScalerLayer(feature_range=(0, 1))
minmax_scaler_layer.adapt(data)
data_scaled_keras = minmax_scaler_layer(data)

Finally, we can compare the results:

print("Sklearn MinMaxScaler result:")
print(data_scaled_sklearn)

print("Keras MinMaxScalerLayer result:")
print(data_scaled_keras.numpy())

print("Difference between Sklearn and Keras results:")
print(np.abs(data_scaled_sklearn - data_scaled_keras.numpy()))

Preparing Real-World Data

In this section, we will demonstrate the usage of the custom MinMaxScalerLayer with a real-world dataset. We will use the UCI Machine Learning Repository’s “Wine Quality” dataset, which consists of various physicochemical properties of wines and their corresponding quality ratings.

First, let’s load the dataset and split it into features and labels:

import pandas as pd
from sklearn.model_selection import train_test_split

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(url, delimiter=";")
features = data.drop("quality", axis=1).values.astype(np.float32)
labels = data["quality"].values.reshape(-1, 1)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

Building and Training a Keras Model:

We will now build a simple Keras model using the custom MinMaxScalerLayer and train it on the Wine Quality dataset. The performance of the model will be evaluated on the test set.

import tensorflow as tf

# Instantiate and adapt the custom preprocessing layer
minmax_scaler_layer = MinMaxScalerLayer(feature_range=(0, 1))
minmax_scaler_layer.adapt(X_train)

# Create and compile a Keras model using the custom preprocessing layer
model = tf.keras.Sequential([
minmax_scaler_layer,
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=2)

# Evaluate the model on the test set
test_loss, test_mae = model.evaluate(X_test, y_test, verbose=2)
print(f"Test MAE: {test_mae:.4f}")

This example demonstrates how to use the custom MinMaxScalerLayer in a Keras model for a real-world dataset. The layer is adapted to the training data, and it can be easily integrated into a Keras model to handle feature scaling.

Conclusion

In this blog post, we demonstrated how to implement a custom preprocessing layer in Keras that imitates Sklearn’s MinMaxScaler. The MinMaxScalerLayer can be easily integrated into a Keras model to handle feature scaling. By comparing the results of our custom layer with Sklearn’s MinMaxScaler, we showed that they produce similar results, highlighting the flexibility and usefulness of Keras for creating custom transformers. This approach can be extended to other preprocessing tasks, making it easier to integrate Sklearn-like transformers directly into Keras models.

BECOME a WRITER at MLearning.ai

--

--

Fernando Nieuwveldt

I am an ML Engineer | Data scientist with interests in Deep learning and building systems and software for ML. https://www.linkedin.com/in/fernandonieuwveldt