Custom Spark Transformers

A collection of custom PySpark ML transformers that demonstrate how to extend Spark's ML pipeline capabilities with custom transformation logic.

Features

This library includes three example transformers of increasing complexity:

CustomImputer: A basic transformer that handles missing values
- Replaces nulls in string columns with 'none'
- Replaces nulls in numeric columns with -99
- Demonstrates basic transformation without fitting
CustomAdder: An intermediate transformer for feature engineering
- Combines multiple numeric columns through addition
- Shows how to work with multiple input columns
- Creates derived features
TargetEncoder: An advanced transformer implementing the full estimator-transformer pattern
- Learns mean statistics during fit()
- Applies learned encodings during transform()
- Demonstrates stateful transformations
- Includes separate Model class for serialization

Usage

from pyspark.ml import Pipeline
from custom_spark_transformers import CustomImputer, CustomAdder, TargetEncoder

# Create transformers
imputer = CustomImputer()
adder = CustomAdder(inputCols=["col1", "col2"], outputCol="sum")
encoder = TargetEncoder(inputCol="category", targetCol="y", outputCol="encoded")

# Build pipeline
pipeline = Pipeline(stages=[
    imputer,
    adder,
    encoder
])

# Fit and transform
model = pipeline.fit(train_df)
result_df = model.transform(test_df)

Requirements

PySpark 3.x
Python 3.7+

Installation

Clone this repository:

git clone https://github.com/yourusername/custom_spark_transformers.git

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
custom_spark_transformers.py		custom_spark_transformers.py
databricks.yml		databricks.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom Spark Transformers

Features

Usage

Requirements

Installation

Contributing

License

About

Releases

Packages

Languages

homayoondb/custom-spark-transformer

Folders and files

Latest commit

History

Repository files navigation

Custom Spark Transformers

Features

Usage

Requirements

Installation

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages