The SSOM
class implements a Self-Organizing Map (SOM), a type of artificial neural network used for unsupervised learning. This manual explains each method and argument in the class.
som = SSOM(grid_size=10, num_iterations=100, max_learning_rate=0.1,
learning_decay=0.25, neighborhood_decay=0.5, metric="euclidean",
grid_shape="2D", log_interval=None)
-
grid_size
- Specifies the size of the SOM grid.
- For
grid_shape="1D"
, it is an integer (number of nodes in the 1D grid). - For
grid_shape="2D"
, it is a tuple(rows, cols)
specifying the grid dimensions.
-
num_iterations
- Total number of iterations during training.
- Determines how many steps the SOM will take to adjust to the data.
-
max_learning_rate
- The initial learning rate for updating the weights of the SOM.
- The learning rate decreases over iterations based on the
learning_decay
.
-
learning_decay
- Controls how quickly the learning rate decreases over time.
- Higher values mean faster decay.
-
neighborhood_decay
- Controls how quickly the neighborhood radius decreases over time.
- The radius affects which nodes in the SOM are influenced by a given data point.
-
metric
- Specifies the distance metric used to find the Best Matching Unit (BMU):
"euclidean"
: Uses Euclidean distance."ssim"
: Uses Structural Similarity Index (SSIM) for image data.
- Specifies the distance metric used to find the Best Matching Unit (BMU):
-
grid_shape
"1D"
: Creates a 1D SOM grid (linear topology)."2D"
: Creates a 2D SOM grid (grid topology).
-
log_interval
- Controls how frequently the training progress is logged and recorded.
- If not specified, it is automatically set to ensure no more than 100 logs are saved.
Check test/ for more examples how to run SSOM
Assume we have sample data (which are two-dimensional data)
# Generate synthetic data
import numpy as np
x = np.random.normal((-0.5, -0.5), 0.2, (200, 2))
x = np.append(x, np.random.normal((-0.25, 1), 0.2, (100, 2)), axis=0)
x = np.append(x, np.random.normal((0.5, 0.5), 0.2, (150, 2)), axis=0)
input_data = x
Initialize SSOM class (2-D map topology)
from ssom import SSOM # Assuming SSOM is imported correctly
# Initialize a 2D SOM with 10x10 grid
som = SSOM(grid_size=(10, 10), num_iterations=1000, max_learning_rate=0.5, metric="euclidean")
Initialize SSOM class (1-D ring topology)
# Initialize the SSOM with a 1D grid of size 20
som = SSOM(grid_size=20, num_iterations=5000, grid_shape="1D")
som.train(input_data)
-
Trains the SOM on the input data (
input_data
). -
Training process is to adjusts the weights of the SOM nodes to match the input data distribution.
-
input_data
A NumPy array of shape(samples, dimensions)
. Each row is a data sample.
The som_grid
and sample_bmu_mapping
are critical attributes for understanding the result of the SOM training:
som_grid
: Represents the learned weights of the SOM nodes and can be visualized to interpret the clustering.sample_bmu_mapping
: Links each input data point to its nearest SOM node, showing how the SOM clusters the data.
- The
som_grid
attribute represents the weights of the SOM nodes. It stores the learned representation of the input data. - Its shape depends on the topology of the SOM:
- For
grid_shape="1D"
, it is a 2D array of shape(grid_size, num_input_dims)
. - For
grid_shape="2D"
, it is a 3D array of shape(grid_rows, grid_cols, num_input_dims)
.
- For
- After training, the
som_grid
contains the trained SOM nodes, which approximate the distribution of the input data.
# Access the trained SOM grid
print("Trained SOM Grid:")
print(som.som_grid)
- The
sample_bmu_mapping
is a list of tuples, each containing:- The index of a sample in the input data.
- The index of the Best Matching Unit (BMU) for that sample.
- This mapping shows how each input sample is clustered within the SOM grid.
- Use this attribute to understand the clustering of the input data points within the SOM.
# Access the sample-to-BMU mapping
print("Sample to BMU Mapping:")
for sample_index, bmu_index in som.sample_bmu_mapping:
print(f"Sample {sample_index} maps to BMU at index {bmu_index}")
grid_indices = self._initialize_grid_indices()
- Initializes the indices of the SOM grid.
- For
grid_shape="2D"
, creates a(rows, cols, 2)
array where each element stores its row and column index.
# Create 2D grid indices for a 5x5 SOM
som = SSOM(grid_size=(5, 5), grid_shape="2D")
grid_indices = som._initialize_grid_indices()
self._initialize_grid(num_input_dims)
- Randomly initializes the weights of the SOM grid.
- For
grid_shape="1D"
, creates a(grid_size, num_input_dims)
array. - For
grid_shape="2D"
, creates a(rows, cols, num_input_dims)
array.
num_input_dims
- The number of dimensions in the input data (e.g., 3 for RGB).
# Initialize a 1D SOM grid with 10 nodes and 3 input dimensions
som = SSOM(grid_size=10, grid_shape="1D")
som._initialize_grid(3)
self.ssim(x, y, C1=0.01**2, C2=0.03**2)
- Computes the Structural Similarity Index (SSIM) between two data points.
- Useful for comparing image-like data.
x, y
Input arrays to compare.C1, C2
Small constants to stabilize SSIM computation.
# Compare two RGB patches
ssim_value = som.ssim(patch1, patch2)
bmu_index = self.find_bmu(input_patch)
- Finds the Best Matching Unit (BMU) for a given input patch based on the specified metric (
"euclidean"
or"ssim"
).
input_patch
A data point from the input dataset.
- The index of the BMU:
- For
grid_shape="1D"
, an integer index. - For
grid_shape="2D"
, a tuple(row, col)
.
- For
# Find BMU for an input point
bmu = som.find_bmu(np.array([0.5, 0.5]))
# Train the SOM with 2D input data
data = np.random.rand(100, 2)
som.train(data)
The following attributes store historical data during training:
-
som_grid_history
A list of SOM grid states logged at eachlog_interval
. -
neighborhood_influence_history
A list of neighborhood influence values for each iteration. -
learning_rate_history
A list of learning rates at eachlog_interval
. -
bmu_history
A list of BMU indices for selected data points at eachlog_interval
. -
step_history
A list of iteration steps corresponding to the logs. -
selected_data_point_history
A list of input data points selected at eachlog_interval
.
# Access learning rate history after training
print(som.learning_rate_history)
You can visualize the SOM training progress and final results using matplotlib
.
# Plot learning rate decay
plt.plot(som.step_history, som.learning_rate_history)
plt.title("Learning Rate Decay")
plt.xlabel("Iteration Step")
plt.ylabel("Learning Rate")
plt.grid()
plt.show()