Skip to content

Modules

Sebastian Pfister edited this page Dec 7, 2024 · 2 revisions

The pbdl package consists of three modules, each designed for specific use cases in physics-based deep learning:

  • pbdl.loader: Provides basic dataset access using NumPy arrays.
  • pbdl.torch.loader: Supports dataset loading for training models in PyTorch.
  • pbdl.torch.phi.loader Supports dataset loading for training models in PyTorch with integrated solver.

pbdl.loader

This module suitable for loading datasets without training and NumPy arrays are sufficient.

A Dataloader instance requires at least two arguments:

  • dataset name (positional): The name of the dataset to be loaded.
  • time_steps: The interval between input and target frame. If set to None, this interval is maximal (number of frames in the simulation minus one).

Additionally, it accepts the following keyword arguments:

  • sel_sims: Select specific simulations. By default, all simulations are included.
  • trim_start/trim_end: Discard the initial or final sequence of frames, which may be uninteresting.
  • step_size: Use every k-th frame (thinning out datasets with many frames). By default the step size is 1.
  • normalize_data/normalize_const: Choose from the available normalization strategies. By default normalization is disabled.
  • batch_size: Define the number of samples in each batch.
  • shuffle: Determine whether the samples should be provided in a random order.
  • intermediate_time_steps: If enabled, not only the initial and target frames are supplied but also all intermediate frames. Useful for computing accumulated errors over multiple time steps.

For a convenient way to use all simulation frames, set the all_time_steps flag. Note that this flag also controls related settings like time steps, step size, and intermediate time steps.

The following code provides a minimal example:

from pbdl.loader import Dataloader
import matplotlib.pyplot as plt

loader = Dataloader(
    "incompressible-wake-flow-tiny",
    time_steps=10,  # interval between input and target frame
    sel_sims=[0],  # select first simulation
    batch_size=3,
    shuffle=True,
)

inputs, targets = next(iter(loader))

for i in range(len(inputs)):
    plt.subplot(2, len(inputs), i + 1)
    plt.imshow(inputs[i][0])  # display field at index 0
    plt.axis("off")
    plt.title("input {}".format(i + 1))

for i in range(len(targets)):
    plt.subplot(2, len(targets), len(targets) + i + 1)
    plt.imshow(targets[i][0])  # display field at index 0
    plt.axis("off")
    plt.title("target {}".format(i + 1))

plt.show()

pbdl.torch.loader

This module is suitable for loading datasets for training with PyTorch. Unlike the dataloader in the previous module, the dataloader from pbdl.torch.loader returns a pair (input tensor, target tensor), where both elements are PyTorch tensors. Each layer of the tensors represents a physical field or constant.

The following code provides a minimal example:

import torch
import numpy as np
from pbdl.torch.loader import Dataloader
import examples.tcf.net_small as net_small

loader = Dataloader(
    "transonic-cylinder-flow-tiny",
    time_steps=10,
    sel_sims=[0, 1],
    step_size=3,
    normalize_data="std",
    batch_size=3,
    shuffle=True,
)

net = net_small.NetworkSmall()
criterionL2 = torch.nn.MSELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.0001, weight_decay=0.0)

for epoch in range(5):
    for i, (input, target) in enumerate(loader):

        net.zero_grad()
        output = net(input)

        loss = criterionL2(output, target)
        loss.backward()
        optimizer.step()

    print(f"epoch { epoch }, loss { loss.item() }")

pbdl.torch.phi.loader

This module is suitable if you want to integrate a (PhiFlow) solver into the training loop of your PyTorch program. It introduces new features that must be enabled using the following parameters:

  • batch_by_const: A list of indices representing constants. It ensures that all samples in a batch share the same constant values. This is useful when using a solver function that requires a batch of samples but only one scalar value for each constant.
  • ret_batch_const: When enabled, the loader also returns the non-normalized constants for the batch. This option is only available if batching by constants is enabled.

Additionally, the module provides auxiliary functions for converting tensors between PyTorch and PhiFlow:

  • to_phiflow(t): Converts network input to solver input by removing constant layers.
  • from_phiflow(t): Converts solver output to match network output format.
  • cat_constants(t,l): Concatenates constant layers from tensor l onto tensor t. This is useful because the network output does not include the constant layers required for the network input in the next iteration.

The following code provides a minimal example:

import torch
from pbdl.torch.phi.loader import Dataloader
from examples.ks.ks_networks import ConvResNet1D
from examples.ks.ks_solver import DifferentiableKS

# solver parameters
DOMAIN_SIZE_BASE = 8
PREDHORZ = 5

device = "cuda:0" if torch.cuda.is_available() else "cpu"

diff_ks = DifferentiableKS(resolution=48, dt=0.5)

loader = Dataloader(
    "ks-dataset",
    PREDHORZ,
    step_size=20,
    intermediate_time_steps=True,
    batch_size=16,
    batch_by_const=[0],
    ret_batch_const=True,
)

net = ConvResNet1D(16, 3, device=device)
optimizer = torch.optim.Adam(net.parameters(), lr=1e-4)
loss = torch.nn.MSELoss()

for epoch in range(4):
    for i, (input, targets, const) in enumerate(loader):

        input = input.to(device)
        targets = targets.to(device)

        optimizer.zero_grad()
        domain_size = const[0]

        inputs = [input]
        outputs = []

        for _ in range(PREDHORZ):
            output_solver = diff_ks.etd1(
                loader.to_phiflow(inputs[-1]), DOMAIN_SIZE_BASE * domain_size
            )

            correction = diff_ks.dt * net(inputs[-1])
            output_combined = loader.from_phiflow(output_solver) + correction

            outputs.append(output_combined)
            inputs.append(loader.cat_constants(outputs[-1], inputs[0]))

        outputs = torch.stack(outputs, axis=1)

        loss_value = loss(outputs, targets)
        loss_value.backward()
        optimizer.step()

    print(f"epoch { epoch }, loss {(loss_value.item()*10000.) :.3f}")
Clone this wiki locally