Skip to content

3.0 Fittign an MCMSEM modng

Michel Nivard edited this page Sep 12, 2024 · 1 revision

Fitting an MCM model

Now that we have set up our MCMSEM model it is time to fit it to our data. For the purposes of this document we will continue with the following:

base_model <- MCMmodel(data_summary, n_latent=2)

MCMfit

Since everything required to fit our MCM model is already stored in our MCMmodel object, all you need to provide to MCMfit is the model and the data.

my_result <- MCMfit(base_model, data_summary)

Note: We could store your data in the MCMmodel object, but that would result in unnecessary copies, hence we don't.

This will produce an MCM result object which when printed displays the parameters and their standard errors:

      a1   a2 b1_2 b1_3 b1_4 b1_5 b2_1 b2_3 ...
est 0.00 0.07 0.06 0.27 0.05 0.11 0.24 0.25 ...
se  3.42 0.19 0.03 0.02 0.03 0.03 0.03 0.03 ...

Note that even though it looks like it when printed, the MCM result object is not a dataframe. To obtain the estimate and SE dataframe that is printed use as.data.frame(my_result). More on what to do with a result object in Wiki pages 4 and 5.

Standard errors

By default MCMfit will include the calculation of standard errors, however, this may not be ideal, e.g. when simply testing the optimization of larger models. You can turn off this computation by setting compute_se to FALSE.

my_result <- MCMfit(base_model, data_summary, compute_se=FALSE)

MCMSEM includes 2 options for calculating standard errors: (1) asymptotic approximation, (2) bootstrap. Asymptotic approximation is the default:

my_result <- MCMfit(base_model, data_summary, se_type='asymptotic')

And for bootstrap there are two options: one-step, or two-step. In the two-step bootstrap chunks of the data are bootstrapped, rather than the entire data for a significant boost in performance, therefore two-step is generally recommended.

my_result <- MCMfit(base_model, data_summary, se_type='one-step') # "Standard" bootstrap
my_result <- MCMfit(base_model, data_summary, se_type='two-step') # Chunked bootstrap

The number of boostrap iterations can be adjusted via bootstrap_iter (defaults to 200), and the number of chunks in two-step boostrap can be adjusted via bootstrap_chunks (defaults to 1000).

my_result <- MCMfit(base_model, data_summary, se_type='two-step', boostrap_iter=100, bootstrap_chunks=500)

Optimizer & loss

MCMSEM by default uses RPROP and LBFGS optimizers in sequence. RPROP serves to find a good starting point, LBFGS to find the optimal soltuion given RPROPs starting point. By default MCMSEM will run RPROP for 50 iterations and LBFGS for 12, this can be changed via optim_iters.

my_result <- MCMfit(base_model, data_summary, optim_iters=c(100, 25))

The optimizers can be changed by passing lowercase name(s) to the optimizers argument. Note that learning rates (see below) can behave differently across different optimizers, so you will likely want to change that in addition to the optimizer(s). Passing a vector of multiple optimizers will result in them being run in sequence (as described with the default RPROP and LBFGS previously).

my_result <- MCMfit(base_model, data_summary, optimizers='sgd', learning_rate=0.1, optim_iters=100)   # Single optimizer
my_result <- MCMfit(base_model, data_summary, optimizers=c('sgd', 'lbfgs'), learning_rate=c(0.1, 0.9), optim_iters=c(100, 25))   # Single optimizer

The following optimizers are implemented, but again, all but rprop have been tested: adadelta, adagrad, adam, asgd, rmsprop, rprop, sgd

By default the learning rate for the first optimizer is 0.02, and the learning rate for LBFGS is 1.0. The learning rate for LBFGS is so high because it should already start close to an optimal solution (i.e. gradients should be low) thanks to RPROP. The learning rates can be adjusted via the learning_rate argument. If your estimates all return NA this might be a good starting point (in which case learning rate should be decreased).

my_result <- MCMfit(base_model, data_summary learning_rate=c(0.01, 0.5))

By default loss calculation is done by mean squared error (mse), alternatively you can use a smooth_l1 loss by specifying the loss_type argument

my_result <- MCMfit(base_model, data_summary, loss_type='smooth_l1')

If you want to keep an eye on the loss as the otpimizer runs you can set debug to TRUE, which will make MCMfit print progress, as wel as the loss at every step.

my_result <- MCMfit(base_model, data_summary, debug=TRUE)

Note on LBFGS
Due to the nature of LBFGS it is very likely to end up with NaN gradients when used as the first optimizer. Additionally, it is unlikely that much more than 25 iterations of LBFGS will lead to better estimates. So if you want to use LBFGS, only use it after another optimizer (or multiple), and increase the number of iterations of the other optimizer(s) first.

Bounds

Due to our use of optimizers that were developed for machine learning applications we are unable to use hard bounds in these optimizers, therefore the default is to not use bounds at all. If you do want to use bounds you can use our work-around described below by setting use_bounds to TRUE, and optionally change the * outofbounds_penalty value.

my_result <- MCMfit(base_model, data_summary, use_bounds=TRUE, outofbounds_penalty=1.0)

As a work-around we have made the loss scale exponentially as estimates get further out of bounds, to nudge the optimizer toward an in-bound solution without destroying the gradients on which the optimizers heavily rely. As a result of this it is still possible that MCMfit finds a solution where parameters are slightly out of bounds even when you have set use_bounds to TRUE, to combat this you can try to increase the severity of the out-of-bounds penalty by increasing its scaling using the outofbounds_penalty argument. In pseudo-code:

lbound_check = ( estimates <= lbounds ) # 0 if parameter is in bounds, 1 if parameter is below lower bound
ubound_check = ( estimates >= ubounds ) # 0 if parameter is in bounds, 1 if parameter is above upper bound
lbound_dist  = ( estimates - lbounds ) ^ 2  # Distance from  lower bounds
ubound_dist  = ( estimates - ubounds ) ^ 2  # Distance from  upper bounds
lbound_dist_sum = sum( sqrt( lbound_dist * lbound_check ) )  # Sum of distances from lower bounds
ubound_dist_sum = sum( sqrt( ubound_dist * ubound_check ) )  # Sum of distances from upper bounds
pow = (lbound_dist_sum + ubound_dist_sum) * outofbounds_penalty
loss = loss * ( 2 ^ pow ) 

Skewness and kurtosis

By default MCMfit will fit the full model including both skewness and kurtosis. You can disable either one of these by setting use_skewness or use_kurtosis to FALSE

my_result_no_skew <- MCMfit(base_model, data_summary, use_skewness=FALSE)
my_result_no_kurt <- MCMfit(base_model, data_summary, use_kurtosis=FALSE)

When testing larger (>20 variable) models we highly recommend to set use_kurtosis to FALSE in the testing phase, as this significantly increases performance. Once you have a model you are satisfied with, set use_kurtosis to TRUE to obtain your final estimates.

Monitor gradients

By default MCMfit will only store the gradient of each parameter at the last iterations. To store the entire gradient history, you can include the monitor_grads argument. This argument will also incur early stopping when NaN gradients are encountered

my_result <- MCMfit(base_model, data_summary, monitor_grads=TRUE)

Jacobian method

For asymptotic calculation of standard errors, a Jacobian matrix is computed with numDeriv::jacobian(), if you would like to change the method argument passed to numDeriv::jacobian() (defaults to simple for optimal performance) you can do so via the jacobian_method argument.

my_result <- MCMfit(base_model, data_summary, jacobian_method='Richardson')

Fitting on a GPU

In order to fit your MCMSEM model on a GPU you must (1) meet all the requirements and (2) have a CUDA-enabled installation of torch, see 1. Installing MCMSEM for more detail.

To run MCMfit() on a GPU first, setup a CUDA device, and pass this to MCMfit(). This will cause all underlying matrices to be stored on the GPU instead, in turn running optimization on the GPU, potentially improving performance significantly.

cuda_device <- torch_device("cuda")
my_result <- MCMfit(base_model, data_summary, device=cuda_device)

CUDA out of memory

If you encounter this, my first advice would be to try and figure out at what stage this happens, run your model with a low number of iterations using debug:

cuda_device <- torch_device("cuda")
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, debug=TRUE)

If the optimization is successful and the out of memory occurs only in calculations of the standard errors, I recommend you change the device used for standard errors:

cuda_device <- torch_device("cuda")
cpu_device <- torch_device("cpu")
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, device_se=cpu_device)

When running into CUDA Out of Memory errors during optimization, first try to increase low_memory from 1 to 4 iteratively. This will run increasingly more VRAM-friendly version of the optimization at the cost of runtime. Note this cost gets VERY significant with large (>30 variables) models, but it will still be faster than CPU.

my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=1)
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=2)
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=3)
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=4)

If after setting low_memory to 4 you still run into CUDA out of memory errors see 1. Installing MCMSEM for your options.

Other arguments with GPU

All other arguments to MCMfit work with a CUDA device, so you do not need to change anything about your code when moving from CPU to GPU.