-
Notifications
You must be signed in to change notification settings - Fork 0
3.0 Fittign an MCMSEM modng
Now that we have set up our MCMSEM model it is time to fit it to our data. For the purposes of this document we will continue with the following:
base_model <- MCMmodel(data_summary, n_latent=2)
Since everything required to fit our MCM model is already stored in our MCMmodel object, all you need to provide to MCMfit
is the model and the data.
my_result <- MCMfit(base_model, data_summary)
Note: We could store your data in the MCMmodel object, but that would result in unnecessary copies, hence we don't.
This will produce an MCM result object which when printed displays the parameters and their standard errors:
a1 a2 b1_2 b1_3 b1_4 b1_5 b2_1 b2_3 ...
est 0.00 0.07 0.06 0.27 0.05 0.11 0.24 0.25 ...
se 3.42 0.19 0.03 0.02 0.03 0.03 0.03 0.03 ...
Note that even though it looks like it when printed, the MCM result object is not a dataframe. To obtain the estimate and SE dataframe that is printed use as.data.frame(my_result)
. More on what to do with a result object in Wiki pages 4 and 5.
By default MCMfit
will include the calculation of standard errors, however, this may not be ideal, e.g. when simply testing the optimization of larger models. You can turn off this computation by setting compute_se
to FALSE
.
my_result <- MCMfit(base_model, data_summary, compute_se=FALSE)
MCMSEM includes 2 options for calculating standard errors: (1) asymptotic approximation, (2) bootstrap. Asymptotic approximation is the default:
my_result <- MCMfit(base_model, data_summary, se_type='asymptotic')
And for bootstrap there are two options: one-step, or two-step. In the two-step bootstrap chunks of the data are bootstrapped, rather than the entire data for a significant boost in performance, therefore two-step is generally recommended.
my_result <- MCMfit(base_model, data_summary, se_type='one-step') # "Standard" bootstrap
my_result <- MCMfit(base_model, data_summary, se_type='two-step') # Chunked bootstrap
The number of boostrap iterations can be adjusted via bootstrap_iter
(defaults to 200), and the number of chunks in two-step boostrap can be adjusted via bootstrap_chunks
(defaults to 1000).
my_result <- MCMfit(base_model, data_summary, se_type='two-step', boostrap_iter=100, bootstrap_chunks=500)
MCMSEM by default uses RPROP and LBFGS optimizers in sequence.
RPROP serves to find a good starting point, LBFGS to find the optimal soltuion given RPROPs starting point. By default MCMSEM will run RPROP for 50 iterations and LBFGS for 12, this can be changed via optim_iters
.
my_result <- MCMfit(base_model, data_summary, optim_iters=c(100, 25))
The optimizers can be changed by passing lowercase name(s) to the optimizers
argument. Note that learning rates (see below) can behave differently across different optimizers, so you will likely want to change that in addition to the optimizer(s).
Passing a vector of multiple optimizers will result in them being run in sequence (as described with the default RPROP and LBFGS previously).
my_result <- MCMfit(base_model, data_summary, optimizers='sgd', learning_rate=0.1, optim_iters=100) # Single optimizer
my_result <- MCMfit(base_model, data_summary, optimizers=c('sgd', 'lbfgs'), learning_rate=c(0.1, 0.9), optim_iters=c(100, 25)) # Single optimizer
The following optimizers are implemented, but again, all but rprop have been tested: adadelta, adagrad, adam, asgd, rmsprop, rprop, sgd
By default the learning rate for the first optimizer is 0.02, and the learning rate for LBFGS is 1.0. The learning rate for LBFGS is so high because it should already start close to an optimal solution (i.e. gradients should be low) thanks to RPROP.
The learning rates can be adjusted via the learning_rate
argument. If your estimates all return NA this might be a good starting point (in which case learning rate should be decreased).
my_result <- MCMfit(base_model, data_summary learning_rate=c(0.01, 0.5))
By default loss calculation is done by mean squared error (mse
), alternatively you can use a smooth_l1
loss by specifying the loss_type
argument
my_result <- MCMfit(base_model, data_summary, loss_type='smooth_l1')
If you want to keep an eye on the loss as the otpimizer runs you can set debug
to TRUE
, which will make MCMfit
print progress, as wel as the loss at every step.
my_result <- MCMfit(base_model, data_summary, debug=TRUE)
Note on LBFGS
Due to the nature of LBFGS it is very likely to end up with NaN gradients when used as the first optimizer. Additionally, it is unlikely that much more than 25 iterations of LBFGS will lead to better estimates. So if you want to use LBFGS, only use it after another optimizer (or multiple), and increase the number of iterations of the other optimizer(s) first.
Due to our use of optimizers that were developed for machine learning applications we are unable to use hard bounds in these optimizers, therefore the default is to not use bounds at all. If you do want to use bounds you can use our work-around described below by setting use_bounds
to TRUE
, and optionally change the * outofbounds_penalty
value.
my_result <- MCMfit(base_model, data_summary, use_bounds=TRUE, outofbounds_penalty=1.0)
As a work-around we have made the loss scale exponentially as estimates get further out of bounds, to nudge the optimizer toward an in-bound solution without destroying the gradients on which the optimizers heavily rely. As a result of this it is still possible that
MCMfit
finds a solution where parameters are slightly out of bounds even when you have setuse_bounds
toTRUE
, to combat this you can try to increase the severity of the out-of-bounds penalty by increasing its scaling using theoutofbounds_penalty
argument. In pseudo-code:lbound_check = ( estimates <= lbounds ) # 0 if parameter is in bounds, 1 if parameter is below lower bound ubound_check = ( estimates >= ubounds ) # 0 if parameter is in bounds, 1 if parameter is above upper bound lbound_dist = ( estimates - lbounds ) ^ 2 # Distance from lower bounds ubound_dist = ( estimates - ubounds ) ^ 2 # Distance from upper bounds lbound_dist_sum = sum( sqrt( lbound_dist * lbound_check ) ) # Sum of distances from lower bounds ubound_dist_sum = sum( sqrt( ubound_dist * ubound_check ) ) # Sum of distances from upper bounds pow = (lbound_dist_sum + ubound_dist_sum) * outofbounds_penalty loss = loss * ( 2 ^ pow )
By default MCMfit
will fit the full model including both skewness and kurtosis. You can disable either one of these by setting use_skewness
or use_kurtosis
to FALSE
my_result_no_skew <- MCMfit(base_model, data_summary, use_skewness=FALSE)
my_result_no_kurt <- MCMfit(base_model, data_summary, use_kurtosis=FALSE)
When testing larger (>20 variable) models we highly recommend to set use_kurtosis
to FALSE
in the testing phase, as this significantly increases performance. Once you have a model you are satisfied with, set use_kurtosis
to TRUE
to obtain your final estimates.
By default MCMfit
will only store the gradient of each parameter at the last iterations. To store the entire gradient history, you can include the monitor_grads
argument. This argument will also incur early stopping when NaN gradients are encountered
my_result <- MCMfit(base_model, data_summary, monitor_grads=TRUE)
For asymptotic calculation of standard errors, a Jacobian matrix is computed with numDeriv::jacobian()
, if you would like to change the method argument passed to numDeriv::jacobian()
(defaults to simple
for optimal performance) you can do so via the jacobian_method
argument.
my_result <- MCMfit(base_model, data_summary, jacobian_method='Richardson')
In order to fit your MCMSEM model on a GPU you must (1) meet all the requirements and (2) have a CUDA-enabled installation of torch
, see 1. Installing MCMSEM
for more detail.
To run MCMfit()
on a GPU first, setup a CUDA device, and pass this to MCMfit()
. This will cause all underlying matrices to be stored on the GPU instead, in turn running optimization on the GPU, potentially improving performance significantly.
cuda_device <- torch_device("cuda")
my_result <- MCMfit(base_model, data_summary, device=cuda_device)
If you encounter this, my first advice would be to try and figure out at what stage this happens, run your model with a low number of iterations using debug
:
cuda_device <- torch_device("cuda")
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, debug=TRUE)
If the optimization is successful and the out of memory occurs only in calculations of the standard errors, I recommend you change the device used for standard errors:
cuda_device <- torch_device("cuda")
cpu_device <- torch_device("cpu")
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, device_se=cpu_device)
When running into CUDA Out of Memory errors during optimization, first try to increase low_memory
from 1
to 4
iteratively. This will run increasingly more VRAM-friendly version of the optimization at the cost of runtime. Note this cost gets VERY significant with large (>30 variables) models, but it will still be faster than CPU.
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=1)
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=2)
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=3)
my_result <- MCMfit(base_model, data_summary, optim_iters=c(5, 5), device=cuda_device, low_memory=4)
If after setting low_memory
to 4
you still run into CUDA out of memory errors see 1. Installing MCMSEM
for your options.
All other arguments to MCMfit
work with a CUDA device, so you do not need to change anything about your code when moving from CPU to GPU.