Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse state matrices given to the various solvers lead to poor performance - we should either provide a warning or automatically call dense #379

Open
NominHanggai opened this issue Dec 28, 2023 · 3 comments

Comments

@NominHanggai
Copy link

NominHanggai commented Dec 28, 2023

I (Stefan @Krastanov ) edited this issue to give some context. @mulliken reported below a performance issue due to using a sparse initial state matrix in timeevolution.master. A large array of sparse (datastructure) matrices was created for matrices that are not at all sparse, causing enormous performance penalties. We should either warn or automatically call dense in these functions.

Original report:

Hello QuantumOptics.jl community!

I have a question about the performance of QuantumOptics.jl, stated as follows. This is the first time I wrote code in Julia. Please forgive me if I made silly mistakes.

Issue Description

I am experiencing a performance issue with a simulation script in QuantumOptics.jl, specifically for a three-level atom coupled to two modes (5 states each). The two modes are subject to some Lindblad dissipations. The script, named three-level.jl, simulates a system with a Hilbert space dimension of (3*25, 3*25) and is set to run for 500 steps.

Expected Behavior

Based on the simplicity of the simulation, I anticipated that the execution would take less than 1 minute.

Actual Behavior

The script took over 10 minutes and still would not stop my Windows PC, which is considerably longer than expected. A similar simulation written in Python using QuTiP (three-level.py) completes in just a few seconds.

Environment

  • OS: Windows 11
  • Julia version: v1.9.4
  • QuantumOptics.jl version: v1.0.15

Comparison with QuTiP/Python

I've also implemented a comparable simulation in Python using the QuTiP library, which runs significantly faster. I'm attaching the Python script (three-level.py) and Julia script for reference.

Request

Are there any known issues, optimizations, or alternative approaches that could improve the execution time of my script?

Thank you for your assistance and looking forward to any suggestions or insights you can provide.

Julia script three-level.jl

using QuantumOptics
using PyPlot

# Parameters
const N_cutoff = 5

const ω1 = 50
const ω2 = 50
const s1 = 400
const s2 = s1

const ω = 3000 - s1^2/ω1

# Bases
const b_fock = FockBasis(N_cutoff)
const b_ele = NLevelBasis(3)

# Fundamental operators
const a = destroy(b_fock)
const at = create(b_fock)
const n = number(b_fock)

const proj_d  = transition(b_ele, 1, 1)
const proj_a1 = transition(b_ele, 2, 2)
const proj_a2 = transition(b_ele, 3, 3)
const v_da1  = 100*(transition(b_ele, 1, 2) + transition(b_ele, 2, 1))
const v_da2  = 100*(transition(b_ele, 1, 3) + transition(b_ele, 3, 1))
const v_a1a2 = 100*(transition(b_ele, 2, 3) + transition(b_ele, 3, 2))

# Hamiltonian
const Hele = ω * proj_d + v_da1 + v_da2 + v_a1a2
println(Hele)
# exit()
const Hvib1 = ω1*n
const Hvib2 = ω2*n
const Hint = s1 * proj_a1⊗(a+at)⊗one(b_fock) + s2 * proj_a2 ⊗ one(b_fock) ⊗ (a+at)
const H = Hele⊗one(b_fock)⊗one(b_fock) + one(b_ele)⊗Hvib1⊗one(b_fock) + one(b_ele)⊗one(b_fock)⊗Hvib1 + Hint

# Initial state
const ρ0 = proj_d ⊗ dm(fockstate(b_fock, 1)) ⊗ dm(fockstate(b_fock, 1))

# Integration time
const t_max = 0.5
const T = [0:0.001:t_max;]

const γ = 100
const J = [sqrt(γ)*one(b_ele)⊗ a ⊗ one(b_fock), sqrt(γ)*one(b_ele) ⊗ one(b_fock) ⊗ a]

# Master
const tout, ρt = timeevolution.master(T, ρ0, H, J)
const exp_d_master = real(expect(proj_d ⊗ one(b_fock) ⊗ one(b_fock), ρt))

figure(figsize=(3,3))
subplot(1,1,1)
ylim([0, 1])
xlim([0, t_max])
plot(T, exp_d_master);
xlabel(L"T")
ylabel(L"\langle P_D \rangle")

tight_layout()
show()

Python (qutip) script three-level.py

import numpy as np
import matplotlib.pyplot as plt
from qutip import *

# Parameters
N_cutoff = 5

ω1 = 50
ω2 = 50
s1 = 400
s2 = s1
ω = 3000 -  s1**2 / ω1

# Bases
# b_fock = Fock(N_cutoff)
b_ele = qutip.qeye(3)

# Fundamental operators
a = destroy(N_cutoff)
at = create(N_cutoff)
n = num(N_cutoff)

# Projection operators
proj_d = basis(3, 0) * basis(3, 0).dag()
proj_a1 = basis(3, 1) * basis(3, 1).dag()
proj_a2 = basis(3, 2) * basis(3, 2).dag()

def transition(m,n, N=3):
    return basis(N, m) * basis(N, n).dag() + basis(N, n) * basis(N, m).dag()

v_da1 = 100 * transition(0,1)
v_da2 = 100 * transition(0,2)
v_a1a2 = 100 * transition(2,1)

# Hamiltonian
Hele = ω * proj_d + v_da1 + v_da2 + v_a1a2

print(Hele)
Hvib1 = ω1 * n
Hvib2 = ω2 * n
Hint = s1 * tensor(proj_a1, a + at, qeye(N_cutoff)) + s2 * tensor(proj_a2, qeye(N_cutoff), a + at)
H = tensor(Hele, qeye(N_cutoff), qeye(N_cutoff)) + tensor(qeye(3), Hvib1, qeye(N_cutoff)) + tensor(qeye(3), qeye(N_cutoff), Hvib1) + Hint

# Initial state
ρ0 = tensor(basis(3,0), basis(N_cutoff, 0), basis(N_cutoff, 0)).unit()

# Integration time
t_max = 0.5
T = np.linspace(0, t_max, 500)

γ = 100
c_ops = [np.sqrt(γ) * tensor(qeye(3), a, qeye(N_cutoff)), np.sqrt(γ) * tensor(qeye(3), qeye(N_cutoff), a)]

# Master equation
result = mesolve(H, ρ0, T, c_ops, [tensor(proj_d, qeye(N_cutoff), qeye(N_cutoff))])
exp_d_master = np.real(result.expect[0])

# Plotting
plt.figure(figsize=(3, 3))
plt.subplot(1, 1, 1)
plt.ylim([0, 1])
plt.xlim([0, t_max])
plt.plot(T, exp_d_master)
plt.xlabel("Time")
plt.ylabel(r"$\langle P_D \rangle$")
plt.tight_layout()
plt.show()
@Krastanov
Copy link
Collaborator

Krastanov commented Dec 28, 2023

Hi there!

For debugging, it helps to make the example smaller and independent from plotting and other distractions. It also helps significantly if the example is shortened so that it can evaluate fast. Lastly, in julia it is valuable to use BenchmarkTools.@benchmark when evaluating the performance of a piece of code (and in ipython that would be that %timeit magic).

I changed to t_max=0.002 in order to evaluate your code more rapidly. Then we get these results:

julia> @benchmark timeevolution.master(T, ρ0, H, J)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  2.514 s …  2.516 s  ┊ GC (min … max): 4.99% … 4.99%
 Time  (median):     2.515 s             ┊ GC (median):    4.99%
 Time  (mean ± σ):   2.515 s ± 1.667 ms  ┊ GC (mean ± σ):  4.99% ± 0.00%

 Memory estimate: 3.20 GiB, allocs estimate: 105701576.

That immense amount of allocations is a very big red flag. The function seems to be spending more time reserving and unreserving memory than actually running computation.

In comparison, in python you get:

In [3]: %timeit mesolve(H, ρ0, T, c_ops, [tensor(proj_d, qeye(N_cutoff), qeye(N_cutoff))])
6.98 ms ± 6.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Way faster in python. But what caused all of this? Check the output of each of these functions. In julia you will notice that your ρt is an array of sparse matrices, a terrible choice for dynamics that do not have a reason to preserve sparsity (but maybe a good choice for iterative steady-state solvers).

Anyway, fixing this in julia by changing your input to a dense matrix:

julia> @benchmark timeevolution.master(T, dense(ρ0), H, J)
BenchmarkTools.Trial: 1095 samples with 1 evaluation.
 Range (min … max):  4.218 ms …   5.520 ms  ┊ GC (min … max): 0.00% … 16.30%
 Time  (median):     4.553 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.566 ms ± 206.563 μs  ┊ GC (mean ± σ):  0.98% ±  3.33%

 Memory estimate: 4.02 MiB, allocs estimate: 245.

A bit faster than python's qutip... Not too bad. And probably it can be made another order of magnitude faster if you play with the diffeq solver settings.

Thanks for bringing this up! We should probably print a warning or something in situations like this (beeing given a sparse matrix for the state when the evolutions should use a dense matrix).

Edit: the %timeit for qutip was incorrect in the original version of this post (I forgot to change the number of steps)

@Krastanov Krastanov changed the title Performance Issue with a Three-Level Atom Simulation in QuantumOptics.jl Sparse state matrices given to the various solvers lead to poor performance - we should either provide a warning or automatically call dense Dec 28, 2023
@NominHanggai
Copy link
Author

NominHanggai commented Dec 29, 2023

Hi Krastanov,

Thank you for your help in pinpointing the issue and directing me to the benchmarking tools! Should I come across any other problems, I'll make sure to provide minimal working examples for clarity. Thank you once again for your support!

@david-pl
Copy link
Member

david-pl commented Jan 3, 2024

Just one comment here: if we opt for the autoconversion to dense, there might be memory issues. A valid use-case for sparse states is when a user knows the evolution will not populate many entries in a density matrix and the system is very large. Granted, this is quite a special case, but we somehow have to offer the possibility of keeping things sparse, i.e. the current behaviour should still be available to users. Yet, I agree that a sensible default is to make everything dense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants