Thermodynamic Algorithms for Linear Algebra
Thermodynamic Algorithms for Linear Algebra
Linear algebraic primitives are at the core of many modern algorithms in engineering, science, and
machine learning. Hence, accelerating these primitives with novel computing hardware would have
tremendous economic impact. Quantum computing has been proposed for this purpose, although
the resource requirements are far beyond current technological capabilities, so this approach remains
arXiv:2308.05660v1 [[Link]-mech] 10 Aug 2023
I. Introduction
Basic linear algebra primitives such as solving a linear system of the form Ax = b and obtaining the
inverse of a matrix are present in many modern algorithms. Such primitives are relevant to a multitude
of applications, including for example optimal control of dynamic systems and resource allocation. They
are also a common subroutine of many artificial intelligence (AI) algorithms, and account for a substantial
portion of the time and energy costs in some cases.
The most common method to perform these primitives is LU decomposition, whose time-complexity
scales as O(d3 ). Many proposals have been made to accelerate such primitives, for example using iterative
methods such as the conjugate gradient method. In the last decade, these primitives have been accelerated
by hardware improvements, notably by their implementation on graphical processing units (GPUs), fueling
massive parallelization. However, the scaling of these methods is still a prohibitive factor, and obtaining
a good approximate solution to a dense matrix of more than a few tens of thousand dimensions remains
challenging.
Exploiting physics to solve mathematical problems is a deep idea, with much focus on solving optimization
problems [1–3]. In the context of linear algebra, much attention has been paid to quantum computers [4],
since the mathematics of discrete-variable quantum mechanics matches that of linear algebra. A quantum
algorithm [5] to solve linear systems has been proposed, which for sparse and well-conditioned matrices
scales as log d. However, the resource requirements [6] for this algorithm are far beyond current hardware
capabilities. More generally building large-scale quantum hardware has remained difficult [7], and variational
quantum algorithms for linear algebra [8–10] have battled with vanishing gradient issues [11–13].
Therefore, the search for alternative hardware proposals that can exploit physical dynamics to accelerate
linear algebra primitives has been ongoing. Notably, memristor crossbar arrays have been of interest for
accelerating matrix-vector multiplications [14, 15]. Solving linear systems has also been the subject of
analog computing approaches [16].
Recently, we defined a new class of hardware, built from stochastic, analog building blocks, which is
ultimately thermodynamic in nature [17]. (See also probabilistic-bit computers [18–20] and thermodynamic
neural networks [21–24] for alternative approaches to thermodynamic computing [25]). AI applications like
generative modeling are a natural fit for this thermodynamic hardware, where stochastic fluctuations are
exploited to generate novel samples.
In this work, we surprisingly show that the same thermodynamic hardware from Ref. [17] can also be used
to accelerate key primitives in linear algebra. Thermodynamics is not typically associated with linear algebra,
and connecting these two fields is therefore non-trivial. Here, we exploit the fact that the mathematics of
harmonic oscillator systems is inherently affine (i.e., linear), and hence we can map linear algebraic primitives
onto such systems. (See also Ref. [26] for a discussion of harmonic oscillators in the context of quantum
computing speedups.) We show that simply by sampling from the thermal equilibrium distribution of coupled
harmonic oscillators, one can solve a variety of linear algebra problems.
2
TABLE I. Comparison of asymptotic complexities of linear algebra algorithms. Here, d is the matrix
dimension, κ is the condition number, and ϵ is the error. For our thermodynamic algorithms, the complexity depends
on the dynamical regime, i.e., whether the dynamics are overdamped and underdamped. For the digital SOTA
case, the complexity of solving symmetric, positive definite linear systems, matrix inverse, Lyapunov equation, and
matrix determinant problems are respectively for algorithms based on: conjugate gradient method [28], fast matrix
multiplication/inverse [29], Bartels-Stewart algorithm [30], and Cholesky decomposition [31]. ω ≈ 2.3 denotes the
matrix multiplication constant.
Specifically we develop thermodynamic algorithms for the following linear algebraic primitives: (i) solving
a linear system Ax = b, (ii) estimating a matrix inverse A−1 , (iii) solving Lyapunov equations [27] of the
form AΣ + ΣA⊺ = 1 and (iv) estimating the determinant of a symmetric positive definite matrix A. We
show that if implemented on thermodynamic hardware, these methods scale favorably with problem size
compared to digital algorithms. Our numerical simulations corroborate our analytical scaling results and
also provide evidence of the fast convergence of these primitives with the wall-clock time, with the speedup
relative to digital methods getting more pronounced with increasing dimension and condition number.
II. Results
A. Algorithmic Scaling
In Table I, we summarize the asymptotic scaling results for our thermodynamic algorithms as compared
to the best state-of-the-art (SOTA) digital methods for dense symmetric positive-definite matrices. The
derivations of these results can be found in the Supplemental Information, and are based on bounds obtained
for physical thermodynamic quantities, including correlation times, equilibration times, and free energy
differences (see Methods section for elaboration on the relevant thermodynamic quantities, as well as the
assumptions employed in our scaling analysis).
As one can see from Table I, an asymptotic speedup is predicted for our thermodynamic algorithms relative
to the digital SOTA algorithms. Specifically, a speedup that is linear in d is expected for each of the linear
algebraic primitives (ignoring a possible dependence of κ on d). Assumptions made to obtain these bounds
are detailed in the Methods section.
In what follows, we systematically present our thermodynamic algorithms for various linear algebraic
primitives.
Ax = b, (1)
given some invertible matrix A ∈ Rd×d and nonzero b ∈ Rd . We may assume without loss of generality that
the matrix A in Eq. (1) is symmetric and positive definite (SPD); if A is not SPD, then we may consider
the system
A⊺ Ax = A⊺ b, (2)
3
FIG. 1. Diagram of our thermodynamic algorithm for solving linear systems and inverse estimation.
The system of linear equations, or the matrix A, is encoded into the thermodynamic hardware, the system is then
allowed to evolve until the stationary distribution has been reached, when the trajectory is then integrated to estimate
the sample mean or covariance. This gives estimates of the solution of the linear system or the inverse of A respectively.
whose solution x = A−1 b is also the solution of Ax = b. Note that this will affect the total runtime, but
still allows for asymptotic scaling improvements with respect to digital methods, in some cases1 . In what
follows, we will therefore assume that A is SPD.
Now let us connect this problem to thermodynamics. We consider a macroscopic device with d degrees of
freedom, described by classical physics. Suppose the device has potential energy function:
1 ⊺
V (x) = x Ax − b⊺ x, (3)
2
where A ∈ SPDd (R). Note that this is a quadratic potential that can be physically realized with a system
of harmonic oscillators, where the coupling between the oscillators is determined by the matrix A, and the
b vector describes a constant force on each individual oscillator. (We remark that while Figure 1 depicts
mechanical oscillators, from a practical perspective, one can build the device from electrical oscillators such
as RLC circuits.)
Suppose that we allow this device to come to thermal equilibrium with its environment, whose inverse
temperature is β = 1/kB T . At thermal equilibrium, the Boltzmann distribution describes the probability
for the oscillators to have a given spatial coordinate: f (x) ∝ exp(−βV (x)). Because V (x) is a quadratic
form, f (x) corresponds to a multivariate Gaussian distribution. Thus at thermal equilibrium, the spatial
coordinate x is a Gaussian random variable
The key observation is that the unique minimum of V (x) occurs where Ax−b = 0, which also corresponds to
the unique maximum of f (x). For a Gaussian distribution, the maximum of f (x) is also the first moment ⟨x⟩.
Thus, we have that, at thermal equilibrium, the first moment is the solution to the linear system of equations:
From this analysis, we can construct a simple thermodynamic protocol for solving linear systems, which is
depicted in Figure 1. Namely, the protocol involves realizing the potential in Eq. (3), waiting for the system
to come to equilibrium, and then sampling x to estimate the mean ⟨x⟩ of the distribution. This mean can
be approximated using a time-average, defined as
Z
1 t0 +τ ′ ′
⟨x⟩ ≈ x̄(τ ) = dt x(t ), (6)
τ t0
where t0 must be sufficiently large to allow for equilibration and τ must be sufficiently large for the average
to converge to a desired degree of precision. The eventual convergence of this time average to the mean
is the content of the ergodic hypothesis [32, 33], which is often assumed for quite generic thermodynamic
systems. It should be mentioned that the mean could also be approximated as the average of a sequence
1 Constructing an SPD system from a generic one in this way results in the squaring of the condition number, which influences
performance.
4
x2
Time
0 0 0.4
−1 −1
0.2
−2 −2
−3 −3 0.0
−2 0 2 4 6 −2 0 2 4 6
x1 x1
FIG. 2. Equilibration of the thermodynamic system. The process of equilibration is depicted on the single-
trajectory level (left) and on the distribution level (right). The trajectory dynamics are described by the overdamped
Langevin equation and the distributional dynamics by the Fokker-Planck equation [34] The system displays ergodicity,
as the time average of a single trajectory (blue curve, left) approaches the ensemble average (dots, right) in the long-
time limit. Time and the coordinate vector (x1 , x2 ) are in arbitrary units.
of samples; however the integration approach has the advantage that it can conveniently be implemented
in a completely analog way (for example, using an integrator electrical circuit), which obviates the need for
transferring data from the physical device until the end of the protocol.
Figure 2 shows the equilibration process for both a single trajectory (left panel) and the overall distribution
(right panel). One can see the ergodic principle illustrated in this figure, since the time dynamics of a single
trajectory at thermal equilibrium are representative of the overall ensemble.
The overall protocol can be summarized as follows.
t0 ⩾ b
t0 , (8)
where b
t0 is computed from the system’s physical properties or using heuristic methods based
on Eqs. (12), (14). Allow the system to evolve under its dynamics until t = t0 , which ensures
that ⟨x⟩ − A−1 b /∥A−1 b∥ ⩽ εµ0 and Σ − β −1 A−1 /∥β −1 A−1 ∥ ⩽ εΣ0 .
3. Choose error tolerance parameter εx and success probability Pε , and choose the integration
time
τ ⩾ τb, (9)
where τb is computed from the system’s physical properties, Eq. (12) or (14). Use an analog
integrator to measure the the time average
Z
1 t0 +τ
x̄ = dt x(t), (10)
τ t0
In order to implement the protocol above, the necessary values of bt0 and τb must be identified, which requires
a more quantitative description of equilibration and ergodicity. To obtain such a description, a model of the
system’s microscopic dynamics may be introduced. Given that the system under consideration is composed
of harmonic oscillators in contact with a heat bath, it is natural to allow for damping (i.e., energy loss to the
bath) and stochastic thermal noise, which always accompanies damping due to the fluctuation-dissipation
theorem [35, 36]. The Langevin equation accounts for these effects, and specifically we consider two common
formulations, the overdamped Langevin (ODL) equation and the underdamped Langevin (UDL) equations
(see the Methods section for additional details on ODL and UDL dynamics).
The ODL equation for this system is
1
dx = − (Ax − b)dt + N 0, 2γ −1 β −1 dt , (11)
γ
where γ > 0 is called the damping constant and β = 1/kB T is the inverse temperature of the environment.
The system has a physical timescale (which is clear from dimensional analysis) that we call the relaxation
time τr = γ/∥A∥. The condition number of A is κ = αmax /αmin , where α1 . . . αd are the eigenvalues of A.
With these definitions, we arrive (see Supplemental Information for derivation) at the following formulas for
tb0 and τb in the overdamped case, which can be used in the above protocol:
−1
1 −1
2κ2 d ∥A∥
b
t0 = max κτr ln κεµ0 , κτr ln 2κεΣ0 , τb = τr . (12)
2 β∥b∥2 ε2x (1 − Pε )
1 γ
dx = p dt, dp = −(Ax − b) dt − p dt + N [0, 2γβ −1 Idt]. (13)
M M
p
We define ξ = γ/2M , ωj = αj /M , and ζj = ξ/ωj . Moreover, a timescale τr(UD) can be identified for
the underdamped system which is analogous to the quantity τr associated with the overdamped system. In
particular, we define τr(UD) = ξ −1 . We introduce a dimensionless quantity χ as well, which is χ = (1 +
ξ/ωmin )1/2 (1 − ξ/ωmin )−1/2 . With these definitions, we arrive (see Supplemental Information for derivation)
at the following formulas for the timing parameters in the underdamped case:
1 √
1 2 κχd∥A∥
t0 = max τr(UD) ln κ1/2 χε−1
b µ0 , τr(UD) ln χ2 3/2 −1
κ ε Σ0 2
+ 1 , b
τ = τr(UD) .
2 4ζmax β∥b∥2 ε2x (1 − Pε )
(14)
An important distinction between the ODL and UDL regimes is that the random variable x undergoes
a Markov stochastic process in the ODL case, but is non-Markovian in the UDL case [37, 38]. The simple
interpretation of this non-Markovianity is that the underdamped system exhibits inertia, which is a form
of memory-dependence. This inertia has a non-trivial (and sometimes beneficial) effect on the algorithm’s
overall performance, which is apparent from the scaling results in Table I.
2. Energy-Time Tradeoff
Note the appearance of the ratio ∥A∥/∥b∥2 in the time required to solve a linear system given by Eqs.
(12) and (14). It is tempting to imagine that one might solve the system faster simply by multiplying b by
some constant c. Then, the time required to solve the system is apparently reduced by a factor of c2 , and
the solution to original problem is obtained (up to a factor of c). A similar approach would be to multiply A
by a small number; of course, in practice it is not possible to solve linear systems of equations in vanishingly
short periods of time, which is reflected in an energy-time tradeoff. As we explain in the Methods and
Supplemental Information, there is an energy cost associated with initializing the system proportional to
b⊺ A−1 b, and this results in a re-formulation of Eqs. (12) and (14) as lower bounds on the product of energy
and time. If E is the energy required to solve the system Ax = b, and τ is the necessary time, then we have,
in the overdamped case:
2κ2 d
Eτ ⩾ β −1 τr (15)
ε2x (1 − Pε )
6
10−1
kx − A−1 bk
200000
150000
tC (µs)
100000
50000
250 500 750 1000
d
10−2
102 103 104 105 106
Time (µs)
FIG. 3. Error of the linear system thermodynamic algorithm as a function of the analog integration
time for different dimensions. Matrices A are drawn from a Wishart distribution with 2d degrees of freedom.
Vertical dashed lines are the times tC at which error goes below a threshold (horizontal dashed line). Inset: Crossing
time tC as a function of dimension d, showing the linear relationship between dimension and the analog dynamics
runtime.
We now present several numerical experiments to corroborate our analytical results. Figure 3 displays the
convergence of the error as a function of time for our thermodynamic linear systems algorithm. This plot
shows that the expected convergence time to reach a given error is linearly proportional to the dimension of
the system, which is in agreement with the analytical bounds that we presented above.
Another question of key importance is how the thermodynamic algorithm is expected to perform in
practical scenarios, i.e., when being run on real thermodynamic hardware. Due to the hardware being
analog in nature, this involves additional digital-to-analog compilation steps. To investigate this question,
we consider a timing model for the thermodynamic algorithm, based on the hardware proposal described
Ref. [17] (See Appendix A 5 of the Supplemental Information for a brief summary of this hardware, whose
dynamics correspond to the overdamped regime as in Eq. (11)). This model includes all the digital, digital-
to-analog and analog operations needed to solve the problem, starting with a matrix A stored on a digital
device, and sending back the solution x from the thermodynamic system to the digital device. Note that
this includes a compilation step that scales as O(d2 ), which is absent for the digital methods2 . Assumptions
about this model are detailed in the Methods section. Note that analog imprecision is not taken into account
in these experiments, and is the subject of further investigations.
2 Cholesky and conjugate gradients are run on a digital computer, and the initial matrix is stored on that same computer,
hence there is no transfer cost, unlike for the thermodynamic algorithm.
7
FIG. 4. Comparison of the error εx of the thermodynamic algorithm (TA) to solve linear systems with
the conjugate gradient method and Cholesky decomposition as a function of total runtime. Panels a)-c):
the TA is shown for different values of kB T (units of 1/γ) for each dimension in {100, 1000, 5000}. Random matrices
are drawn from the Wishart distribution and then mixed with the identity such that their condition numbers are
respectively 120, 1189, 5995. Panels d)-f): same quantities with a fixed condition number κ, respectively 199, 1190,
and 7880 for fixed dimension d = 1000. Calculations were performed on an Nvidia Tesla A10 GPU.
where ||.||2 denotes the 2-norm, as a function of time for the thermodynamic algorithm (TA) as well as the
conjugate gradient (CG) method and the Cholesky decomposition, which is exact. In panels a) - c) we explore
how the methods converge with varying κ and d. While at low dimensions our method performs poorly with
respect to the Cholesky decomposition and only slightly better than CG, it becomes very competitive for
dimensions d = 1000 and d = 5000. Additionally, in panels d) - f), we show the error as a function of
time for different condition numbers, at fixed dimension. One can see that as κ grows (as conditioning
is worse) our method becomes more competitive with CG. This suggests that, even in practical scenarios
where we account for realistic computational overhead issues, our thermodynamic linear systems algorithm
can outperform SOTA digital methods, especially for large d and large κ.
Figure 4 also shows that the thermodynamic algorithm performs significantly better than the CG method
at early times, although the CG method ultimately achieves a higher quality result for later times. This
suggests that the thermodynamic algorithm is ideally suited to providing an approximate solution in a short
amount of time. Nevertheless, we note that the effective temperature of the thermodynamic hardware is
an important parameter, and one can lower this temperature to achieve higher precision solutions from the
thermodynamic hardware, as can be seen from the curves in Figure 4.
8
Overall, these numerical experiments highlight the potential utility of thermodynamic hardware by show-
ing the opportunity for speedup over SOTA digital methods, based on a simulated timing model of the
thermodynamic device.
The results of the previous section rely on estimating the mean of x, but make no use of the fluctuations
in x at equilibrium. By using the second moments of the equilibrium distribution, we can go beyond solving
linear systems. For example it is possible to find the inverse of a symmetric positive definite matrix A. As
mentioned, the stationary distribution of x is N [A−1 b, β −1 A−1 ], meaning the inverse of A can be obtained
by evaluating the covariance matrix of x. This can be accomplished in an entirely analog way, using a
combination of analog multipliers and integrators. By setting b = 0 for this protocol, we ensure that ⟨x⟩ = 0,
so the stationary covariance matrix is, by definition
Σs = lim ⟨x(t)x⊺ (t)⟩ . (18)
t→∞
In order to estimate this, we again perform time averages after allowing the system to come to equilibrium
Z
1 t0 +τ
Σs ≈ xx⊺ = dt x(t)x⊺ (t). (19)
τ t0
It is therefore necessary to have an analog component which evaluates the product xi (t)xj (t) for each pair
(i, j), resulting in d2 analog multiplier components. Each of these products is then fed into an analog
integrator component, which computes one element of the time-averaged covariance matrix
Z
1 t0 +τ
Σs,ij ≈ dt xi (t)xj (t). (20)
τ t0
While the equilibration time is the same as for the linear system protocol, the integration time is different,
because in general the covariance matrix is slower to converge than the mean. We now give a detailed descrip-
tion of the inverse estimation protocol, assuming ODL dynamics (the corresponding results for underdamped
dynamics can be found in Appendix B).
10
d = 64
d = 128
2
105 d d = 256
d = 512
tC
d = 1024
6 103
d
102 103
εÃ−1
d
4
2
103 104
Time (µs)
FIG. 5. Error of the inverse estimation thermodynamic algorithm as a function of the analog integration
time for different dimensions. Matrices A are drawn from a Wishart distribution with 2d degrees of freedom.
Vertical dashed lines are the times tC at which error goes below a threshold (horizontal dashed line). Inset: Crossing
time tC as a function of dimension d.
The timing parameters for the inverse estimation protocol (as derived in the Supplemental Information)
are, for the overdamped case,
1 4κd(d + 1)
t0 = κτr ln 2κε−1
b Σ0 , τb = τr , (25)
2 (1 − Pε )ε2Σ
Similar to our analysis for the linear systems protocol, let us now examine the performance of the inverse
estimation protocol in practical settings. To do so, we consider the error on the inverse, defined as
∥Ã − A∥F
εà = , (27)
∥A∥F
d = 5000
d = 1000
d = 100
100
Ã−1
10−1
10−3 10−2
Time (s)
FIG. 6. Comparison of the error εÃ−1 of the thermodynamic algorithm (TA) to solve linear systems
with the Cholesky decomposition as a function of total runtime. Dimensions are d = 100, 1000, 5000,
respectively in light green, light blue, and purple are shown, as well as the corresponding Cholesky decomposition
times as dashed lines. Here the condition numbers are respectively {120, 1189, 5995}. Calculations were performed
on an Nvidia Tesla A10 GPU.
In this section we assume we have access to a device with a controllable noise source such that the
covariance matrix of the noise term may be chosen to be an arbitrary symmetric positive definite matrix.
We do not include the linear b⊺ x term in the potential, and therefore obtain the following overdamped
Langevin equation
1 2
dx = − Axdt + N 0, R dt , (28)
γ γβ
where R is symmetric and positive definite. In this case, the stationary distribution has mean zero and
covariance matrix Σs , which is a solution to the Lyapunov equation
AΣs + Σs A⊺ = 2β −1 R. (29)
We propose the following protocol for solving the Lyapunov equation.
1. Given two symmetric positive definite matrices A and R, set the potential of the device to
1 ⊺
V (x) = x Ax, (30)
2
and the noise term in the overdamped Langevin equation to N 0, 2γ −1 β −1 Rdt at time t = 0.
That is, the system evolves under the dynamics of Eq. (28).
2. Choose equilibration tolerance parameter εΣ0 ∈ R+ , and choose the equilibration time
t0 ⩾ b
t0 , (31)
where bt0 is computed from the system’s physical properties, Eq. (25) or (26). Allow the system
to evolve under its dynamics until t = t0 , which ensures that Σ − β −1 A−1 b /∥β −1 A−1 ∥ ⩽ εΣ .
3. Choose error tolerance parameter δΣ and success probability Pδ , and choose the integration
time
τ ⩾ τb, (32)
11
where τb is computed from the system’s physical properties, Eq. (25) or (26). Use analog
multipliers and integrators to measure the the time averages
Z τ
1
xi xj = 2 dt xi (t)xj (t), (33)
τ t0
The timing parameters for the Lyapunov equation protocol (as derived in the Supplemental Information)
are, for the overdamped case,
b 1 ∥Σ0 − Σs ∥ + κβ −1 ∥A∥−1 4κd(d + 1)
t0 = κτr ln , τb = τr , (34)
2 δΣ0 (1 − Pε )ε2Σ
For the underdamped case, tb0 would be somewhat different, but τb would be the same, because the behavior of
the equilibrium correlation function does not depend on the noise (see Methods), so the same result derived
for the matrix inverse protocol is applicable. Note that Eq. (34) only vaguely determines the equilibration
time, as the target covariance matrix Σs is not known beforehand. The corresponding equilibration time tb0
for an underdamped system could also be evaluated in principle; however, this would only result in a similarly
vague expression, which is anyway not necessary to determine the asymptotic time-complexity scaling of the
algorithm, so it is not pursued here. Moreover, the relative error cannot be bounded as straightforwardly
as was done for the other protocols given that there is no explicit formula for the target covariance matrix.
For this reason, we have used absolute error as the error tolerance in the above protocol.
The determinant of the covariance matrix appears in the normalization factor of a multivariate normal
distribution, whose density function is
−1/2 1
fµ;Σ (x) = (2π)−d/2 |Σ| exp − x⊺ Σ−1 x , (35)
2
and it is therefore natural to wonder whether hardware which is capable of preparing a Gaussian distri-
bution may be used to somehow estimate the determinant of a matrix. This can in fact be done, as the
problem is equivalent to the estimation of free energy differences, an important application of stochastic
thermodynamics. Recall that the difference in free energy between equilibrium states of potentials V1 and
V2 is [39]
R !
−βV2 (x)
dx e
∆F = F2 − F1 = −β −1 ln R . (36)
dx e−βV1 (x)
Suppose the potentials are quadratic, with V1 (x) = x⊺ A1 x and V2 (x) = x⊺ A2 x. Then each integral simplifies
to the inverse of a Gaussian normalization factor,
Z q
dx e−βVj (x) = (2π)d/2 β −1 A−1 j , (37)
so
s ! s !
−1 A−1
2 −1 |A1 |
∆F = −β ln = −β ln . (38)
A−1
1 |A2 |
This suggests that the determinant of a matrix A1 can found by comparing the free energies of the equilibrium
states with potentials V1 and V2 (where A2 has known determinant), and then computing
Fortunately, the free energy difference ∆F can be found, assuming we have the ability to measure the work
which is done on the system as the potential V (x) is changed from V1 to V2 . According to the Jarzynski
equality [40], the free energy difference between the (equilibrium) states in the initial and final potential is
where ⟨·⟩ denotes an average over all possible trajectories of the system between time t = 0 and time t = τ ,
weighed by their respective probabilities. This may be approximated by an average over N repeated trials,
N
1 X −βWj
e−β∆F ≈ e−βW ≡ e . (41)
N j=1
However, while Jarzynski’s relation may be applied directly to estimate the free energy difference, this
estimator has large bias and is slow to converge. Far more well-behaved estimators have been found based
on work measurements. For simplicity, we here provide the expression based on Jarzynski’s estimator, while
in the Methods section and the Supplemental Information we refer to more suitable estimators. In summary,
the determinant of A1 is approximated by
2
|A1 | ≈ e−βW |A2 | . (42)
In practice we will generally be interested in the log determinant to avoid computational overflow. This is
ln (|A1 |) ≈ 2 ln e−βW + ln (|A2 |) . (43)
It is shown in the Supplemental Information that to estimate the log determinant to within (absolute) error
δLD with probability at least Pδ , the total amount of time required is roughly
d ln(κ)2 1
τ≈ 2 ln χ2 κ3/2 ε−1
Σ0 2
+ 1 τr(UD) = O(d ln(κ)3 ). (44)
δLD (1 − Pδ ) 4ζmax
We also present numerical simulations of a protocol for determinant estimation that does not include directly
measuring the work in Appendix C 1 of the Supplemental Information.
III. Discussion
Various types of physics-based computers have been devised, which are supposed to expedite calculations
by using physical processes to evaluate expensive functions [4, 7, 41–43]. These devices (which include
quantum computers and a number of distinct analog architectures) have been shown to offer theoretical
advantages for solving certain problems, including linear systems of equations, but they have not found
common use commercially. A key obstacle to harnessing the power of physical computing is that fluctuations
in the system’s state tend to cause errors that compound over time, and which cannot be corrected in a
straightforward way [44] (as can be done for digital computers).
For this reason, we have considered thermodynamic algorithms, which treat the naturally-present fluctu-
ations as a resource, or at the very least are indifferent to them. In fact, we have introduced three distinct
classes of thermodynamic algorithms: first-moment based, second-moment based, and all-moment based
algorithms. Other thermodynamic algorithms will likely be discovered making use of third and higher mo-
ments, implying that such methods form a hierarchy. In some sense, using higher moments allows us to
solve “harder” problems, for example inverting a matrix (which uses the second moments) is harder than
solving a linear system of equations (which uses the first moment). Whether a precise relationship can be
found between computational hardness and the hierarchy of thermodynamic algorithms is currently an open
question.
Another open question concerns the optimality of these new thermodynamic algorithms. Our analysis
implies that, while the time and energy costs of linear-algebraic primitives are negotiable, the product of time
and energy necessary for a computation is fundamentally constrained. It is therefore of interest to search for
thermodynamic algorithms which achieve lower values of the energy-time product for these computations, and
also to see whether such constraints may apply to other problems as well. We anticipate that non-equilibrium
thermodynamics will be a crucial tool in exploring such resource tradeoffs for computation. For example,
we have used the fact that a thermodynamic distance may be defined between equilibrium configurations
13
of a system, and this distance determines the minimal amount of dissipated energy necessary to transition
from one configuration to another in a finite time [45–49]; the shorter the time of transition, the more
dissipation must occur. Perhaps, then, the search for algorithms which have minimal energy-time product
may be framed as a variational problem of minimizing length on the thermodynamic manifold. Although
proofs of optimal algorithmic performance are notoriously hard to find in digital computing paradigms [50],
unavoidable resource tradeoffs are relatively mundane in thermodynamic analyses [51, 52], suggesting that
computational cost may be fruitfully studied within the thermodynamic computing paradigm.
Aside from the theoretical questions mentioned, clearly the task of actually demonstrating the methods
described here remains an important one. An obvious priority is to run our linear systems algorithm on a
hardware prototype to verify that the time necessary scales linearly with dimension, thereby demonstrating
an advantage over the best known classical algorithms. In order of increasing difficulty, similar tests can
then be conducted for the matrix inversion, Lyapunov equation, and determinant algorithms, although the
latter will likely require very specialized hardware. Once this has been accomplished, we anticipate that the
methods will be appealing alternatives to digital algorithms, particularly in settings where it is desirable to
trade some accuracy for better time and energy scaling.
In addition to our work’s direct impact, the broader impact is laying the theoretical, mathematical founda-
tions for the emerging paradigm of thermodynamic computing [25]. Our work provides the first mathematical
analysis, as well as the first numerical benchmarks, of potential speedups for thermodynamic hardware. Thus
we have taken the somewhat vague notion of thermodynamic computing and made it concrete and precise,
with a clear set of applications. Moving forward, we expect new applications to be discovered, beyond linear
algebra, since one can simply modify the potential energy function V (x) to solve, e.g., non-linear algebraic
problems. There is also the exciting prospect of running multiple applications, such as the linear algebra ones
here and the probabilistic AI ones discussed in Ref. [17], on the same thermodynamic hardware, providing
the user with a flexible programming experience. One can envision that much of the amazing technolog-
ical developments (compilers, simulators, programming languages, etc.) that have happened in quantum
computing will likely happen for thermodynamic computing in the near future.
IV. Methods
A. Thermodynamic Framework
The algorithms detailed above were developed using simple thermodynamic arguments, and the analysis
of their performance makes use of ideas from the field of stochastic thermodynamics. We provide a brief
summary of the relevant concepts here.
1. Equilibrium
Suppose that a system’s state can be completely described (on a microscopic level) by a vector of gener-
alized coordinates x = (x1 . . . xd )⊺ and canonically conjugate momenta p = (p1 . . . pd )⊺ , and that the energy
of the system is given by a Hamiltonian function H(x, p)
A statistical ensemble is an imaginary collection of copies of this system, each of which has its own coordinates
(x, p) and energy E. The relative density of copies in different parts of the coordinate
R space is described
by a non-negative density function f (x, p), satisfying a normalization constraint drf (x, p) = 1. Ensembles
describe macroscopic states (macrostates) of complex systems, where the microscopic state (microstate) is
not precisely known. An observable quantity which depends on the microstate O(r) has an ensemble average
⟨O⟩, defined as
Z
⟨O⟩ = dx dp f (x, p)O(x, p). (46)
One such ensemble, called the canonical ensemble, has the density function [53]
1
fβ (r) = exp (−βH(x, p)) , (47)
Z
14
The canonical ensemble is the macrostate of a system in thermal equilibrium with an environment at tem-
perature T , which is large and does not change in response to changes in the system, a heat reservoir.
In nature, we do not encounter anywhere the infinite collection of copies implied by the statistical ensemble;
instead we have access to one particular instance of a system at different moments in time. The usage of the
ensemble concept is justified by the property of ergodicity observed empirically in many systems. A system
is ergodic insofar as its observable quantities may be averaged over time to yield the appropriate ensemble
averages, that is [54]
Z
1 τ
O= dt O(t) = ⟨O⟩ (49)
τ t0
for any t0 and sufficiently long τ . This characterization of ergodicity should not be taken as a definition, but
more of a template; in practice, we will only claim (and prove) that some particular observable O is ergodic
when averaged over a sufficiently long length of time, which depends on the observable.
Of interest in this work are Hamiltonians of the form
1 ⊺
H(x, p) = V (x) + p p, (50)
2M
where M ∈ R+ and the potential is of the form
1 ⊺
V (x) = x Ax − b⊺ x, (51)
2
for a symmetric, positive-definite A ∈ Rd×d and b ∈ Rd . The above can be written
1 1 1 ⊺
H(x, p) = (x − A−1 b)⊺ A(x − A−1 b) − b⊺ A−1 b + p p. (52)
2 2 2M
As a rule, the Hamiltonian may be increased or decreased by a constant without any observable effect, and
equivalently any term appearing in the Hamiltonian that doesn’t depend on x or p may be absorbed into
the partition function. In this case, we may take
1 1 ⊺
H(x, p) = (x − A−1 b)⊺ A(x − A−1 b) + p p, (53)
2 2M
so the density function of the canonical ensemble is
1 β −1 ⊺ −1 1 ⊺
f (x, p) = exp − (x − A b) A(x − A b) + p p , (54)
Z 2 2M
which we often write as
x ∼ N [A−1 b, β −1 A−1 ], p ∼ N 0, β −1 M I . (55)
2. Stochasticity
The property of ergodicity outlined previously is very strict, and is generally not valid on small timescales,
where the system may display rich behavior which is not captured by the long time averages. This behavior
is the province of stochastic thermodynamics. Due to the largeness of the heat reservoir, its interactions with
the system result in effectively random displacements of the coordinate vector, which we model by Brownian
motion. In the absence of any other dynamics, (anisotropic) Brownian motion is given by the equation [55]
dx = N [0, Bdt], (56)
where B ∈ Rd×d is symmetric and positive definite, and we abuse notation by writing N [µ, Σ] to denote a
vector drawn from this distribution. Equation (56) is self-consistent in that subdividing the time step dt
into two equal parts gives
x(t + dt) = x(t) + N [0, B dt/2] + N [0, B dt/2], (57)
15
and because adding two random normal vectors can be accomplished by summing their means and covariance
matrices, we recover Eq. (56). If x(0) is a random normal vector with mean µ0 and covariance matrix Σ0
then, under Eq. (56), the distribution at a later time t is
Note that this implies that the limit limt→∞ Σ(t) does not exist, meaning there is no stationary distribution
for the pure Brownian process. The overdamped Langevin (ODL) equation is a useful generalization of
Brownian motion which includes a drift term determined by the potential [56, 57]:
1 1
dx = − ∇V (x)dt + N [0, Bdt], (59)
γ γ
where γ ∈ R+ is called the damping constant. When V is given by Eq. (51), the ODL equation can be put
in a particularly simple form by making a change of variables y = x − A−1 b, in terms of which the potential
is V (y) = 12 y ⊺ Ay. We also define A = γ −1 A, as well as B = γ −2 B, and now Eq. (59) reads
which is the defining equation of the Ornstein-Uhlenbeck (OU) process (with the requirement that A is
positive definite). The OU process is a Gaussian process [58], meaning that if y is initially a Gaussian
random variable then it will be Gaussian at all times. Suppose this is the case, and x0 ∼ N [µ0 , Σ0 ]. The
mean µ(t) = ⟨y(t)⟩ is given by
The second moments of y(t) can also be found explicitly. Because the process is indexed both by the
components of y and by the time, the covariances are collected in a matrix-valued function of two time
parameters,
The OU process has a unique stationary distribution, which is a Gaussian N [0, Σs ], where Σs is the unique
solution to the equation
AΣs + Σs A⊺ = B. (65)
Notice that if B = γ −2 B = 2γ −1 β −1 I, then Eq. (65) implies that the stationary solution to the ODL
equation is
agreeing with Eq. (55). When the system is in its stationary distribution, the correlation function can be
simplified
when s ⩾ t, and Gs (t, s) = Gs (s, t)⊺ . These results are quite useful due to the wide variety of cases where the
OU process is applicable. For example, Eq. (59) may be modified to account for inertial effects, resulting in
the underdamped Langevin (UDL) equations, which are
1
dx = p dt, (68)
M
16
γ
dp = −∇V dt − p dt + N [0, Bdt], (69)
M
where M, γ ∈ R+ . Taking the potential in (51), Eq. (69) reads
γ
dp = −(Ax − b) dt − p dt + N [0, Bdt]. (70)
M
In order to put the UDL equations into the form of an OU process, we first define dimensionless coordinates
r r
γ2β β
ỹ = y, p̃ = p, (71)
M M
and define the vector r as the concatenation of the dimensionless position and momentum vectors
⊺
r = (ỹ1 , . . . ỹd , . . . p̃1 , . . . p̃d ) . (72)
Then we have
dr = −Ar dt + N [0, Bdt], (73)
where the matrices A and B may be written as block matrices, each with four d × d blocks
! !
0 −γM −1 I 0 0
A= , B= . (74)
γ −1 A γM −1 I 0 βM −1 B
3. Thermodynamics
Thermodynamics is concerned with the relationships between macroscopically observable quantities which
are well-characterized by their equilibrium ensemble averages. For example, the average energy is defined as
Z
⟨E⟩ = dxH(x, p)fβ (x, p). (76)
Xk Xk
∂H dλi
dW = dt = − Xi λ̇i , (79)
i=1
∂λi dt i=1
17
F = −β −1 ln(Z), (83)
and can be interpreted as the amount of energy in the system that can be converted to work at constant
temperature β −1 . For the Hamiltonian given in Eq. (52), we get
1 1 d β
F = − b⊺ A−1 b + ln (|A|) − ln √ , (84)
2 2β β 2π M
and so two states with the same temperature have free energy difference
1 ⊺ −1 1 |A2 |
∆F = F2 − F1 = (b A b1 − b⊺2 A−1
2 b2 ) + ln . (85)
2 1 1 2β |A1 |
The second law reads
⟨W ⟩ ⩾ ∆F. (86)
Whereas Eq. (86) is an inequality, Jarzynski identifies the following equality [40]
and it follows from Jensen’s inequality [60] that the latter implies Eq. (86). In the limit of an infinitely slow
(quasistatic) process we would have ⟨W ⟩ = W = ∆F , but in general there will be some excess (dissipated)
work,
Intuitively, there is more dissipation when the system is driven between very different states in a short period
of time, as friction-like effects are more significant when the system changes quickly. This notion is made
rigorous by introducing the thermodynamic metric tensor g [45, 61]
where the average is over the canonical ensemble with inverse temperature β and control parameters λ. The
thermodynamic length L of a trajectory in the control parameter space λ(t) is then given by
Z τ q
L= dt λ̇⊺ g λ̇. (90)
0
Suppose that a system is driven from one state to another, and then back again by the reverse process, and
that it is allowed to come to equilibrium N times during both the forward and reverse process. It has been
shown [45] that the excess work during the combined forward and reverse process (called the hysteresis) is
lower bounded as
L2
Wex ⩾ . (91)
N
18
Interestingly, the thermodynamic length is also equivalent to the Fisher-Rao distance, which is [62]
s
Z τ Z
f˙(x)2
L=β dt dx . (92)
0 f (x)
The above relations imply that the more distinguishable the initial and final distributions, and the less time
allowed to evolve between them, the more work will be dissipated.
Equation (87) suggests that the free energy difference can be estimated by repeatedly transforming one
potential V1 into another V2 and measuring the work done in the process (nJ times, say), then taking an
average. The estimate of the free energy obtained in this way is called the Jarzynski estimator ∆F̂J
nJ
1 X
∆F ≈ ∆F̂J = e−βWj . (93)
nJ j=1
The Jarzynski estimator is theoretically sound, but in practice it can be very slow to converge as it is
dominated by terms with large negative work, and there are very few such terms in the average because the
occurrence of large negative work is unlikely [63]. This problem is mitigated somewhat by using an estimator
which makes use of both the forward and reverse process. The Bennett Acceptance Ratio (BAR) estimator
∆F̂BAR is defined implicitly as the solution to [64, 65]
N
X N
X
1 1
= , (94)
βW(f)i −β∆F̂BAR −βW(r)i +β∆F̂BAR
i=1 1+e i=1 1+e
where it is assumed that there are N realizations of both the forward and reverse process, resulting in work
measurements of W(f)i and W(r)i . This estimator can be generalized to the Multistate Bennett Acceptance
Ratio [66] (MBAR), where intermediate equilibrium states are used between the initial and final states [66].
Interestingly, the variance of any unbiased estimator of the free energy difference is lower bounded in terms
of the thermodynamic length between the terminal states and the total number of observations n, which
corresponds to the number of times that equilibrium must be reached during the protocol [61]
β −2 L2
Var(∆F̂ ) ⩾ . (95)
n
B. Assumptions
A number of assumptions are made in the analytical derivations of the findings presented in the Results
section. Certain aspects of the problem have been idealized in order to reveal the fundamental performance
characteristics of the thermodynamic algorithms. The main assumptions are the following
• The dynamics of the system may be described by the ODL equation or the UDL equations.
• The potential function V (x), and in particular the matrix A and vector b, can be switched instanta-
neously between different values.
• The potential energy function V (x) can be implemented to arbitrary accuracy.
• Before a protocol begins, the system may be taken to be in an equilibrium distribution N [0, β −1 ∥A∥−1 I].
C. Numerical simulations
We outline the methods used for our numerical simulations of the thermodynamic algorithms. In general,
we simulated the overdamped system dynamics because the performance is similar to the underdamped case,
and the overdamped system is more numerically stable. The ODL Langevin equation (Eq. (60)) is often
written using an Itô integral [67]:
Z t
′
x(t) = e−At x0 + dt′ e−A(t−t ) LdWt , (96)
0
19
where LL⊺ = B, and in this form it is apparent that the deterministic and stochastic parts of the evolution
(the first and second terms above) can be evaluated separately. As this is a Gaussian process, the corre-
sponding Fokker-Planck dynamics are fully captured by the behavior of the first and second moments, which
can be evaluated directly using Eqs. (61) and (63). The JAX [68] library was used to simulate the system
at high dimensions, leveraging efficient implementations of matrix exponentials, diagonalization, and convo-
lution to evaluate the various terms in Eqs. (61), (63), and (96). The goal of these numerical simulations
is to examine the performance of the thermodynamic solver in practical settings. To do so, we examine
the convergence of the solutions with respect to the number of samples, as well as timing estimates for an
implementation using electrical circuits.
1. Timing model
To obtain the comparisons to other digital methods, we considered the following procedure to run the TA
on electrical hardware. For more details on our model for the hardware implementation we refer the reader to
Appendix A 5. We take the RC = 1/γ = 1µs, which sets the characteristic timescale of our thermodynamic
device. The determinant estimation procedure is excluded here for clarity, as it involves directly measuring
work, which may involve a more complicated hardware proposal.
1. Compute the values of the resistors {Rij , Ri′ } entering the matrix J that encodes the A matrix.
2. Digital-to-analog (DAC) conversion of the J matrix and the b vector with a given bit-precision.
3. Let the dynamics run for t0 (the equilibration time). Note that for simulations this time was chosen
heuristically by exploring convergence in the solutions of the problem of interest.
4. Switch on the integrators (and multipliers for the inverse estimation) and let the system evolve for
time τ .
5. Analog-to-digital (ADC) conversion of the solution outputted from the integrators sent back to the
digital device.
For step 1), we measured the time for the digital operation to be performed, and for the other steps we
estimated the time, based on the following assumptions:
• 16 bit-precision
• 5000 ADC/DAC channels with a sampling rate: 250 Msamples/s.
• R = 103 Ω, C = 1 nF, which means RC = 1µs is the characteristic timescale of the system.
Finally, note that in all cases that were investigated, the dominant contribution to the total runtime was
the digital compilation step that includes O(d2 ) operations. Namely the conversion from the matrix A to
J , which is detailed in Appendix A 5. Hence some assumptions about the DAC/ADC may be relaxed and
the total thermodynamic run time would be similar. The RC time constant may also be reduced to make
the algorithm faster.
[1] Sri Krishna Vadlamani, Tianyao Patrick Xiao, and Eli Yablonovitch, “Physics successfully implements lagrange
multiplier optimization,” Proc. Natl. Acad. U.S.A. 117, 26639–26650 (2020).
[2] Naeimeh Mohseni, Peter L. McMahon, and Tim Byrnes, “Ising machines as hardware solvers of combinatorial
optimization problems,” Nat. Rev. Phys. 4, 363–379 (2022).
[3] Takahiro Inagaki, Yoshitaka Haribara, Koji Igarashi, Tomohiro Sonobe, Shuhei Tamate, Toshimori Honjo, Alireza
Marandi, Peter L. McMahon, Takeshi Umeki, Koji Enbutsu, Osamu Tadanaga, Hirokazu Takenouchi, Kazuyuki
Aihara, Ken-ichi Kawarabayashi, Kyo Inoue, Shoko Utsunomiya, and Hiroki Takesue, “A coherent ising machine
for 2000-node optimization problems,” Science 354, 603–606 (2016).
[4] Richard P. Feynman, “Simulating physics with computers,” Int. J. Theor. Phys. 21, 467–488 (1982).
[5] Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd, “Quantum algorithm for linear systems of equations,”
Phys. Rev. Lett. 103, 150502 (2009).
[6] Artur Scherer, Benoît Valiron, Siun-Chuon Mau, Scott Alexander, Eric Van den Berg, and Thomas E. Chapu-
ran, “Concrete resource analysis of the quantum linear-system algorithm used to compute the electromagnetic
scattering cross section of a 2D target,” Quantum Inf. Process. 16, 1–65 (2017).
20
[7] John Preskill, “Quantum computing in the nisq era and beyond,” Quantum 2, 79 (2018).
[8] Carlos Bravo-Prieto, Ryan LaRose, M. Cerezo, Yigit Subasi, Lukasz Cincio, and Patrick Coles, “Variational
quantum linear solver,” arXiv:1909.05820 (2019).
[9] Xiaosi Xu, Jinzhao Sun, Suguru Endo, Ying Li, Simon C. Benjamin, and Xiao Yuan, “Variational algorithms
for linear algebra,” Sci. Bull. 66, 2181–2188 (2021).
[10] M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R.
McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and Patrick J. Coles, “Variational quantum algorithms,”
Nat. Rev. Phys. 3, 625–644 (2021).
[11] Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Ryan Babbush, and Hartmut Neven, “Barren plateaus
in quantum neural network training landscapes,” Nat. Commun. 9, 1–6 (2018).
[12] M. Cerezo, Akira Sone, Tyler Volkoff, Lukasz Cincio, and Patrick J. Coles, “Cost function dependent barren
plateaus in shallow parametrized quantum circuits,” Nat. Commun. 12, 1–12 (2021).
[13] Samson Wang, Enrico Fontana, M. Cerezo, Kunal Sharma, Akira Sone, Lukasz Cincio, and Patrick J. Coles,
“Noise-induced barren plateaus in variational quantum algorithms,” Nat. Commun. 12, 1–11 (2021).
[14] Can Li, Miao Hu, Yunning Li, Hao Jiang, Ning Ge, Eric Montgomery, Jiaming Zhang, Wenhao Song, Noraica
Dávila, Catherine E. Graves, Zhiyong Li, John Paul Strachan, Peng Lin, Zhongrui Wang, Mark Barnell, Qing
Wu, R. Stanley Williams, and J. Joshua Yang, “Analogue signal and image processing with large memristor
crossbars,” Nat. Electron. 1, 52–59 (2018).
[15] Su-in Yi, Jack D. Kendall, R. Stanley Williams, and Suhas Kumar, “Activity-difference training of deep neural
networks using memristor crossbars,” Nat. Electron. 6, 45–51 (2023).
[16] Yipeng Huang, Ning Guo, Mingoo Seok, Yannis Tsividis, and Simha Sethumadhavan, “Evaluation of an analog
accelerator for linear algebra,” Comput. Archit. News 44, 570–582 (2016).
[17] Patrick J. Coles, Collin Szczepanski, Denis Melanson, Kaelan Donatella, Antonio J. Martinez, and Faris Sbahi,
“Thermodynamic AI and the fluctuation frontier,” (2023), arXiv:2302.06584 [[Link]].
[18] Navid Anjum Aadit, Andrea Grimaldi, Mario Carpentieri, Luke Theogarajan, John M. Martinis, Giovanni
Finocchio, and Kerem Y. Camsari, “Massively parallel probabilistic computing with sparse Ising machines,”
Nat. Electron. 5, 460–468 (2022).
[19] Kerem Y. Camsari, Brian M. Sutton, and Supriyo Datta, “p-bits for probabilistic spin logic,” Appl. Phys. Rev.
6, 011305 (2019).
[20] J. Kaiser, S. Datta, and B. Behin-Aein, “Life is probabilistic—why should all our computers be deterministic?
computing with p-bits: Ising solvers and beyond,” in 2022 International Electron Devices Meeting (IEDM) (IEEE,
2022) pp. 21–4.
[21] Todd Hylton, “Thermodynamic neural network,” Entropy 22, 256 (2020).
[22] Todd Hylton, “Thermodynamic state machine network,” Entropy 24, 744 (2022).
[23] Natesh Ganesh, “Rebooting neuromorphic design-a complexity engineering approach,” in 2020 International
Conference on Rebooting Computing (ICRC) (IEEE, 2020) pp. 80–89.
[24] Natesh Ganesh, “A thermodynamic treatment of intelligent systems,” in 2017 IEEE International Conference
on Rebooting Computing (ICRC) (2017) pp. 1–4.
[25] Tom Conte, Erik DeBenedictis, Natesh Ganesh, Todd Hylton, John Paul Strachan, R Stanley Williams, Alexan-
der Alemi, Lee Altenberg, Gavin E. Crooks, James Crutchfield, et al., “Thermodynamic computing,” arXiv
preprint arXiv:1911.01968 (2019).
[26] Ryan Babbush, Dominic W Berry, Robin Kothari, Rolando D Somma, and Nathan Wiebe, “Exponential quan-
tum speedup in simulating coupled classical oscillators,” arXiv preprint arXiv:2303.13012 (2023).
[27] Patrick C Parks, “Am lyapunov’s stability theory—100 years on,” IMA journal of Mathematical Control and
Information 9, 275–303 (1992).
[28] Jonathan Richard Shewchuk et al., “An introduction to the conjugate gradient method without the agonizing
pain,” (1994).
[29] Sara Robinson, “Toward an optimal algorithm for matrix multiplication,” SIAM news 38, 1–3 (2005).
[30] Richard H. Bartels and George W Stewart, “Solution of the matrix equation ax+ xb= c [f4],” Communications
of the ACM 15, 820–826 (1972).
[31] Dariusz Dereniowski and Marek Kubale, “Cholesky factorization of matrices in parallel and ranking of graphs,” in
Parallel Processing and Applied Mathematics: 5th International Conference, PPAM 2003, Czestochowa, Poland,
September 7-10, 2003. Revised Papers 5 (Springer, 2004) pp. 985–992.
[32] Giovanni Gallavotti, “Ergodicity, ensembles, irreversibility in boltzmann and beyond,” Journal of Statistical
Physics 78, 1571–1589 (1995).
[33] Yakov Grigor’evich Sinai, “On the foundations of the ergodic hypothesis for a dynamical system of statistical
mechanics,” in Doklady Akademii Nauk, Vol. 153 (Russian Academy of Sciences, 1963) pp. 1261–1264.
[34] Adriaan Daniël Fokker, “Die mittlere energie rotierender elektrischer dipole im strahlungsfeld,” Annalen der
Physik 348, 810–820 (1914).
[35] Rep Kubo, “The fluctuation-dissipation theorem,” Reports on progress in physics 29, 255 (1966).
[36] J Weber, “Fluctuation dissipation theorem,” Physical Review 101, 1620 (1956).
[37] Timo J Doerries, Sarah AM Loos, and Sabine HL Klapp, “Correlation functions of non-markovian systems
out of equilibrium: Analytical expressions beyond single-exponential memory,” Journal of Statistical Mechanics:
Theory and Experiment 2021, 033202 (2021).
21
[38] Lutz H’walisz, Peter Jung, Peter Hänggi, Peter Talkner, and Lutz Schimansky-Geier, “Colored noise driven
systems with inertia,” Zeitschrift für Physik B Condensed Matter 77, 471–483 (1989).
[39] Clara D Christ, Alan E Mark, and Wilfred F Van Gunsteren, “Basic ingredients of free energy calculations: a
review,” Journal of computational chemistry 31, 1569–1582 (2010).
[40] Christopher Jarzynski, “Nonequilibrium equality for free energy differences,” Physical Review Letters 78, 2690
(1997).
[41] Wilfried Haensch, Tayfun Gokmen, and Ruchir Puri, “The next generation of deep learning hardware: Analog
computing,” Proceedings of the IEEE 107, 108–122 (2018).
[42] James S Small, “General-purpose electronic analog computing: 1945-1965,” IEEE Annals of the History of Com-
puting 15, 8–18 (1993).
[43] Michael A. Nielsen and Isaac L. Chuang, Quantum Computation and Quantum Information (Cambridge Uni-
versity Press, Cambridge, 2000).
[44] Yipeng Huang, Ning Guo, Mingoo Seok, Yannis Tsividis, and Simha Sethumadhavan, “Analog computing in a
modern context: A linear algebra accelerator case study,” IEEE Micro 37, 30–38 (2017).
[45] Gavin E. Crooks, “Measuring thermodynamic length,” Phys. Rev. Lett. 99, 100602 (2007).
[46] Carlo Cafaro, Orlando Luongo, Stefano Mancini, and Hernando Quevedo, “Thermodynamic length, geometric
efficiency and legendre invariance,” Physica A: Statistical Mechanics and its Applications 590, 126740 (2022).
[47] Hernando Quevedo, “Geometrothermodynamics,” Journal of Mathematical Physics 48 (2007).
[48] Bjarne Andresen, “Finite-time thermodynamics and thermodynamic length,” Revue générale de thermique 35,
647–650 (1996).
[49] Jin-Fu Chen, CP Sun, and Hui Dong, “Extrapolating the thermodynamic length with finite-time measurements,”
Physical Review E 104, 034117 (2021).
[50] Ramprasad Saptharishi, “A survey of lower bounds in arithmetic circuit complexity,” Github survey 95 (2015).
[51] Giulio Chiribella, Fei Meng, Renato Renner, and Man-Hong Yung, “The nonequilibrium cost of accurate infor-
mation processing,” Nature Communications 13, 7155 (2022).
[52] Paul M Riechers, “Transforming metastable memories: The nonequilibrium thermodynamics of computation,”
arXiv preprint arXiv:1808.03429 (2018).
[53] Herbert B Callen, “Thermodynamics and an introduction to thermostatistics,” (1998).
[54] Robert Zwanzig, Nonequilibrium statistical mechanics (Oxford university press, 2001).
[55] Joseph L Doob, “The brownian movement and stochastic equations,” Annals of Mathematics , 351–369 (1942).
[56] Don S Lemons and Anthony Gythiel, “Paul langevin’s 1908 paper “on the theory of brownian motion”[“sur la
théorie du mouvement brownien,” cr acad. sci.(paris) 146, 530–533 (1908)],” American Journal of Physics 65,
1079–1081 (1997).
[57] Takahiro Hatano and Shin-ichi Sasa, “Steady-state thermodynamics of langevin systems,” Physical review letters
86, 3463 (2001).
[58] David JC MacKay et al., “Introduction to gaussian processes,” NATO ASI series F computer and systems sciences
168, 133–166 (1998).
[59] Udo Seifert, “Stochastic thermodynamics: principles and perspectives,” The European Physical Journal B 64,
423–431 (2008).
[60] Christopher Jarzynski, “Equalities and inequalities: Irreversibility and the second law of thermodynamics at the
nanoscale,” Annu. Rev. Condens. Matter Phys. 2, 329–351 (2011).
[61] Daniel K. Shenfeld, Huafeng Xu, Michael P. Eastwood, Ron O. Dror, and David E. Shaw, “Minimizing ther-
modynamic length to select intermediate states for free-energy calculations and replica-exchange simulations,”
Phys. Rev. E 80, 046705 (2009).
[62] Shun-ichi Amari and Hiroshi Nagaoka, Methods of information geometry, Vol. 191 (American Mathematical Soc.,
2000).
[63] Christopher Jarzynski, “Rare events and the convergence of exponentially averaged work values,” Physical Review
E 73, 046105 (2006).
[64] Michael R. Shirts, Eric Bair, Giles Hooker, and Vijay S. Pande, “Equilibrium free energies from nonequilibrium
measurements using maximum-likelihood methods,” Phys. Rev. Lett. 91, 140601 (2003).
[65] Charles H Bennett, “Efficient estimation of free energy differences from monte carlo data,” Journal of Computa-
tional Physics 22, 245–268 (1976).
[66] Michael R Shirts and John D Chodera, “Statistically optimal analysis of samples from multiple equilibrium
states,” The Journal of chemical physics 129 (2008).
[67] Crispin W. Gardiner, Handbook of stochastic methods, Vol. 3 (springer Berlin, 1985).
[68] Roy Frostig, Matthew James Johnson, and Chris Leary, “Compiling machine learning programs via high-level
tracing,” Systems for Machine Learning 4 (2018).
[69] Lambert H Koopmans, The spectral analysis of time series (Elsevier, 1995).
[70] Harald Cramér, Mathematical methods of statistics, Vol. 43 (Princeton university press, 1999).
[71] Frank Nielsen, “A simple approximation method for the Fisher–Rao distance between multivariate normal dis-
tributions,” Entropy 25, 654 (2023).
22
A. Overdamped Dynamics
1. Stationary Distribution
We set B = 2γβ −1 I (see Methods), where β = (kB T )−1 is the inverse temperature of the environment. We
then change variables to y = x − A−1 b, and have the transformed equation
1 2
dy = − Aydt + N 0, I dt , (A2)
γ βγ
where A = γ −1 A and B = 2γ −1 β −1 I. The stationary distribution for y then has mean zero (which follows
from Eq. (61)) and its variance Σs satisfies Eq. (65),
AΣs + Σs A = B. (A4)
Σs = β −1 A−1 . (A6)
The uniqueness of this solution is guaranteed, because Eq. (A6) always has a unique solution when A is
positive definite. Transforming back to the original coordinates, we see that at equilibrium x is distributed
as
Alternatively we may take B = 2γβ −1 R (or B = 2γ −1 β −1 R) for some symmetric positive-definite matrix R.
Then we would have stationary covariance matrix satisfying
AΣs + Σs A = 2β −1 R, (A8)
which is a Lyapunov equation. In the latter case, the stationary mean of x is still A−1 b.
We assume that the initial distribution is Gaussian. After some time has passed, the system will be in the
equilibrium distribution described by Eq. (A7). Here we quantify the amount of time one should wait (called
23
the equilibration time) to ensure that the distribution is arbitrarily close to the equilibrium distribution.
Note that because A is symmetric positive definite, the spectral norm of e−At/γ is
Given δµ0 > 0, in order to have ⟨x(t0 )⟩ − A−1 b ⩽ δµ0 , it is sufficient to require
!
γ ⟨x(0)⟩ − A−1 b
t0 ⩾ ln . (A12)
αmin δµ0
We assume the initial distribution has mean zero, ⟨x(0)⟩ = 0. Defining the condition number κ = αmax /αmin ,
and the relative error tolerance εµ0 = δµ0 /∥A−1 b∥ it is sufficient to require
t0 ⩾ κγ∥A∥−1 ln κε−1 µ0 . (A13)
The correlation matrix is defined by
G(t, s) = ⟨[x(t) − ⟨x(t)⟩][x(s) − ⟨x(s)⟩]⊺ ⟩ (A14)
For t = s, the correlation matrix reduces to the covariance matrix
G(t, t) = ⟨[x(t) − ⟨x(t)⟩][x(t) − ⟨x(t)⟩]⊺ ⟩ = Σ(t) (A15)
For an OU process, the dynamics of the correlation matrix (and the covariance matrix in particular) can be
expressed in terms of the following propagator function
⊺
Pt (X) = e−At/γ Xe−A t/γ
. (A16)
The covariance matrix for y is then given by
Z t
Σ(t) = Pt (Σ0 ) + dt′ Pt′ (B), (A17)
0
which is given by
Gs (τ ) = e−Aτ /γ Σs . (A36)
Therefore
∥Gs (τ )∥ ⩽ ∥e−At/γ ∥ · ∥Σs ∥ ⩽ e−αmin τ /γ ∥Σs ∥ . (A37)
Suppose we sample once per interval τc . To obtain samples with bounded correlation ∥Gs ∥ ⩽ δG , we need
γ ∥Σs ∥
τc ⩾ ln . (A38)
αmin δG
Defining εG = δG /∥Σs ∥, we get
τc ⩾ κτr ln κε−1
G . (A39)
Interestingly, the correlation time is dimension-independent, ignoring any incidental dependence of condition
number on dimension.
25
3. Ergodicity of Mean
We assume that t0 is large enough that, to a good approximation, the system has reached equilibrium by
time t0 . Therefore
Z
1 t0 +τ
⟨ȳ⟩ = dt ⟨y(t)⟩ = 0. (A41)
τ t0
We now bound the covariance matrix of ȳ
Z t0 +τ Z t0 +τ
1
∥⟨ȳ ȳ ⊺ ⟩∥ = dt′′ dt′ ⟨y(t′ )y(t′′ )⊺ ⟩ (A42)
τ2 t0 t0
Z t0 +τ Z t0 +τ
1
= dt′′ dt′ Gs (t′′ − t′ ) (A43)
τ2 t0 t0
Z t0 +τ Z t′′
2 ′′
= dt dt′ Gs (t′′ − t′ ) (A44)
τ 2 t0 t0
Z t0 +τ Z t′′
2
⩽ dt′′ dt′ ∥Gs (t′′ − t′ )∥ (A45)
τ 2 t0 t0
Z t0 +τ Z t′′
2 ′′ ′′ ′
⩽ dt dt′ e−αmin (t −t )/γ ∥Σs ∥ (A46)
τ 2 t0 t0
Z t0 +τ
2γ ′′
= ∥Σs ∥ dt′′ 1 − e−αmin (t −t0 )/γ (A47)
αmin τ 2 t0
Z t0 +τ
2γ
⩽ ∥Σs ∥ dt′′ 1 (A48)
αmin τ 2 t0
2γ
= ∥Σs ∥ (A49)
αmin τ
2γκ2
= (A50)
β∥A∥2 τ
2κ2 τr
= . (A51)
β∥A∥τ
According to Chebyshev’s inequality
−1 d
P (y ⊺ ⟨ȳ ȳ ⊺ ⟩ y > k2 ) ⩽ . (A52)
k2
−1
Now note that the eigenvalues of ⟨ȳ ȳ ⊺ ⟩ are the inverses of the eigenvalues of ⟨ȳ ȳ ⊺ ⟩, which are the same as
the singular values as the covariance matrix is positive definite. Therefore because the eigenvalues of ⟨ȳ ȳ ⊺ ⟩
−1 −1
are at most ∥⟨ȳ ȳ ⊺ ⟩∥, the eigenvalues of ⟨ȳ ȳ ⊺ ⟩ are at least ∥⟨ȳ ȳ ⊺ ⟩∥ . This implies
−1 β∥A∥2 τ
y ⊺ ⟨ȳ ȳ ⊺ ⟩ y ⩾ ∥y∥2 , (A53)
2γκ2
and so we have the following proposition
2γκ2 2 −1
∥y∥2 > k ⇒ y ⊺ ⟨ȳ ȳ ⊺ ⟩ y > k 2 . (A54)
β∥A∥2 τ
Moreover, ∥Ay∥2 ⩽ ∥A∥2 ∥y∥2 so the following also holds
2γκ2 2 −1
∥Ay∥2 > k ⇒ y ⊺ ⟨ȳ ȳ ⊺ ⟩ y > k 2 . (A55)
βτ
26
Now let
s
2γκ2
εy = k, (A57)
β∥b∥2 τ
p
so k = β∥b∥2 τ /2γκ2 δy . Then we have
∥Ay∥ d 2γκ2 d
P ⩾ εy ⩽ 2 = . (A58)
∥b∥ k β∥b∥2 τ ε2y
with probability at least Pε , then we can require the integration time is at least
2γκ2 d
τ⩾ (A60)
β∥b∥2 ε2x (1 − Pε )
2κ2 d ∥A∥τr
= . (A61)
β∥b∥2 ε2x (1 − Pε )
Here, we note that the energy cost E can be estimated as E ⩾ b⊺ A−1 b ⩾ ∥A∥−1 ∥b∥2 , which is the depth of
the bottom of the potential well relative to the point x = 0. Therefore E −1 ⩽ ∥A∥∥b∥−2 , allowing us to write
2κ2 d
Eτ ⩾ β −1 τr . (A62)
ε2x (1 − Pε )
Z t0 +τ Z t0 +τ
1 ′′
2
⟨yi yj ⟩ = 2 dt dt′ ⟨yi (t′ )yj (t′ )yi (t′′ )yj (t′′ )⟩ . (A65)
τ t0 t0
⟨yi (t′ )yj (t′ )yi (t′′ )yj (t′′ )⟩ = ⟨yi (t′ )yj (t′ )⟩ ⟨yi (t′′ )yj (t′′ )⟩ + ⟨yi (t′ )yi (t′′ )⟩ ⟨yj (t′ )yj (t′′ )⟩ (A66)
+ ⟨yi (t′ )yj (t′′ )⟩ ⟨yj (t′ )yi (t′′ )⟩ (A67)
= (Σs,ij )2 + Gs,ii (t′′ − t′ )Gs,jj (t′′ − t′ ) + Gs,ij (t′′ − t′ )2 , (A68)
so
Z Z ′′
4d(d + 1) t0 +τ ′′ t
= dt dt′ ∥Gs (t′′ − t′ )∥2 (A74)
τ2 t0 t0
Z Z ′′
4d(d + 1) t0 +τ ′′ t ′′ ′ 2
⩽ dt dt′ e−2αmin (t −t )/γ ∥Σs ∥ (A75)
τ2 t0 t0
Z t0 +τ −2αmin (t′′ −t0 )/γ
4d(d + 1) 2 ′′ γ − γe
= ∥Σ s ∥ dt (A76)
τ2 t0 2αmin
Z t0 +τ
4d(d + 1) 2 γ
⩽ ∥Σ s ∥ dt′′ (A77)
τ2 t0 2α min
2
4d(d + 1) γ ∥Σs ∥
= . (A78)
τ αmin
(A79)
5. Hardware implementation
Here we describe an electronic device comprised of d coupled RC cells that maps to the overdamped
Langevin process described previously. The equation of motion for voltages v = (v1 , v2 , . . . , vd ) across the
capacitor of each cell is given by
dv = C−1 (−Jvdt + R−1 dw) (A93)
where C = diag(C1 , C2 , . . . , Cd ), R = diag(R1 , R2 , . . . , Rd ), w is uncorrelated Brownian motion and the
elements of J are given by
(
Jij = − R1ij if i ̸= j
(A94)
Jij = R1i + R1′ + R1ij if i = j.
i
Hence there are two in-cell resistors, allowing for freedom on the diagonal elements of the J matrix indepen-
dently of the R matrix, and one resistor coupling each cell. One can set J = JR, and γ = 1/RC as R, C
are diagonal matrices. By denoting x = v and choosing J = A, this leads to:
1
dx = (−Axdt + dw) (A95)
γ
which reduces to overdamped dynamics of the linear systems solver provided the mean of the noise is γb,
and its variance is 2γ/β. For more details on this implementation, see ref. [17].
Note that to implement this in hardware the values of the in-cell and out-of-cell resistors must be computed
based on the A matrix, therefore this incurs a cost of O(d2 ) operations at the initialization. For electrical
√
hardware, the effective temperature is related to Johnson-Nyquist noise as vn2 = 4kB T R∆f , with ∆f the
bandwidth of the system. For a bandwidth of 1MHz, we obtain v¯n2 = 0.1µV for T = 300K.
B. Underdamped Dynamics
1. Stationary Distribution
γ
dp = −(Ax − b) dt − p dt + N [0, 2γβ −1 Idt], (B2)
M
where M, γ, β ∈ R+ . As before, define y = x − A−1 b, resulting in the transformed equations
1
dy = p dt, (B3)
M
29
γ
dp = −Ay dt − p dt + N [0, 2γβ −1 Idt]. (B4)
M
Equations (B3) and (B4) can be made dimensionless by defining x̃ and p̃ as follows
r r
γ2β β
ỹ = y, p̃ = p. (B5)
M M
˙ The underdamped Langevin equations can now be
and note that these choices imply that p̃ = M γ −1 x̃.
formulated as a single dimensionless equation by concatenating the vectors ỹ and p̃ into a single vector r,
⊺
r = (ỹ1 , . . . ỹd , . . . p̃1 , . . . p̃d ) , (B6)
As in the Methods section, we write the UDL equations as a single matrix equation
with
! !
0 −γM −1 I 0 0
A= , B= . (B8)
γ −1 A γM −1 I 0 βM −1 B
Following the analysis from the Methods section further, we find that the stationary distribution, in the
original coordinates, is
x ∼ N [A−1 b, β −1 A−1 ], p ∼ N 0, β −1 M I . (B9)
If instead we had B = 2γβ −1 R for an SPD matrix R, the stationary covariance matrix for x would be the
solution to the Lyapunov equation
AΣs + Σs A = 2β −1 R, (B10)
2. Transient Behavior
We review some results about the transient behavior of a system of coupled damped harmonic oscillators.
Consider Eqs. (B3) and (B4) with the noise term removed. These can be combined into a single second
order equation
M ÿ + γ ẏ + Ay = 0. (B11)
The system may be decoupled into d independent equations by expanding y in the (orthonormal) eigenbasis
of A,
X
y= cj aj , (B12)
j
where aj are the eigenvectors of A with eigenvalues αj . Now we have the set of uncoupled equations
γ αj
c̈j + ċj + cj = 0. (B13)
M M
p
We define ωj = αj /M , ξ = γ/2M , and ζj = ξ/ωj . The general solution to Eq. (B13) is
where
q
λj± = −ξ ± ξ 2 − ωj2 . (B15)
30
If ξ < ωj , the square root in Eq. (B15) is imaginary, in which case the system is said to be underdamped,
and when ξ > ωj the system is overdamped. For ξ = ωj Eq. (B14) is invalid, which is the critically damped
case. In what follows, we assume that ξ < ωj . In this case, the solution is
When regarded as p
a function of the state vector r ∈ R2d , the square root of the energy is a norm, which we
denote by ∥r∥E = βE(r).
β ⊺ β ⊺
∥r∥2E = y Ay + p p (B22)
2 2M
M 1
= 2 ỹ ⊺ Aỹ + p̃⊺ p̃ (B23)
2γ 2
Note that y ⊺ Ay ⩾ αmin ∥y∥2 , so the energy norm gives the following upper bound on the Euclidean norm of
the vector y
r r
y ⊺ Ay 2
∥y∥ ⩽ ⩽ ∥r∥E . (B24)
αmin βαmin
We may also consider the matrix norm induced by the energy norm, which we also denote ∥ · ∥E
s
βE(Or)
∥O∥E = max . (B25)
r∈R2d βE(r)
It is instructive to look at the upper left d × d submatrix of O, which we call Oy . In particular, observe that
s
βE(Or)
∥O∥E = max (B26)
r∈R 2d βE(r)
s
(Oy y)⊺ A(Oy y)
⩾ max (B27)
y∈Rd y ⊺ Ay
s
αmin (Oy y)⊺ (Oy y)
⩾ max (B28)
y∈Rd αmax y ⊺ y
= κ−1/2 ∥Oy ∥, (B29)
31
so
√
∥Oy ∥ ⩽ κ∥O∥E , (B30)
a fact which will be used later. The earlier derivation shows that in the underdamped limit,
1 + ξ/ωmin −2ξt
E e−At r ⩽ e E(r), (B31)
1 − ξ/ωmin
We proceed in much the same way as in the overdamped case. We now have a bound on the energy norm
of e−At ,
e−At E
⩽ χe−ξt . (B34)
where E0 is the energy of the state ⟨r0 ⟩. Define the relative error tolerance as εµ0 > ∥Ax−b∥/∥b∥ = ∥Ay∥/∥b∥.
Also assume that ⟨x(0)⟩ = ⟨p(0)⟩ = 0, so E(⟨r(0)⟩) = 21 b⊺ A−1 b. We then see that
s
∥Ay∥ ∥A∥ 2κE(⟨r(0)⟩) −ξt
⩽ χe (B40)
∥b∥ ∥b∥ ∥A∥
s
2∥A∥κE(⟨r(0)⟩) −ξt
= χe (B41)
∥b∥2
√
⩽ κχe−ξt (B42)
(B43)
In order to ensure that the relative error is less than εµ0 , we require
√
1 κχ
t0 ⩾ ln . (B44)
ξ εµ0
32
Defining the underdamped relaxation time τr(UD) = 1/ξ, we have which may be written
√
t0 ⩾ τr(UD) ln κχε−1µ0 . (B45)
As in the overdamped case, we proceed to determine the convergence rate of the (dimensionless) covariance
matrix. The covariance matrix for r obeys
Z ∞
2 ′ 2
∥Σs − Σ(t0 )∥E ⩽ e−At · ∥Σ0 − Σs ∥E + ∥B∥E dt′ e−At (B46)
t0 E
Z ∞
′
⩽ χ2 e−2ξt0 ∥Σ0 − Σs ∥E + ∥B∥E dt′ χ2 e−2ξt (B47)
t0
2
χ −2ξt0
= χ2 e−2ξt0 ∥Σ0 − Σs ∥E + e ∥B∥E (B48)
2ξ
γ
= χ2 e−2ξt0 ∥Σ0 − Σs ∥E + (B49)
Mξ
= χ2 e−2ξt0 (∥Σ0 − Σs ∥E + 2) (B50)
Now we assume the initial covariance matrix in the dimensionless coordinates is Σ0,r = γ 2 M −1 ∥A−1 ∥I ⊕ I,
which corresponds to a potential V0 (x) = ∥A∥x⊺ x. Therefore, Σs,r − Σ0,r = γ 2 M −1 (A−1 − ∥A−1 ∥I) ⊕ 0,
which implies
∥Σs,r − Σ0,r ∥E ⩽ ∥γ 2 M −1 A−1 ⊕ 0∥E (B51)
r
−1 v ⊺ A−1 AA−1 v
2
= γ M max (B52)
v v ⊺ Av
−1 −1
2
⩽ γ M ∥A∥ κ. (B53)
This implies that the spectral norm error of the covariance matrix for ỹ is bounded by
√
∥Σs,ỹ − Σỹ (t0 )∥ ⩽ κχ2 e−2ξt0 (∥Σ0 − Σs ∥E + 2) (B54)
√
⩽ κχ2 e−2ξt0 γ 2 M −1 ∥A∥−1 κ + 2 (B55)
which gives the corresponding result for the covariance matrix of y
√
∥Σs,y − Σy (t0 )∥ ⩽ β −1 κχ2 e−2ξt0 κ∥A∥−1 + M γ −2 . (B56)
We want the relative error to be less than tolerance εΣ0
∥Σs,y − Σy (t0 )∥
⩽ εΣ0 , (B57)
β −1 ∥A−1 ∥
which is satisfied if we require
1 h i
t0 ⩾ τr(UD) ln χ2 ε−1
Σ0 κ
3/2
+ κ1/2 M γ −2 ∥A∥ , (B58)
2
or
1 1
t0 ⩾ τr(UD) ln χ2 κ3/2 ε−1
Σ0 2
+ 1 . (B59)
2 4ζmax
Combining the equilibration times for mean and variance gives
1
1
t0 ⩾ max τr(UD) ln κ1/2 χε−1µ0 , τr(UD) ln χ2 3/2 −1
κ ε Σ0 2
+ 1 . (B60)
2 4ζmax
4. Ergodicity of Mean
Once equilibrium has been reached, we may consider the correlation matrix of the stationary system
Gs (τ ) = lim G(t, t + τ ), (B61)
t→∞
33
which is given by
Gs (τ ) = e−Aτ Σs . (B62)
We define the A-norm of a vector as = v Av, which induces an A-norm on operators. It clearly follows
∥v∥2A ⊺
from the definitions that for all vectors r = (u, v)⊺ , ∥u∥A ⩽ ∥r∥E , and similarly if Xy is the upper left d × 2
submatrix of a 2d × 2d matrix X, we have ∥Xy ∥A ⩽ ∥X∥E . Therefore
∥Gs,ỹ (τ )∥A ⩽ ∥Gs (τ )∥E (B63)
−At
⩽ ∥e ∥E · ∥Σs ∥E (B64)
−ξτ
⩽ χe ∥Σs ∥E . (B65)
The quantity ∥Σs ∥E can be evaluated by the following arguments: after changing to the dimensionless
coordinates, the covariance matrix is
!
γ 2 M −1 A−1 0
Σs = . (B66)
0 I
By definition,
M γ −2 y ⊺ Σs,ỹ AΣs,ỹ y + p⊺ p
∥Σs ∥2E = max (B67)
y,p M γ −2 y ⊺ Ay + p⊺ p
y ⊺ A−1 y + p⊺ p
= max 2 −4 ⊺ (B68)
y,p M γ y Ay + p⊺ p
= max 1, M γ ∥A−1 ∥2A .
−2 4
(B69)
We then see that
−1
∥Σs ∥E = max{1, γ 2 M −1 αmin }. (B70)
−1
By the assumption that the system is in the underdamped regime, 2M γ −2 αmin > 1, so we finally obtain
−1
∥Σs ∥E = γ 2 M −1 αmin = γ 2 M −1 κ∥A∥−1 . (B71)
By the same manipulations as in the overdamped case, we arrive at
√ Z Z ′′
2M κ t0 +τ ′′ t
∥⟨ȳ ȳ ⊺ ⟩∥ ⩽ 2 2 dt dt′ ∥Gs (t′′ − t′ )∥E (B72)
τ γ β t0 t0
√ Z t0 +τ Z t′′
2M κ ′′ ′′ ′
⩽ 2 2 χ ∥Σs ∥E dt dt′ e−ξ(t −t ) (B73)
τ γ β t0 t0
√ Z t0 +τ
2 κ ′′
= 2 χ dt′′ 1 − e−ξ(t −t0 ) (B74)
ξτ β∥A∥ t0
√ Z t0 +τ
2 κ
⩽ 2 χ dt′′ 1 (B75)
ξτ β∥A∥ t0
√
2 κχτr(UD)
= (B76)
β∥A∥τ
The exact same reasoning employed in the overdamped case now gives the following statement: In order to
have
∥Ax̄ − b∥ ⩽ εx ∥b∥ (B77)
with probability at least Pε , it is sufficient to require that
√
2 κχ∥A∥d
τ⩾ τr(UD) . (B78)
β∥b∥2 ε2x (1 − Pε )
We note that by the same reasoning as was applied in the overdamped case, we may state the above as an
energy-time tradeoff
√
2 κχd −1
Eτ ⩾ 2 β τr(UD) . (B79)
εx (1 − Pε )
34
X Z t0 +τ Z t0 +τ
2d(d + 1)
var(yi yj ) ⩽ dt′′ dt′ ∥Gs,y (t′′ − t′ )∥2 (B80)
ij
τ2 t0 t0
Z t0 +τ Z t′′
4d(d + 1)M 2 κ ′′
= dt dt′ ∥Gs (t′′ − t′ )∥2 (B81)
γ4β2τ 2 t0 t0
Z Z ′′
4d(d + 1)M 2 κ t0 +τ ′′ t ′′ ′ 2
⩽ dt dt′ e−2ξ(t −t ) ∥Σs ∥E (B82)
γ4β2τ 2 t0 t0
3 Z t0 +τ Z t′′
4d(d + 1)κ ′′ ′
= 2 dt′′ dt′ e−2ξ(t −t ) (B83)
β ∥A∥2 τ 2 t0 t0
Z
4d(d + 1)κ3 t0 +τ ′′ ′′
= 2 2 2
dt 1 − e−2ξ(t −t0 ) (B84)
β ∥A∥ τ ξ t0
4d(d + 1)κ3
⩽ (B85)
β 2 ∥A∥2 τ ξ
4d(d + 1)κ∥Σs,y ∥2 τr(UD)
= (B86)
τ
Then repeating the same steps as in the overdamped, we find that in order to ensure that ∥yy ⊺ −
Σs ∥F /∥Σs ∥F ⩽ εΣ , it is sufficient to require that
4κd(d + 1)
τ⩾ τr(UD) . (B87)
(1 − Pε )ε2Σ
C. Determinant Estimation
Suppose we are given a series of work measurements W1 . . . WN resulting from changing a potential from
V1 to V2 . Importantly, the system must be allowed to come to equilibrium once per measurement (so N
times in total). It has been shown, via the Cramér-Rao bound [70], that if ∆F̂ is any asymptotically
unbiased estimator of the difference in equilibrium free energies between the two potentials, then its variance
is bounded below [61]
L2
var ∆F̂ ⩾ , (C1)
N
where L is the Fisher-Rao length of the path in parameter space between the two equilibrium states. Ex-
plicitly, the Fisher-Rao length of a path through the space of probability distributions is
s
Z Z
τ
f˙t (x)2
L= dt dx . (C2)
0 ft (x)
For Gaussian distributions sharing the same mean, an explicit formula has been found for the minimal Fisher-
Rao length between two distributions. In particular, if one of the Gaussian distributions has covariance
matrix β −1 A−1 and the other has covariance matrix β −1 ∥A−1 ∥I, and both share the same mean, then the
minimal Fisher-Rao distance is given by [71]
s
1X
Lmin = ln(λj (A∥A∥−1 ))2 , (C3)
2 j=1
35
which is bounded by
r
1 2
Lmin ⩽ d |ln(κ−1 )| (C4)
2
r
1
⩽ d ln(κ)2 (C5)
2
r
d
= ln(κ). (C6)
2
Therefore the minimal variance of an unbiased estimator is
d ln(κ)2
var ∆F̂ ⩾ , (C7)
2N β 2
Moreover, this lower bound can be attained in the limit of large N by the Bennett Acceptance Ratio (BAR)
estimator or its multistate generalization (MBAR). The free energy difference between equilibrium states of
the device is
1 ⊺ −1 ⊺ −1 1 |A2 |
∆F = F2 − F1 = (b1 A1 b1 − b2 A2 b2 ) + ln . (C8)
2 2β |A1 |
Therefore the variance of an unbiased estimator of the log determinant ln(|A2 |) is lower bounded as follows
c 2 )) ⩾ 2d ln(κ)2
var(LD(A (C10)
N
where we write LD for the log determinant and LD c for the estimator of the log determinant. To achieve
absolute error of at most δLD with probability Pδ , according to Chebyshev’s inequality we need a number of
samples given by
c 2 )) ⩽ δ 2 (1 − Pδ ),
var(LD(A (C11)
LD
2d ln(κ)2
N> 2 (1 . (C12)
δLD − Pδ )
As the equilibration does not require a change of the mean of the distribution but just the covariance matrix,
the equilibration time is, in the overdamped case
1
κτr ln 2κε−1
Σ0 , (C13)
2
for some sufficiently small εΣ0 . The total time of the optimal protocol would then be approximated by
d κ ln(κ)2 ln(2κε−1
Σ0 )
τ≈ 2 (1 − P ) τr = O(d κ ln(κ)3 ). (C14)
δLD δ
102
101
log(|A|),log(|A|)
g
100
Dimension
10−1 10
50
10−2 100
FIG. 7. Convergence of determinant estimation using the Jarzynski estimator. This shows relative error
of the determinant estimate with increasing numbers of samples used in the Jarzynski estimator. Solid lines show
the median and the error bars show the interquartile range over 10 different A matrix realizations.
1. Numerical Results
To explore the performance of our thermodynamic algorithm for estimating the log determinant of a matrix
we numerically explore using a Jarzynski estimator of the free energy difference in the simplest case. Here the
potential is initialized using an identity matrix and this is instantaneously switched to the matrix for which
we desire to evaluate the log determinant. We explore the performance of the estimated log determinant
when using a Jarzynski estimator with different numbers of samples. Recall the free energy can be estimated
using
N
1 X −βWj
e−β∆F ≈ e−βW ≡ e . (C17)
N j=1
The effectiveness of thermodynamic algorithms in solving linear algebraic primitives is influenced by their use of stochastic methods and the inherently affine nature of harmonic oscillator systems. These factors allow these algorithms to efficiently map linear algebra tasks onto physical systems. Additionally, their ability to leverage thermal equilibrium distributions and stochastic fluctuations ensures faster convergence and scalability, surpassing traditional digital methods especially for large matrix dimensions and condition numbers .
In the thermodynamic framework, a linear system involving a symmetric positive definite matrix is approached by encoding the problem into potential energy functions that can be realized with harmonic oscillators. The system evolves until it reaches a stationary distribution from which estimates for solutions, like the matrix inverse, can be derived. This method offers an asymptotic improvement in scaling compared to digital algorithms by exploiting the physical realizability and evolution properties of harmonic oscillator systems .
The transitional energy requirements in thermodynamic systems are determined by examining the minimal energy-time product needed for state transitions. The task involves minimizing the length on the thermodynamic manifold to understand these requirements. This approach suggests an inherent tradeoff between time and energy consumption, prompting consideration of variational problems to minimize resource usage not typically addressed in digital paradigms, highlighting a unique benefit of thermodynamic computing .
Thermodynamic algorithms offer significant asymptotic speedups over state-of-the-art digital algorithms for linear algebra tasks. This efficiency is primarily attributed to their favorable scaling properties, which become more pronounced with increasing problem dimensions and condition numbers. Such algorithms thrive by treating problems in a stochastic, thermodynamic framework, often offering energy and time efficiency unattainable by classical digital methods, which are typically bounded by the resource-intense requirements of digital computations .
The research sets a foundational framework for thermodynamic computing, similar to early stages seen in quantum computing. It provides mathematical analysis and numerical benchmarks that reveal potential speedups, setting the stage for concrete applications in linear algebra and AI. The insights gained suggest that thermodynamic computing could experience rapid technological advancements in compilers, simulators, and programming languages, paralleling those in quantum computing, thus broadening its appeal and applicability .
Thermodynamic computing could provide advantages over digital methods in scenarios where there is a need to trade some accuracy for faster time and energy scaling. This is particularly relevant in high-dimensional problems or applications requiring fast convergence, such as large-scale linear algebra tasks or probabilistic AI applications. By exploiting physical dynamics and equilibria, thermodynamic systems offer a viable alternative for resource-intensive computations .
Harmonic oscillators serve as a physical analogy to apply thermodynamic principles to linear algebra computations. Their mathematical description is inherently affine, which aligns with linear algebraic structures, allowing for the mapping of linear algebraic primitives onto these oscillatory systems. By sampling from the thermal equilibrium distribution of coupled harmonic oscillators, thermodynamic systems can efficiently solve linear algebra problems, providing a unique intersection of physical theory and computational applications .
The underdamped thermodynamic regime differs notably from the overdamped regime in its approach to computations, offering more efficient handling of linear algebra tasks by leveraging higher oscillation dynamics. This results in distinct energy-time trade-offs that optimize the performance under different conditions. The efficiency improvements stem from utilizing detailed knowledge of the system's energy landscape and dynamics, adjusting strategies to cater to specific computational requirements of the regime .
Thermodynamic hardware leverages stochastic fluctuations by utilizing systems built from stochastic, analog building blocks. These systems sample from thermal equilibrium distributions of coupled harmonic oscillators, inherently affine and suitable for linear algebra problems. This sampling process allows for solving various linear algebra problems such as linear systems, matrix inverses, and determinant estimation more efficiently than classical digital methods .
Developing large-scale quantum hardware presents challenges primarily due to the demanding resource requirements that exceed current hardware capabilities. Quantum algorithms, while promising for linear algebra, often face issues like vanishing gradients in variational methods, making alternative hardware solutions necessary. These alternatives, such as memristor crossbar arrays and analog computing, show potential in accelerating computations like matrix-vector multiplications and linear systems, but remain complex to implement at scale .