0% found this document useful (0 votes)
2 views4 pages

GPU-Accelerated Model Predictive Control

Uploaded by

jbmccavalcanti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

GPU-Accelerated Model Predictive Control

Uploaded by

jbmccavalcanti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

2012 2nd International Conference on Computer Science and Network Technology

Acceleration of MPC Using Graphic Processing Unit

Yang Gang Liu Mingguang


School of Electrical Engineering School of Electrical Engineering
Beijing Jiaotong University Beijing Jiaotong University
Beijing, China Beijing, China
07117325@[Link] mgliu@[Link]

Abstract—Model Predictive control (MPC) algorithm owns good


performance but causes heavy computational burden. This II. MPC ALGORITHM
disadvantage makes it only applied for small systems or slow The discrete model of linear time invariant systems is stated
processes. Due to great calculation capability of Graphic as follow.
Processing Unit (GPU), this paper proposes a parallel calculation
method for the acceleration of MPC based on CUDA (Compute ì
ï
ï x (k + 1) = Ax (k ) + Bu(k )
United Device Architecture): using CPU to organize and control í (1)
the procedure and using GPU to calculate matrixes and to solve
ï
î y (k ) = Cx (k )
ï
variables. Simulations give calculation time and speedup of the
proposed method as well as the serial. It indicates that this x ( k ) , y (k ) and u(k ) are the state vector, output vector and
method can greatly deduce the calculation time and broadens the input vector respectively. A , B and C are the corresponding
application range of MPC. constant matrixes. In each control interval MPC needs a system
model to predict system’s future states x (k + i | k ) in
Keywords-GPU; CUDA; Model predictive control; Parallel
computing
predictive horizon N based on the current measured data and
system states. And finds out proper input vector u(k + i | k )
I. INTRODUCTION which makes system’s output archive the desired reference.
The cost function is often selected as follow:
MPC algorithm [1][2] rose in 70s with the development of
{ y ( k + i | k ) - yr ( k + i )
N
computer technology. Firstly, MPC was applied in the slow J = min å
2

processes and small systems as its disadvantage of i =0


P
(2)
computational complexity. MPC was used in petroleum
industry and chemical industry early, and was proved to
+ u( k + i | k )
2
Q
+ Du(k + i | k )
2
R }
maintain excellent performance and to reduce the cost of
production. Recently, MPC tends to be applied to more fields s.t. ymin £ y (k + i | k ) £ ymax
as computer’s computing speed increases greatly. For all that, umin £ u(k + i | k ) £ umax
real-time problem is still encountered while MPC applied to |Du(k + i | k ) |£ Dumax
large systems or quick processes. On one hand, many MPC
variants were developed, such as explicit MPC [3][4], moving Where, Du(k + i | k ) = u(k + i | k ) - u(k + i -1| k ) denotes the
blocking MPC [5]. On the other hand, special architecture of 2
hardware was used to speed up the process of solution, such as increment vector of input; u( k + i | k ) Q
multiple CPUs [6] or FPGA [7][8]. Explicit MPC calculates the
control law offline and chooses the proper control law online represents uT (k + i | k )Qu(k + i | k ) ; Q is a diagonal matrix;
according to the state of system. But it demands exponential yr (k + i ) is the reference trajectory. ymin , ymax , umin , umax
memory to store data with the increase of number of system and Dumax are the output lower limit, output upper limit, input
state variables. Using special hardware architecture is limited
lower limit, input upper limit and limit of input vector
by the complexity and cost of control systems. The special
respectively. For convenience, define
hardware also obtains less flexibility and adaptability. T
X = [u(k ) u(k + 1)  u(k + N )] and represent (2) as a
This paper realizes the acceleration of MPC based on quadratic programming (QP) problem:
NVIDIA GPU. This solution takes advantage of standardized
video card installed in general computers, which means no 1 T
special hardware is involved. Great computing power of GPU min f ( X ) = X GX + r T X (3)
2
reduces the time of solution of MPC. This paper firstly
analyzes the whole procedure of MPC and finds out which s.t. aiT X £ bi ,i Î E
steps demand much time. Parallel manner is proposed to
deduce computation time. In the end, simulations are aiT X = bi ,i Î I
conducted to show the effectiveness.

978-1-4673-2964-4/12/$31.00 ©2012 IEEE 1001 CHANGCHUN, CHINA

Authorized licensed use limited to: University of Michigan Library. Downloaded on March 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
Active Set and Interior Point [9] are two effective methods é B ù
to solve QP problems. The first is adopted to realize (3). In é x (k + 1| k ) ù é A ù ê ú
ê ú ê 2ú ê 1 ú
ê å i =0
i
each iteration the main purpose of Active Set is to find a ê x (k + 2 | k ) ú ê A ú A B ú
feasible point at which the effective constraints are used as ê ú ê ú ê ú
ê x (k + 3 | k ) ú = ê A3 ú x (k )+ ê 2 Ai B ú u(k -1)
equality constraints and the ineffective constraints are deleted. ê ú ê ú ê å i =0 ú
To find a feasible point, an equality constrained QP problem is ê  ú ê  ú ê ú
ê ú ê ú ê  ú
solved. There are 4 steps as follows: ê x ( k + N | k )ú ê A N ú ê ú
ëê ûú ëê ûú ê -1 i ú

ëê å i =0
N
A B
Step 1. Let k = 1 and find an initial feasible point X (1) ûú
which satisfies:
é B  0 ù
ê ú
T
a X (1)
£ bi ,i Î E ê AB + B  0 ú é Du(k | k ) ù
i
(4) ê ú ê ú
T
a X (1)
= bi ,i Î I ê A2 B + AB + B  0 ú ê ú (8)
i +ê ú ê 
ê ú ú
At point X (1) the active set is I ( X (1) ) = {i | aiT X (1) = bi ,i Î I } . ê    ú êëDu(k + N -1| k )úû
ê N -1 N -m ú
ê å Ai B  åi=0 Ai Bú
Step 2. Find searching direction d ( k ) by solving the ë i =0 û
equality constrained QP problem: The predictive output on horizon N is
1 é y (k + 1| k ) ù éC 0  0 ù é x (k + 1 | k ) ù
min d T Gd + f ( X ( k ) )T d ê ú ê úê ú
2 (5) ê y(k + 2 | k ) ú ê 0 C  0 ú ê x (k + 2 | k ) ú
ê ú=ê úê ú (9)
s.t. S T d = 0 ê  ú ê    úú êê  ú
ê ú ê ú
ê y (k + N | k )ú ê 0 0  C úûú êëê x (k + N | k )úûú
Here, S = [a1 a2  am ] , n > m and rank( S ) = m ; G is a ëê ûú ëê
n ´ n positive semi-definite matrix. d * , l * are the globe Obviously, the number of system states and the length of
solution and related Lagrange multiplier respectively by prediction horizon are the main factors leading to
solving: computational burden. For example, implementing the
procedure above in MATLAB, take a small system (see (11))
é G S ù é d ù é-f ( X ( k ) )T ù with N=100 and shows the computation time given by Profiler
ê T úê ú=ê ú (6)
êS
ë 0 úû êël úû êë 0 ú
û
in MATLAB in detail as Table 1.
Step 3. Let d ( k ) = d * . If d ( k ) = 0 , find the Lagrange
TABLE I. CLASSICAL COMPUTATION TIME OF QP IN MPC
multiplier l ( k ) . Then if l ( k ) ³ 0 , X ( k ) is the optimal solution.
Otherwise find lq( k ) = min{li( k ) | i Î I ( X ( k ) )} and let Function name Calls Total [s] Self time [s]
( k +1) (k ) ( k +1) MPC_Control 1 2.619 0.141
X = X . The active set I ( X ) is replaced by
MPC_QP 1 2.478 0.076
I ( X ( k ) ) excluding from the equality constraints with respect
MPC_QP>Solve_L 17 2.117 2.117
to q. Let k = k + 1 , jump to Step 2.
MPC_QP>FindAciveSet 16 0.141 0.141
Step 4. If d ( k ) ¹ 0 , calculate the step size in search: MPC_QP>GetNewX 16 0.126 0.126
ì-a X
ï T (k )
ïü MPC_QP>FindInitPoint 1 0.016 0.016
aˆk = min ïí i
| aiT d ( k ) > 0, i Ï I ( X ( k ) )ïý The function Solve_L taking much time is to solve (6) by
ï
î a d
ï
T (k )
i
ïïþ
Gauss elimination method which has a complexity of O(n3 ) .
-a Tp X ( k ) When N=100, the dimension of coefficient matrix of predicted
= (7) output is 300´300 . One reason for low real time peculiarity
a Tp d ( k )
of MPC is to solve a large linear equation. Another is the call
Compute X ( k +1) = X ( k ) + ak d ( k ) , ak = min{aˆk ,1} ; If ak = aˆk , times of the solver.
add new constraints with respect to p to the active set, or let
III. PARALLEL COMPUTING ARCHITECTURE
I ( X ( k +1) ) = I ( X ( k ) ) , k = k + 1 and jump to Step 2.
In last chapter Gauss elimination method is implemented in
Before solving QP problem, the related matrixes need to be serial mode (running on a CPU) in MATLAB. This manner
compute first. In N horizon, the system model becomes takes much time to solve large equations. In this section, a
equality constraints like this: parallel one is presented based on a NVIDIA GPU.
All work is done in CUDA environment [10]. In CUDA,
the code contains two parts, the serial running on a host (CPU)
and the parallel running on a device (GPU). The former aims to
tasks’ collaboration and the later to acceleration of solution.

1002

Authorized licensed use limited to: University of Michigan Library. Downloaded on March 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
The parallel code running on device is called kernel function 453478569234567589 3
which is executed identically by all light threads. One kernel is 66789436457892148 4
with a corresponding grid which is made of thread blocks. 4785692345675893
234585696758938
5
3
Thread blocks are made of threads which are the basic 81569422634567 2
computational units of a device. The dimension of a grid and 4228156945675
694228155675
7
8
thread blocks can be one, two or three. A two dimension 75366942281 9
example is demonstrated in Fig. 1. Before the device starts to 2815366942
692836421
0
6
parallel compute, a host allocates the device memory for the 96283642 4
9
grid and copies the initial data from host to device. In the end, a 9628364
357283 6
host copies the results form device. 65483 0
4532 4
687 5
63 7
7 2

(c)
Figure 2. Demonstration of Gauss elimination method in parallel mode

Stage 2. Row level operation. This kernel is to eliminate the


elements below kth row. The operation is
ak +i , k + j = ak +i , k + j - rj ⋅ ak +i , k as Fig. 2 (b) shows.

Stage 3. Back substitution. As Fig. 2 (c) shows, the part in


red rectangle is operated by CPU. GPU eliminates the part in
green rectangle. Assume that the width of red rectangle is m
Figure 1. The relation of kernel, grids, blocks and threads in 2 dimensions. and the upper triangular matrix is n´ n . t1 is the time for host
performing one operation, and t2 responding to device.
Gauss elimination method in parallel mode may reduce the Transferring a element between host and device needs time t3.
computation time. As known, Gauss elimination method So the host needs t1m 2 to finish the calculation to elements in
consists of two operation, forward elimination and back
substitution. There are three stages in parallel mode of Gauss the red rectangle. And time for transferring the elements in the
elimination. green rectangle is t3 [m(n - m)] ; The device calculating one
row elements of the green rectangle needs t2 (4m + 1) , and
Stage 1. Calculate the ratios. This kernel gets the kth column
from the augmented matrix, and calculates the ratio array using returning the results need t3 (n - m) . The total time for back
the sub-diagonal element ai,i. Fig. 2 (a) is the demonstration. substitution is
n/ m n/ m n/ m
é a0,0 a0,1  a0, n ù é a0,0 a0,1  a0,n ù Ts (m) = å (t1m 2 ) + t3 (m + 1)å (n - i ⋅ m) + å (t2 (4m + 1))
ê ú ê ú
ê a1,0 a1,1  a1, n ú ê a1,0 a1,1  a1,n ú
i =1 i =1 i =1

ê ú ê ú (10)
ê  ê    úú
ê    úú ê 
êa  an ,n úûú It is obvious that when m = (4t2 + 2t3 n) / (4k1 + t3 ) the
êa an ,1  an,n úúû a
ëê n ,0 n ,1
êë n ,0 total time is the minimum.

 The flowchart of QP solver containing with acceleration is
 showed as Figure 3.


a0,0 / a0,0 a1,0 / a0,0 a2,0 / a0,0  an,0 / a0,0
r0,0 r1,0 r2,0 rn,0

(a) (b)

1003

Authorized licensed use limited to: University of Michigan Library. Downloaded on March 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.
10 0.0476 0.2447 -
21 0.1263 0.2812 -
42 0.4861 0.3241 1.4998
85 1.7052 0.5923 2.8789
170 8.9164 1.0620 8.3959
341 53.3854 2.1834 24.4506
682 402.1035 13.3325 30.1596
X ( k+1) 1365 3054.5284 88.6587 35.6593
For N < 42 , the acceleration is invalid. Because the
matrixes transferred to device is small. And the transferring
li( k ) < 0 time occupies much proportion than computing time.
For N ³ 42 , the effect of acceleration is displayed.
l (k ) ³ 0 ak For N = 682 , traditional serial mode needs 402.1035 s for the
solution of QP problem, and the parallel mode needs only
13.3325 s. Notice that the proposed method is a dense solver.
d (k ) = 0 ?
The computing power of the PC used in this paper is weaker
than the newest PC, so it is certain that large speedup and less
time will be achieved if a new PC is used.

V. CONCLUSIONS
This paper analyzes the computational burden caused by
Figure 3. Flowchart of QP solver accelerated by GPU MPC, and proposes a parallel method to accelerate the solution
of MPC based on GPU. The essential of this method is to
implement Gauss elimination in parallel mode. And simulation
IV. SIMULATIONS results show that the parallel method deduces much time for
Simulations are conducted on a PC. Hardware consists of calculation. The speedup reaches 30 for N = 682 . That allows
AMD Athlon64 3000+, DDR400 1 GB and NVIDIA Geforce MPC to be applied to much larger systems.
8800GT. Software environment consists of VC2005 and
CUDA 1.0. This GPU peak capability of single precision REFERENCES
floating point arithmetic is 336 GFLOPS, and the CPU’s is [1] J.B. Rawlings, “Tutorial overview of model predictive control. ieee
2.3346 GFLOPS. 8800GT consists of 16 SMs (Stream control systems magazine,” 20(3):38-52. 2000.
Multiprocessor). Each SM can process 786 threads. All the [2] D.Q. Mayne, J.B. Rawlings, C.V. Rao, et al., “Constrained model
registers accessed by threads of a SM are limited to 8192 predictive control: stability and optimality,” Automatica, 36(6):789-814,
words. For sake of convenience, the dimension of matrixes or 2000.
vectors to be calculated is augmented to the nth power of 2. [3] T.A. Johansen, W. Jackson, R. Schreiber, et al., “Hardware synthesis of
The elements added are set to zero. The length of prediction explicit model predictive controllers,” IEEE Trans. Control System
Technology, 15(1):191-197, 2007.
horizon is properly set to make dimension of the related
constant matrixes get close to 2n. This paper takes a model [4] P. Tondel, T.A. Johansen, A. Bemporad., “An algorithm for multi-
parametric quadratic programming and explicit MPC solutions,”
below to test the proposed method. Automatica, 39(3):489-497, 2003.
é x1 ù é 0 1 0 ù é x1 ù é 0 ù [5] Raphael Cagienard, Pascal Grieder, Eric C, Kerrigan, et al., “Moving
ê ú ê úê ú ê ú blocking strategies in receding horizon control,” 43rd IEEE Conference
ê x 2 ú = ê 0 -1.0860 4.287925ú ê x2 ú + ê 0 ú u on Decision and Control, 2004.
ê ú ê úê ú ê ú
ê x ú ê 0 -1.0909 -1.4545 ú ê x ú ê3.6363ú [6] Stefano Longo, Eric C. Kerrigan, Kecj Voonling, et al., “Parallel move
ë 3û ë û ë 3û ë û (11) blocking model predictive control,” 50th IEEE CDC-ECC, 2011.
[7] Adrian G. Wills, Geoff Knagge, Brett Ninness., “Fast linear model
x1 is system output. The serial and the parallel methods are predictive control via custom integrated circuit architecture,” IEEE
performed under different length of N. The calculation time Trans. Control Systems Techology 20(1): 59-71, 2012.
and speedup is given in Table 2. [8] J. L. Jerez, G. A. Constantinides, E. C. Kerrigan, et al., “Parallel MPC
for realtime FPGA-based implementation,” Proc. 18th IFAC World
Congress, 2011.
TABLE II. COMPARISON OF COMPUTING TIME FOR MPC [9] R. A. Bartlett, A. Wachter, L. T. Biegler., “Active Set vs. Interior Point
strategies for model predictive control,” Proceedings of the American
Serial Parallel Control Conference, 2000.
N Speedup
mode [s] mode [s]
[10] NVIDIA CUDA Programming Guide 2.0. NVIDIA Co., 2008.
5 0.0315 0.2138 -

1004

Authorized licensed use limited to: University of Michigan Library. Downloaded on March 05,2025 at [Link] UTC from IEEE Xplore. Restrictions apply.

You might also like