Stabilization of Rotational Double Pendulum
Stabilization of Rotational Double Pendulum
Thesis
Submitted to
UNIVERSITY OF DAYTON
The Degree of
By
Bo Li
UNIVERSITY OF DAYTON
Dayton, Ohio
August, 2013
ROTATIONAL DOUBLE INVERTED PENDULUM
Name: Li, Bo
APPROVED BY:
ii
c Copyright by
Bo Li
2013
ABSTRACT
Name: Li, Bo
University of Dayton
The thesis deals with the stabilization control of the Rotational Double Inverted Pendulum (RDIP)
System. The RDIP is an extremely nonlinear, unstable, underactuated system of high order. A math-
ematical model is built for the RDIP with the Euler-Lagrange (E-L) equation. A Linear Quadratic
Regulator (LQR) controller is designed for this system and its stability analysis is presented in the
Lyapunov method. We re-develop the Direct Adaptive Fuzzy Control (DAFC) method in our case
for the purpose of exploring the possibility to improve the performance of the LQR control of the
system. The simulation results of these two control schemes with their comparative analysis show
that the DAFC is able to enhance the LQR controller by increasing its robustness in the RDIP con-
trol.
iii
For my family
iv
ACKNOWLEDGMENTS
I would like to express my special gratitude to my advisor Dr. Raúl Ordóñez for his tremendous
support and help through the learning process of this master thesis. Without his guidance and
persistent help this project would not have been possible within the limited time frame. Furthermore
I would also like to thank my committee members, Dr. Asari and Dr. Barrera, who have willingly
shared their precious time and provided me with the useful comments, remarks and engagement on
the dissertation. In addition, a thank you for all the members in the lab KL302, whose companion
and suggestions have supported me during the process of this thesis and helped me a lot. Last but
not the least, I would like to thank my loved ones for their endless love and support. I will be
v
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 LQR Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Takagi-Sugeno Fuzzy System . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Bounding Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Adaptation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.3 Sliding-mode Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
vi
V. SIMULATION RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
VI. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vii
LIST OF FIGURES
5.7296◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
viii
LIST OF TABLES
ix
CHAPTER I
INTRODUCTION
An inverted pendulum is a pendulum which has its links rotating above its pivot point. It is often
implemented either with the pivot point connected with a base arm that can rotate horizontally
(described in [1]) or mounted on a cart that can move in a fixed horizontal line (introduced in [2]).
The links of the pendulum are usually limited to 1 degree of freedom by affixing the links to an
axis of rotation. It is obvious that an inverted pendulum is inherently unstable, and must be actively
balanced in order to remain upright while a normal pendulum is stable when hanging downwards.
This can be done by applying a torque at the pivot point for a rotational inverted pendulum as
considered in this thesis or moving the pivot point horizontally for the case of an inverted pendulum
on a cart. A simple demonstration of moving the pivot point to control the pendulum is achieved
by balancing an upturned broomstick on the end of one’s finger. The inverted pendulum control
is a classic problem in dynamics and control theory and is used to verify the performance and
The Rotational Double Inverted Pendulum (RDIP) takes the classic rotational single pendulum prob-
lem to the next level of complexity. The RDIP is composed of a rotary arm that attaches to a servo
system which provides a torque to the base arm to control the whole system, a short bottom rod
1
connected to the arm and a top long rod. It is an underactuated (i.e., it has fewer inputs that de-
grees of freedom) and extremely nonlinear unstable system due to the gravitational forces and the
coupling arising from the Coriolis and centripetal forces. Since the RDIP presents considerable
control-design challenges, it is an attractive tool utilized for developing different control techniques
and testing their performances. Related applications include stabilizing the take-off of a multi-stage
Nearly all works on pendulum control concentrate on two problems: stabilization of the inverted
pendulums and pendulums swing-up control design. The first topic is concerned with the controller
design to maintain the pendulum in the upright position. In the RDIP case, controllers are designed
to balance two vertical rods by manipulating the angle of the base arm. The second one refers to an
adequate algorithm to swing up the pendulum from its stable equilibrium [2], the downward position
to the upright position. In this thesis we concentrate on the balancing control of the pendulum
There is a variety of works devoted into the control design of the RDIP. A simple mathematical
model for the RDIP has been built in [3], which takes the angles and angular velocities of the base
arm and the two pendulums as the system outputs and ignores all the friction terms for the rotational
joints and the DC motor. It also presents an alternative of the least squares theory to come up with
a controller providing a domain of convergence for the pendulum. A more precise system model
has been developed in this paper by using the modeling method mentioned in [4] with the help of
the E-L equation. Another control structure is proposed in [5] by compensating individually the
multiple loop delays, which is suitable to be used in a networked control system environment. In
2
this paper, we seek to balance the RDIP with the LQR and the DAFC. We will discuss the details
This paper mainly serves for two purposes. Firstly, the simulation and experiment results in [6]
indicates that, even with a non-minimum-phase plant, the adaptive fuzzy controller is still able to
make a good control performance for the single inverted pendulum: the angle of the pendulum
can converge to the origin, although the base arm trends to rotate with a constant angular velocity,
which, in control notation, is another stable state. This paper tries to explore the possibility of this
finding in a more complicated case, the RDIP. Furthermore, assuming that this possibility exists, we
try to improve the control performance to make all the system states converge to zero. Secondly, this
paper also provides a fundamental theoretical basis for the control experiment of the SV02+DBIP
double inverted pendulum kit from the Quanser Company in the lab KL302.
This paper is organized as follows. In Chapter II, we present a description of the RDIP and develop
a mathematical model for the system. In Chapter III and IV, the LQR and DAFC are introduced and
developed for the RDIP in details respectively. Chapter V shows the simulation results of these two
controllers and presents an analysis on them. In Chapter VI, the concluding remarks, we summarize
the overall results, provide a broad assessment of the apparent advantages and disadvantages of the
LQR and the DAFC control techniques, and provide some future research directions which help to
3
CHAPTER II
SYSTEM DESCRIPTION
In this Chapter we will focus on the mathematical model building of the RDIP. The more we know
about a dynamic system, the more accurate a mathematical model can be obtained. With accurate
mathematic models, faster, more accurate and effective controllers can be designed, since math-
ematic models allow design, test and development of controllers with the help of some powerful
The RDIP experiment platform that we use for simulation to be described later consists of a hori-
zontal base arm (denoted as Link 1) driven by a servo motor and two vertical pendulums (denoted
as Link 2 and 3) that move freely in the plane perpendicular to Link 1, as shown in Figure 2.1 about
which we will talk in details later. Since we will focus on the stabilization of the pendulums, it is
convenient to set the coordinate system as in Figure 2.1. In this paper, the mathematical model of
the RDIP will be developed by the use of the Euler-Lagrange (E-L) function. A simple mathemat-
ical model has been presented in [3], which assumes that the acceleration of the base arm is able
4
Figure 2.1: Rotational double inverted pendulum schematic.
to be manipulated directly and therefore chosen as the system control input. In this dissertation,
a more practical assumption is taken under which the torque of the motor to the base arm is the
control signal. Moreover, the pivoting friction factors will be taken care of for the goal to build a
more precise model and simulate the real system we have in the lab.
We will use some additional basic assumptions of the system attributes similar with [3]:
• All the link angles and the angular velocities are accessible at each time step, since we know
that we can access these data with the help of the encoders on the links and high rate of data
5
• The viscous frictions of the arm and the two pendulums are considered while the static fric-
• The apparatus is light weight and has low inertia resulting in a structure with low stiffness
Figure 2.1 shows the basic configurations of the RDIP. The arrows on the arcs show the positive
direction for the rotary movement of the links. The straight dash lines denote the origin of the
displacement of the link angles. For example when the horizontal Link 1 is centered and the vertical
Link 2 and 3 are in the upright position, all of the position variables are zero. The state variables of
6
θ̈2 Acceleration of Link 2.
Ji Moment of inertia of Link i. J1 is about its pivot while Ji is about its center of mass Pci for
i = 2, 3.
mi Mass of Link i, i = 1, 2, 3.
g Gravity with the value g = 9.81m/s2 towards the center of the earth.
Li Length of Link i, i = 1, 2, 3.
7
2.2 Euler-Lagrange Equation
The E-L method, introduced in details in [7], is applied in the derivation of the equations of motion
for the RDIP dynamics since the Newtonian approach of applying Newton’s laws of motion is highly
complicated in this case. The solutions to the E-L equation for the action of a system are capable
of describing the evolution of a physical system according to the Hamilton’s principle of stationary
but it has the advantage that it takes the same form in any system of generalized coordinates, and it
The E-L equation is an equation satisfied by a function q, of a real argument t, which is a stationary
q(a) = xa and q(b) = xb ; q̇ is the derivative of q satisfying that q̇ : [a, b] → Tq(t) X, t 7→ υ = q̇(t)
with Tq(t) X denotes the tangent space of X at q(t); L is a real-valued function with continuous first
partial derivatives: L : [a, b] × T X → R, (t, x, υ) 7→ L(t, x, υ) with T X being the tangent bundle
S
of X defined by T X = {x} × Tx X. The E-L equation, then, is given by
x∈X
d
Lx (t, q(t), q̇(t)) − Lυ (t, q(t), q̇(t)) = 0, (2.2)
dt
where Lx and Lυ denote the partial derivatives of L with respect to x and υ respectively.
To determine the equations of motion for the system dynamics, we follow the following steps:
8
1. Determine the kinetic energy K and the potential energy P .
L = K − P. (2.3)
∂L
3. Compute ∂q .
∂L d ∂L
4. Compute ∂ q̇ and from it, dt ∂ q̇ . It is important that q̇ be treated as a complete variable rather
than a derivative.
5. Solve the revised E-L equation for the system with the generalized forces
d ∂L ∂L
− = Qq , (2.4)
dt ∂ q̇ ∂q
where Qq are the generalized forces and q are the generalized coordinates.
2.3 Modeling
The RDIP works as follows. The movement of the arm on the base, Link 1, is constrained to the
x−o−z plane and rotating around the y axis. The movements of the other two links are constrained
to a vertical plane perpendicular to Link 1. Link 1 is driven by a DC motor, which generates a torque
to control the system and is described in [8]. Here we will not discuss the servo system. Therefore,
the control input of the RDIP is the torque applied to Link 1. The control objective is to maintain
the pendulums Link 2 and 3 in the upright position with Link 1 in the origin position.
9
Figure 2.2: Velocity analysis for Link 2.
The total kinetic energy of each link in our system is given by the combination of its moving kinetic
where v and θ̇i are respectively the moving velocity and the rotational angular velocity. We provide
the analysis of the total kinetic energy of the Link 2 in Figure 2.2 to help readers make an analysis
for other two links with the same method. The potential energy is easy to get, thus we do not discuss
it further. In general, we will have the total kinetic energy for the whole system as
2
1 2 1 2 1 2 1 2
K = J1 θ̇1 + J2 θ̇2 + J3 θ̇3 + m2 L1 θ̇1 + l2 θ̇2 cos θ2 + −l2 θ̇2 sin θ2 +
2 2 2 2
(2.6)
1 2 2
m3 L1 θ̇1 + l2 θ̇2 cos θ2 + l3 θ̇3 cos θ3 + −l2 θ̇2 sin θ2 − l3 θ̇3 sin θ3 ,
2
10
Now we can obtain the Lagrangian by applying (2.6) and (2.7) to (2.3)
L = K − P.
Applying the E-L equation (2.4) to (2.3) results in three coupled non-linear equations.
d ∂L ∂L
− = τ − b1 θ̇1 (2.8)
dt ∂ θ̇1 ∂θ1
becomes
(2.9)
+ b1 θ̇1 − L1 (m2 l2 + m3 L2 ) θ̇12 sin θ2 − L1 m3 l3 θ̇32 sin θ3 .
d ∂L ∂L
− = −b2 θ̇2 (2.10)
dt ∂ θ̇2 ∂θ2
becomes
(2.11)
− b2 θ̇2 − L2 m3 l3 θ̇32 sin (θ2 − θ3 ) + (m2 l2 + m3 L2 )g sin θ2 .
And
d ∂L ∂L
− = −b3 θ̇3 (2.12)
dt ∂ θ̇3 ∂θ3
becomes
(2.13)
− b3 θ̇3 + L2 m3 l3 θ̇22 sin (θ2 − θ3 ) + m3 l3 g sin θ3 .
If the equations are parameterized they reduce to a more manageable form. Define h1 , h2 , h3 , h4 ,
h5 , h6 , h7 and h8 as
h1 = J1 + L21 (m2 + m3 ),
h2 = L1 (m2 l2 + m3 L2 ),
h3 = L1 m3 l3 ,
h4 = J2 + L22 m3 + l22 m2 ,
(2.14)
h5 = L2 m3 l3 ,
h6 = J3 + l32 m3 ,
h7 = (m2 l2 + m3 L2 )g,
h8 = m3 l3 g.
11
The dynamic equations are reduced into the form
τ = h1 θ̈1 + h2 cos θ2 θ̈2 + h3 cos θ3 θ̈3 + b1 θ̇1 − h2 θ̇12 sin θ2 − h3 θ̇32 sin θ3 ,
0 = −h2 cos θ2 θ̈1 − h4 θ̈2 − h5 cos (θ2 − θ3 )θ̈3 − b2 θ̇2 − h5 θ̇32 sin (θ2 − θ3 ) + h7 sin θ2 , (2.15)
0 = −h3 cos θ3 θ̈1 − h5 cos (θ2 − θ3 )θ̈2 − h6 θ̈3 − b3 θ̇3 + h5 θ̇22 sin (θ2 − θ3 ) + h8 sin θ3 .
To make the system dynamics more accessible, we assume that θ = [θ1 , θ2 , θ3 ]> , θ̇ = [θ̇1 , θ̇2 , θ̇3 ]>
and θ̈ = [θ̈1 , θ̈2 , θ̈3 ]> . Then (2.15) can be rewritten in the same form with the equations in [3] as
where
h1 h2 cos θ2 h3 cos θ3
F (θ) = −h2 cos θ2 −h4 −h5 cos (θ2 − θ3 ) , (2.17)
−h3 cos θ3 −h5 cos (θ2 − θ3 ) −h6
2 2
b1 θ̇1 − h2 θ̇1 sin θ2 − h3 θ̇3 sin θ3
G(θ, θ̇) = −b2 θ̇2 − h5 θ̇32 sin (θ2 − θ3 ) , (2.18)
−b3 θ̇3 + h5 θ̇22 sin (θ2 − θ3 )
0
V (θ) = h7 sin θ2 , (2.19)
h8 sin θ3
1
b(θ) = 0 . (2.20)
0
From the mathematical model, we can get the conclusion about the natural characteristics of the
equilibrium for the system. A little disturbance will lead the open-loop system to leave from
the equilibrium and fall down to the downward position which is the stable equilibrium of the
system. This characteristic can be seen by the MATLAB simulation result in Figure 5.1 when
applied this model in some initial states without adding a control signal into the system.
• Coupling Characteristic: According to the mathematical model, we can see the strong cou-
pling characteristic between the state variables of the pendulum. This can be observed by
12
developing the dynamics into a state differential equation form. We will prove this later in
Section 3.2.
So far, a nonlinear mathematical model has been built for the RDIP. In the following two chapters,
we will design controllers for the RDIP based on this model. In Chapter V, we will check the
validation of this model by simulation and we will analyze its behavior in some initial states without
control.
In attempting to further develop the mathematical model for the RDIP, several challenges present
• The emergence of vibrational modes associated with unmodeled dynamics related to the elas-
• The model of the motor system and its incorporation with the RDIP model.
13
CHAPTER III
3.1 Introduction
Linear Quadratic Regulator (LQR) is one of the main results of the theory of optimal control which
is concerned with operating a system at the minimum cost. In this theory, system dynamics are
usually described by a set of linear differential equations and the cost in the control process is
ẋ = Ax + Bu, (3.1)
where x is the state variables, u is the control input, A and B are known matrices. The goal is to
seek a feedback gain K which will be applied in the linear control law
u = −Kx, (3.2)
14
so as to minimize the cost function V expressed as the integral of a quadratic form in the state x
Z ∞
1
V = x> Qx + u> Ru dt, (3.3)
2 0
where Q is a positive semi-definite symmetric matrix and R is a positive definite symmetric matrix.
With this assumption, the first integral term x> Qx is always positive or zero and the second term
u> Ru is always positive at each time t for all values of x and u. This guarantees that V is well-
positive. Usually, Q and R are selected to be diagonal for convenience, thus some entries of Q will
be positive with some possible zeros on its diagonal while all the entries of R must be positive. Note
that R is invertible.
The cost function V is a performance index of the cost of the whole control process and it can be
interpreted as an energy function. The magnitude of the control action itself is included in the cost
function so as to keep the cost, which is due to the control action itself, to be limited. Since both
the state x and the control input u are weighted in V , if V is small, then both x and u are kept to be
small. Furthermore, if V is minimized, then it is certainly finite, and since it is an infinite integral of
x, this implies that x goes to zero as t goes to infinity, which guarantees that the closed-loop system
will be stable.
The plant is linear and the cost function V is quadratic. For this reason, the problem of determining
the state feedback control which regulates the states to zero to minimize V is called the Linear
15
By solving the algebraic Riccati equation (ARE) for P
The minimal value of the performance criterion V using this gain is given by
V (x0 ) = x>
0 P x0 , (3.6)
which only depends on the initial condition x0 . This mean that the cost of using the LQR gain can
be computed from the initial conditions before the control is ever applied to the system.
Note that Q and R are set by control engineers. In effect, the LQR algorithm takes care of the
tedious work done by engineers in optimizing the controller. However, one still needs to specify
the weighting factors Q and R and compare the results with the specified design goals. Often this
means that it will be an iterative process for engineers to judge the produced ”optimal” controllers
through simulation and then adjusts the weighting factors to get a controller more suitable with the
specified design goals. A clear linkage between the adjusted matrices and the resulting changes in
control behavior is hard to find, which limits the application of the LQR based controller synthesis.
16
3. Find the LQR gain using K = R−1 B > P .
The MATLAB routine “lqr(A,B,Q,R)” is used here to perform the numerical procedure for solving
the ARE.
3.2 Implementation
For the LQR control design, we need to linearize the RDIP dynamics at the upright equilibrium.
By defining x as
θ1
x1
x2 θ2
x3 θ3
x=
x4 = θ̇1 ,
(3.7)
x5 θ̇2
x6 θ̇3
we can rewrite (2.15) as
where f (x) ∈ R6 and g(x) ∈ R6 (Note that vectors in this paper are column vectors by default)
x4
x5
x6
f (x) =
, (3.9)
f4 (x)
f5 (x)
f6 (x)
0
0
0
g(x) =
. (3.10)
g4 (x)
g5 (x)
g6 (x)
17
We do not expand the terms fi (x) and gi (x) for i = 4, 5, 6 because their complete forms are too
long to be expanded here. Readers can easily get these terms from our dynamics with the help of a
computer. From the expansion form of (3.8), we find that all the terms fi (x) and gi (x) are related to
the states from x2 to x6 . This indicates a strong coupling relationship between the three links which
0
0
1 0
B= 2
, (3.14)
T h4 h6 − h5
h2 h6 − h3 h5
h3 h4 − h2 h5
where
18
Another simple approach in [2] is also good for linearizing the system model by using the approxi-
mation
sin(θi ) = θi ,
(3.16)
cos(θi ) = 1,
We will use this method to linearize our second system model expressed in the form of (2.16). Thus
we have
where,
h1 h2 h3
F̄ (θ) = −h2 −h4 −h5 , (3.18)
−h3 −h5 −h6
b1 θ˙1
Here we ignore all the high-order terms of θi in both of these two methods. High-order terms contain
at least quadratic quantities of θi . Since if θi are small, their squares are even smaller, the high-order
19
With the linear state matrices A and B, we are able to figure out the controllability of the system.
C = A BA B 2 A B 3 A B 4 A B 5 A
The rank of the matrix C is 6, therefore the linearized system is controllable. We simply choose Q =
I 6 and R = I to obtain the first gain K that stabilizes the system by using the MATLAB command
“lqr(A,B,Q,R)” and then adjust the entries of these two matrices to optimize the performance of the
As we discussed in section 3.1, Q and R are selected by the design engineer. Different choice
of these design parameters will lead to different control performance for the closed-loop system.
Generally speaking, a large Q means that, to keep V small, the state x must be smaller, resulting
in the poles of the closed-loop system matrix Ac = (A − BK) being further left in the s-plane so
that the states converge faster to zero. On the other hand, selecting R large means that the control
input u must be smaller to keep V small which implies a less control effort. In this case the poles
20
A reasonable simple choice for the matrices Q and R is given by the Bryson’s rule [11]. Select Q
1
Qii = , i = 1, . . . , l, (3.23)
maximum acceptable value of x2i
1
Qjj = , j = 1, . . . , k. (3.24)
maximum acceptable value of u2j
In this way, the Bryson’s rule scales the variables that appear in V so that the maximum accept-
able value for each term is one. This is especially important when the units used for the different
components of u and x make the values of the variables numerically very different from each other.
Although Bryson’s rule usually gives good results, it is just the starting point to a trial-and-error
iterative design procedure aimed to obtain a controller more in line with the desirable properties for
Applying different Q and R in different initial states to construct the LQR controller in the MAT-
LAB simulation, the magnitudes of the three angular velocities are found to be around 3. The
magnitudes of the three angles are around π/3. The magnitude of the control input is around 1.
Then we choose the following Q and R to start our tuning according to the Bryson’s rule
0.9119 0 0 0 0 0
0 0.9119 0 0 0 0
0 0 0.9119 0 0 0
Q= ,
0 0 0 0.1111 0 0 (3.25)
0 0 0 0 0.1111 0
0 0 0 0 0 0.1111
R = 1,
21
and finally get an optimized result after some fine tuning as
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
Q= 0 0 0 0.1667
,
0 0 (3.26)
0 0 0 0 0.1667 0
0 0 0 0 0 0.1667
R = 0.8.
In fact, the entries of Q and R can be adjusted separately and it is possible that it will provide an
even better result. Now we only use two parameters q1 and q2 to adjust all the entries, since we find
that the changes of the angles are similar in magnitude and the same case happens to the angular
velocities. In addition, it is more convenient to show the procedure about how to adjust them and
compare the tuning results of different combinations. Actually, we find that this setting has been
The tuning effect is shown in the comparison of the control results between the LQR with the
identity weight matrices and the tuned LQR in Section 5.2. We can see that tuning makes an
improvement to the RDIP control system. However, the effect of this optimization method is limited.
>
For instance, in the initial states 0 −0.1 0.1 0 0 0 which we will discuss in Chapter IV
and V in details, it is impossible to control the system with the LQR no matter how we tune the
weight matrice. Therefore, we need to choose another way to optimize the LQR. The DAFC scheme
Although it is possible to optimize the LQR controller by adjusting the matrices Q and R on the
principle of the Bryson’s rules, we still have no idea about the stability of the RDIP system near the
22
upright position. In fact, the stability plays a central role in the analysis of a given control method
for systems. Here we will try to figure out the Region of Attraction (R.O.A) of the closed-loop
RDIP system to analyze the system stability (we will define the notion of the R.O.A later in the
introduction of the Lyapunov stability in this section). Once the R.O.A of the system is accessible,
method providing a larger R.O.A makes the system stable in a larger scale of initial states. We
will check the stability near the upright position, one of the 2 equilibrium point. An equilibrium
point is stable if all solutions staring at the nearby points stay nearby; otherwise it is unstable.
It is asymptotically stable if all solutions starting at nearby points not only stay nearby, but also
tend to the equilibrium point as time approaches infinity. In the case of the RDIP, the downward
equilibrium is an asymptotically stable equilibrium while the upper equilibrium is unstable. Stability
of equilibrium points is usually characterized in the sense of the Lyapunov function. Lyapunov’s
method helps to prove stability without requiring knowledge of the true physical energy, provided a
Lyapunov function can be found to satisfy the following constraints. One thing we need mentioned
here is that it is common that the R.O.A might not be the only concern when we decide which
control method we will select for a given system. For example, we may need a quick-response
controller that can force the system to converge to the origin as fast as possible. Also, there is a
limit for the Lyapunov stability analysis. That is, we cannot estimate directly the performance of a
controller which updates itself online, such as the DAFC method we will take in the next chapter.
Here the Lyapunov’s second method for stability is used. For more details about this method readers
can refer to [2]. Suppose we have a system with a point of equilibrium at x = 0. A Lyapunov
23
that contains the origin such that
dV (x)
V̇ (x) = ≤ 0 in D. (3.28)
dt
The trajectory of the system states can be constrained inside an area that it can never escape away
from.
Moreover, if
dV (x)
V̇ (x) = < 0 in D, (3.29)
dt
the system is asymptotically stable since the trajectory of the states approaches the origin as time
order to conclude global asymptotic stability. In general, the origin is stable if there is a continu-
ously differentiable positive definite function V (x) so that V̇ (x) is negative semidefinte, and it is
To understand this method of stability analysis, we can visualize the Lyapunov function as the
energy of a physical system. If there is no energy restored into it, the system will lose energy (due
to vibration, friction or some factors else) over time and finally stop in some final resting state. This
final state is called the attractor. For a system to be controlled there might be a lot of Lyapunov
functions that can be applied but finding an appropriate Lyapunov function to support the stability
24
Usually we will use a class of scalar functions of the quadratic form for which sign definiteness can
be easily checked
n X
X n
>
V (x) = x P x = pij xi xj , (3.30)
i=1 j=1
where P is a real positive definite symmetric matrix in which case V (x) is guaranteed to be a
good Lyapunov candidate. Here we re-take the Q matrix that we picked up for the LQR control to
where ẋ = [ẋ1 , ẋ2 , ẋ3 , ẋ4 , ẋ5 , ẋ6 ]> and Q is already selected as a postive definite diagonal matrix.
and then
To this point, we are able to find the R.O.A of the closed-loop system with the LQR controller using
25
2. Estimate the value of V̇ (x) for these samples according to (3.34).
V̇ (xp ) ≥ 0. (3.35)
4. Enlarge the sample set if there is no such a point exists and repeat Step 1 to Step 3 till a point
xp is found.
Note the approximation sign is used here because the R.O.A should be a region without xp . Since
xp is the nearest point satisfying (3.35), it is a point on the boundary between the R.O.A and the area
outside. Any point that has a less distance than xp will make V̇ (x) less than 0. If there is another
x0p for which the value of V̇ (x0p ) is semi-positive within the boundary, then the R.O.A should be
smaller than the given one. In this way, we can make sure that the statement stands. To make the
identification more precise, we could use smaller intervals to sample the testing points. Another
thing we need to mention here is that, we use the Euclidean norm to express the distance between a
||x|| = ||x||2
q (3.37)
= x21 + x22 + x23 + x24 + x25 + x26 .
26
R
q1 = π
100 10 5 4 3 2 1
1 0.0361 0.0490 0.0447 0.0436 0.0447 0.0480 0.0458
10 0.0387 0.0490 0.0566 0.0648 0.0632 0.0616 0.0592
q2
100 0.0387 0.0490 0.0566 0.0648 0.0632 0.0616 0.0592
1000 0.0387 0.0490 0.0566 0.0648 0.0632 0.0616 0.0592
q1 R.O.A q2 R
π 0.0648 10 4
π/2 0.0721 10 1
π/4 0.0755 10 5
π/8 0.0700 10 10
For the convenience of the simulation, we simply set the matrix Q the same as in Section 3.2
q1 0 0 0 0 0
0 q1 0 0 0 0
0 0 q1 0 0 0
Q= 0 0 0 q2 0 0 ,
(3.38)
0 0 0 0 q2 0
0 0 0 0 0 q2
where q1 and q2 are selected for the angles and the angular velocities of the links respectively.
Note that we only adjust two parameters q1 and q2 based on the same reason as in Section 3.2 and
actually the six states can be adjusted separately. Table 3.1 shows us one of the searching results in
simulation. Here, we fix q1 as π, and change the values of q2 and R with some of their representative
values. Thus we can find that when q2 = 10 and R = 4 the system get its largest R.O.A with the
fixed q1 . Table 3.2 shows that the largest R.O.A of the system with different q1 . For each option of
q1 , not only the largest R.O.A but also its corresponding q2 and R values are provided in this table.
27
From Table 3.2, we can see that when q1 = π/4, q2 = 10 and R = 5, the LQR gets its largest
R.O.A as 0.0755 which means that if ||x|| < 0.0755, the derivative of the Lyapunov function is
negative and therefore system is asymptotically stable. Beyond the R.O.A, the behavior of the RDIP
system is not able to be predicted. There are still some points starting from which the states of the
system will converge to the origin in the control process whereas the progress is not controllable.
Interestingly, although the R.O.A is very small, the LQR can work in a region even beyond it. This
might be possible that there exist other Lyapunov candidates can provide a better assessment about
the R.O.A than the Lyapunov function we defined here. We will leave this topic for the RDIP system
28
CHAPTER IV
ADAPTIVE CONTROL
4.1 Introduction
Since LQR can only work in a small region, we intend to optimize the performance of the LQR
controller. We will develop the LQR controller into a Direct Adatpive Fuzzy Control (DAFC)
system as mentioned in [6] and [12]. From the conclusion of [9], we know that a fuzzy controller
can be used to control the RDIP. A theoretical analysis of the stability and design of a fuzzy control
system is introduced in [13] using the Takagi-Sugeno (T-S) fuzzy model. However, there are some
problems about it. While this non-adaptive fuzzy control has proven its value in the application, it is
difficult to specify the rule base for some plants, or need could arise to tune the rule base parameters
if the plant changes. In the RDIP system, it is very hard to gather the heuristic knowledge about how
to control the RDIP to make it stands upright. Since heuristics do not provide enough information to
specify all the parameters of the fuzzy controller, a priori, adaptive schemes that use data gathered
during the on-line operation of the controller can be used to improve the fuzzy system by making it
automatically learn the parameters, to ensure that the performance objectives are met.
29
There has been some adaptive control schemes applied in system control. As the first adaptive
fuzzy controller, the linguistic self-organizing controller is introduced in [14]. Another successful
method, so-called “fuzzy model reference learning controller” is introduced in [15]-[16]. However,
the problem with them is that while they appear to be practical heuristic approaches to adaptive
fuzzy control there is no proof that these methods will result in a stable closed-loop system. Here,
we are going to optimize the LQR controller by the DAFC which has been provided with the stability
analysis in [6] and the experimental validation in [12]. Therefore the stability requirement could be
met for a safety-critical system such as the RDIP experiment plant in the lab. We will make a
The DAFC attempts to directly adjust the parameters of a fuzzy or neural controller to achieve
asymptotic tracking of a reference input. There are some advantages with the DAFC:
• The stability of this controller may be applied to systems with a state-dependent input gain,
• The DAFC method works for zero dynamics with minimum phase, however it looks like it
also works for some of the zero dynamics with non-minimum phase.
• The direct adaptive controller allows for T-S fuzzy systems, standard fuzzy systems, or neural
networks.
30
• The direct adaptive technique presented here allows for the inclusion of a known controller
uk so that it may be used to either enhance the performance of some pre-specified controller
For what follows in this chapter, the notation from [12] will be used. In the next section, we will
introduce the T-S fuzzy system first. Reader could consult [17] and [18] to fully understand this
kind of fuzzy system. Then we provide a description of the DAFC and specify the DAFC scheme
for the RDIP with the LQR controller which we have obtained as the “known part” of the controller.
This section largely follows [6] to provide an introduction of the Takagi-Sugeno (T-S) fuzzy sys-
tem. Readers can refer to [19] for more details about the T-S fuzzy system. To briefly present
Pp Pp
the notation, take a fuzzy systems dented by f˜(x). Then, f˜(x) = i=1 ci µi / i=1 µi . Here,
>
singleton fuzzification of the input x = x1 x2 . . . xn is assumed; the fuzzy system has p
rules, and µi is the value of the membership function for the antecedent of the ith rule given the
Pp
input x. It is assumed that the fuzzy system is constructed in such a way that i=1 µi 6= 0 for
all x ∈ Rn . The parameter ci is the consequent of the ith rule which, in this paper, will be taken
31
Figure 4.1: Membership function with c = 1 and σ = 0.25.
Then, the nonlinear equation that describes the fuzzy system can be written as f˜(x) = z > Aζ. Here
we present one of the membership functions we use in the DAFC design for our system in Figure
4.1.
32
4.3 Theory
A DAFC controller directly adjusts the parameters of a controller to meet some performance spec-
ifications. In [12], the author developed the adaptive control method by assuming that 0 < β0 ≤
β(x) ≤ β1 < ∞. For the RDIP, the assumption holds for −∞ < β1 ≤ β(x) ≤ β0 < 0 which
can be found out from the simulation result of the RDIP dynamics in MATLAB. Here we will re-
develop the direct adaptive scheme for our case and at the same time provide readers a description
of the method.
f (x), g(x) ∈ Rn , and h(x) ∈ R are smooth. The system has a “strong relative degree” r as
ẏ = Lf h(x),
ÿ = L2f h(x),
(4.3)
..
.
(r−1)
y (r) = Lrf h(x) + Lg Lf h(x)u,
where Lrf h(x) is the rth Lie derivative of h(x) with respect to f and Lrg h(x) is the rth Lie derivative
depend on the states) or known exogenous time dependent signals and that α(t) and β(t) represent
nonlinear dynamics of the plant that are unknown. It is also assumed that if x is a bounded state
vector, then αk (t) and βk (t) are bounded signals. Throughout the analysis to follow, both αk (t) and
1. The plant is of relative degree 1 ≤ r < n with the zero dynamics exponentially attractive and
of ψ(ξ, π), the plants satisfying this assumption have bounded states [20].
3. We require that βk (t) = 0, t ≥ 0 and some function B(x) ≥ 0 such that |β̇(x)| = |(∂β/∂x)ẋ| ≤
B(x).
Although Assumption 1 is not met in our case since our system has an undetermined zero dynamics,
we will derive the control scheme for our system despite this condition. From [6] we know that this
method works for some simple nonlinear systems that are non-minimum-phase such as the rotational
single inverted pendulum. We are trying to implement this method in a more complicated case and
verify if it will still work. Assumption 1 also introduces a requirement that the controller gain β(x)
34
be bounded by a constant β0 from above and a constant β1 from below. The third restriction requires
that |β̇(x)| ≤ B(x) for some B(x) > 0. If k∂β/∂xk and kẋk are bounded, then some B(x) may
be found. If the controller gain of the system is finite, k(∂β/∂x)k is bounded. If y (i) is bounded
as i = 0, . . . , r, then plants with no zero dynamics are ensured that kẋk is bounded since the states
can be represented in terms of outputs y (i) . For a plant has zero dynamics, if β(x) is not dependent
upon the zero dynamics, then once again we have |β̇(x)| bounded. In [6], a function of x is found as
α1 (x) to meet Assumption 4. In this paper, we will use a constant as our global bound for α(x) to
simplify the choice of this function. The constant is obtained by the observation in the RDIP system
simulation results.
Using feedback linearization theory in [2], we assume that there exists some ideal controller
1
u∗ = [−α(x) + ν(t)] , (4.6)
β(x)
where ν(t) is a free parameter. We may express u∗ in terms of T-S fuzzy model, so that
where zu ∈ Rmu , ζu ∈ Rpu and A∗u ∈ Rmu ×pu is the ideal direct control parameters
" #
A∗u : = arg min sup |zu> Au ζu − (u∗ − uk )| . (4.8)
Au ∈Ωu X∈Sx ,ν∈Sm
du (x) is an approximation error which arises when u∗ is represented by a fuzzy system. We assume
that Du (x) ≤ |du (x)|, where Du (x) is a known bound on the error in representing the ideal con-
troller with a fuzzy system. If |du (x)| is to be small, then our fuzzy controller will require x and
ν to be available, either through the input membership function or through zu> . uk is a known part
of the controller. The DAFC attempts to directly determine a controller, so within this chapter we
allow for a known part of the controller that is perhaps specified via heuristics or past experience
35
with the application of conventional direct control (in our case, LQR). The approximation of the
desired control is
û = zu> Au ζu + uk , (4.9)
is used to define the difference between the parameters of the current controller and the desired
In general, the DAFC is comprised of a bounding control term ubd , a sliding-mode control term usd ,
(r)
and an adaptive control term û. Here we define ν : = ym + ηes + ēs − ak (t) with e0 , es and ēs as
defined
e0 = ym − yp ,
(r−1)
es = [e0 . . . e0 ][k0 . . . kr−2 , 1]> , (4.12)
(r)
ēs = ės − e0 ,
where L̂(s) : = sr−1 + kr−2 sr−2 + . . . + k1 s + k0 has its poles in the open left-half plane.
36
4.3.1 Bounding Control
We now define the bounding control term ubd of the DAFC. The bounding control term is deter-
mined by considering
1
ubd = e2s . (4.14)
2
We do not explicitly know u∗ , however, the bounding controller can be implemented using α1 (x) ≥
|α(x)| as
(
α1 (x)+|ν|
−|û| − |usd | + β0 sgn(es ) if |es | > Me ,
ubd = (4.16)
0 if else.
Thus we are ensured that if there exists a time t0 such that |es (t0 )| > Me , then for t > t0 , |es (t)|
1 1
Vd = − e2 + tr(φ>
u Qu φu ), (4.18)
2β(x) s 2
where Qu ∈ Rmu ×mu is positve definite and diagonal, and φu = Au − A∗u . Since −∞ < β1 ≤
β(X) ≤ β0 < 0, Vd is radially unbounded. The tr(·) is the trace operator. The Lyapunov candidate
37
Vd is used to describe the error in tracking and the error between the desired controller and current
controller. If Vd → 0, then both the tracking and learning objectives have been fulfilled. Taking the
es β̇(x)e2s
V̇d = − [ės ] + tr(φ> Q φ̇
u u u ) + . (4.19)
β(x) 2β 2 (x)
es β̇(x)e2s
V̇d = − [−ηes − β(x)(û − u∗ ) − β(x)(usd + ubd )] + tr(φ> Q φ̇
u u u ) + . (4.20)
β(x) 2β 2 (x)
η 2 β̇(x)e2
V̇d = es + [zu> φu ζu − du + usd + ubd ]es − tr(zu> φu ζu )es + 2 s . (4.22)
β(x) 2β (x)
The projection algorithm mentioned in [12] is used to ensure that Thus we have
η 2 β̇(x)e2
V̇d ≤ es + [zu> φu ζu − du + usd + ubd ]es − tr(zu> φu ζu )es + 2 s . (4.23)
β(x) 2β (x)
4.4 Implementation
Although the theoretical analysis in [12] uses the assumption that the unknown control law u∗ which
the DAFC tries to identify is a feedback linearizing law, it was found experimentally in [6] that it
is not necessarily the case. If the adaptation mechanism is initialized appropriately in accord with
the known controller such as the LQR, the adaptation algorithm will converge to a controller that
might behave in a very different manner because this mechanism seems to try to find the local
optimum controller closest to its starting point in the search space and, in our case, an optimized
This finding is very important in the case that the control design involves dealing with a non-
minimum-phase plant or a system with internal dynamics that are hard to identify. If a non-adaptive
controller is available that can control the system regardless of whether the system is minimum-
phase, then it is possible that the desirable boundedness characteristics of this controller can be
incorporated into the DAFC design, and enhance by the robustness that the adaptive method pro-
vides.
y = x3 (4.27)
39
Then we have
ẏ = x6 ,
(4.28)
ÿ = f6 (x) + g6 (x)u.
To this point, we find that the zero dynamics of the system are very hard to identify. But we already
know that the LQR controller works for the RDIP system, therefore a DAFC controller is possible
First we present the conditions mentioned in the former section for the DAFC. we set the known
bound for the approximation error Du (x) as 0.01. In practice it is often hard to have a concrete idea
about the magnitude of Du (x), because the relation between u∗ and its fuzzy representation might
be difficult to characterize; however, it is much easier to begin with a rough, intuitive idea about
this bound, and then iterate the design process and adjust it, until the performance of the controller
indicates that one is close to the right value. For the simulation, we found that Du (x) = 0.01 gives
us good results. A small Du (x) indicates that the fuzzy system could represent the ideal controller
very accurately.
We are going to search for u∗ using (4.9) where ζu ∈ R2187 , with the membership functions shown
in Figure 4.1. The mathematical description of the membership functions are provided in the section
5.3. We choose the number of rules p = 37 = 2187. And the matrix Au (t) ∈ R7×2187 is adaptively
40
The fuzzy system uses 37 rules and each ci (x) is a row of the matrix z > Au (t). We initialize the
fuzzy system approximation by letting Au = 0 since we know nothing about the optimal controller.
The DAFC control law is given by ud = û + usd + ubd as we have discussed before. The sliding
term is given by (4.26). In simulation we find that β(x) is between −128 and −135, thus we choose
β0 as −100. We also choose B(x) as 250 for safety. The bounding term needs the assumption that
α(x) is bounded, with |α(x)| ≤ α1 (x). We find that α(x) is always less than 14.6, therefore we
safely choose α1 (x) as 20. Then we have ubd as defined in (4.16). For simulation we use Me =8,
because by some calculation from the simulation results, we can find that Me is always less than
4.6 when the system works. Actually, the parameter Me defines a bounded, closed subset of the es
error-state space within which the error is guaranteed to stay. ν is defined as in the section above.
Here we select η = 1 and k0 = 5 for the es = k0 e0 + ė0 . With this choice, the poles of the error
transfer function are at s = −1 and s = −5, which produce a small error settling time.
The last part of the DAFC mechanism is the adaptation law, which is chosen in such a way that
the output error converges asymptotically to zero, and the parameter error remains at least bounded.
For this law We choose Qu = 1.2I 7 with which the algorithm is able to adapt and estimate the
control law û fast enough to perform well and compensate for disturbances, but without inducing
41
CHAPTER V
SIMULATION RESULTS
We will use the MATLAB programming engineering environment to do the simulation all through
this paper. The solver “ode45” in MATLAB is used here in all cases to solve initial value problems
for ordinary differential equations. It is important to notice that both the LQR and DAFC controllers
are continuous time techniques, to implement them we use a digital computer, and thus are forced
to implicitly use a discrete time approximation of the controller. It is reasonable to think that a proof
of stability is still applicable when a continuous time technique is discretized, but such a study is
>
The initial conditions x(0) are set to be x(0) = 1 0.1 0.1 0 0 0 which means the pendu-
lums are nearly in the upright position at the starting point. Note here the initial states are expressed
in rad or rad/s while we will show the states in all the figures in deg and deg/s for easy observa-
tion. And the performance of the RDIP model without control is shown below in Figure 5.1. All the
42
Figure 5.1: Open-loop simulation in the initial states θ1 = 57.2958◦ , θ2 = 5.7296◦ and θ3 =
5.7296◦ .
simulation results of the link angles of the RDIP will scale from −100◦ to 250◦ for the convenience
As we can see from Figure 5.1, when there is no control input into the system, the pendulums fall
down directly to its stable equilibrium in the downward position, where the RDIP keeps in its lowest
total energy state. We can also see that the angle of Link 1 converges to a position near the initial
states. Since there is no input torque and only the small viscous friction working on the base arm,
the base arm can be seen as a conservative system along the horizontal plane. Therefore the base
arm finally goes back to its starting point. As of now, the model we build seems acceptable to
simulate the behavior of the RDIP system. During the simulation, a 3-D dynamical model is also
43
built to provide a direct insight of what the pendulum is doing by using the Euclidean geometry to
calculate the relative position from the pivot to the end for each link.
In MATLAB, the LQR gain can be computed directly once we select the values of the parameters
Q and R. The MATLAB command “LQR” can help us speed up the calculation. We will set the
K = −1.0000 211.7112 −120.5868 −2.2287 56.0199 −5.3027 . (5.2)
The same initial states as in Figure 5.1 is used. The simulation result of the link angles is shown in
Figure 5.2. The control input is shown together with the one of the tuning system in Figure 5.4. We
can see that LQR provides a very good performance to control the system. All the links converge
to zero and the control input is relatively small compared with the control signal we will get later
from the DAFC simulation. That means the LQR controller will save the energy used to control the
Then we will try to adjust the parameters of the LQR for the purpose of optimizing the control
performance. Taking many simulation trails in different initial states with different Q and R, we get
an intuitive understanding about the scale of the system states. The maximum of the absolute value
44
Figure 5.2: LQR simulation result with Q = I 6 and R = I.
of the six states and the input are respectively around 1.5, 0.6, 0.5, 5, 1, 5 and 1. Thus, according
to the Bryson’s rules, we can set the Q and R with this information to start tuning. Then after some
K = −1.1180 135.6454 −70.5044 −1.8397 36.1255 −2.7697 . (5.4)
The simulation result is in Figure 5.3. Clearly, the performance of the LQR is improved by tuning.
The negative peak value of the position variation of Link 1 is nearly 50◦ in the improved system
45
Figure 5.3: Tuned LQR simulation result.
while it is 25◦ larger in the original system. Besides, at around 3.5s the tuned system has already
As we mentioned above, the control signals for the two cases are shown in Figure 5.4. Here we
scale the control signals from -2 to 2 by truncating the initial large negative peaks in both of the
LQR and its tuned version. For the LQR, the initial peak is -5.3961 and for the tuned one it changes
to -8.1124. We can see that, the tuning control signal react quickly than the original one. It looks
like the positive peak of the tuning LQR is higher than the original one, however, since it falls down
faster that original one, the total energy of the tuned system used for control is not large compared
with the original one. In fact, using (3.6) to calculate the total energy of the control process for these
two cases, we find that the control energy cost for the tuned LQR is 5.8730, which is much smaller
than the total cost of the original LQR, 19.7358. Note that here there is an intense change in both of
46
Figure 5.4: Control signals of the LQR and its tuned version.
the control signal at the beginning of the control process. We will see the same phenomenon in the
DAFC Simulation.
Firstly, we provide a case that the LQR loses its impact on controlling the system, even with its
>
parameters tuned. For the initial states x(0) = 0 −0.1 0.1 0 0 0 , the system states will
diverge over time. We will see that the RDIP system can be controlled with the adaptive LQR
control scheme.
47
zi c σ
1 1 0.25
x1 1 0.25
x2 0.1 0.025
x3 0.2 0.05
x4 4 1
x5 0.6 2
x6 3 0.75
A Gaussian membership function with the following form is used here, which is shown in Figure
Left:
(
1 if x ≤ −c,
µ(x) = −( x+c
2
(5.5)
2σ )
e if x > −c.
Center:
2
x
−( 2σ )
µ(x) = e . (5.6)
Right:
( 2
−( x−c
2σ )
µ(x) = e if x < c, (5.7)
1 if x ≥ c.
With this function, we set the membership function for each term of the z matrix as in Table 5.1.
Also, we adjust the other parameters in the bounding term, sliding term and the central equivalent
term of the control scheme with the options we have picked up in Chapter IV. Note here we need to
take care of not only the parameters of the adaptive fuzzy system, but also the LQR parameters Q
48
Figure 5.5: DAFC simulation result.
and R matching with the adaptive law. The Q and R are chosen here as
0.5 0 0 0 0 0
0 0.5 0 0 0 0
0 0 0.5 0 0 0
Q= ,
0 0 0 1.5 0 0 (5.8)
0 0 0 0 1.5 0
0 0 0 0 0 1.5
R = 1.
Then we have the the result for the DAFC as in Figure 5.5 and the DAFC control signal as in Figure
5.6, where the control signal is scaled from -20 to 20 for observation by truncating the initial positive
peak 58.2286.
In Figure 5.5 we find that the adaptive LQR really works for the given initial states. The system
finally converges into zero. In [6], the state x1 in the case of a single inverted pendulum converges to
49
Figure 5.6: DAFC control signal.
a stable state of rotating with constant speed. If we could figure out the RDIP has a non-minimum-
phase zero dynamics, together with the finding from that paper with ours, we conclude that the
DAFC can work on a non-minimum-phase system and the system states will converge to a stable
state. Further analysis is still needed for a theoretical evidence of the effectiveness of the DAFC
From Figure 5.6 we can see that the control signal of the DAFC method vibrates more intensely
than the conventional LQR. We find that there is a sudden change at the beginning of the control
process, which also happens in the original LQR case. As we mentioned before, the DAFC tries
to follow the known controller part and optimize it. Thus it may inherit some characteristic of the
known controller. The magnitude of the DAFC control signal seems larger than the LQR, but it
indeed improves the LQR controller by making the system stable in the given initial states that the
50
LQR loses its power. In addition, the control signal of the DAFC has a magnitude much larger than
the LQR. Since a high feedback gains may lead to torque saturation, noise amplification, and other
From the characteristics information of the DC motor for the SV02+DBIP experiment in the “Quanser
Systems and Procedures” technique document in the lab KL302, we can figure out that the motor
can provide to the system with a torque no less than 83.8856N · m, which is large enough for our
DAFC control scheme since the largest control signal peak that we have got is 58.2286.
In Figure 5.7, we provide the comparison of the R.O.A of the original LQR and the DAFC. Here
we sample for the initial states of the system in a range where θi ∈ [−0.1, 0.1], f or i = 1, 2, 3
and assume θi = 0, f or i = 4, 5, 6 to simplify the comparison and make it possible to show the
result on a 3-D plot. We sample θ1 and θ2 at an interval of 0.02, and we sample θ3 at an interval
of 0.01. The blank region within the cube is where both the LQR and the DAFC works. The blue
points shows where the DAFC still works in these initial states while the original LQR has lost its
power to make the system converge to the zero point. We can see clearly that the DAFC actually
51
Figure 5.7: LQR vs. DAFC.
52
CHAPTER VI
CONCLUSION
We have studied two control approaches for the RDIP. First of all, a mathematical model is built
with the E-L method. The rotary frictions of the links are considered in model building while we
In the next step, we have developed a LQR controller for the system after linearizing the system with
two methods which is equivalent with each other. The LQR method presents an adequate behavior
on the plant in terms of our basic control objective to balance the pendulum. We introduce the
Bryson’s rule as a starting point for tuning the LQR controller, and then improve the performance
of the LQR with some fine tuning. For the exploration of the stability of the LQR, we discuss the
Lyapunov stability of the LQR controller and then get the R.O.A for the system. We can find that
beyond the scope of the R.O.A, there are still a large area that the LQR controller works, such as the
initial states we have used for the original LQR and its tuning test. But we cannot claim that there
exists a neighborhood outside the R.O.A where the LQR control can always work. For example,
>
in the initial states 0 −0.05 0.1 0 0 0 the LQR controller will lose its power and this
>
point is much closer than the point 1 0.1 0.1 0 0 0 which has been proven to be stable.
53
The R.O.A of the RDIP system is still limited; one can continue to adjust the parameters of the
Since the performance of the LQR is limited, we tried to optimize the LQR with the DAFC. This
method directly approximates the ideal controller by using the T-S fuzzy set. We are able to increase
robustness using this method compared with the LQR. We find that with this method, the good
characteristic of the LQR can be retained and at the same time the benefits of adaptation are added.
Following the simulation result in [6], we applied the adaptive fuzzy control scheme in a more
complex system and get a better control result which shows that all the states of the system converges
to zero, while the base arm in [13] converge to another stable state, a constant-speed rotational
movement. It indicates that the DAFC is able to improve the LQR by improving its robustness
adaptively. It is possible to design a DAFC that gives us bounded states in spite of the marginal
stability of the zero dynamics. However, we provided no theoretical justification of the fact that this
The DAFC can also be improved in some ways. An approach that can be applied is the incorporation
of heuristics about the inverse plant dynamics to speed the adaptation. An inverse plant is a fuzzy
system that is heuristically designed to roughly approximate the plant’s inverse dynamics. The
One must be careful in trying to evaluate these results. It is probably not fair to say that the LQR
failed and the DAFC succeeded, recalling that the pendulum does not satisfy the zero-dynamics
assumption of the DAFC method. However, our experience indicates that at least in some cases, the
adaptive fuzzy method we have investigated has an advantage with respect to the conventional LQR
54
method. It allows for more design flexibility. This is clearly illustrated by our adaptive design. The
DAFC using a LQR as the known part of the controller displays an improved behavior in comparison
with the conventional LQR technique. Apparently, the use of our knowledge of what the control law
should be helpful to increase the robustness of the algorithm. We manage to obtain an improvement
Although the result we have obtained seems to indicate that the adaptive LQR can improve the
original LQR and even work with systems with non-minimum-phase zero dynamics, it is still nec-
essary to evaluate the performance of the DAFC under a greater variety of conditions. It remains
to be investigated how robust the controllers are against many different types of disturbances, for
instance, we did not study how the adaptive fuzzy controllers react to a “white noise” disturbance
Another thing needs to be mentioned is that, we can see from Figure 5.6, there is a large peak in the
control signals at the beginning of the control process. It might be due to the zero initial states of
Au . One can try to reset the initial states of the fuzzy set to improve the magnitude of the control
input.
As we mentioned before, this paper is a theoretical preparation for the RDIP experiment in the lab
KL302. One can implement these control schemes in experiment to verify the simulation results,
which will be a great challenge since in the experiment, there may be some disturbance, unknown
55
BIBLIOGRAPHY
[1] V. Sukonatanakarn and M. Parnichkun, “Real-time optimal control for rotary inverted pendu-
lum,” American Journal of Applied Sciences, 2009.
[2] H. K. Khalil, Nonlinear Systems. Upper Saddle River, NJ: Prentice Hall, 2002.
[3] R. W. Brockett and H. Li, “A light weight rotary double pendulum: Maximizing the domain of
attraction,” in IEEE Decision and Control Conference, (Maui, Hawaii), pp. 3299–3304, Dec
2003.
[4] J. Driver and D. Thorpe, “Design, build and control of a single/ double rotational inverted
pendulum,” tech. rep., School of Mechanical Engineering, The University of Adelaide, 2004.
[5] V. Casanova, J. Salt, R. Piza, and A. Cuenca, “Controlling the double rotary inverted pendu-
lum with multiple feedback delays,” International Journal of Computers Communications and
Control, vol. 7, pp. 20–38, Mar. 2012.
[6] R. Ordonez, J. Zumberge, J. T. Spooner, and K. M. Passino, “Adaptive fuzzy control: Exper-
iments and comparative analysis,” IEEE Transactions on Fuzzy Systems, vol. 5, pp. 167–188,
May 1997.
[7] C. Fox, An Introduction to the Calculus of Variations. New York, NY: Dover Publications,
2010.
56
[9] Y. Wang, “Rotation double inverted pendulum,” tech. rep., School of Electrical and Computer
Engineering, The University of Dayton, Apr. 2012.
[10] Z. Gajic, Linear Dynamic Systems and Signals. Upper Saddle River, NJ: Prentice Hall, 2002.
[12] J. T. Spooner and K. M. Passino, “Stable adaptive control using fuzzy system and neural
networks,” IEEE Transactions on Fuzzy Systems, vol. 4, pp. 339–359, Aug. 1996.
[13] K. Tanaka and M. Sugeno, “Stability analysis and design of fuzzy control systems,” Fuzzy Sets
and Systems, vol. 45, pp. 135–156, Jan. 1992.
[15] J. R. Layne and K. M. Passino, “Fuzzy model reference learning control,” J. Intell. Fuzzy Syst.,
vol. 4, no. 1, pp. 33–47, 1996.
[16] J. R. Layne and K. M. Passino, “Fuzzy model reference learning control for cargo ship steer-
ing,” in IEEE Contr. Syst. Mag., pp. 23–24, Dec 1993.
[17] T. .Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling
and control,” TSMC, 1985.
[18] J. T. Spooner, M. Maggiore, R. Ordonez, and K. M. Passino, Stable Adaptive Control and
Estimation for Nonlinear Systems. New York, NY: John Wiley and Sons, 2002.
[19] T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling
and control,” TSMC, vol. 15, pp. 116–132, Jan. 1985.
[20] S. Sastry and M. Bodson, Adaptive Control: Stability, Convergence and Robustness. Engle-
wood Cliffs, NJ: Prentice Hall, 1989.
57