Understanding SOLID Principles in Software
Understanding SOLID Principles in Software
SOLID Principles
Here, SOLID stands for:
S – Single Responsibility Principle (SRP)
Here, a class Shapes is trying to represent different shapes, calculate different properties, etc.
Although simple, we can see the type of complexity and maintenance issues this will cause
once the number of shapes handled by the class starts increasing. To prevent code from
getting into such pitfalls, various methodologies were created which include SOLID .
As depicted above, our class Shapes is trying to do a lot of things and it is easy to see that it
has multiple responsibilities. Lets apply the SRP to Shapes.
The class diagram depicts the scenario after applying SRP. Our Shapes class was handling
all shapes as well as printing. Now, we created an IShapes interface that defines the
operations a shape must perform and it is implemented by individual shape classes. So now,
class Circle is responsible for representing only a circle and its operations. Thus, the only
reason for a Circle class to change is for any change related to a circle. Same is true for
classes Triangle, Rectangle and any other shape class that may be added subsequently. Also,
printing a value is not a responsibility of a shape class and can be given to a
separate ConsolePrinter class, to print the values on console.
It means every subclass/derived class should be substitutable for their base/parent class. And
this substitution should be complete, i.e. no functionality or invariant of the base class should
be broken. Invariants can be assumptions of behaviour by clients, or any preconditions or
post-conditions. A client should not know the actual type of the object it is using, and use an
object of derived type in the same way it would use an object of base class.
Lets try to further understand LSP with our shapes example. Now, we know that square is a
type of rectangle and so we can say that square is-a rectangle and add a class Square that
derives from our Rectangle class.
1. l = 3, b = 2 – Here we initialise rectangle as a new Rectangle & get the expected result 6
2. l = 3, b = 3 – Here we initialise rectangle as a new Square & get the expected result 9
3. l = 3, b = 4 – Here we initialise rectangle as a new Square & get the result 9 instead of
the expected result 12.
So, if both the second and third scenarios uses an object of type Square, why is the behaviour
different? Actually the behaviour is not different. As we can see, the CalculateArea method
of Square uses only Length to calculate the area as for a square l = b. So, even if we set the
value for Breadth, Square ignores it. Now, scenario 2 works as we are
setting both Length and Breadth as 3. So the rectangle we are trying to create with
a Square object is actually a square (l = b). Whereas in scenario 3 we are initialising a
rectangle that is not a square (l = 3, b = 4) as an object of type Square. So we are
essentially substituting the class Rectangle with Square, as Square inherits (is-a) Rectangle,
and calling the CalculateArea method of Square expecting it should work for Rectangle and
it does not.
So, as per the Liskov Substitution Principle, if we want to add a class for square to our
application, it should not derive from Rectangle but from the interface IShapes (OCP).
Object Oriented Analysis and Design is still a very popular way of designing and
implementing software. These metrics focus on class and design characteristics and can be
used to measure the quality of an object-oriented design. Metrics help in attaining more
accurate estimations of project milestones and develop software with minimal faults. Metrics
also reflect on the quality of software design.
McCabe Cyclomatic Complexity (CC) – This complexity metric is probably the most
popular one and calculates essential information about constancy and maintainability of
software system from source code. It gives insight into the complexity of a method. The idea
is to see how the sequence of a program takes as a graph with all possible paths. The
complexity is determined as ―connections – nodes + 2‖ which gives a numerical value
denoting how complex the method is. A large complexity number indicates the possibility of
errors; therefore, a too high McCabe number should be avoided.
Example :
IF A = 10 THEN
IF B > C THEN
A=B
ELSE
A=C
ENDIF
ENDIF
Print A
Print B
Print C
FlowGraph:
The Cyclomatic complexity is calculated using the above control flow diagram that shows
seven nodes(shapes) and eight edges (lines), hence the cyclomatic complexity is 8 - 7 + 2 =
3
Weighted Method per Class (WMC) – This metric indicates the complexity of a class. One
way of calculating the complexity of a class is by using cyclomatic complexities of its
methods. One should aim for a class with lower value of WMC as a higher value indicates
that the class is more complex.
Depth of Inheritance Tree (DIT) – DIT measures the maximum path from node to the root
of tree. This metric indicates how far down a class is declared in the inheritance hierarchy.
The following figure shows the DIT value for a simple class hierarchy.
DIT can be used as a measure of complexity of behavior of class, the complexity of design of
a class and potential reuse.
Number of Children (NOC) – This metric indicates how many sub-classes are going to
inherit the methods of the parent class. As shown in above figure, class C2 has 2 children,
subclasses C21, C22. The value of NOC indicates the level of reuse in an application. If
NOC increases it means reuse increases.
Coupling between Objects (CBO) – The rationale behind this metric is that an object is
coupled to another object if two object acts upon each other. If a class uses the methods of
other classes, then they both are coupled. An increase in CBO indicates an increase in
responsibilities of a class. Hence, the CBO value for classes should be kept as low as
possible.
Lack of Cohesion in Methods (LCOM) – LCOM can be used to measure the degree of
cohesiveness present. It reflects on how well a system is designed and how complex a class
is. LCOM is calculated by ascertaining the number of method pairs whose similarity is zero,
minus the count of method pairs whose similarity is not zero.
Method Hiding Factor (MHF) – MHF is defined as the ratio of sum of the invisibilities of
all methods defined in all classes to the total number of methods defined in the system. The
invisibility of a method is the percentage of the total classes from which this method is not
visible.
Attribute Hiding Factor (AHF) – AHF is calculated as the ratio of the sum of the
invisibilities of all attributes defined in all classes to the total number of attributes defined in
the system.
Method Inheritance Factor (MIF) – MIF measures the ratio of the sum of the inherited
methods in all classes of the system to the total number of available method for all classes.
Attribute Inheritance Factor (AIF) – AIF measures the ratio of sum of inherited attributes
in all classes of the system under consideration to the total number of available attributes.
Class – Class is the fundamental unit of object oriented design. So, the class level metrics
can be a good judge on the quality of design. WMC measures the complexity of the methods.
LCOM reflects on the cohesiveness amongst classes.
Attribute – Attributes define the properties of a data object. AHF and LCOM metric are
crucial metrics which reflect on the usage of attributes.
Method – Methods are invoked to perform operations. WMC and LCOM reflect on the
design and usage of methods.
Coupling and Cohesion – The most essential metrics are the ones related to Coupling and
Cohesion. They not only measure systems structural complexity but also are used to assess
design. A class is coupled with one more classes if the methods of one class use the methods
or attributes of the other classes. Many maintenance challenges can be directly attributed to
high-level of coupling and low level of cohesion.
Despite its long name, WMC is simply the method count for a class.
WMC = number of methods defined in class
Keep WMC down. A high WMC has been found to lead to more faults. Classes with many
methods are likely to be more application specific, limiting the possibility of reuse. WMC is
a predictor of how much time and effort is required to develop and maintain the class. A
large number of methods also means a greater potential impact on derived classes, since the
derived classes inherit (some of) the methods of the base class. Search for high WMC values
to spot classes that could be restructured into several smaller classes.
What is a good WMC? Different limits have been defined. One way is to limit the number of
methods in a class to, say, 20 or 50. Another way is to specify that a maximum of 10% of
classes can have more than 24 methods. This allows large classes but most classes should be
small.
The deeper a class is in the hierarchy, the more methods and variables it is likely to inherit,
making it more complex. Deep trees as such indicate greater design complexity. Inheritance
is a tool to manage complexity, really, not to not increase it. As a positive factor, deep trees
promote reuse because of method inheritance.
A high DIT has been found to increase faults. However, it’s not necessarily the classes
deepest in the class hierarchy that have the most faults. Glasberg et al. have found out that
the most fault-prone classes are the ones in the middle of the tree. According to them, root
and deepest classes are consulted often, and due to familiarity, they have low fault-proneness
compared to classes in the middle.
A recommended DIT is 5 or less. The Visual Studio .NET documentation recommends that
DIT <= 5 because excessively deep class hierachies are complex to develop. Some sources
allow up to 8.
NOC equals the number of immediate child classes derived from a base class. In Visual
Basic .NET one uses the Inherits statement to derive sub-classes. In classic Visual Basic
inheritance is not available and thus NOC is always zero.
NOC measures the breadth of a class hierarchy, where maximum DIT measures the depth.
Depth is generally better than breadth, since it promotes reuse of methods through
inheritance. NOC and DIT are closely related. Inheritance levels can be added to increase the
depth and reduce the breadth.
A high NOC, a large number of child classes, can indicate several things:
High NOC has been found to indicate fewer faults. This may be due to high reuse, which is
desirable.
A class with a high NOC and a high WMC indicates complexity at the top of the class
hierarchy. The class is potentially influencing a large number of descendant classes. This can
be a sign of poor design. A redesign may be required.
Not all classes should have the same number of sub-classes. Classes higher up in the
hierarchy should have more sub-classes then those lower down.
Two classes are coupled when methods declared in one class use methods or instance
variables defined by the other class. The uses relationship can go either way: both uses and
used-by relationships are taken into account, but only once.
Multiple accesses to the same class are counted as one access. Only method calls and
variable references are counted. Other types of reference, such as use of constants, calls to
API declares, handling of events, use of user-defined types, and object instantiations are
ignored. If a method call is polymorphic (either because of Overrides or Overloads), all the
classes to which the call can go are included in the coupled count.
A useful insight into the 'object-orientedness' of the design can be gained from the system
wide distribution of the class fan-out values. For example a system in which a single class
has very high fan-out and all other classes have low or zero fan-outs, we really have a
structured, not an object oriented, system.
The response set of a class is a set of methods that can potentially be executed in response to
a message received by an object of that class. RFC is simply the number of methods in the
set.
RFC = M + R (First-step measure)
RFC’ = M + R’ (Full measure)
M = number of methods in the class
R = number of remote methods directly called by methods of the class
R’ = number of remote methods called, recursively through the entire call
tree
A given method is counted only once in R (and R’) even if it is executed by several methods
M.
Since RFC specifically includes methods called from outside the class, it is also a measure of
the potential communication between the class and other classes.
A large RFC has been found to indicate more faults. Classes with a high RFC are more
complex and harder to understand. Testing and debugging is complicated. A worst case
value for possible responses will assist in appropriate allocation of testing time.
RFC is the original definition of the measure. It counts only the first level of calls outside of
the class. RFC’ measures the full response set, including methods called by the callers,
recursively, until no new remote methods can be found. If the called method is polymorphic,
all the possible remote methods executed are included in R and R’.
The use of RFC’ should be preferred over RFC. RFC was originally defined as a first-level
metric because it was not practical to consider the full call tree in manual calculation. With
an automated code analysis tool, getting RFC’ values is not longer problematic. As RFC’
considers the entire call tree and not just one first level of it, it provides a more thorough
measurement of the code executed.
The 6th metric in the Chidamber & Kemerer metrics suite is LCOM (or LOCOM), the lack
of cohesion of methods. This metric has received a great deal of critique and several
alternatives have been developed.
It’s also called LCOM or LOCOM, and it’s calculated as follows:
Take each pair of methods in the class. If they access disjoint sets of instance variables,
increase P by one. If they share at least one variable access, increase Q by one.
LCOM1 = P − Q, if P > Q
LCOM1 = 0 otherwise
LCOM1 = 0 indicates a cohesive class.
LCOM1 > 0 indicates that the class needs or can be split into two or more classes, since its
variables belong in disjoint sets.
Classes with a high LCOM1 have been found to be fault-prone.
A high LCOM1 value indicates disparateness in the functionality provided by the class. This
metric can be used to identify classes that are attempting to achieve many different
objectives, and consequently are likely to behave in less predictable ways than classes that
have lower LCOM1 values. Such classes could be more error prone and more difficult to test
and could possibly be disaggregated into two or more classes that are more well defined in
their behavior. The LCOM1 metric can be used by senior designers and project managers as
a relatively simple way to track whether the cohesion principle is adhered to in the design of
an application and advise changes.
LCOM1 critique
LCOM1 has received its deal of critique. It has been shown to have a number of drawbacks,
so it should be used with caution.
First, LCOM1 gives a value of zero for very different classes. To overcome that problem,
new metrics, LCOM2 and LCOM3, have been suggested (see below).
Second, Gupta suggests that LCOM1 is not a valid way to measure cohesiveness of a class.
That’s because its definition is based on method-data interaction, which may not be a correct
way to define cohesiveness in the object-oriented world. Moreover, very different classes
may have an equal LCOM1.
Third, as LCOM1 is defined on variable access, it's not well suited for classes that internally
access their data via properties. A class that gets/sets its own internal data via its own
properties, and not via direct variable read/write, may show a high LCOM1. This is not an
indication of a problematic class. LCOM1 is not suitable for measuring such classes.
To overcome the problems of LCOM1, two additional metrics have been proposed: LCOM2
and LCOM3.
A low value of LCOM2 or LCOM3 indicates high cohesion and a well-designed class. It is
likely that the system has good class subdivision implying simplicity and high reusability. A
cohesive class will tend to provide a high degree of encapsulation. A higher value of
LCOM2 or LCOM3 indicates decreased encapsulation and increased complexity, thereby
increasing the likelihood of errors.
Which one to choose, LCOM2 or LCOM3? This is a matter of taste. LCOM2 and LCOM3
are similar measures with different formulae. LCOM3 varies in the range [0,1] while
LCOM2 is in the range [0,2]. LCOM2>=1 indicates a very problematic class. LCOM3 has
no single threshold value.
It is a good idea to remove any dead variables before interpreting the values of LCOM2 or
LCOM3. Dead variables can lead to high values of LCOM2 and LCOM3, thus leading to
wrong interpretations of what should be done.
Definitions used for LCOM2 and LCOM3
Implementation details. m is equal to WMC. a contains all variables whether Shared or not.
All accesses to a variable are counted.
LCOM2
LCOM2 = 1 − sum(mA) / (m * a)
LCOM2 equals the percentage of methods that do not access a specific attribute averaged
over all attributes in the class. If the number of methods or attributes is zero, LCOM2 is
undefined and displayed as zero.
LCOM3 = (m − sum(mA)/a) / (m − 1)
LCOM3 varies between 0 and 2. Values 1..2 are considered alarming.
In a normal class whose methods access the class's own variables, LCOM3 varies between 0
(high cohesion) and 1 (no cohesion). When LCOM3=0, each method accesses all variables.
This indicates the highest possible cohesion. LCOM3=1 indicates extreme lack of cohesion.
In this case, the class should be split.
When there are variables that are not accessed by any of the class's methods, 1 < LCOM3 <=
2. This happens if the variables are dead or they are only accessed outside the class. Both
cases represent a design flaw. The class is a candidate for rewriting as a module.
Alternatively, the class variables should be encapsulated with accessor methods or
properties. There may also be some dead variables to remove.
If there are no more than one method in a class, LCOM3 is undefined. If there are no
variables in a class, LCOM3 is undefined. An undefined LCOM3 is displayed as zero.
Martin’s metrics--
In the pic.1 it can be seen that class A has outgoing dependencies to 3 other classes, that is
why metric Ce for this class is 3.
The high value of the metric Ce> 20 indicates instability of a package, change in any of the
numerous external classes can cause the need for changes to the package. Preferred values
for the metric Ce are in the range of 0 to 20, higher values cause problems with care and
development of code.
Afferent Coupling (Ca)
This metric is an addition to metric Ce and is used to measure another type of dependencies
between packages, i.e. incoming dependencies. It enables us to measure the sensitivity of
remaining packages to changes in the analysed package.
In the pic.2 it can be seen that class A has only 1 incoming dependency (from class X), that
is why the value for metrics Ca equals 1.
High values of metric Ca usually suggest high component stability. This is due to the fact
that the class depends on many other classes. Therefore, it can’t be modified significantly
because, in this case, the probability of spreading such changes increases.
This metric is used to measure the relative susceptibility of class to changes. According to
the definition instability is the ration of outgoing dependencies to all package dependencies
and it accepts value from 0 to 1.
Pic. 3 – Instability
In the pic.3 it can be seen that class A has 3 outgoing and 1 incoming dependencies,
therefore according to the formula value of metric I will equal 0,75.
This metric is used to measure the degree of abstraction of the package and is somewhat
similar to the instability. Regarding the definition, abstractness is the number of abstract
classes in the package to the number of all classes.
Preferred values for the metric A should take extreme values close to 0 or 1. Packages that
are stable (metric I close to 0), which means they are dependent at a very low level on other
packages, should also be abstract (metric A close to 1). In turn, the very unstable packages
(metric I close to 1) should consist of concrete classes (A metric close to 0).
Additionally, it is worth mentioning that combining abstractness and stability enabled Martin
to formulate thesis about the existence of main sequence (Pic. 4).
In the optimal case, the instability of the class is compensated by its abstractness, there is an
equation I + A = 1. Classes that were well designed should group themselves around this
graph end points along the main sequence.
Pic. 4 – Main sequence
This metric is used to measure the balance between stability and abstractness and is
calculated using the following formula:
The value of metric D may be interpreted in the following way; if we put a given class on a
graph of the main sequence (Pic. 5) its distance from the main sequence will be proportional
to the value of D.
@@@@@@@@@@@@@@@######END#######@@@@@@@@@@@@@@@@