Software Quality Assurance Fundamentals
Software Quality Assurance Fundamentals
The Role of SQA SQA Plan SQA considerations SQA people Quality
Management Software Configuration Management.
Software Quality Assurance (SQA) consists of a means of monitoring the software engineering
processes and methods used to ensure quality. It does this by means of audits of the quality
management system under which the software system is created. These audits are backed by one
or more standards, usually ISO 9000.
It is distinct from software quality control which includes reviewing requirements documents,
and software testing. SQA encompasses the entire software development process, which includes
processes such as software design, coding, source code control, code reviews, change
management, configuration management, and release management. Whereas software quality
control is a control of products, software quality assurance is a control of processes.
Software quality assurance is related to the practice of quality assurance in product
manufacturing. There are, however, some notable differences between software and a
manufactured product. These differences stem from the fact that the manufactured product is
physical and can be seen whereas the software product is not visible. Therefore its function,
benefit and costs are not as easily measured. What's more, when a manufactured product rolls off
the assembly line, it is essentially a complete, finished product, whereas software is never
finished. Software lives, grows, evolves, and metamorphoses, unlike its tangible counterparts.
Therefore, the processes and methods to manage, monitor, and measure its ongoing quality are as
fluid and sometimes elusive as are the defects that they are meant to keep in check.
Tasks
Quality Engineering
The activity consisting of the cohesive collection of all tasks that are primarily performed to
ensure and help continually improve the quality of an endeavors process and work products
Goals
The typical goals of quality engineering are to:
Objectives
The typical objectives of quality engineering are to:
Define what quality means on the endeavor in terms of a quality model defining quality
factors and quality sub factors.
Plan the quality tasks including helping the requirements team determine and specify the
quality requirements and associated quality factors (attributes) and quality metrics.
Assure
the
quality
of
the
process
used
by
the
endeavor.
Thus, quality assurance is concerned with fulfilling the quality requirements and
achieving
the
quality
factors
Are we building the products right?
of
the
endeavors
Control the quality of the work products delivered during the endeavor.
Thus, quality control is concerned with fulfilling the quality requirements and achieving
the
quality
factors
of
the
endeavors
work
products.
Are we building the right products?
Examples
Examples of quality engineering based on scope include:
Preconditions
Quality engineering typically may begin when the following preconditions hold:
Completion Criteria
Quality engineering is typically complete when the following post conditions hold:
Tasks
process.
The
following
diagram
illustrates
the
relationships
between
the
quality
tasks:
Plan
The purpose of this Software Quality Assurance Plan (SQAP) is to define the techniques,
procedures, and methodologies that will be used at the Center for Space Research (CSR) to
assure timely delivery of the software that meets specified requirements within project resources.
The use of this plan will help assure the following: (1) That software development, evaluation
and acceptance standards are developed, documented and followed. (2) That the results of
software quality review and audits will be given to appropriate management within CSR. This
provides feedback as to how well the development effort is conforming to various CSR
development standards. (3) That test results adhere to acceptance standards.
Teams
The SQA team shall check that the quality is maintained during the project and that the proper
quality procedures are being followed, discovered problems are reported to the Project
Management. The members of the project team must work according to the part(s) of the SQAP
that applies to their specific task.
The URD
The SPMP
The SQA team must check whether the goals of the project are clearly described. A life
cycle approach for the project must be defined. The SQA team must ensure that the
SPMP is realistic by checking:
o the assumptions made during the planning of the project;
o restrictions with respect to plan (e.g. availability of members);
o external problems (e.g. delivery of PCs, interface card and drivers).
The SCMP
With respect to the SCMP, the SQA team has to check whether the document provides
procedures concerning:
o CI identification
o CI storage
o CI change control
o CI status indication
All documents must have a unique identifier and backups must be made at least once
every three days.
The SQAP
With respect to the SQAP, the SQA team must check wether the SQAP contains:
o Project standards
o Review procedures
o Problem reporting procedures
o Responsibilities of the project members with respect to quality assurance
For the second phase of the project (SR), the SQA team must see to it that the following
documents are properly reviewed internally before they are submitted for an external review.
The SRD
The SQA team must check whether the SRD:
o contains requirements on the software to be developed, these requirements must
be based on the software requirements stated in the URD;
o contains constraints on the software to be developed, these constraints must be
based on the software contains in the URD;
o contains a priority list of the requirements.
o contains a traceability matrix.
The SPMP-SR
The SQA team must ensure that the SPMP is realistic by checking:
o the assumptions made during the planning;
o restrictions with respect to the planning (e.g. availability of members);
o external problems (e.g. external software/code).
The SCMP-SR
Which respect to the SCMP, the SQA team must check weather the SCP contains:
o The additional baselines.
The SQAP-SR
With respect to the SQAP, the SQA team must check wether the SQAP contains:
o The Tasks of the SQA team during the SR phase.
For the third phase of the project (AD), the SQA team must see to it that the following
documents are properly reviewed internally before they are submitted for an external review.
The ADD
The SQA team must check whether the ADD:
o contains an architectural design of the software to be developed, this design must
describe a logical model and the interfaces between the different classes;
o contains pre and post conditions of the methods in the logical model;
o contains a traceability matrix where the design is checked to the software
requirements in the SRD.
The SPMP-AD
The SQA team must ensure that the SPMP is realistic by checking:
o the assumptions made during the planning;
o restrictions with respect to the planning (e.g. availability of members);
o external problems.
The SCMP-AD
Which respect to the SCMP, the SQA team must check wether the SCMP contains:
o the additional baselines.
Documentation:
Project documentation may include many kinds of documents (e.g., plans, task reports,
development products, problem reports, phase summary reports). Project size, criticality (i.e., the
severity of the consequence of failure of the system), and complexity are some features that may
affect the amount of documentation a project should need. For example, the design
documentation may consist of a single document describing both the system architecture and the
detailed modules or it may consist of separate documents for the architecture and subsystems.
The purpose of this section is not to specify how many documents should be required. Rather,
this section identifies the information content needed for any project and the timeliness of
requirements so that the information can be used by the vendor, the utility, and the NRC
reviewers. Because the NRC reviewers cannot determine the characteristics of the software
product without substantial technical specifications, project plans, and reports, NRC should
specify the technical products of the vendor that the utility must provide NRC.
Review:
The reviewers will also need to evaluate the installation package, which consists of installation
procedures, installation medium (e.g., magnetic tape), test case data used to verify installation,
and expected output from the test cases. In some instances, the product may already be installed
in the utility. NRC should request documentation on the results of installation and acceptance
testing.
1. Software Quality
1.1. Definition
Software quality is called the conformance to explicitly stated functional
and performance requirements, documented development standards,
and implicit characteristics.
Important points:
Product revision:
- maintainability - can I fix it?
- flexibility - can I change it?
- testability - can I test it?
Product transition:
- portability - will I be able to use it on another machine?
- reusability - will I be able to reuse some of the software?
- interoperability - will I be able to interface it with another system?
1.2 Metrics for Grading the Software Quality factors
- audit ability - the ease with which conformance to standards can be checked
- accuracy - the precision of computations and control
- communication commonality - the degree to which standard interfaces are used
- completeness - the degree to which the implementation has been achieved
- conciseness - the compactness of the program in terms of lines of code
- consistency - the use of uniform design and documentation techniques
- data commonality - the use of standard data structures and types
- error tolerance - the damage that occurs when the program encounters an error
- execution efficiency - the run-time performance of the program
- expandability - the degree to which the design can be extended
- generality - the breadth of potential application of program components
This chapter discusses software tools for automating the functions introduced above. The basis of
all tools is representation, so we develop a model for representing multi-version/
multiconfiguration systems. Section 2 establishes basic terminology, while Sections 3 and 4
introduces versions. Later sections on version selection, software manufacture and modification
requests can be read in any order. Background material and manual CM procedures can be found
in References [3, 5, 7].
This section defines the basic elements of a data base for software configuration management.
The data base stores all software objects produced during the life-cycle of a project.
A software object is any kind of identifiable, machine-readable document generated during the
course of a project. The document must be stored on-line to be fully controllable by an SCM
system. Examples of software objects are requirements documents, design documents,
specifications, interface descriptions, program code, test programs, test data, test output, binary
code, user manuals, or VLSI designs.
Every software object has a unique identifier and a body containing the actual information. A set
of attributes associated with software objects and a facility for linking objects via various
relations are also needed. For example, attributes record time of creation and last read access,
and relations link objects to their revisions and variants. The set of attributes and relations must
be extensible; later sections will introduce a basic set. We also need a facility to describe
subclasses or subtypes of the general software object. For instance, the subclass may fix the
language in which the body is written, or the structure editor used to compose the body, or
whether the object represents an interface or an implementation. The subclass also defines the
set of operations available on objects of that class, such as compiling, configuring, printing, etc.
The body of a software object is immutable, that is, once the body has been completed, it can
only be read. Any "change" of a body actually creates a new software object with the changed
body. Immutability is important for configuration management, because it prevents
misidentification: an object identifier is associated with one and only one constant body, and not
with several different versions. Most other attributes and relations of software objects remain
changeable, however, so new information can be added.
Software objects have two orthogonal refinements, one according to how they were created, the
other according to the structure of their body. For creation, we distinguish source and derived
objects. For internal structure, we distinguish atomic objects and configurations.
2.1 Creation of Software Objects
A source object is a software object that is composed manually, for instance with an interactive
editor. Creating a source object requires human action; it cannot be produced automatically.
A derived object is generated fully automatically by a program, usually from other software
objects. A program that produces derived objects is called a deriver. Examples of derivers are
compilers, linkers, document formatters, pretty printers, cross references, and call graph
generators. Normally, de-rived objects need not be stored, since they can be regenerated,
provided both the deriver and the input are available or can be received. To reduce the delay
caused by regeneration, a smart configuration management sys-tem maintains a cache of derived
objects that are likely to be reused.
Unlike derived objects, which can be deleted to make room, source objects are "sacred", because
deleting them may cause irreparable damage or at least significant delay until they are
reconstructed. However, derived objects may also become "sacred", i.e., they must not be deleted
merely to make room, if it is impossible or time consuming to reproduce them. For in-stance,
derived objects that are imported from other sites, especially vendor supplied programs, must not
be deleted, even though they are derived in most cases. Another example are derived objects for
which the original de-rivers have stopped working (if they have not been ported to new hardware, say), or if the corresponding input objects have been lost.
A special case is derived objects that are modified manually. Examples are automatically
generated program skeletons and templates that are fleshed out by hand, or object code that is
patched manually. In principle, these manual modifications produce new source objects. 1
However; the SCM sys-tem should store a traceability link that records the dependency between
the two objects. This link can be used for generating a reminder to update the source object if the
derived object changes. Traceability links should also be recorded among dependent source
objects, for example between a specification and its implementation, or a program and its
documentation. In fact, most source objects in an SCM system depend on one or more other
objects, Software Configuration Management except perhaps the initial requirements
specification. Traceability information is extremely valuable for automatically producing update
reminders, for reviewing completeness of changes, and for informing maintainers what
information they need to consider when preparing a change.
2.2 Structure of Software Objects
The body of a software object is either atomic or structured. An atomic object, or atom, has a
body that is not decomposable for SCM; its body is an opaque data structure with a set of generic
operations such as copying, deletion, re-naming, and editing.
An atomic object may consist of a program written in some language, a syntax tree produced by
a structure editor, a data structure generated by a WYSIWYG word processor, or an object code
module produced by a compiler.
A configuration has a body that consists of sub-objects, which may them-selves have subobjects,
and so on. Configurations have two subclasses: composites and sequences. A composite object,
or simply composite, is a record structure comprised of fields. Each field consists of a field
identifier and s field value. A field value is either an object identifier or a version group
identifier. An example of a composite object is a software package consisting of a program, a
users manual, and an installation procedure. An-other example is a regression test object,
consisting of a test program, input data, expected output data, and a comparator for comparing
expected and actual output. Thus, fields may contain data as well as operations.
A sequence is a list of object and version group identifiers. Sequences represent ordered
multisite of objects. They are used for combining sub-objects that are of the same class, or when
the number of sub-objects is indeterminate. In contrast to composites, the individual elements of
a sequence fulfill identical roles and are treated in the same way for SCM purposes, such as the
list of object code modules constituting a library.
Note that the above definitions permit version group identifiers in composites and sequences. A
version group is a set of related source or derived objects that can replace each other under
certain assumptions (see Sections 3 and 4 for details). The purpose of version groups here is to
permit compact representations of multiple software objects with the same structure. By using a
version group identifier instead of an object identifier, configurations need not be updated if new
versions are added to the groups. On the other hand, a version selection process must decide
which versions to choose when processing such configurations.
Because of the need to distinguish between "precise" and "loose" configurations, we introduce
the following terms. A generic composite is a composite with at least one field value that is either
a version group identifier or a generic configuration (i.e., a generic composite or a generic
sequence). The opposite of a generic composite is a baseline composite, which is a composite
whose field values are atomic objects, baseline composites, or baseline se-quences. The
subclasses generic sequences and baseline sequence are defined analogously. Finally, a generic
configuration, also called a system model, is a generic composite or a generic sequence. A
baseline configuration, or simply baseline, is a baseline composite or baseline sequence.
For clarity, we should point out some uses of the above definitions. Suppose a software house
delivers a single, binary program to a customer. This pro-gram is a single, derived object. It most
cases, this object was generated from a baseline configuration recorded at the software house.
The purpose of the baseline is to guarantee that the derived object can be reproduced when
needed. The software house may also deliver a configuration, per-haps a composite that consists
of one or more binaries and a manual. The delivered configuration may also contain source
programs, because the pro-grams will be interpreted, or because the customer wishes to compile
source locally. The customer may also need to adapt the source code to local needs. Thus,
depending on how much the customer expects to do, a more or less complete SCM system must
be available at the customer site to take over portions of the software houses SCM functions.
3 Source Versions
SCM systems have to cope with constant change. Corrective, adaptive, and perfective
maintenance activities produce a steady stream of updates. Since most changes are incremental,
they are best viewed as producing related versions of objects rather than separate, unrelated
objects. This section deals with versions of source objects; versions produced by derivers will be
treated in Section 4.
3.1 Source Version Groups
An important concept for dealing with multiple versions is the source version group. A source
version group is a set of source objects that are connected via the relations revision-of, variantof, and their subtypes. These relations are defined below. Note, however, that source version
groups may contain atoms, composites, sequences, and even mixtures of those.
y revision-of x: This relation holds if and only if x and y are source objects and y was
produced by changing a copy of x. Thus, revision-of records the development history of
source objects. The subtypes of this
1. The term "parametric" is sometimes used as synonym for "generic. Software
Configuration Management relation, correction-of, adaptation-of, and enhancement-of,
capture the nature of the change. It is possible for several of these subtypes to hold
simultaneously between a pair of objects.
The relation revision-of and its subtypes are transitive, ant symmetric, and reflexive. Objects of
a version group that are transitively related by revision-of etc. are simply called revisions.
1. y variant-of x: This relation holds if and only if x and y are source objects that are
to these relations. In other words, no re-vision-of and variant-of link may cross version group
boundaries.
FIGURE 1
par.1
par.2
par.3
con.1
1.1
1.2
1.3
2.1
2.2
3.1
fix.1
revision-of
variant-of
con.1
par.1
fix.1
conflict at a change
parallel version 1
bugfix 1
Revisions 3.1 and con.l illustrate conflicting updates. This situation arises when two
programmers wish to update the same revision (here: 2.2) simultaneously, and neither can wait
for the other to finish. This situation is un-desirable, yet cannot always be prevented in practice.
SCM should warn programmers in this case, but allow work to proceed by forming a temporary
side branch for later merging. Note that such conflicts can only occur at branch tips. Reference
[39] discusses a range of strategies for dealing with these conflicts.
Revision fix.1 illustrates the handling of temporary fixes. Suppose the need to correct revision
1.3 arises after 2.1 and 2.2 have been completed. To reflect the actual development history, SCM
places the correction on a side branch starting at revision 1.3. The correction is later merged with
2.2, resulting in 3.1.
3.3 Operations on Source Version Groups
Virtually all SCM systems in use today use some form of a check-out/edit/ check-in cycle for
adding revisions to source version groups. The check-out operation creates a copy of the revision
to be modified and reserves it for the user. Check-out also links the new copy to its original with
the revision-of relation. The user can then update the copy with an arbitrary editor. As long as the
copy remains checked out, it remains inaccessible to others. Any subsequent check-out of the
same original revision causes a branch to form, with a warning stating that a merge operation
will be necessary later. The check-in operation signals the completion of the changes. This
operation makes the (modified) copy visible to other users. Before a revision is checked in, it
should satisfy some quality control criterion, such as a successful test, to make sure it is usable
by other team members.
In the period between check-out and check-in, a revision may actually go through several
successive edit cycles, until the change is acceptable. Whenever the editor writes out an object, a
new revision is created. All of these revisions, except for the latest one, are called minor
revisions. Minor re-visions are deleted upon check-in of the latest revision. They are needed for
short-term backup purposes, in case of machine crashes or inadvertent, disastrous deletes during
editing. Most programming environments limit the number of minor revisions to one or two. For
instance, EMACS [36] saves one minor revision and periodically writes a checkpoint as another.
Software Configuration Management
Three-way revision merging is important for combining parallel lines of development. A three
way merge first identifies the commonalities among a base version and two of its parallel
revisions, and then integrates the changes. The merge process also detects conflicting changes.
These must be resolved manually. In practice, the merging process works well, provided changed
segments are well separated from each other by unchanged ones. Examples of three-way text
mergers are diff3 and rcsmerge [39]. These pro-grams are based on the algorithms that compute
deltas, i.e., the differences among revisions (see Section 3.4). Recently, Reps et al [33] have
made some progress towards improved merge conflict resolution using data flow information.
A consistent revision numbering scheme is important for version selection. Most SCM systems
use a Dewey decimal notation, with revisions on the main branch numbered by a pair of the form
(release-number, level number). Some systems extend this notation to branches in such a way
that the structure of the revision graph is reflected by the numbering. Unfortunately, this notation
becomes clumsy as the number of branches increases. A better approach is to simply select a
unique, symbolic identifier for each branch and to number revisions on each branch with a single
number or a pair. The relation revision-of can be consulted to determine the lineage of a revision.
While revision numbers together with attributes such as check-in date, author, and state are
sufficient for selecting revisions, additional, descriptive attributes are needed for differentiating
and selecting variants. An adequate approach is to let variant attributes take on subsets of values
from enumerated types. For instance, one may wish to provide an attribute that indicates the
target operating systems on which a certain variant can run. This at-tribute would have as value a
subset of an enumerated type listing all relevant operating systems. All revisions of a variant
would have the same variant attributes; changing them creates a new variant. Clearly, the attributes and types for describing variants must be user-definable.
To support change tracking, every object in a source version group carries a state attribute and a
log entry. The state attribute indicates the status of a re-vision. For example, check-out and
check-in set the attribute to in-preparation and experimental, respectively. A revision can later
be promoted to a higher state, for example stable or released. The set of states should be
extensible. To allow for effective tracing, the attribute should not just show the current value, but
actually log all state changes with date and person responsible for the change.
The log entry is extremely important for change tracking. It stores a commentary requested
during check-in, describing the changes completed. Browsing the log messages helps determine
what happened to software.
Software Configuration Management
object over time, and sometimes prevents attempting changes that had earlier been abandoned as
unsuccessful. Because of the usefulness of the log entry, the Crystal SCM system [2] actually
requests a log message during check-out. For recording the programmers intentions. A checkout log helps determine what changes are in preparation. Check-in returns this message to the
user, who can then edit it into the final, permanent log entry.
3.4 Implementation of Source Version Groups
Source version Groups and the objects in them must be represented as persistent objects in an
object base. The object base has traditionally been implemented with hierarchical file systems,
by either placing the objects and relations in separate files in a special directory, or by encoding
this information in a single file. These implementations provide sufficient reliability, but
recovery, consistency control, access synchronization, and authorization are realized in an ad hoc
manner.
Building the object base on top of a full-fledged data base management sys-tem seems to be an
attractive alternative, because a DBMS would provide high reliability and systematic
There are two efficient algorithms for computing deltas in batch mode. One is based on isolating
a longest common subsequence [18], the other one on identifying block moves [42]. A delta
based on a longest common substring is not necessarily mineral, because it cannot detect
crossing block moves.
Crossing block moves arise if two or more segments (e.g., procedures) appear in a different
order in two revisions. An edit script derived from a longest common substring first deletes the
shorter of the two segments, and then reinserts it. Tichys block move algorithm [42] detects
such permutations and is guaranteed to produce a minimal delta.
Most deltas used in practice are line-based, i.e., the unit for comparison is the line. Two lines are
considered different if they differ by a single character. Clearly, a byte- or word-based delta
would be smaller, but computing it would require many more comparisons and therefore much
more time.
1. Blank compression saves space if a significant fraction of an objects size is due to indentation
Obst [29] reports that with special heuristics, a character-based block-move algorithm runs in the
same time as a line-based one, and produces deltas that are on average 30 per cent smaller. The
heuristic is specifically oriented towards block moves and does not seem applicable to longest
common sub-strings.
For objects that consist of a representation other than text, the existing delta algorithms are easily
adapted by choosing an appropriate unit for compari-son and converting the representation into a
linear sequence. For example, the difference between two syntax trees can be computed by
comparing prefix representations of the trees at the level of individual nodes.
4
Derived Versions
Handling derived versions is much simpler than handling source versions, since they are
computed fully automatically and no human actions need be observed or supported. A derived
version group is a set of derived objects that were generated from the same set of software
objects by varying derivation parameters or derivers. For example, a compiler may be able to
produce code for different target machines, optimized code, non-optimized code, code with
runtime checks, code with debugging hooks, etc. There may also be several compiler versions
available. Conditional compilation falls in this class also. The term derived variants is used for
those objects in a derived version group that offer identical functional specifications to their
client programs.
Derivers may also be able to produce information quite different from inter-mediate or binary
code. There exist derivers to generate call graphs, pretty-printed listings, cross reference tables,
or indexes. These transformations are not called variants, because they do not preserve the
semantic content as compilers do. However, both these transformations and the derived variants
are collected into a derived version group, as long as they were generated from the same input.
The relations revision-of, variant-of and their subtypes are defined on source objects, but extend
naturally to derived objects. For example, if two source objects are revisions of each other, then
so are their derived objects, provided the derived objects were produced with the same deriver
and parameters. By definition, these two derived objects would be in different derived version
groups. A minor difficulty here is that derived objects are often generated from several source
objects. When stating that two derived objects are variants or revisions of each other, it is
therefore useful to qualify this statement with respect to the source object(s) involved.
Section 6 discusses the details of how to generate and keep track of derived objects.
FIGURE 2
1.0
2.0
1.0
2.0
1.1 1.2
1.3
1.1
1.2
1.3
1.1
1.2
1.3
1.4
An example of an AND/OR graph appears in FIGURE 2. Nodes A, B, and C are version groups
of atomic objects, while S and R are version groups containing configurations. AND-nodes are
depicted graphically by arcs connecting their off springs. Labels on the out-arcs of AND-nodes
distinguish composites from sequences. For example, version 1.0 of R is a composite. Note that
by searching the graph starting with version 2.0 of node 5, we reach no OR-nodes. Such a start
node identifies a baseline, because it unambiguously specifies a set of nodes making up a
configuration. Establishing a baseline is important at release time. In a large project, where
multiple changes are carried out concurrently, a baseline is an important point of reference.
Updates usually are relative to a baseline. A private baseline is created whenever an actual
system instance is generated. It may contain revisions that are not yet checked in. It is handled
like a minor revision in that only a few of them are stored per user. A public baseline must not
contain checked-out revisions, is itself checked into a version group, and should satisfy
established quality control criteria. Quality control is a subject beyond the scope of this survey.
An AND-node that leads to one or more OR-nodes represents a generic configuration, since
some selection will be necessary when constructing an actual system instance. Generic
configurations are important for compactly representing a large set of possible baselines,
without having to enumerate all combinations. Without generic configurations. SCM requires
the maintenance of bulky configuration tables. The problem with these tables is that they are
difficult to keep up to date in a large project. For instance, the addition of an upward compatible
version of a pervasively used module may cause such tables to double in size because the new
version can be used wherever the old one was permitted.
Version selection is currently an active research area within SCM. The general approach is to
associate constraints with generic configurations. The constraints are conditions on attributes of
software objects that select appropriate variants, revisions, and derived versions. Attributes
usable for revision selection are revision number and state, creation date, author, and the
relation revision-of with its subclasses. With these attributes it is possible to express the
following example constraint:
For all version groups where the invoker has a revision checked out, select that revision;
otherwise use the most recent revision that is checked in and has state stable.
Constraints of this sort are called "configuration threads" in DSEE [25]. By adding a cut-off
constraint for the creation date (a maximum date), a configuration can be regenerated as it
would have been produced at a certain date.
Variants should be selected based upon the relation variant-of and user-de-fined variant
attributes, as described in Section 3.2. For example, one may want to choose a variant on the
basis of the hardware processors on which it can run. Note that a variant attribute may be
single-valued or set-valued. Using the previous example, a variant may actually run on several
processors. Single-valued attributes for differentiating variants were used in IN-TERCOL and
RCS [41, 38]. The Adele and No made configuration managers [12, 13] use sophisticated
constraints on attributes, including negation and conditionals. The latter can be used to specify
preferences, that is, if a certain constraint cannot be met, then some secondary choice may be
accept-able. A similar approach to preferences, based on a relational database for describing
generic configurations, is due to Bernard et al [4]. Winkler [44] discusses set-valued attributes
and introduces constraints expressed as functions over attribute values.
Additional selection criteria can be based on modification requests (see Sec-tion 7). For
instance, a constraint of the following sort would configure a new release:
Select the previous baseline. Let O be the set of objects in this baseline that have modification
requests to be addressed in the current release, and have a corrected revision for each request.
Replace the elements in 0 with the corrected revisions.
Parameters for the derivers finally select derived versions. An additional degree of freedom is
available here: If a certain parameter is left unspecified, the SCM system can make its own
choice. For instance, if the user does not care whether certain sub configurations have been
compiled with optimization on or off, SCM can choose whatever is available and save
derivation time that way.
If constraint-based version selection is available, it is straightforward to provide an automatic
function for constructing baselines. This function simply runs the selection process and records
the outcome. Recording the outcome involves creating new revisions in the visited
configuration groups. For example, revision 2.0 of S and R in Figure 2 could have been
generated automatically. It is convenient to store the constraints used to produce a baseline
along with it, in order to document the intent behind the baseline. Saving the constraints
permits a similar selection to be repeated at the next release time.
Module interconnection languages (MILs) take a different approach to version selection. They
concentrate on the interfaces among software modules. Type checking the interfaces assure that
only type-safe configurations are constructed. DeRemer and Kron [10] originated the concept of
a MIL, as a language separate from the programming language. Prieto-Diaz [32] gives an
extensive survey of the MILs developed since then. Most MILs suffer from not treating
interfaces as first class software objects. Thus, it is difficult to represent versions of interfaces.
This is a serious limitation, even though versions of interfaces do not arise as frequently as
versions of the implementing programs. Exceptions are the programming languages Mesa and
Cedar [28, 37]. Both provide a common sub-language for describing configurations, called CMesa. A key aspect is the distinction between interfaces and implementations. An interface
contains the types, variables, subprogram headers, etc., visible to clients of the interface,
whereas an implementation of an interface provides the subprogram bodies and data structures
invisible to clients. C-Mesa programs represent not only configurations, but also record the
relations has-implementation and has-client. The first relation holds between interfaces and
corresponding implementations; the second between interfaces and their clients. Both can be
viewed as subtypes of the general traceability relation, because the change of an interface must
trigger changes in affected implementations and clients. A serious limitation of C-Mesa is that
its version scheme distinguishes only two revisions, the current one and its predecessor.
Ada and Modulaalso separate interfaces and implementations and the relations hasimplementation and has-client make dependencies traceable. Ada and Modula do not provide a
separate configuration language. The implicit configurations and unnecessarily strict
recompilation rules in both languages make treatment of versions difficult.
6. Software Manufacture
Software manufacture is the process of generating derived objects. Using the AND/OR graph,
software manufacture operates on a baseline and produces a mirror image of that baseline
containing only derived objects. The nodes in that minor image are connected to the corresponding
nodes in the input baseline, showing the derivation history. In Figure 2, consider what must be
produced by compiling and linking revision 2.0 of S.
To speed up the derivation process, an SGM system must manage a cache containing derived
objects which are likely to be reused. Make [14] is a widely used program that uses a simple form
of such a cache. It is based on a time-stamp mechanism for deciding when to update the cache: If a
de-rived object is older than its input objects, then re derivation is necessary. Make also uses
simple rules to process objects based on their types. One such rule describes how to produce
machine code from C source code. Make can be combined with SCCS or RCS to provide a limited
versioning capability.
Despite its popularity, Make has a number of serious drawbacks for large-scale SGM. The times
tamping mechanism is inappropriate for determining whether a derived object can be reused. When
there are multiple versions, a time stamp is insufficient for deciding from which versions of input
objects a derived object was generated. Another problem is that Make does not record the
parameter settings on derivers. For example, it is impossible to decide whether a given machine
code module was produced with optimization turned on or off. Make also handles derivation
processes with inter-mediate objects inefficiently, because it always re derives a target object if its
intermediate objects have been deleted, regardless of whether the target is up-to-date. Finally,
Make provides derivation rules for atomic objects only; processing of configurations must be
programmed explicitly.
DSEEs handling of derived objects is more reliable. Each derived object carries a history attribute
that describes precisely how the object was produced, including version identifiers and parameter
setting. For high speed processing, DSEE performs parallel manufacture on idle workstations . A
remaining drawback is that DSEE provides no general rule for processing configurations; the
individual steps have to be programmed explicitly for every configuration.
Derivers start up automatically as soon as new source object versions are created. By running
derivations in parallel with the programmers activities, opportunistic manufacture attempts to
have derived objects ready ahead of time. This approach reduces programmer idle time. However,
a problem is limiting the combinatorial explosion of derivations caused by multiple versions.
Without a specific target configuration, almost all of the derivation runs after a change could be
useless.
Odin is a flexible system for managing derived objects. Similar to Make, it uses an extensible set
of rules that form a derivation graph for object types. Unlike Make, Odins rule language covers
derived configurations as well as atoms, and distinguishes sequences and composites. (Make only
has sequences.) Users need only indicate the objects to be combined in configurations, and Odin
determines how to process them, based on their types. For instance, it is not necessary to always re
describe how configurations are linked, or how documents consisting of several parts are
processed. Furthermore, composites handle derivation processes with more than one output
cleanly.
Odin also provides facilities for including quality control tests, such as regression tests, as part of
the derivation. In its cache of derived objects, Odin stores a full history attribute, including the
parameters used during derivation. Unfortunately, support for versioning is poor.
Automatic system manufacture guarantees that the correct derived objects are produced when
necessary. However, the cost of the processing involved may be too high. In large system families,
changing a single line in an object with shared declarations may trigger massive recompilations.
Many of these recompilations may be redundant, because the change may actually affect only a
small fraction of the compilation units. Selective recompilation mechanisms, such as smart
recompilation [40], reduce the number of redundant derivations. These mechanisms analyzes
changes for their effect and prevent redundant compilations when, for example, an unused
declaration is deleted, a new declaration is added, or a comment is changed.
Hood et al [17] generalize smart recompilation to recursive interface dependencies. Smarter
recompilation reduces the number of recompilations further by allowing harmless inconsistencies
to remain. As an example, consider a type declaration T used in a set S of source objects. Assume
we change T into T, and update a few source objects to be compatible with T. Suppose
furthermore we can partition S into a subset S1 in which only T is used, and a subset S2 in which
only T is used. If there are no interactions among S1 and S2 that depend in any way on T or T,
then recompilation of S1 is not necessary and smarter recompilation will suppress it. More
important than saving the recompilations is perhaps the fact that programmers can delay the work
of making the source objects in S1 compatible with T. Thus,programmers can test their changes
without having to wait for others to bring their modules up-to-date. Without a mechanism for
managing in-consistencies in this manner, programmers have to resort to the unsafe practice of
subverting the type checking and manufacturing system to get their work done.
7 Modification Requests
A modification request (MR) is a change proposal. General configuration management is MRdriven, that is, every change is initiated by one or more Mrs. Tracking of modification requests
makes it possible to answer questions about past, current, and future capabilities of a system
family, as well as providing important management data about project status. There is no reason
why SCM should not be MR-driven as well, yet few tools for managing software modification
requests exist. The author is aware of only two published tools: MRCS [23], a control system
running on Unix, and Crystal [2], an SCM system that integrates version control, MR tracking, and
project management.
Modification requests propose to correct errors, to modify existing system capabilities, or to extend
or contract capabilities. An MR may address any set of source objects in the software lifecycle:
requirements documents, de-sign documents, interfaces, program code, test data, documentation,
etc. An MR should be machine-readable and is itself a source object.
Versions of MRs do not seem necessary, but each MR has an attribute that reflects its state.
A useful set of states is submitted, rejected, accepted, delayed, in progress, and completed. When
an MR is first entered, it has state submitted. A review decides whether to accept or reject the MR.
A rejected MR is not discarded, but filed with a note describing the reason for rejection. A third
alternative is to delay an MR, which means that it will assure the state submitted again at a later
time for reconsideration. Once the work involved in an MR is assigned to a person, then the MR
assumes state in progress. State completed indicates the modifications required by the MR have
been performed and tested. To allow for effective tracing of an MR, the state attribute should not
just have the current value, but should actually log all previous states, including the times when the
state changes occurred. That way, it is easy to determine the history of MRs and to find MRs that
have fallen behind schedule.
Usually, each programmer is responsible for a set of related MRs. This set can be represented
naturally by a configuration. Configurations of MRs are often called tasks, and are associated with
a workspace for managing temporary objects.
Additional useful data items associated with an MR are the relations has-MR and has-change. The
first links an object with its MRs, the second an MR with the updates it caused. These relations
support MR-based selection, as illustrated in the second query in Section 5. The submitter or
reviewer of an MR establishes has-MR, while has-change is entered during check-in. To simplify
entry and prevent errors, check-in should allow selection from a menu of relevant MRs. This set
can easily be derived from the MR configurations in the work space.
Crystal implements the above relations. A simple experiment with bug reports on a medium-sized
system showed that software engineers can attach their MRs to the affected objects in an
unfamiliar system with high accuracy, provided the overall system architecture is explained with a
few sentences per object. Crystal therefore presents the submitter of an MR with a sophisticated
browser for locating relevant objects. This browser shows system configurations graphically and
lets the user read documentation as well as the existing MRs (to avoid duplication). As a heuristic
to speed up the search process, the browser even highlights "suspect" components, i.e., those that
changed relative to the last baseline. Once the relation has-MR has been entered, it opens several
possibilities for project management sup-port. The history of the objects can be inspected to
identify competent programmers for carrying out the changes. The history can also yield a rough
estimate for the time required for the change, by averaging past periods between check-out and
check-in. In Crystal, this information is used to update a PERT-chart of maintenance activities is
to accommodate versions of operations, for example versions of compilers. The benefits of
semantic modeling are greater conceptual clarity, direct representation of the model for machine
interpretation, more sophisticated operations and queries, and simplified implementation. Finally,
an interesting topic is building a maintainers assistant, i.e., a pro-gram that helps with carrying
out changes in complex software systems. The maintainer initiates a change, while the assistant
provides decision support and takes over the task of bringing the system back into a consistent
state. For example, the assistant detects all places that are affected by a given change and present
them to the programmer for update. It proposes corrections and perhaps even derives corrections
by observing the programmer. This approach is, of course, not limited to programs; it is just as
applicable to updating specifications or other formal representations consistently. Intensive
research in smart editing systems will be needed to achieve the goal of automating consistency
maintenance.