Overview of Embedded Systems Basics
Overview of Embedded Systems Basics
Embedded System
Sensor – It measures the physical quantity and converts it to an electrical signal which can be read by an
observer or by any electronic instrument like an A2D converter. A sensor stores the measured quantity to
the memory.
A-D Converter – An analog-to-digital converter converts the analog signal sent by the sensor into a digital
signal.
Processor & ASICs – Processors process the data to measure the output and store it to the memory.
D-A Converter – A digital-to-analog converter converts the digital data fed by the processor to
analog data.
Actuator – An actuator compares the output given by the D-A Converter to the actual (expected) output
stored in it and stores the approved output.
Components of Embedded System:
The embedded systems basics include the components of embedded system hardware, embedded system
types and several characteristics. An embedded system has three main components: Embedded system
hardware, embedded system software and Operating system.
The CU includes a fetch unit for fetching instructions from the memory. The EU has circuits
that implement the instructions pertaining to data transfer operation and data conversion from
one form to another.
The EU includes the Arithmetic and Logical Unit (ALU) and also the circuits that execute
instructions for a program control task such as interrupt, or jump to another set of instructions.
A processor runs the cycles of fetch and executes the instructions in the same sequence as they
are fetched from memory.
Types of Processors:
General Purpose Processor (GPP): GPP is used for processing signal from input to output by
controlling the operation of system bus, address bus and data bus inside an embedded system. It
provides the hardwire circuit for memory management i.e. supports the one –chip DMA and
Cache. It consists the common circuitry for computations of arithmetic as well as logical
operations used in daily life i.e. it includes the powerful ALU. It use the large instruction set and
use the pipeline structure for instruction execution to speed up computer. Types of general
purpose processor are:
Microprocessor
Microcontroller
Embedded Processor
Digital Signal Processor
Microprocessor
A microprocessor is a single VLSI chip having a CPU. In addition, it may also have other units
such as coaches, floating point processing arithmetic unit, and pipelining units that help in faster
processing of instructions.
Microcontroller
Microcontrollers are particularly used in embedded systems for real-time control applications
with on-chip program memory and devices. Some of the examples are: Intel 8032, 8051, 8052,
AVR ATMEGA 328 etc.
Embedded Processor
A digital signal processor (DSP) is an integrated circuit designed for high-speed data
manipulations, and is used in audio, communications, image manipulation, and other data
acquisition and data-control applications. For example: PAC, TMS320XX series, Zed-broad etc.
The constraints in the embedded systems design are imposed by external as well as internal
specifications. Design metrics are introduced to measure the cost function taking into account
the technical as well as economic considerations.
A Design Metric is a measurable feature of the system’s performance, cost, time for
implementation and safety etc. Most of these are conflicting requirements i.e. optimizing one
shall not optimize the other: e.g. a cheaper processor may have a lousy performance as far as
speed and throughput is concerned. Following metrics are generally taken into account while
designing embedded systems
It is one-time cost of designing the system. Once the system is designed, any number of units
can be manufactured without incurring any additional design cost; hence the term nonrecurring.
Suppose three technologies are available for use in a particular product. Assume that
implementing the product using technology ‘A’ would result in an NRE cost of $2,000 and unit
cost of $100, that technology B would have an NRE cost of $30,000 and unit cost of $30, and
that technology C would have an NRE cost of $100,000 and unit cost of $2. Ignoring all other
design metrics, like time-to-market, the best technology choice will depend on the number of
units we plan to produce.
Unit Cost:
The monetary cost of manufacturing each copy of the system, excluding NRE cost.
Size: The physical space required by the system, often measured in bytes for software, and gates
or transistors for hardware.
Performance:
Power Consumption:
It is the amount of power consumed by the system, which may determine the lifetime of a
battery, or the cooling requirements of the IC, since more power means more heat.
Flexibility:
The ability to change the functionality of the system without incurring heavy NRE cost. Software
is typically considered very flexible.
Time-to-prototype:
The time needed to build a working version of the system, which may be bigger or more
expensive than the final system implementation, but it can be used to verify the system’s
usefulness and correctness and to refine the system’s functionality.
Time-to-market:
The time required to develop a system to the point that it can be released and sold to customers.
The main contributors are design time, manufacturing time, and testing time. This metric has
become especially demanding in recent years. Introducing an embedded system to the
marketplace early can make a big difference in the system’s profitability.
Maintainability:
It is the ability to modify the system after its initial release, especially by designers who did not
originally design the system.
Correctness:
This is the measure of the confidence that we have implemented the system’s functionality
correctly. We can check the functionality throughout the process of designing the system, and
we can insert test circuitry to check that manufacturing was correct.
A single purpose processor is a digital; circuit designed to execute exactly one program. An
embedded system designer may obtain several benefits by choosing to use a custom single
purpose processor to implement a computation task.
A basic processor consists of a controller and a data path. The data path stores and manipulates a
system’s data. The data path contains registers units, functional units and connection like wires
and multiplexers. The data path can be configured to read data from particular registers feed that
data through functional units configured to carry out particular operations like add or shift and
store the operation results back in to the particular registers. Controller caries out such
configuration of the data path. It sets the data path control inputs, like register load and
multiplexer select signals, of the registers units, functional units and connection units to obtain
the desired configuration at a particular time.
It monitors external control inputs as well as data path control outputs, known as status signals,
coming from functional units, and it sets external control outputs as well. The digital systems
design techniques such as combinational and sequential logic design including those of
synchronous and asynchronous design can be applied to build a CONTROLLER and a DATA
PATH.
Benefits of Custom Single Purpose Processor:
Performance may be faster, due to fewer clock cycles resulting from a customized data
path and due to shorter clock cycles resulting from the simpler controller logic.
Size may be smaller due to simplest data path and no program memory.
Power consumption may be less due to more efficient computation.
However, cost could be higher because of high NRE cost. Also time to market may be longer.
Embedded systems have different applications. A few select applications of embedded systems
are smart cards, telecommunications, satellites, missiles, digital consumer electronics, computer
networking, etc.
A real-time system is defined as a data processing system in which the time interval required to process
and respond to inputs is so small that it controls the environment. The time taken by the system to
respond to an input and display of required updated information is termed as the response time. So in
this method, the response time is very less as compared to online processing.
Real-time systems are used when there are rigid time requirements on the operation of a processor or the
flow of data and real-time systems can be used as a control device in a dedicated application. A real-time
operating system must have well-defined, fixed time constraints, otherwise the system will fail. For
example, scientific experiments, medical imaging systems, industrial control systems, weapon systems,
robots, air traffic control systems, etc.
- Idle (Created) State: The task has been created and memory allotted to its structure. However, it
is not ready and is not schedulable by kernel.
- Ready (Active) State: The created task is ready and is schedulable by the kernel but not running
at present as another higher priority task is scheduled to run and gets the system resources at this
instance.
- Running state: Executing the codes and getting the system resources at this instance. It will run
till it needs some IPC (input) or wait for an event or till it gets preempted by another higher
priority task than this one.
- Blocked (waiting) state: Execution of task codes suspends after saving the needed parameters
into its context. It needs some IPC (input) or it needs to wait for an event or wait for higher
priority task to block to enable running after blocking.
- Deleted (finished) state: The created task has memory de allotted to its structure i.e. task be
deleted such that It frees the memory.
Task Control Block
- A data structure having the information using which the OS controls the process state.
- Task Information at the TCB are:
TaskID: The unique identifier use to define a task. For example, in case of 8-bit ID, a number between 0
and 255 be used to define TaskID.
Task Context: It includes the current status of program counter, stack pointer, status of CPU register and
Status Register.
Task priority: It stores the priority level of parent as well as child task available in Task List. The priority
is a number used as the identifier.
Task Context_init: it is a pointer to the processor memory that stores following information.
- Allocated program memory address blocks in physical memory and in secondary (virtual)
memory for the tasks-codes.
- Allocated task-specific data address blocks.
- Allocated task-stack addresses for the functions called during running of the process.
- Allocated addresses of CPU register-save area as a task context represents by CPU registers,
which include the program counter and stack pointer.
Context Switch
When the multithreading kernel decides to run a different thread, it simply saves the current thread’s
context (CPU registers) in the current thread’s context storage area (the thread control block, or TCB).
Once this operation is performed, the new thread’s context is restored from its TCB and the CPU resumes
execution of the new thread’s code. This process is called a context switch. Context switching adds
overhead to the application.
Task Management:
The task management operation defines the following operations:
- Creation of new task with TCB.
- Task termination: remove the TCB
- Change Priority: modify the TCB
- State-inquiry: read the TCB
The major challenges for Task Management in RTOS kernel are:
- Creating an RT task, it has to get the memory without delay: this is difficult because memory has
to be allocated and a lot of data structures, code segment must be copied/initialized.
- Changing run-time priorities is dangerous: it may change the run-time behavior and predictability
of the whole system.
Interrupt Handling:
An interrupt is a hardware mechanism used to inform the CPU that an asynchronous event has occurred.
When an interrupt is recognized, the CPU saves all of its context (i.e., registers) and jumps to a special
subroutine called an Interrupt Service Routine, or ISR. The ISR processes the event, and upon completion
of the ISR, the program returns to:
- the background for a foreground / background system,
- the interrupted thread for a non-preemptive kernel, or
- The highest priority thread ready to run for a preemptive kernel.
Interrupts allow a microprocessor to process events when they occur. This prevents the microprocessor
from continuously polling an event to see if it has occurred. Microprocessors allow interrupts to be
ignored and recognized through the use of two special instructions: disable interrupts and enable
interrupts, respectively.
The interrupt handlers hands the interrupt generated by external devices as below:
- The current context of the task is saved on stack.
- Block the task and branches the program control to beginning address of ISR and executes
the ISR to serve the interrupt.
- Terminates from interrupt routine and read the context of the blocked task.
In a real-time environment, interrupts should be disabled as little as possible. Disabling interrupts affects
interrupt latency and may cause interrupts to be missed. Processors generally allow interrupts to be nested.
This means that while servicing an interrupt, the processor will recognize and service other (more
important) interrupts, as shown in Figure below.
Figure – Interrupt nesting
Interrupt Latency
Probably the most important specification of a real-time kernel is the amount of time interrupts are
disabled. All real-time systems disable interrupts to manipulate critical sections of code and renewable
interrupts when the critical section has executed. The longer interrupts are disabled, the higher the
interrupt latency. Interrupt latency is given by
Interrupt latency = Maximum amount of time interrupts are disabled + Time to start executing the first
instruction in the ISR
Interrupt Response
Interrupt response is defined as the time between the reception of the interrupt and the start of the user
code that handles the interrupt. The interrupt response time accounts for all the overhead involved in
handling an interrupt.
For a foreground / background system, the user ISR code is executed immediately. The response time is
given by
Interrupt recovery time = Time to execute the return from interrupt instruction
Interrupt Recovery
Interrupt recovery is defined as the time required for the processor to return to the interrupted code.
Interrupt recovery in a foreground / background system simply involves restoring the processor's context
and returning to the interrupted thread. Interrupt recovery is given by:
Interrupt recovery time = Time to execute the return from interrupt instruction
ISR Processing Time
Although ISRs should be as short as possible, there are no absolute limits on the amount of time for an
ISR. One cannot say that an ISR must always be less than 100 ms, 500 ms, or l ms. If the ISR code is the
most
important code that needs to run at any given time, it could be as long as it needs to be. In most cases,
however, the ISR should recognize the interrupt, obtain data or a status from the interrupting device, and
signal a thread to perform the actual processing.
Scheduler
The scheduler is the part of the kernel responsible for determining which thread will run next. Most real-
time kernels are priority based. Each thread is assigned a priority based on its importance. Establishing
the priority for each thread is application specific. In a priority-based kernel, control of the CPU will
always be given to the highest priority thread ready to run. In a preemptive kernel, when a thread makes a
higher priority thread ready to run, the current thread is pre-empted (suspended) and the higher priority
thread is immediately given control of the CPU. If an interrupt service routine (ISR) makes a higher
priority thread ready, then when the ISR is completed the interrupted thread is suspended and the new
higher priority thread is resumed.
With a preemptive kernel, execution of the highest priority thread is deterministic; you can determine
when the highest priority thread will get control of the CPU.
Application code using a preemptive kernel should not use non-reentrant functions, unless exclusive
access to these functions is ensured through the use of mutual exclusion semaphores, because both a low-
and a high-priority thread can use a common function. Corruption of data may occur if the higher priority
thread preempts a lower priority thread that is using the function.
To summarize, a preemptive kernel always executes the highest priority thread that is ready to run. An
interrupt preempts a thread. Upon completion of an ISR, the kernel resumes execution to the highest
priority thread ready to run (not the interrupted thread). Thread-level response is optimum and
deterministic.
Reentrancy
A reentrant function can be used by more than one thread without fear of data corruption. A reentrant
function can be interrupted at any time and resumed at a later time without loss of data. Reentrant
functions either use local variables (i.e., CPU registers or variables on the stack) or protect data when
global variables are used. An example of a reentrant function is shown below:
Since copies of the arguments to strcpy() are placed on the thread's stack, and the local variable is created
on the thread’s stack, strcpy() can be invoked by multiple threads without fear that the threads will corrupt
each other's pointers.
An example of a non-reentrant function is shown below:
Swap () is a simple function that swaps the contents of its two arguments. Since Temp is a global
variable, if the swap () function gets preempted after the first line by a higher priority thread which also
uses the swap () function, then when the low priority thread resumes it will use the Temp value that was
used by the high priority thread.
We can make swap () reentrant with one of the following techniques:
- Declare Temp local to swap ().
- Disable interrupts before the operation and enable them afterwards.
- Use a semaphore.
Thread Priority
A priority is assigned to each thread. The more important the thread, the higher the priority given to it.
- Static Priorities
Thread priorities are said to be static when the priority of each thread does not change during the
application's execution. Each thread is thus given a fixed priority at compile time. All the threads
and their timing constraints are known at compile time in a system where priorities are static
- Dynamic Priorities
Thread priorities are said to be dynamic if the priority of threads can be changed during the
application's execution; each thread can change its priority at run time. This is a desirable feature
to have in a real-time kernel to avoid priority inversions.
- Priority Inversions
Priority inversion is a problem in real-time systems and occurs mostly when you use a real-time
kernel. Priority inversion is any situation in which a low priority thread holds a resource while a
higher priority thread is ready to use it. In this situation the low priority thread prevents the high
priority thread from executing until it releases the resource.
To avoid priority inversion a multithreading kernel should change the priority of a thread
automatically to help prevent priority inversions. This is called priority inheritance.
Mutual Exclusion
The easiest way for threads to communicate with each other is through shared data structures. This is
especially easy when all threads exist in a single address space and can reference global variables,
pointers, buffers, linked lists, FIFOs, etc. Although sharing data simplifies the exchange of information,
we must ensure that each thread has exclusive access to the data to avoid contention and data corruption.
The most common methods of obtaining exclusive access to shared resources are:
- disabling interrupts,
- performing test-and-set operations,
- disabling scheduling, and
- Using semaphores.
Semaphores
The semaphore was invented by Edgser Dijkstra in the mid-1960s. It is a protocol mechanism offered by
most multithreading kernels. Semaphores are used to:
- control access to a shared resource (mutual exclusion),
- signal the occurrence of an event, and
- Allow two threads to synchronize their activities.
A semaphore is a key that code acquires in order to continue execution. If the semaphore is already in use,
the requesting thread is suspended until the semaphore is released by its current owner. In other words, the
requesting thread says: ''Give me the key. If someone else is using it, I am willing to wait for it!" There
are two types of semaphores: binary semaphores and counting semaphores. As its name implies, a binary
semaphore can only take two values: 0 or 1. A counting semaphore allows values between 0 and 255,
65535, or 4294967295, depending on whether the semaphore mechanism is implemented using 8, 16, or
32 bits, respectively. The actual size depends on the kernel used. Along with the semaphore's value, the
kernel also needs to keep track of threads waiting for the semaphore's availability.
Generally, only three operations can be performed on a semaphore: Create (), Wait (), and Signal (). The
initial value of the semaphore must be provided when the semaphore is initialized. The waiting list of
threads is always initially empty.
A thread desiring the semaphore will perform a Wait () operation. If the semaphore is available (the
semaphore value is greater than 0), the semaphore value is decremented and the thread continues
execution. If the semaphore's value is 0, the thread performing a Wait () on the semaphore is placed in a
waiting list. Most kernels allow you to specify a timeout; if the semaphore is not available within a certain
amount of time, the requesting thread is made ready to run and an error code (indicating that a timeout has
occurred) is returned to the caller.
A thread releases a semaphore by performing a Signal () operation. If no thread is waiting for the
semaphore, the semaphore value is simply incremented. If any thread is waiting for the semaphore,
however, one of the threads is made ready to run and the semaphore value is not incremented; the key is
given to one of the threads waiting for it. Depending on the kernel, the thread that receives the semaphore
is either:
Following listing shows how you can share data using a semaphore. Any thread needing access to the
same shared data calls OS_SemaphoreWait(), and when the thread is done with the data, the thread calls
OS_SemaphoreSignal(). Both of these functions are described later. You should note that a semaphore is
an object that needs to be initialized before it is used; for mutual exclusion, a semaphore is initialized to a
value of 1. Using a semaphore to access shared data doesn't affect interrupt latency. If an ISR or the
current thread makes a higher priority thread ready to run while accessing shared data, the higher priority
thread executes immediately.
Semaphores are especially useful when threads share I/O devices. Imagine what would happen if two
threads were allowed to send characters to a printer at the same time. The printer would contain
interleaved data from each thread. For instance, the printout from Thread 1 printing "I am Thread 1!"
and Thread 2 printing "I am Thread 2!" could result in:
“I Ia amm T Threahread d1 !2!”
In this case, use a semaphore and initialize it to 1 (i.e., a binary semaphore). The rule is simple: to access
the printer each thread first must obtain the resource's semaphore.
In this case, use a semaphore and initialize it to 1 (i.e., a binary semaphore). The rule is simple: to access
the printer each thread first must obtain the resource's semaphore.
Figure below shows threads competing for a semaphore to gain exclusive access to the printer. Note that
the semaphore is represented symbolically by a key, indicating that each thread must obtain this key to
use the printer.
Figure – Using a semaphore to get permission to access a printer
The above example implies that each thread must know about the existence of the semaphore in order to
access the resource. There are situations when it is better to encapsulate the semaphore. Each thread
would thus not know that it is actually acquiring a semaphore when accessing the resource. For example,
the UART port may be used by multiple threads to send commands and receive responses from a PC:
Note that, in this case, the semaphore is drawn as a flag to indicate that it is used to signal the occurrence
of an event (rather than to ensure mutual exclusion, in which case it would be drawn as a key). When used
as a synchronization mechanism, the semaphore is initialized to 0. Using a semaphore for this type of
synchronization is called a unilateral rendezvous. A thread initiates an I/O operation and waits for the
semaphore. When the I/O operation is complete, an ISR (or another thread) signals the semaphore and the
thread is resumed.
If the kernel supports counting semaphores, the semaphore would accumulate events that have not yet
been processed. Note that more than one thread can be waiting for an event to occur. In this case, the
kernel could signal the occurrence of the event either to:
- the highest priority thread waiting for the event to occur or
- the first thread waiting for the event.
Depending on the application, more than one ISR or thread could signal the occurrence of the [Link]
threads can synchronize their activities by using two semaphores, as shown in Figure below. This is called
a bilateral rendezvous. A bilateral rendezvous is similar to a unilateral rendezvous, except both threads
must synchronize with one another before proceeding.
Figure – Threads synchronizing their activities
For example, two threads are executing as shown in Listing below. When the first thread reaches a certain
point, it signals the second thread (1) then waits for a return signal (2). Similarly, when the second thread
reaches a certain point, it signals the first thread (3) and waits for a return signal (4). At this point, both
threads are synchronized with each other. A bilateral rendezvous cannot be performed between a thread
and an ISR because an ISR cannot wait on a semaphore:
Interthread Communication
It is sometimes necessary for a thread or an ISR to communicate information to another thread. This
information transfer is called interthread communication. Information may be communicated between
threads in two ways: through global data or by sending messages.
When using global variables, each thread or ISR must ensure that it has exclusive access to the variables.
If an ISR is involved, the only way to ensure exclusive access to the common variables is to disable
interrupts. If two threads are sharing data, each can gain exclusive access to the variables either by
disabling and enabling interrupts or with the use of a semaphore (as we have seen). Note that a thread can
only communicate information to an ISR by using global variables. A thread is not aware when a global
variable is changed by an ISR, unless the ISR signals the thread by using a semaphore or unless the thread
polls the contents of the variable periodically.
To correct this situation, we should consider using either a message mailbox or a message queue.
Semaphores are useful either for synchronizing execution of multiple tasks or for coordinating access to a
shared resource. The following examples and general discussions illustrate using different types of
semaphores to address common synchronization design requirements effectively, as listed:
wait-and-signal synchronization,
multiple-task wait-and-signal synchronization,
credit-tracking synchronization,
single shared-resource-access synchronization,
recursive shared-resource-access synchronization, and
multiple shared-resource-access synchronization.
Note that, for the sake of simplicity, not all uses of semaphores are listed here. Also, later chapters of this
book contain more advanced discussions on the different ways that mutex semaphores can handle priority
inversion.
Wait-and-Signal Synchronization
Two tasks can communicate for the purpose of synchronization without exchanging data. For example, a
binary semaphore can be used between two tasks to coordinate the transfer of execution control, as shown
in figure below.
When coordinating the synchronization of more than two tasks, use the flush operation on the task-waiting list of a
binary semaphore, as shown in Figure below.
Figure: Wait-and-signal synchronization between multiple tasks.
As in the previous case, the binary semaphore is initially unavailable (value of 0). The higher priority tWaitTasks
1, 2, and 3 all do some processing; when they are done, they try to acquire the unavailable semaphore and, as a
result, block. This action gives tSignalTask a chance to complete its processing and execute a flush command on
the semaphore, effectively unblocking the three tWaitTasks.
Credit-Tracking Synchronization
Sometimes the rate at which the signaling task executes is higher than that of the signaled task. In this case, a
mechanism is needed to count each signaling occurrence. The counting semaphore provides just this facility. With a
counting semaphore, the signaling task can continue to execute and increment a count at its own pace, while the wait
task, when unblocked, executes at its own pace, as shown in figure below.
Again, the counting semaphore's count is initially 0, making it unavailable. The lower priority tWaitTask tries to
acquire this semaphore but blocks until tSignalTask makes the semaphore available by performing a release on it.
Even then, tWaitTask will waits in the ready state until the higher priority tSignalTask eventually relinquishes
the CPU by making a blocking call or delaying itself.
Single Shared-Resource-Access Synchronization
One of the more common uses of semaphores is to provide for mutually exclusive access to a shared resource. A
shared resource might be a memory location, a data structure, or an I/O device-essentially anything that might have
to be shared between two or more concurrent threads of execution. A semaphore can be used to serialize access to a
shared resource, as shown in figure below.
Sometimes a developer might want a task to access a shared resource recursively. This situation might
exist if tAccessTask calls Routine A that calls Routine B, and all three need access to the same shared
resource, as shown in figure below.
If a semaphore were used in this scenario, the task would end up blocking, causing a deadlock. When a routine is
called from a task, the routine effectively becomes a part of the task. When Routine A runs, therefore, it is running as
a part of tAccessTask. Routine A trying to acquire the semaphore is effectively the same as tAccessTask trying to
acquire the same semaphore. In this case, tAccessTask would end up blocking while waiting for the unavailable
semaphore that it already has.
One solution to this situation is to use a recursive mutex. After tAccessTask locks the mutex, the task owns it.
Additional attempts from the task itself or from routines that it calls to lock the mutex succeed. As a result, when
Routines A and B attempt to lock the mutex, they succeed without blocking.
Multiple Shared-Resource-Access Synchronization
For cases in which multiple equivalent shared resources are used, a counting semaphore comes in handy, as shown
in Figure
Note that this scenario does not work if the shared resources are not equivalent. The counting semaphore's count is
initially set to the number of equivalent shared resources: in this example, 2. As a result, the first two tasks
requesting a semaphore token are successful. However, the third task ends up blocking until one of the previous two
tasks releases a semaphore token.
Memory Management
Embedded systems developers commonly implement custom memory-management facilities on top of
what the underlying RTOS provides. Understanding memory management is therefore an important
aspect of developing for embedded systems.
Knowing the capability of the memory management system can aid application design and help avoid
pitfalls. For example, in many existing embedded applications, the dynamic memory allocation
routine, malloc, is called often. It can create an undesirable side effect called memory fragmentation. This
generic memory allocation routine, depending on its implementation, might impact an application's
performance. In addition, it might not support the allocation behavior required by the application.
Many embedded devices (such as PDAs, cell phones, and digital cameras) have a limited number of
applications (tasks) that can run in parallel at any given time, but these devices have small amounts of
physical memory onboard. Larger embedded devices (such as network routers and web servers) have
more physical memory installed, but these embedded systems also tend to operate in a more dynamic
environment, therefore making more demands on memory. Regardless of the type of embedded system,
the common requirements placed on a memory management system are minimal fragmentation, minimal
management overhead, and deterministic allocation time.
Dynamic Memory Allocation in Embedded Systems
It is known that the program code, program data, and system stack occupy the physical memory after
program initialization completes. Either the RTOS or the kernel typically uses the remaining physical
memory for dynamic memory allocation. This memory area is called the heap . Memory management in
the context of this chapter refers to the management of a contiguous block of physical memory, although
the concepts introduced in this apply to the management of non-contiguous memory blocks as well. These
concepts also apply to the management of various types of physical memory. In general, a memory
management facility maintains internal information for a heap in a reserved memory area called the
control block. Typical internal information includes:
the starting address of the physical memory block used for dynamic memory allocation,
the overall size of this physical memory block, and
the allocation table that indicates which memory areas are in use, which memory areas are free,
and the size of each free region.
Memory Fragmentation and Compaction
In the example implementation, the heap is broken into small, fixed-size blocks. Each block has a unit
size that is power of two to ease translating a requested size into the corresponding required number of
units. In this example, the unit size is 32 bytes. The dynamic memory allocation function, malloc, has an
input parameter that specifies the size of the allocation request in bytes. malloc allocates a larger block,
which is made up of one or more of the smaller, fixed-size blocks. The size of this larger memory block is
at least as large as the requested size; it is the closest to the multiple of the unit size. For example, if the
allocation requests 100 bytes, the returned block has a size of 128 bytes (4 units x 32 bytes/unit). As a
result, the requestor does not use 28 bytes of the allocated memory, which is called memory
fragmentation. This specific form of fragmentation is called internal fragmentation because it is internal
to the allocated block.
The allocation table can be represented as a bitmap, in which each bit represents a 32-byte unit. Figure
shows the states of the allocation table after a series of invocations of the malloc and free functions. In this
example, the heap is 256 bytes.
Figure: States of a memory allocation map.
Step 6 shows two free blocks of 32 bytes each. Step 7, instead of maintaining three separate free blocks,
shows that all three blocks are combined to form a 128-byte block. Because these blocks have been
combined, a future allocation request for 96 bytes should succeed.
Figure below shows another example of the state of an allocation table. Note that two free 32-byte blocks
are shown. One block is at address 0x10080, and the other at address 0x101C0, which cannot be used for
any memory allocation requests larger than 32 bytes. Because these isolated blocks do not contribute to
the contiguous free space needed for a large allocation request, their existence makes it more likely that a
large request will fail or take too long. The existence of these two trapped blocks is considered external
fragmentation because the fragmentation exists in the table, not within the blocks themselves. One way to
eliminate this type of fragmentation is to compact the area adjacent to these two blocks. The range of
memory content from address 0x100A0 (immediately following the first free block) to address 0x101BF
(immediately preceding the second free block is shifted 32 bytes lower in memory, to the new range of
0x10080 to 0x1019F, which effectively combines the two free blocks into one 64-byte block. This new
free block is still considered memory fragmentation if future allocations are potentially larger than 64
bytes. Therefore, memory compaction continues until all of the free blocks are combined into one large
chunk.
Memory compaction is allowed if the tasks that own those memory blocks reference the blocks using
virtual addresses. Memory compaction is not permitted if tasks hold physical addresses to the allocated
memory blocks.
In many cases, memory management systems should also be concerned with architecture-specific
memory alignment requirements. Memory alignment refers to architecture-specific constraints imposed
on the address of a data item in memory. Many embedded processor architectures cannot access multi-
byte data items at any address. For example, some architecture requires multi-byte data items, such as
integers and long integers, to be allocated at addresses that are a power of two. Unaligned memory
addresses result in bus errors and are the source of memory access exceptions.
Some conclusions can be drawn from this example. An efficient memory manager needs to perform the
following chores quickly:
Determine if a free block that is large enough exists to satisfy the allocation request. This
work is part of the malloc operation.
Update the internal management information. This work is part of both
the malloc and free operations.
Determine if the just-freed block can be combined with its neighboring free blocks to form a
larger piece. This work is part of the free operation.
The structure of the allocation table is the key to efficient memory management because the structure
determines how the operations listed earlier must be implemented. The allocation table is part of the
overhead because it occupies memory space that is excluded from application use. Consequently, one
other requirement is to minimize the management overhead.
Chapter4
VHDL stands for very high-speed integrated circuit hardware description language. It is a programming
language used to model a digital system by dataflow, behavioral and structural style of modeling. This
language was first introduced in 1981 for the department of Defense (DoD) under the VHSIC program.
- The language not only defines the syntax but also defines very clear simulation
semantics for each language construct.
- Quick Time-to-Market
- Concurrency.
- Supports Hierarchies.
Concurrency
- To ensure that design is correct as per the specifications, the designer has to write
another program known as “TEST BENCH”.
- It generates a set of test vectors and sends them to the design under test (DUT).
- Also gives the responses made by the DUT against a specifications for correct results to
ensure the functionality.
- Example:
Supports Hierarchies:
- Example :
Levels of Abstraction:
- In this style of modeling the flow of data through the entity is expressed using
concurrent signal assignment statements.
Structural level
Behavioral level.
- This style of modeling specifies the behavior of an entity as a set of statements that are
executed sequentially in the specified order.
VHDL Identifiers:
- A basic identifier may contain only capital ‘A’ - ’Z’ , ‘a’ - ’z’, ‘0’ - ’9’, underscore
character ‘_’.
Objects:
Type
Major Types
- Major types
- Composite Types
Scalar Types
Integer
- Maximum range of integer is tool dependent type integer is range
implementation_defined.
- For example:
Floating Point:
Physical
Enumeration
- Example:
Composite Types
Array:
- The synthesis of multidimensional array depends upon the synthesizer being used.
Record:
type std_logic is (‘U’, ‘X’, ‘0’, ‘1’, ‘Z’, ‘W’, ‘L’, ‘H’,’-’)
‘u’ unspecified
‘x’ unknown
Alias:
- Syntax :
- Examples:
Signal Array:
- A set of signals may also be declared as a signal array which is a concatenated set of
signals.
- Example:
Subtype
- Useful for range checking and for imposing additional constraints on types.
Syntax:
Operators
2. relational operators:
3. shift operators:
4. adding operators:
6. multiplying operators:
7. miscellaneous operators:
Multi-Dimensional Arrays
Syntax
For example:
For synthesizers which do not accept multidimensional arrays,one can declare two uni-
dimensional arrays.
For example:
Dataflow Level
8. A Dataflow model specifies the functionality of the entity without explicitly specifying
its structure.
9. This functionality shows the flow of information through the entity, which is expressed
primarily using concurrent signal assignment statements and block statements.
10. The primary mechanism for modeling the dataflow behavior of an entity is using the
concurrent signal assignment statement.
Entity
12. The interconnections of the design unit with the external world are enumerated.
Entity<entity_name > is
…………………….
);
End <entity_name>;
15. These modes describe the different kinds of interconnections that the port can have with
the external circuitry.
Entity andgate is
a: in bit;
b : in bit
);
End andgate;
Architecture:
z : out std_logic 1 1 1
);
End andgate;
architecture arc_andgate of andgate is
begin
z <= x and y;
end arc_andgate;
# Write the VHDL code for full adder circuit
Library ieee;
use ieee.std_logic_1164.all;
Entity half_adder is
Port(
a, b: in std_logic;
c, s : out std_logic;
);
End half_adder;
Signals
23. Syntax: signal signal_name <list of signals > : type := initial_value;
24. Equivalent to wires.
25. Connect design entities together and communicate changes in values within a design.
26. Computed value is assigned to signal after a specified delay called as Delta Delay.
27. Signals can be declared in an entity (it can be seen by all the architectures), in an
architecture (local to the architecture), in a package (globally available to the user of the
package) or as a parameter of a subprogram (I.e. function or procedure).
28. Signals have three properties attached to it.
Type and Type attributes,value,Time (It has a history).
29. Signal assignment be done by using assignment operator: ‘<=‘.
30. Signal assignment is concurrent outside a process & sequential within a process.
# Write the VHDL code for full adder circuit.
Library ieee;
use ieee.std_logic_1164.all;
Entity full_adder is
Port(
a, b, c: in std_logic;
carry, sum : out std_logic;
);
End full_adder;
architecture arc_full_adder of full_adder is
signal x, y, z : std_logic;
begin
x<= a xor b;
sum<= x xor c;
y<= x and c;
z<= a and b;
carry <= y or z;
end arc_full_adder;
Structural Modeling:
31. An entity is modeled as a set of components connected by signals, that is, as a netlist.
32. The behavior of the entity is not explicitly apparent from its model.
33. The component instantiation statement is the primary mechanism used for describing such a
model of an entity.
34. A component instantiated in a structural description must first be declared using a component
declaration.
35. A larger design entity can call a smaller design unit in it.
36. This forms a hierarchical structure.
37. This is allowed by a feature of VHDL called component instantiation.
38. A component is a design entity in itself which is instantiated in the larger entity.
39. A component is a design entity in itself which is instantiated in the larger entity.
40. Syntax:
component <component_name >
port (
<port_name>: <mode> <type>;
…………………………………
);
end component;
use ieee.std_logic_1164.all;
entity and3gate is
port
(
o : out std_logic;
i1 : in std_logic;
i2 : in std_logic;
i3 : in std_logic
);
end and3gate;
Library ieee;
use ieee.std_logic_1164.all;
entity andgate is
port
( c : out std_logic;
a : in std_logic;
b: in std_logic;
);
end andgate;
architecture arch_andgate of angate is
begin
c<=a and b;
end arch_andgate;
entity xorgate is
port
( c : out std_logic;
a : in std_logic;
b: in std_logic;
);
end xorgate;
architecture arch_xorgate of xorgate is
begin
c<=a xor b;
end arch_xorgate;
entity halfadder is
port
( carry : out
std_logic; Sum: out
std_logic; a : in
std_logic;
b: in std_logic;
);
end halfadder;
architecture arch_halfadder of halfadder is
component andgate
port
(
a, b : in std_logic;
c: out std_logic;
);
End component;
Component xorgate
(
a, b : in std_logic;
c : out std_logic;
);
End component;
Begin
U0: andgate portmap(carry, a, b);
U1: xorgate portmap (sum, a, b);
End arch_halfadder;
# Write the VHDL code to design the full adder circuit using gates as the component.
#Write a VHDL code to implement the full adder using two half adder.
Library ieee;
use ieee.std_logic_1164.all;
entity halfadder is
port
( s, c : out std_logic;
x : in std_logic;
y: in std_logic;
);
end halfadder;
architecture arch_halfadder of halfadder is
begin
s<=x xor y;
c<= x and y;
end arch_halfadder;
entity fulladder is
port
( a,b,c : in std_logic;
Sum, carry : out std_logic;
);
end fulladder;
architecture arch_fulladder of fulladder is
component halfadder
port
( x,y : in std_logic;
s,c : out std_logic
);
End component;
Signal b1, b2, b3 : std_logic;
begin
U1: halfadder portmap(b,c,b2,b1);
U2: halfadder portmap (b2,a,sum,b3)
carry<= b1 or b2;
end arch_fulladder;
# Write the VHDL code for 4-bit binary parallel adder.
Library ieee;
use ieee.std_logic_1164.all;
entity fulladder is
port
( sum, carry : out std_logic;
a,b c. : in std_logic;
);
end halfadder;
architecture arc_full_adder of full_adder is
signal x, y, z : std_logic;
begin
x<= a xor b;
sum<= x xor c;
y<= x and c;
z<= a and b;
carry<= y or z;
end arc_full_adder;
entity BPA4 is
port
( A,B: in std_logic_vector(3 down to 0);
Sum_bpa: out std_logic_vector(3 down to 0);
Cin : in std_logic;
Cout: out std_logic;
);
end BPA4;
architecture arc_BPA4 of BPA4 is
component fulladder
port
( sum, carry : out std_logic;
a,b c. : in std_logic;
);
End component;
signal c1, c2, c3 : std_logic;
begin
U0: fulladder portmap(A(0),B(0), Cin, sum(0),c1);
U1: fulladder portmap(A(1),B(1), C1, sum(1),c2);
U2: fulladder portmap(A(2),B(2), C2, sum(2),c2);
U0: fulladder portmap(A(3),B(3), C3, sum(3),Cout);
end arc_BPA4;
Concatenation
43. This is the process of combining two signals into a single set which can be individually addressed.
44. The concatenation operator is ‘&’.
45. A concatenated signal’s value is written in double quotes whereas the value of a single bit signal
is written in single quotes.
Decision-making statements:
If statements:
If (expression) then
S1
Elseif (expression) then
S2
Elseif (expression) then
S3
Elseif (expression) then
S4
………………..
…………………
…………………
Elseif (expression) then
Sn
Else
Sn+1
End if;
With-Select
49. Example:
entity mux2 is
port
( i0, i1 : in bit_vector(1 downto 0);
y : out bit_vector(1 downto 0);
sel : in bit
);
end mux2;
architecture behaviour of mux2 is
begin
with sel select y <= i0 when '0',
i1 when '1';
end behaviour;
When-Else
50. syntax :
Signal_name<= expression1 when condition1
else expression2 when condition2
else expression3;
51. Example:
entity tri_state is
port
( a, enable : in std-
logic; b : out std_logic
); end tri_state;
architecture beh of tri_state is
begin
b <= a when enable =‘1’
else ‘Z’;
end beh;
Library ieee;
use ieee.std_logic_1164.all;
entity decoder is
port
( SW : in std_logic_vector (1 down to 0);
Q : out std_logic_vector(3 down to 0);
);
end decoder;
architecture arc_decoder of decoder is
begin
if (SW = “00”) then
Q<= “0001” ;
elseif (SW = “01”) then
Q<= “0010” ;
elseif (SW = “10”) then
Q<= “0100”;
else
Q<= “1000”;
Endif;
end arc_decoder;
Library ieee;
use ieee.std_logic_1164.all;
entity enecoder is
port
( Q : out std_logic_vector (1 down to 0);
D: in std_logic_vector(3 down to 0);
);
end encoder;
architecture arc_encoder of encoder is
begin
if (D = “0001”) then
Q<= “00” ;
elseif (Q = “0001”) then
Q<= “01” ;
elseif (D= “10”) then
Q<= “10”;
else
Q<= “11”;
Endif;
end arc_encoder;
Library ieee;
use ieee.std_logic_1164.all;
entity MUX is
port
( Q : out std_logic;
I0,I1,I2,I3 : in
std_logic;
SL: in std_logic_vector (1 down to 0);
);
end MUX;
architecture arc_MUX of MUX is
begin
if (SL= “00”) then
Q<= I0;
elseif (SL= “01”) then
Q<= I1 ;
elseif (SL= “10”) then
Q<= I2;
else
Q<= I4;
Endif;
end arc_MUX;
Library ieee;
use ieee.std_logic_1164.all;
entity DMUX is
port
( Din : in std_logic;
Y0,Y1,Y2,Y3 : out
std_logic;
SL: in std_logic_vector (1 down to 0);
);
end DMUX;
architecture arc_DMUX of DMUX is
begin
if (SL= “00”) then
Q0<= Din;
elseif (SL= “01”) then
Q1<= Din ;
elseif (SL= “10”) then
Q2<= Din;
else
Q3<= Din;
Endif;
end arc_MUX;
Do your self
Process Statement:
process_declarations
begin
sequential_statements
- The process statement represents the behavior of some portion of the design. It consists of the
sequential statements whose execution is made in order defined by the user.
- Each process can be assigned an optional label.
- The process declarative part defines local items for the process and may contain declarations of:
subprograms, types, subtypes, constants, variables, files, aliases, attributes, use clauses and group
declarations. It is not allowed to declare signals or shared variables inside processes.
- The statements, which describe the behavior in a process, are executed sequentially, in the order
in which the designer specifies them. The execution of statements, however, does not terminate
with the last statement in the process, but is repeated in an infinite loop. The loop
can be suspended and resumed with wait statements. When the next statement to be
executed is a wait statement, the process suspends its execution until a condition supporting the
wait statement is met. See respective topics for details.
- A process declaration may contain optional sensitivity list. The list contains identifiers of signals
to which the process is sensitive. A change of a value of any of those signals causes the
suspended process to resume. A sensitivity list is a full equivalent of a wait on sensitivity_list
statement at the end of the process. It is not allowed, however, to use wait statements and
sensitivity list in the same process. In addition, if a process with a sensitivity list calls a
procedure, then the procedure cannot contain any wait statements.
Sequential Circuits-Gated D Latch
Positive-edge-triggered D Flip-Flop
VHDL Code for a D Flip-Flop with Asynchronous Reset
VHDL Code for a T Flip Flop