Assembly language fundamentals
Chapter four
Outline
Introduction to Assembly Language
Basic Elements of Assembly Language
Assembly, Machine, and High-Level Languages
Defining Data Types
Assembly Directives
Assembly Language Programming Tools
Introduction
Levels of Programming Languages
1) Machine Language
Consists of individual instructions that will be executed by the CPU one at a
time
2) Assembly Language (Low Level Language)
Designed for a specific family of processors (different processor
groups/family has different Assembly Language)
Consists of symbolic instructions directly related to machine language
instructions one-for-one and are assembled into machine language.
3) High Level Languages
e.g. : C, C++ and Vbasic
Designed to eliminate the technicalities of a particular computer.
Statements compiled in a high level language typically generate many
low-level instructions.
HLL programs are machine independent.
They are easy to learn, easy to use, and convenient for managing complex tasks.
Advantages of Assembly Language
1. Shows how program interfaces with the
processor, operating system, and BIOS(basic
input/output system).
2. Shows how data is represented and stored in
memory and on external devices.
3. Clarifies how processor accesses and
executes instructions and how instructions
access and process data.
4. Clarifies how a program accesses external
devices.
Reasons for using Assembly Language
1. A program written in Assembly Language requires
considerably less memory and execution time than one
written in a high –level language.
2. Assembly Language is useful for implementing system
software and also useful for small embedded system
applications
3. Assembly Language gives a programmer the ability
to perform highly technical tasks that would be
difficult, if not impossible in a high-level language.
4. Although most software specialists develop new applications
in high-level languages, which are easier to write and
maintain, a common practice is to recode in assembly
language those sections that are time-critical.
5. Resident programs (that reside in memory while
Assembly vs HLL
Basic Elements of Assembly Language
Integer constants
Integer expressions
Character and string constants
Reserved words and identifiers
Directives and instructions
Labels
Mnemonics and Operands
Comments
Examples
7
Integer Constants
Optional leading + or – sign
binary, decimal, hexadecimal, or octal digits
Common radix characters:
h – hexadecimal
d – decimal
b – binary
r – encoded real
Examples: 30d, 6Ah, 42, 1101b
Hexadecimal beginning with letter: 0A5h 8
Integer Expressions
Operators and precedence levels:
Examples:
9
Character and String Constants
Enclose character in single or double quotes
'A', "x"
ASCII character = 1 byte
Enclose strings in single or double quotes
"ABC"
'xyz'
Each character occupies a single byte
Embedded quotes:
'Say "Goodnight," Gracie'
10
Reserved Words and Identifiers
Reserved words cannot be used as identifiers
Instruction mnemonics(such as MOV, ADD, and MUL),
directives, type attributes, operators, predefined symbols
Identifiers
1-247 characters, including digits
case insensitive (by default)
first character must be a letter, _, @, or $
Examples: var1, Count, $first, _main, MAX ,
open_file, xVal
11
Directives
Commands that are recognized and acted upon by the
assembler
Not part of the Intel instruction set
Used to declare code, data areas, select memory model,
declare procedures, etc.
E.g. myVar DWORD 26 ; DWORD directive
move ax, myVar ; MOV instruction
Different assemblers have different directives
NASM != MASM, for example
12
Directives
In MASM, directives are case insensitive.
different types directives
Defining Segments: One important function of assembler directives is to define
program section, or segments.
.DATA directive identifies the area of a program containing variables:
.data
.CODE directive identifies the area of a program containing
instructions:
.code
.STACK directive identifies the area of a program holding the runtime stack, setting its
size:
.stack 1000h
Directives
Proc: End :
Directive identifies the beginning of a Directive marks the last line of the program to be
procedure assembled. It identifies the name of the program’s
startup procedure (the procedure that starts the
Endp: program execution.) Procedure main is the startup
procedure.
Directive marks the end of the Title:
procedure Directive marks the entire line as a comment
.model
directive instructs the assembler to generate
code for a protected mode program, and
STDCALL enables the calling of MS-
Windows functions.
Flat, small
Instructions
An instruction is a statement that becomes executable
We use the Intel IA-32 instruction set
when a program is assembled.
Syntax:
Instructions are translated by the assembler into
machine language bytes, which are loaded and [label] mnemonic(opcode) operand(s) [;comment]
executed by the CPU at run time. label optional
The major two fields are: instruction mnemonic required: such as MOV, ADD,
SUB, MUL
Opcode field which stands for operation code
operands usually required
and it specifies the particular operation that is to
comment optional
be performed. An instruction contains:
Each operation has its unique opcode.
Label
Operands fields which specify where to get the
source and destination operands for the
Mnemonic
operation specified by the opcode. Operand
The source/destination of operands can be a constant, Comment
the memory or one of the general-purpose registers. 15
Labels
Act as place markers Code label
marks the address (offset) of target of jump and loop
code and data instructions
Follow identifier rules example: L1:
MOV ax, bx …
Data label
JMP L1
must be unique
(followed by colon)
example:
count DWORD 100
(not followed by colon)
16
Mnemonics and Operands
Instruction Mnemonics
"reminder"
examples: MOV, ADD, SUB, MUL, INC, DEC
Operands
constant (immediate value i.e. 4 or 0-9)
constant expression(2*4)
Register(eax, ax)
memory (data label)
17
Comments
Comments are good! Multi-line comments
explain the program's purpose
begin with COMMENT directive and
a programmer-chosen character
when it was written, and by end with the same programmer-
whom chosen character
revision information Example:
tricky coding techniques
application-specific COMMENT ! This is a comment.
explanations This line is also a comment. !
Single-line comments
begin with semicolon (;)
Instruction Format Examples
No operands
stc ; set Carry flag
One operand
inc eax ; register
inc myByte ; memory
Two operands
add ebx,ecx ; register, register
sub myByte,25 ; memory, constant
add eax,36 * 25 ; register, constant-expression
Suggested Coding Standards
Some approaches to capitalization
capitalize nothing
capitalize everything
capitalize all reserved words, including instruction mnemonics and
register names
capitalize only directives and operators
Other suggestions
descriptive identifier names
spaces surrounding arithmetic operators
blank lines between procedures
Suggested Coding Standards
Indentation and spacing
code and data labels – no indentation
executable instructions – indent 4-5 spaces
comments: begin at column 40-45, aligned vertically
1-3 spaces between instruction and its operands
ex: mov ax,bx
1-2 blank lines between procedures
Program Template
TITLE Program Template ([Link])
; Program Description:
; Author:
; Creation Date:
; Revisions:
; Date: Modified by:
INCLUDE [Link]
.data; (insert variables here)
.code
main PROC ; (insert executable instructions here)
exit ;exit to operating system
main ENDP; (insert additional procedures here)
END main
Example: Adding and Subtracting Integers
TITLE Add and Subtract ([Link])
; This program adds and subtracts 32-bit integers.
INCLUDE [Link]
.code
main PROC
mov eax,10000h ; EAX = 10000h
add eax,40000h ; EAX = 50000h
sub eax,20000h ; EAX = 30000h
call DumpRegs ; display
registers//EAX=00030000
exit
main ENDP
END main
Assembly Language Programming Tools
Software tools are needed for editing, assembling, linking, and
debugging assembly language programs
An assembler is a program that converts source-code
programs written in assembly language into object files in
machine language
Popular assemblers includes …
TASM (Turbo Assembler from Borland)
NASM (Netwide Assembler for both Windows and Linux), and
GNU assembler distributed by the free software foundation
MASM- Microsoft Macro Assembler
Linker and Link Libraries
You need a linker program to produce executable files
It combines your program's object file created by the assembler with
other object files and link libraries, and produces a single executable
program
[Link] is the linker program provided with the MASM distribution
for linking 32-bit programs
We will also use a link library for input and output
Called [Link] developed by Kip Irvine
Works in Win32 console mode under MS-Windows
Assemble and Link Process
Source Object
File Assembler File
Source Object Executable
File Assembler File Linker
File
Source Object Link
File Assembler File Libraries
A project may consist of multiple source files
Assembler translates each source file separately into an object file
Linker links all object files together with link libraries
Debugger
Allows you to trace the execution of a program
Allows you to view code, memory, registers, etc.
Example: 32-bit Windows debugger
Editor
Allows you to create assembly language source files
Some editors provide syntax highlighting features and
can be customized as a programming environment
Notepad, visual studio 2010 C++ express
Defining Data
Intrinsic Data Types
Data Definition Statement
Defining BYTE and SBYTE Data
Defining WORD and SWORD Data
Defining DWORD and SDWORD Data
Defining QWORD Data
Defining TBYTE Data
Defining Real Number Data
Little Endian Order
Declaring Uninitialized Data
Intrinsic Data Types
Intrinsic Data Types QWORD
BYTE, SBYTE 8-bit unsigned
64-bit integer
integer;
TBYTE
80-bit integer
8-bit signed integer
WORD, SWORD
REAL4
4-byte IEEE short real
16-bit unsigned & signed integer
DWORD, SDWORD
REAL8
8-byte IEEE long real
32-bit unsigned & signed integer
REAL10
10-byte IEEE extended real
Data Definition Statement
A data definition statement sets Defining BYTE and SBYTE
aside storage in memory for a Data
variable. o May optionally assign a
name (label) to the data o Syntax:
[name] directive initializer [,initializer]
...
Example: value1 BYTE 10 Defining Byte Arrays
All initializers become binary data in • Examples: use multiple initializers
memory
list1 BYTE 10, 20, 30, 40
Defining Strings Examples:
A string is implemented as an array str1 BYTE "Enter your name", 0
of characters str2 BYTE 'Error: halting program',
For convenience, it is usually 0
enclosed in quotation marks str3 BYTE 'A','E','I','O','U'
It often will be null-terminated greeting BYTE "Welcome to the
(containing 0). Strings of this type are Encryption program " BYTE
used in C, C++, and Java programs. "created by Kip Irvine.", 0
Using the DUP Operator
Use DUP to allocate (create space for) an array or string.
Syntax:
counter DUP ( argument )
Counter and argument must be constants or constant
expressions
Examples:
Defining WORD and SWORD Data • Defining DWORD and SDWORD Data
Defining WORD and SWORD Data o Defining DWORD and SDWORD Data
Define storage for 16-bit integers, single Storage definitions for signed and unsigned
value or multiple valus 32-bit integers
Defining QWORD, TBYTE, Real Number Data
Defining QWORD, TBYTE, Real Data
Storage definitions for quadwords, tenbyte
values, and real numbers
Little Endian Order
Little Endian Order: Big Endian Order
All data types larger than a byte store
their individual bytes in reverse order
val1 DWORD 12345678h
The least significant byte occurs at the
first (lowest) memory address
Example:
val1 DWORD 12345678h
Symbolic Constants
Associate and identifier (a Equal-Sign Directive
Syntax: name = expression
symbol) with an integer expression is a 32-bit integer (expression or
expression or some text
constant)
may be redefined
Symbols do not reserve name is called a symbolic constant
storage good programming style to use
Used only by the assembler symbols
COUNT = 500 . .
when scanning a program mov al, COUNT
Cannot change at run time
EQU Directive TEXTEQU Directive
Define a symbol as either an integer or text
expression.
Define a symbol as either an integer or
Cannot be redefined text expression.
Syntax Called a text macro
name EQU expression ; integer expression Can be redefined
name EQU symbol ; existing symbol name
continueMsg TEXTEQU <"Do you wish to continue (Y/N)?">
name EQU <text> ; any text
rowSize = 5
Example .data
matrix EQU 10 * 10 prompt1 BYTE continueMsg
PI EQU <3.1416> count TEXTEQU %(rowSize * 2) ; evaluates the expression
pressKey EQU <"Press any key to continue...",0> setupAL TEXTEQU <mov al,count>
.data .code
prompt BYTE pressKey setupAL ; generates: "mov al,10"
MI WORD matrix
Outline( lab)
Tools and setups
Notepad, notepad++, any other
Assembler (MASM)
Linker
Assembling linking
Step in execution
Registers and memory
Tools
DOSBox: download DOSBox 0.74 and install.
Notepad: we can use notepad editor, in addition there are
another editor like visual studio c++ express and other.
Assembler (MASM): its task is to assemble the programs written
in assembly language. It generate an object file for separate files.
Alternatively you can download 8086 MASM assembler which
contains all the tools in it.
Linker: link the object file with the link library
Debug: it helps for different purpose
How to run?
First write the code in notepad, for simplicity.
Open your DOSBox
To mount to the directory where your file is found write in the
command as
Mount c c:\ or directly to your folder as, mount c c:/foldername
Then, type c:
Then it would be in your directory.
Then to assemble, type masm [Link], enter until c:\> is seen.
Then to link, type link filename
Finally use debug or afdebug command to execute the program. C:\
>debug [Link]
-t ; for single step execution
-g ; for at a time execution
-I ; for restarting the program execution
-d ; to see the data segment
-q ; to quit the execution
-g ; complete execution of program in single step.
-t ; Stepwise execution.
-d ds: starting address or ending address ; To see data in memory locations
-p ; Used to execute interrupt or procedure during stepwise execution of program
Or just type [Link] and enter then press ? And enter.
Or just type , [Link]
Then type, ? Then the following command will be
shown.
Example
You can write codes in the [Link] command
E.g. addition of two numbers
Push, pop and xchg
Decrement, subtraction and increment
First once you mount c:\>8086> [Link]
Other way
E.g displaying the text “hello world”
Addition of two numbers
Interrupt instructions
MS-DOS uses INT 21H for its main API functions which provide a
low-level interface to the devices-reading input from
keyboard, writing to terminal, create/read/write files and directories
etc. MS-DOS uses other interrupts to provide other services.
INT is an assembly language instruction for x86 processors that
generates a software interrupt. ... For example, INT 21H will generate
the software interrupt0x21 (33 in decimal), causing the function
pointed to by the 34th vector in theinterrupt table to be executed,
which is typically an MS-DOS API call.
INT 03H: Breakpoint Interrupt. The INT 03H vector is used by
debugging utilities in order to intercept execution when it reaches a
user-selected address. The opcode for INT 03H is one byte (c0H), so
it can lay over top of the start of any CPU instruction, without any
chance of overwriting the code that follows it.
mov ah,4ch is the first line of assembler code. The value 4C in
hexadecimal is stored in the register AH. int 21h is the second line
of assembler code. The software interrupt 21h is called. This
interrupt, when given the value of 4ch in AH (as is the case here),
causes the program to exit immediately.