0% found this document useful (0 votes)
41 views6 pages

GPU: Revolutionizing Computational Speed

Uploaded by

rahul.j.lawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views6 pages

GPU: Revolutionizing Computational Speed

Uploaded by

rahul.j.lawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

An Introduction to

Graphical Processing Unit


(Department of Computer Applications ,Chandigarh University)India

ABSTRACT
Today’s world requires maximum computing
speed. The progress that the CPU has achieved unit (GPU) is rapidly gaining maturity as a powerful
over the past 2 decades, though tremendous, has engine for computationally demanding applications.
now reached a point of stagnation. To overcome GPU hides latency with computation not with cache!
this, a new highly parallel and multithreading The GPU‟s performance and potential offer a great
processor optimized for high degree of deal of promise for future computing systems. One of
computation was introduced, which was named as the most important challenges for GPU computing is
the Graphics Processing Unit (GPU) by NVIDIA to connect with the mainstream fields of processor
or the Visual Processing Unit (VPU). A Graphics architecture and programming systems, as well as
Processing Unit (GPU) is a single-chip processor learn from the parallel computing experts of the past.
primarily used to manage and boost the
The GPU is a chip that functions on the
performance of video and graphics. This paper
same principle as the CPU with the one important
talks about the reasons for choosing GPU to
difference that it has nothing to do with any part of
accelerate the computation. This paper also states
the system that is not part of the graphics package on
where GPU will work more efficiently than the
the computer. GPU is essentially a CPU that is
CPU.
specifically designed and dedicated to the control of
graphics. The end result is an easier to control
Keywords – CPU, data parallelism, GFLOPS, graphics package and better response time based on
GPU, SPMD computer commands. Games with intensive graphics
end up running a lot quicker and multimedia that you
I. INTRODUCTION find at online websites tend to be a lot better as well.
CPU frequency growth is now limited by physical The advantages of having a GPU are therefore quite
matters and high power consumption. Their easy to notice from those outcomes and that is why
performance is often raised by increasing the number people are now clamoring to have GPU devices
of cores. Present day processors may contain up to installed into their computers.
four cores (further growth will not be fast), and they
are designed for common applications, they use When we compare GPUs with CPUs over
MIMD (multiple instructions / multiple data). Each the last decade in terms of Floating point operations
core works independently of the others, executing (FLOPs), we see that GPUs appear to be far ahead of
various instructions for various processes the CPUs as shown in Fig.1.
The GPU is a specialized processor efficient GPUs came into existence with only image
at manipulating and displaying computer graphics. and graphics computation in mind. But now GPUs
The term was defined and popularized by Nvidia as has evolved into an extremely flexible and powerful
“a single chip processor with integrated transform, processor in terms of
lighting, triangle setup/clipping, and rendering
engines that is capable of processing a minimum 10 • Programmability
million per seconds”[1]. There are mathematically- • Precision
intensive tasks, complex algorithms which would put • Performance
quite a strain on the CPU. GPU lifts this burden from
So GPUs are well suited for fast, efficient, non
the CPU and frees up cycles that can be used for graphical computing too.
other jobs. The highly parallel graphics processing
good thing and this is why the GPU has become very
popular in recent years. GPU computing is on the rise
and continuing to grow in popularity and that makes
the future very friendly for it indeed.

In GPU computing the CPU calculations are


replaced by Graphics Processing Units. Migrating
large scale algorithms and entire kernel onto the GPU
co-processors help in arriving at the answer much
faster and thus decreases the processing time. GPUs
are never a completed replacement for CPUs but
complementary. Parallel operation of CPU and GPU
has found to increase the performance. CPUs offload
the tasks which are better performed by GPU leading
to high performance computing. GPU excel CPUs in
certain computational tasks.

Fig.1: CPU GPU Performance growth [2]

II. CPU GPU COMPARISON


The CPU or Central Processing Unit is where all the
program instructions are executed in order to derive
the necessary data. The advancement in modern day
CPUs have allowed it to crunch more numbers than
ever before, but the advancement in software Fig.2: CPU GPU Comparison [3]
technology meant that CPUs are still trying to catch
up. A Graphics Processing Unit or GPU is meant to Whether it is CPU or GPU every processing
alleviate the load of the CPU by handling all the unit has its own memory (cache) and shared memory
advanced computations necessary to project the final (DRAM). Since it is very hard to transfer data
display on the monitor. between these structures one should avoid using
complex data structures and messaging in their
Originally, CPUs handle all of the parallel algorithms. As it can be seen in Fig. 2, GPU
computations and instructions in the whole computer, has many ALU (Arithmetic Logic Unit) as compared
thus the use of the word „central‟. But as technology to the CPU. So, it is able to perform multiple
progressed, it became more advantageous to take out instructions execution at the same time. This provides
some of the responsibilities from the CPU and have it the GPU with a high degree of parallelism which aids
performed by other microprocessors. efficient computation. But it lacks the cache space.
Owing to these structure differences we can deduce
The GPU is a device that is beneficial
that an algorithm implemented for parallel structures
primarily to people that has intensive graphical
may still work with better performance on a multi-
functions on their computer. In other words, if you
core CPU when it could not be efficiently
just use Microsoft Office and the e-mail page of your
parallelized on GPU.
browser when you are on the computer, chances are
very good that the GPU will not add that much to
your computing experience. However, if you play III. WORKING OF GPU
video games and look at videos on the internet on a The programmable units of the GPU follow a single
frequent basis, what you will discover is that program multiple-data (SPMD) programming model.
installing a GPU onto your computer will greatly For efficiency, the GPU processes many elements
improve the performance you get out of the entire (vertices or fragments) in parallel using the same
thing. Improving computer performance is always a program. Each element is independent from the other
elements, and in the base programming model, some rearrangement of data layout or data access
elements cannot communicate with each other. All patterns.
GPU programs must be structured in this way: Many
parallel elements each processed in parallel by a The GPU offers multiple memory spaces
single program. Each element can operate on 32-bit that can be used to exploit common data-access
integer or floating point data with a reasonably patterns: in addition to the global memory, there are
complete general-purpose instruction set. Elements constant memory (read-only, cached), texture
can read data from a shared global memory and, with memory (read-only, cached, optimized for
the newest GPUs, also write back to arbitrary neighboring regions of an array), and per-block
locations in shared global memory. shared memory (a fast memory space within each
warp processor, managed explicitly by the
A graphics processing unit is a dedicated programmer).
graphics rendering device for a personal computer,
workstation, or game console. Modern GPUs are very IV. CPU GPU WORK SHARING
efficient at manipulating and displaying computer For efficient use of the GPU one must find areas in
graphics. But it is also used in general purpose the execution path of the program to send to the
computation in various computation intensive GPU, instead of offloading the entire code to GPU.
algorithms. These algorithms can harness the high This leads to better resource utilization. In short we
multiplicity of the GPU to improve their must use the CPU for operations involving memory
performance. Mapping the general purpose references and logical statements, while the
computations on the GPU is very much similar to computation intensive part of the code must be sent
manipulating the computer graphics. GPU computing to the GPU.
applications are structured in the following way [4]:
1. The programmer directly defines the Since the GPU is a coprocessor on a
computation domain of interest as a separate PCI-Express card, data must first be
structured grid of threads. explicitly copied from the system memory to the
2. An SPMD general-purpose program memory on the GPU board.
computes the value of each thread.
3. The value for each thread is computed by a As shown in Fig.3, the CPU has an input
combination of math operations and both data stream, from where it receives the data to be
read accesses from and write accesses to processed. It has a global memory through which it
global memory. Unlike in the previous two references the tasks to be performed. Whenever a
methods, the same buffer can be used for specific task is selected to be sent to the GPU, the
both reading and writing, allowing more CPU checks for an available unit of the GPU to
flexible algorithms (for example, in-place which it can assign the task. On completion of the
algorithms that use less memory). task the GPU signal the CPU and processing
4. The resulting buffer in global memory can resumes. Since the GPU has multiple such units
then be used as an input in future capable of a high degree of computation, parallelism
computation. is achieved and computation speeds up.

The GPU is organized as multiple SIMD


(Single instruction, multiple data) groups. Within
one SIMD group, all the processing elements execute
the same instruction in synchronization. A set of
threads that execute in this way is called a “warp”.
Branching is allowed, but if threads within a single
warp follow different execution paths, there may be
some performance loss.

Memory interfaces are wide and achieve


highest bandwidth when that access width is fully
utilized. For applications that are memory bound,
this means that all threads in a warp should access Fig.3: Data transfer between GPU and CPU
adjacent data elements when possible. For example,
neighboring threads in a warp should access
neighboring elements in an array. This may require
V. BENEFITS AND LIMITATIONS watts for the GeForce GTX 590. AMD recommends
a 750W or higher power supply with two 150W
Benefits: power connectors for its graphics cards. A 1200W or
1. Application that require large number of parallel higher power supply is recommended for a
threads work well. CrossFireX tandem built out of two Radeon HD
2. Allows use of per-block shared memory. 6990s. NVidia has the following recommendations:
3. Allows use of “data parallelism” in applications. 700 and 1000-watt power supplies for a single
4. Easier to calculate reciprocal and reciprocal GeForce GTX 590 and a SLI tandem, respectively.
square root. Each card has a single connector for building multi-
5. Can perform large amount of computation per GPU configurations. It is located in the top front part
data element. of the PCB.
6. If the synchronization is infrequent then GPU
can handle them. Differences:
Limitations: The Radeon HD 6990 carries two full-featured
1. Applications with limited concurrency do not Cayman GPUs. They are indeed full-featured because
fully utilize the potential of the GPU. dual-processor cards used to be equipped with cut-
2. Even with many threads, if all the threads are down versions of GPUs in the past. The GPU
doing different work then GPU is not utilized frequency of the Radeon HD 6990 is only 50 MHz
fully. lower than that of the Radeon HD 6970 and equals
3. Frequent global synchronization requires an 830 MHz. However, there is a high-speed mode you
expensive global barrier. can trigger by means of the abovementioned switch
4. If there is high degree of point-to-point near the CrossFireX connector. The card's GPU
synchronization among random threads the GPU frequency is 880 MHz in that mode, but AMD says
does not work well. that turning that switch on will make your warranty
5. Frequent communication between CPU and void. The card‟s GPU frequency is lowered to 150
GPU hampers the performance of GPU. MHz in 2D applications to save power.
6. Applications which require small amount of
computation do not work well on GPU. As opposed to AMD, NVidia equips its
GPUs with such caps. The company didn‟t disable
any subunits in the GPUs of its GeForce GTX 590,
VI. COMPARATIVE STUDY OF DIFFERENT
either. Each of the card's GPUs has 512 unified
GPU’S shader processors, 64 texture-mapping units and 48
There is always a constant and tough struggle for raster operators. In other words, we've got two
supremacy among the manufacturers of computer GeForce GTX 580 processors on a single PCB here.
components like CPUs, graphics cards, system Their frequencies are lowered more than those of the
memory modules, coolers, etc. There is fierce AMD Radeon HD 6990, though. The GeForce GTX
competition in each price category, especially among 590 clocks its GPUs at 607/1215 MHz, which is
top-end products. The graphics card market is a vivid 21.4% lower than the clock rates of the GeForce
example of that. Having the world's fastest graphics GTX 580 (772/1544 MHz). The reason for this
card under one‟s belt is not only prestigious but also reduction is clear enough. If NVidia used the clock
profitable because it proves the manufacturer's rates of the GTX 580 for the GTX 590, the latter's
technical superiority and promotes its sales in other heat dissipation and power consumption would be
price sectors. beyond all reasonable limits. The GeForce GTX 590
drops its GPU clock rates to 51/101 MHz in 2D mode
Up to this moment the Nvidia GeForce GTX
as a power-saving measure.
580 has been the fastest single-GPU graphics card
although AMD could offer its dual-processor Radeon The GPU-Z tool is a tool that detects the
HD 5970 as an alternative. On March 8, 2011, AMD CPU, RAM, motherboard chipset, and other
released an even faster dual-GPU product, Radeon hardware features of a modern personal computer,
HD 6990. NVidia hasn‟t taken long to respond and and presents the information in one window. It
has just rolled out its own dual-processor GeForce reports about the two cards as shown in fig4 and fig5.
GTX 590.

Similarities:
They do not differ much in terms of the peak power
draw: 375 watts for the Radeon HD 6990 and 365
frequency of 5000 MHz. The card's memory
frequency is 5000 MHz, too, but is lowered to 600
MHz in 2D mode. The memory bus is 256 bits wide.
The GeForce GTX 590 comes
with Samsung chips that have a rated access time of
0.4 nanoseconds and a rated frequency of 5000 MHz.
However, the card clocks them at 3414 MHz only,
which is 15% lower than the memory frequency of
the GeForce GTX 580 and 10% lower than that of the
GTX 570. The memory frequency is lowered to 270
MHz in 2D mode. The memory bus is 384 bits wide.

VII. FUTURE WORK


GPU has gained over CPUs because they are more
powerful and cheaper compared to CPUs. The main
area of work in future for the GPU will be to reduce
the cost and provide a huge multi-processing
capability to the users all around. The future of GPU
rests in making it as a co-processor which performs
much of the computation for the CPU. This, as of
now, is difficult to do since the languages and
software tools available for combining GPU process
Fig.4: ATI Radeon HD6990 Specifications [5]
with CPU are still in their preliminary stage of
development. So, a major scope for future
improvement is to develop new languages and
software tools specifically to take advantage of high
level of parallelism.

The other thing that can be improved with


engineering is GPU communication with the CPU or
the NIC (Network Interface Card). At present it takes
a longer and slower route to copy contents from the
CPU memory to the GPU blocks and vice-versa. This
uses up a lot of time of computation. But with
engineering we can develop better architectures to
improve the communication speed of the GPU,
reducing altogether the time required for
computation.

Fig.5: NVidia GTX 590 Specifications [5]


The Radeon HD 6990 carries a total of 4
gigabytes of graphics memory (2 gigabytes per each
GPU) whereas the GeForce GTX 590 has 1.5
gigabytes of onboard memory for each GPU or 3
gigabytes in total. As usual, AMD installs Hynix
chips on its Cayman-based reference cards. These
chips have a voltage of 1.5 volts and a rated
VIII. CONCLUSION
Thus we can conclude the following advantages of
GPU over CPU such as
1. GPU‟s contain much larger number of dedicated
ALU‟s then CPU.
2. GPU‟s also contain extensive support of stream
processing paradigm. It is related to SIMD
processing. [4]
3. Each processing unit on GPU contains local
memory that improves data manipulation and
reduces fetch time.
The Graphical Processing Unit is visually
and visibly changing the course of general purpose
computing. The future of the GPU certainly has much
more promise on the horizon than the general-
purpose CPU. Although the GPU will not overtake
the CPU as the main processor, we do think that the
GPU has much more potential for expanding our
computing experience. The CPU has a large amount
of logic dedicated to branch prediction, whereas
stream processing does not require as much of this
type of logic. The GPU is much better at parallelism
than the CPU and as the gap in the transistor rate
expansion continues to grow the GPU parallelism
performance will also continue to grow. The power
of solving these highly parallel problems has
immense implications to the scientific community
because we are the able to change the evolution of
scientific computation from the CPU growth curve
that double every 18 months to GPU growth curve
that doubles about every 6 months.

REFERENCES
[1] Graphics Processing Unit – Wikipedia
([Link]
g_unit)
[2] CPU GPU Trends over time.
([Link]
u-and-gpu-trends-over-time/)
[3] GPGPU: OPENCL vs CUDA vs ArBB
([Link]
vs-cuda-vs-arbb/)
[4] Peter Zalutski – “CUDA–Supercomputing for
masses”
[5] Sergey Lepilov –“Equilibrium: AMD Radeon HD
6990 vs. NVidia GeForce GTX 590.”

You might also like