See discussions, stats, and author profiles for this publication at: [Link]
net/publication/224354069
System-Level Early Power Estimation for Memory Subsystem in Embedded
Systems
Conference Paper · November 2008
DOI: 10.1109/SEC.2008.48 · Source: IEEE Xplore
CITATIONS READS
14 1,515
3 authors:
Jinsong Ji Chao Wang
University of Science and Technology of China University of Science and Technology of China
6 PUBLICATIONS 35 CITATIONS 200 PUBLICATIONS 2,717 CITATIONS
SEE PROFILE SEE PROFILE
Xuehai Zhou
University of Science and Technology of China
256 PUBLICATIONS 3,150 CITATIONS
SEE PROFILE
All content following this page was uploaded by Xuehai Zhou on 04 September 2014.
The user has requested enhancement of the downloaded file.
Fifth IEEE International Symposium on Embedded Computing
System-level Early Power Estimation for Memory Subsystem in Embedded
Systems
Jinsong Ji1,2+ Chao Wang1,2 Xuehai Zhou1,2
1 2
Dept. of Computer Science Embedded System Laboratory
University of Science and Technology of China Suzhou Institute for Advanced Study,USTC
Hefei, Anhui, 230027, China Suzhou, Jiangsu, 215223,China
jsji@[Link] xhzhou@[Link]
saintwc@[Link]
Abstract
Early power estimation is important to guide architec-
tural design, especially for embedded systems. Since the
power consumption of memory subsystem dominates, while
DRAM and NAND Flash are the two main storage medi-
ums nowadays, we analyze the power model of DRAM and
propose a power model for NAND Flash, considering its
system-level behaviors. Experimental results show that the
accuracy of model proposed can be up to 95% .
Figure 1. Typical Memory Hierarchy
1. Introduction
Nowadays, power consumption is already one of the
leading constraints in designing embedded systems[8]. And In this paper, we propose a mathematical power model
memory system cost is a major factor in the total power for typical memory subsystem in embedded systems, which
consumption[12, 7].A share of up to 80% of the total power can help designers gain accurate power budgets in early
has been attributed to the memory subsystem in the sig- stage of architecture design, requiring only the knowledge
nal processing domain [2]. So system designers are paying of system level behaviors and related medium parameters.
more and more attention to memory subsystem powers. We first analyze the main components of modern memory
In order to create power efficient designs, accurate power hierarchy, then describe the power model of DRAM pro-
budgets for the memory subsystem are essential, whether it vided by Micron[5] and propose our model of NAND Flash
is for calculating battery life, planning cooling system, or in details; after that we show our experimental results which
determining the power supply. verified the models presented.
Though there have been several different power estima-
tion techniques[6, 9, 15, 16, 3, 12, 11] and different related
tools, such as SimplePower[14], Wattch[1], JouleTrack[13], 2 System-level Power Models
VPR[10] et al; most of them focus on micro-architecture
level power estimation. System designers usually could not As shown in Fig.1, the typical memory hierarchy of to-
utilize these techniques or tools in early design stage, be- day’s embedded systems consists of three or more type of
cause they may not have enough micro-architecture infor- memory: Cache, RAM, Flash et [Link] power consumption
mation, or enough time to run a whole simulation for the of Cache is usually calculated within controllers or proces-
design. Another problem is that most of the tools focus [Link], we would like to focus on RAM and Flash
on processors or controllers, memory subsystems are ig- in the following sections. As to RAM, though there are sev-
nored,to say nothing about memory hierarchies. eral kinds of RAM, such as SRAM, DRAM, VRAM et al,
978-0-7695-3348-3/08 $25.00 © 2008 IEEE 370
DOI 10.1109/SEC.2008.48
Meanwhile, DRAM have to refresh in the background. We
assume that the device is in precharge power-down state
most time except when the actual REFRESH commands are
executed. Thus, the average extra power for refresh com-
mand is:
PREF = (IREF − Ipre−pdn ) × VDD
2.1.2 Activate Power
All DRAM banks have to be activated (ACT) first before
read/write access. After the ACT command, a large amount
of current is used to decode the command/address and trans-
fer data from the DRAM array to sense [Link] this
is complete, the DRAM is maintained in an active state and
draws Iact−pre until a PRE command is issued. The PRE
command restores the data from the sense amplifiers into
the memory array and resets the bank for the next ACT
command. Once this is complete, the device is returned
to precharge state. This cycle is then repeated at tRC inter-
vals between ACT commands. Notice that CKE is always
held HIGH during this period, we have to subtract the back-
Figure 2. Typical State Machine of DRAM[4]. ground current it draws. Therefore, the activate power is:
PACT = (Iact−pre − Iact−stby ) × VDD
most of them are more or less working in the same mech- But as DRAM may work in lower clock frequency or bank
anism; and the DRAMs are the most prevalent ones, so we interleave mode, the interval between two ACT commands
decided to analyze the DRAM power model as a represen- is not always tRC. Fortunately, it is easy to scale the ACT
tative. As for Flash, many current designs are moving to- current for other modes of operation.
wards NAND Flash to take advantage of its higher density nREAD
and lower cost, so we choose to model NAND Flash instead PACT = (Iact−pre − Iact−stby ) × × VDD
nACT
of NOR Flash.
2.1.3 Read/Write Power
2.1 DRAM Power Model
During read/write access, assume that the currents are
Iread and Iwrite respectively. The the read/write pow-
According to the working mechanisms of DRAM, its
ers can be calculated as follows:
power consists of three key components: background
power, activate power, read/write power. nREAD
PREAD = (IREAD − Iact−stby ) × × VDD
nACT
nW RIT E
2.1.1 Background Power PW RIT E = (IW RIT E − Iact−stby ) × × VDD
nACT
As shown in Fig.2, CKE signal is the master on-off switch But the equations above are not the complete answer. To
of DRAM. When CKE is LOW(CKEL), the DRAM goes drive the outputs, additional DQ currents are required.
into PowerDown state; and when it is HIGH(CKEH), the
DRAM goes into Active state. The current changes during PperDQ = VOU T × IOU T
these transitions even without read/write access. So we can nREAD
PDQ = PperDQ × (nDQ + nDQS ) ×
calculated these background powers easily. nACT
Ppre−pdn = Ipre−pdn × VDD 2.1.4 Total Power of DRAM
Pact−pdn = Iact−pdn × VDD
After showing the basic components above, it is now in
Ppre−stby = Ipre−stby × VDD place to calculate the total power. Considering different us-
Pact−stby = Iact−stby × VDD age conditions, the parameters as in Table 1 are needed for
371
Table 1. Parameters for DRAM Model
Typical
Symbol Comments
Values
ppre Percentage of time all anks are precharged 40%
pckpre Percentage of the PRE time when CKE is LOW 50%
pckact Percentage of the ACT time when CKE is LOW 0%
tACT The average time between ACT commands 120 ns
pREAD Percentage of CK cycles that output read data 1%
pW RIT E Percentage of CK cycles that input write data 24%
input. The total power of DRAM is:
Ptotal = Pbg + PACT + PR/W
where:
Figure 3. Flash Operation Diagram
Pbg = Ppre−pdn × ppre × pckpre
+ Ppre−stby × ppre × (1 − pckpre )
+ Pact−pdn × (1 − ppre ) × pckact where
+ Pact−stby × (1 − ppre ) × (1 − pckact ) ∗
PREAD ∗
= IW ∗
cmd+addr RIT E × VDD
+ PREF ∗
PREAD ∗
= IREAD ∗
× VDD
data
PR/W = (PREAD + PDQ ) × pREAD + PperDQ × (n∗DQ + n∗DQS )
∗
+ PW RIT E × pW RIT E ∗
PperDQ ∗
= VOU ∗
T × IOU T
∗
Most of the parameters in the equations can be found in Here, VDD is the maximum voltage which can be found in
∗ ∗ ∗ ∗
DRAM’s datasheet, others such as read/write percentage the datasheet. IW RIT E , IREAD , VOU T , IOU T , are also
∗ ∗
can be easily estimated in early design stage. provided by vendor. pREADcmd+addr and pREADdata are
the percentage of time consumed in each stage. They could
2.2 NAND Flash Power Model be calculated easily from the timing diagrams, and are al-
most constant. n∗DQ and n∗DQS are the pin number of
Compared to DRAM model, NAND Flash are simpler. outputs and their strobes.
In standby state, the Flash could not be accessed, the current
∗
Istandby is usually small enough to be ignored. In operation 2.2.2 Program Power
state, as shown in Fig.3, a typical flash operation includes at
least three stages: command, address, data. Notice that the For PROGRAM and ERASE operations, we have to include
command and address stages are almost the same for all op- the stage of status, when some specific bits of status regis-
erations (READ/PROGRAM/ERASE), except for different ters are read to confirm the success of operation.
command codes, we denote them as cmd+addr stage. PP∗ ROG = PP∗ ROGcmd+addr × p∗P ROGcmd+addr
+ PP∗ ROGdata × p∗P ROGdata
2.2.1 Read Power
+ PP∗ ROGstatus × p∗P ROGstatus
The read operation consists of only two major stages, so the
average power can be calculated according to their percent- where
ages:
PP∗ ROGcmd+addr ∗
= IW ∗
RIT E × VDD
∗
PREAD ∗
= PREAD × p∗READcmd+addr PP∗ ROGdata ∗
= IW ∗
RIT E × VDD
cmd+addr
∗
+ PREAD × p∗READdata + PperDQ × (nDQ + n∗DQS )
∗ ∗
data
1 We consider only the typical interface defined by ONFI(Open NAND Flash In-
PP∗ ROGstatus ∗
= IREAD ∗
× VDD
∗
terface). + PREAD × p∗polling
372
Table 2. Parameters for NAND Flash Model Table 3. Measured Power data of DDR DRAM
Typical Assembly Binary Current Power
Symbol Comments Operations
Values Code Code (A) (W)
p∗READ Percentage of READ operation 1% Only Bootloop 0.87 4.35
p∗P ROG Percentage of PROGRAM operation 24% No Memroy Access BRI 0 B8000000 0.9 4.5
p∗ERASE Percentage of ERASE operation 24% ADDIK r3,r0,0 30600000
IMM 10240 B0002800
Write all ”0” 1.21 6.05
SWI r3,r0,0 F8600000
Here all parameters have the same meanings with READ BRI -8 B800FFF8
operation, except for p∗polling . p∗polling is the ratio of data
ADDIK r3,r0,-1 3060FFFF
polling during the whole programming progress, it depends
on the software implementation. If user uses R/B pins to IMM 10240 B0002800
Write all ”1” 1.16 5.8
determine the end of operation, then there would be only SWI r3,r0,0 F8600000
one polling operation; otherwise user has to read the status BRI -8 B800FFF8
register time and time again to check whether the program- IMM 10240 B0002800
ming is still in progress. The maximum number of polling Read 1.35 6.75
LWI r3,r0,0 E8600000
is k = t∗status /t∗polling , where tstatus is the total time of
BRI -8 B800FFF8
status checking and tpolling is the time of one single data
polling. So, a The Working Voltage of Development Board is 5 V
t∗polling × k
p∗polling = where k ∈ [1, t∗status /t∗polling ].
t∗status
3.1 DRAM Model
2.2.3 Erase Power
We developed benchmarks in assembly within Xilinx
ERASE operation is mostly the same as PROGRAM, ex- EDK, debugged them with XMD and measured the current.
cept that ERASE doesn’t have data stage. So, First, we downloaded the bootloop program into its embed-
∗ ∗ ded CPU, measured the current of board without DRAM
PERASE = PERASE × p∗ERASEcmd+addr
cmd+addr module to gain the basic power for all other modules on
∗
+ PERASEstatus
× p∗ERASEstatus board. Then we installed the DRAM module and down-
where loaded a benchmark with one single loop, without access-
∗ ∗ ∗
ing DRAM, to measure the background power of DRAM.
PERASEcmd+addr
= IW RIT E × VDD After that, we downloaded benchmarks reading and writ-
∗ ∗ ∗
PERASEstatus
= IREAD × VDD ing DRAM to gain the related powers. Finally, we got the
∗
+ PREAD × p∗polling results in Table 3 and related powers calculated below:
Pbg = (4.5 − 4.35)/8 W = 18.75mW
2.2.4 Overall NAND Flash Power PREAD = (6.75 − 4.5)/8 W = 281.25mW
Considering the parameters in Table 2 for input, we have: PW RIT E = (6.05 + 5.8)/2 − 4.5)/8 W = 178.125mW
∗ ∗
Ptotal = PREAD × pREAD + PP∗ ROG × pP ROG Meanwhile, we got 12.9mW, 229 mW and 187.6 mW re-
∗
+ PERASE × pERASE spectively from the DRAM model. Fig. 4 shows the com-
parison of two data sets, indicating that the accuracy of
Now we can estimate the total power of NAND Flash with DRAM model could reach 95%.
only the knowledge of operation percentages.
What is more, with these two key component models, we
3.2 NAND Flash Model
could estimate the power of whole memory subsystem, re-
quiring only the hierarchy and system-level information.
In order to verify the NAND Flash power model, we
also developed benchmarks in NIOS II IDE. We used the
3 Experimental Results C APIs provided by Altera’s flash driver to access the flash,
then collected the current data in Table 4, and calculated its
For evaluation, we use Xilinx’s Vertex-II Pro Develop- powers below:
ment Board for DRAM model and Altera’s Cyclone Devel-
∗
opment Board for NAND Flash model. PREAD = (3.78 − 3.69) W = 90mW
373
Table 4. Measured Power data of NAND Flash
Operations Current(A) Power(W)
Only Bootloop 0.37 3.33
No Memroy Access 0.41 3.69
Read 0.42 3.78
Erase 0.44 3.96
Program 0.44 3.96
a The Working Voltage of Development Board is 9 V
Figure 4. Comparison of DRAM Power
for providing the Development Board and Mr. Wang Heng-
cai for providing measure devices.
∗
PW RIT E = (3.96 − 3.69) W = 270mW
∗
PERASE = (3.96 − 3.69) W = 270mW
References
Meanwhile, we got 142.3mW, 217.3mW and 217.3mW
from the NAND Flash model. The measured data is not so [1] D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a frame-
good as DRAM model, but the accuracy is still above 80%. work for architectural-level power analysis and optimiza-
After examining the data carefully, we found out that the tions. In ISCA, pages 83–94, 2000.
[2] F. Catthoor, F. Franssen, S. Wuytack, L. Nachtergaele, and
accuracy is somehow affected by the precision of our mea-
H. D. Man. Global communication and memory optimizing
suring instruments and the working voltage of Development
transformations for low-power signal processing systems.
Board: the working voltage is 9 V, while the precision of In VLSI Signal Processing Workshop, pages 178–187, Oct.
current measuring instruments is 0.01 A under 9 V , which 1994.
indicates that we can’t get power data between 180 mW and [3] J. Coburn, S. Ravi, and A. Raghunathan. Power emulation:
270 mW. Another reason may be that the software driver in a new paradigm for power estimation. page 700, 2005.
NIOS II IDE is too complicated, it may cause extra powers [4] H. Corp. DDR2 SDRAM device operation and timing dia-
during execution. gram. DateSheet, 2005.
[5] J. Janzen. Calculating memory system power for DDR
SDRAM. Designline, 10(2), 2001.
4 Summary and Conclusions [6] K. Kang, J. Kim, H. Shim, and C.-M. Kyung. Software
power estimation using ipi(inter-prefetch interval) power
model for advanced off-the-shelf processor. In GLSVLSI
This paper analyzes a mathematical power model of
’07: Proceedings of the 17th great lakes symposium on
DRAM and presents another one for NAND Flash. These Great lakes symposium on VLSI, pages 594–599, New York,
models estimate the total power of memory subsystem, re- NY, USA, 2007. ACM Press.
quiring only the percentage of different operations and the
parameters provided by memory medium datasheets . Ex-
perimental results show that the accuracy of them are 95%
and 80%. The accuracy of Flash model was affected by the
precision of our measure instruments. Compared against
other power estimation techniques and tools, the models are
simpler but accurate enough. With these two models, sys-
tem designers can make an early power estimation easily
and accurately without too much effort. Refining the model
of NAND Flash by considering more about the microoper-
ations is one of our future work.
ACKNOWLEDGEMENTS
This article was supported by 863 Research and Devel- Figure 5. Model Screenshot
opment Program. And we would like to thank Xilinx Inc.
374
[7] L. Kruse, E. Schmidt, G. Jochens, A. Stammermann, [12] E. Schmidt, G. von Cölln, L. Kruse, F. Theeuwen, and
A. Schulz, E. Macii, and W. Nebel. Estimation of lower W. Nebel. Memory power models for multilevel power es-
and upper bounds on the power consumption from sched- timation and optimization. IEEE Trans. Very Large Scale
uled data flow graphs. IEEE Trans. Very Large Scale Integr. Integr. Syst., 10(2):106–108, 2002.
Syst., 9(1):3–15, 2001. [13] A. Sinha and A. Chandrakasan. Jouletrack - a web based
[8] T. N. Mudge. Power: A first class design constraint for fu- tool for software energy profiling. In Design Automation
ture architecture and automation. In HiPC, pages 215–224, Conference, pages 220–225, 2001.
2000. [14] N. Vijaykrishnan, M. T. Kandemir, M. J. Irwin, H. S. Kim,
[9] M. Onouchi, T. Yamada, K. Morikawa, I. Mochizuki, and and W. Ye. Energy-driven integrated hardware-software op-
H. Sekine. A system-level power-estimation methodology timizations using simplepower. In ISCA, pages 95–106,
based on ip-level modeling, power-level adjustment, and 2000.
power accumulation. In ASP-DAC ’06: Proceedings of the [15] W. Wu, L. Jin, J. Yang, P. Liu, and S. X.-D. Tan. A system-
2006 conference on Asia South Pacific design automation, atic method for functional unit power estimation in micro-
pages 547–550, New York, NY, USA, 2006. ACM Press. processors. In DAC ’06: Proceedings of the 43rd annual
[10] K. K. W. Poon, S. J. E. Wilton, and A. Yan. A detailed power conference on Design automation, pages 554–557, New
model for field-programmable gate arrays. ACM Trans. Des. York, NY, USA, 2006. ACM Press.
Autom. Electron. Syst., 10(2):279–302, 2005. [16] L. Zhong, S. Ravi, A. Raghunathan, and N. K. Jha.
[11] E. Schmidt, G. Jochens, L. Kruse, F. Theeuwen, and Rtl-aware cycle-accurate functional power estimation.
W. Nebel. Automatic nonlinear memory power modelling. Computer-Aided Design of Integrated Circuits and Systems,
In DATE ’01: Proceedings of the conference on Design, IEEE Transactions on, 25(10):2103, 2006. 0278-0070.
automation and test in Europe, page 808, Piscataway, NJ,
USA, 2001. IEEE Press.
375
View publication stats