Speed and Size¶

约 2625 个字 46 张图片预计阅读时间 13 分钟

处理器速度 <= 高度相关 => 寄存器速度

Memory Technology¶

SRAM¶

value is stored on a pair of inverting gates 两个反相器
very fast but takes up more space than DRAM

alt text

DRAM¶

Value is stored as a charge on capacitor 电容器
Very small but slower than SRAM (factor of 5 to 10)
Must periodically be refreshed 必须定期刷新
Read contents and write back (destructive read) 破坏性读取

Bits in a DRAM are organized as a rectangular array

DRAM accesses an entire row

Burst mode: supply successive words from a row with reduced latency (SDRAM)

从一行中连续读取比较快

数字逻辑

高速的口一般是串行的，接口很少 <== 保证多条线同时到达目标

Flash¶

Nonvolatile semiconductor storage 非易失半导体

只能从1写到0，擦除代价大

NOR flash: bit cell like a NOR gate
- Random read/write access
- Used for instruction memory in embedded systems
NAND flash: bit cell like a NAND gate
- Denser (bits/area), but block-at-a-time access
- Cheaper per GB
- Used for USB keys, media storage, …
Flash bits wears out after 1000’s of accesses
- Not suitable for direct RAM or disk replacement
- Wear leveling: remap data to less used blocks

Disk¶

访问时间的计算：

每次读取一个sector扇区的数据

Memory Hierarchy Introduction¶

locality

Temporal locality 时间局部性
- Items accessed recently are likely to be accessed again soon
- e.g., instructions in a loop, induction variables
Spatial locality 空间局部性
- Items near those accessed recently are likely to be accessed soon
- E.g., sequential instruction access, array data
Copy recently accessed (and nearby) items from disk to smaller DRAM memory Main memory 主存
Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory Cache memory attached to CPU 缓冲

cache 在CPU内部

概念

alt text

Block : unit of copying 搬运的最小单位可能是一个word或者多个word
Hit : If accessed data is present in upper level 两层：从硬盘到内存，从内存到cache
- Hit Time : The time to access the upper level of the memory hierarchy, which includes the time needed to determine whether the access is a hit or a miss.
- Hit ratio: hits/accesses
- Miss : block copied from lower level
- Miss ratio : misses/accesses = 1 – hit ratio
- Miss penalty : The time to replace a block in the upper level with the corresponding block from the lower level, plus the time to deliver this block to the processor. 把数据从底层拿到高层，之后再传给processor

alt text

The basic of Cache¶

SRAM and DRAM (main memory)

Direct mapped¶

alt text

(Block address) modulo (Number of blocks in the cache)

固定的位置 $\text{(Block address) \% (Number of blocks in the cache)} $

Tag 区分存放的哪个数据

Store block address as well as the data
Actually, only need the high-order bits

Valid Bits 这个Block是否为空

Valid bit: 1 = present, 0 = not present 最初设置为0

Cache Block Address

alt text

example

alt text

理解cache中的各个位

alt text

上面的部分是 Memory address（Tag + Index)，下面的才是cache中的(这里Cache Address就是它的索引Index)。这里每一个Block有四个word，需要两位来定位word，每个word有四个字节，需要两位来定位byte，故而Byte offset = 4

index 在cache中寻找block，其大小由cache size决定

Offset 用于去在data中寻址。注意寻址方式，一般是字节寻址，但是很多题目会是word寻址，同时也看清楚给的地址是以字节址还是Word。比如，一个 4-word blocks，就需要 index 先去寻找block，再由offset分别确定word和byte，故需要4bit

一个cache slot实际的size是 data size of a block + Tag + Valid，但是提及某个cache大小时，都是只计算Data部分

Tag 由 memory address 减去 Index、Byte Offset计算

Total cache size: Block num(2^n) * (data size + tag size + valid bit)

计算

cache的命名不考虑标签和有效位的大小，只考虑数据的大小

example

alt text

Handling Cache reads hit and Misses

Read
- Read Miss 暂停处理器，直到从内存中返回数据
  - instruction cache miss
  - data cache miss

detail

1. Send the original PC value (current PC-4) to the memory.

2. Instruct main memory to perform a read and wait for the memory to complete its access. (in multiple cycles)

3. Write the cache entry, putting the data from memory in the data portion of the entry, writing the upper bits of the address (from the ALU) into the tag field, and turning the valid bit on.

4. Restart the instruction execution at the first step, which will refetch the instruction again, this time finding it in the cache.

Write hits 不同的写策略详细内容见后
- write-back: Cause Inconsistent Wrote the data into only the data cache
- write-through: Ensuring Consistent
Write Miss

Deep Concept¶

引入

alt text

1-word block

alt text

Block Placement¶

alt text

Fully Associative¶

Block can go anywhere in cache

Set Associative¶

Block can go in one of a set of places in the cache.

A set is a group of blocks in the cache.

Block address MOD Number of sets in the cache 也就是针对上面的index在这里会是set index

If sets have n blocks, the cache is said to be n-way set associative. 注意这里是一个组有n个Block则称之为N-way set而非 n 个set

Block identification¶

基本相同

Block replacement¶

Random replacement
- randomly pick any block
Least-recently used (LRU)
- pick the block in the set which was least recently accessed

Assumed more recently accessed blocks more likely to be referenced again

This requires extra bits in the cache to keep track of accesses.

First in,first out(FIFO)
- Choose a block from the set which was first came into the cache

Write Strategy¶

write hit

If the data are written to memory, the cache is called a write-through cache
- Can always discard cached data - most up-to-date data is in memory 在Disk中总是最新的数据
- Cache control bit: only a valid bit
- memory (or other processors) always have latest data
- 读取miss不会影响write
If the data are NOT written to memory, the cache is called a write-back cache
- dirty page 存在,不能直接discard
- Cache control bits: both valid and dirty bits 多了Dirty bit
- much lower bandwidth, since data often overwritten multiple times

这两种策略，都会使用Buffer。

Write-Through 使用Buffer是为了降低不断写回下一层的cost，使用的是下面的 Write Buffer，数据写入Cache的同时，也写入buffer

Write-Back 也有写缓冲，Buffer

Write stall

When the CPU must wait for writes to complete during write through

Write buffers

A small cache that can hold a few values waiting to go to main memory.
This buffer helps when writes are clustered.
如果一次写入的量很大，很可能超过buffer的容量

alt text

增加了数据访问的复杂度，需要检测数据是否在write buffer中，否则并行的IC数据访问读取的是旧数据

Write miss

看是否写回到cache中去

**Write allocate **

The block is loaded into the cache on a miss before anything else occurs.

**Write around (no write allocate) **
- The block is only written to main memory
- It is not stored in the cache.
In general, write-back caches use write-allocate , and write-through caches use write-around.

Designing the Memory system to Support Cache¶

alt text

对主存的加速。单纯的One-word-wide架构，由于对DRAM取数据很耗时，会造成很多时间浪费

概念

Bandwidth 带宽，是一个时钟周期能访问的Bytes数目 $Bandwidth = \frac{Bytes}{CLK}$

miss penalty 移动一个Block的代价

example

增大 Memory 位宽

如果位宽增加，单位时间内传输的Block数目增加

增加Memory数量（类似并行

4 banks Interleaved Memory

但是bus的位宽没有变化，导致通过bus传输数据需要等待

Measuring and improving cache performance¶

The main contents are the following:

Measuring cache performance
Reducing cache misses by more flexible placement of blocks
Reducing the miss penalty using multilevel caches

$$ \text{Average Memory Assess Time (AMAT) = hit time + miss time}\ \text{= hit rate × Cache time + miss rate ×memory time} $$

Measuring¶

我们使用CPU Time来衡量 how good it works

$$ {CPU Time} = \text{CPI * I * CPU cycle time} \ =\text{CPU Excution Time} + \text{CPU Memory-stall Time} $$

$$ \text{CPU Memory-Stall Time} = \text{Read-stall+Write-stall}\ = #of instructions\text{ * miss ratio * miss penalty} $$

For Read

$\text{Read-stall} = \frac{Read}{Program}*\text{Read miss rate * Read miss penalty}$

For Write

$Write-stall = \frac{Write}{Program}*\text{Write miss rate * Write miss penalty} + Write Buffer$

If the write buffer stalls are small, we can safely ignore them .
If the cache block size is one word, the write miss penalty is 0.
通常情况下都可以忽略 <== 设计的较好的CPU一般不会满

一定要注意Write的类型，针对write hit

Write-through Write-back

还有miss的应对

Write-allocate Write-around

计算

Miss
- Instruction miss
- Data miss

Improving¶

提高命中率
减少 miss 代价

映射 Block Placement¶

相联度和容量不是相互独立的

Tag计算

主存中的Block Address = Index + Tag

前面已经写的比较详细了

Decreasing miss penalty with multilevel caches¶

Add a second level cache:

often primary cache is on the same chip as the processor 一级缓存在处理器的芯片上
use SRAMs to add another cache above primary memory (DRAM)
miss penalty goes down if data is in 2^nd level cache

一级和二级的cache关注的目标不同：

一级 hit time，降低失效代价
二级 miss rate 相联度更高

如果第一级的cache失效，就会访问第二级的cache，如果成功，那么第一级的penalty就是第二级的访问时间；... (这里有点疑问)

例题

Main Memory 的失效代价：访问Main Memory 的时间
总CPI = 基准CPI + 各级代价注意计算的是什么单位

Local/Global miss rate

实际做题的时候,有时候算L2-cache失效的CPI需要在第一级失效的基础上进行计算,那时候第二级的miss rate实际上就是local miss rate

Virtual Memory¶

Main Memory act as a “Cache” for the secondary storage.

Translation of a program’s address space to physical address

Motivation:

Efficient and safe sharing of memory among multiple programs.（多个Va映射到同一个Pa）
Remove the programming burdens of a small, limited amount of main memory.

Larger number of virtual pages than physical pages 现如今不再是这样

个人设备中，通常是DRAM和闪存充当两级存储层次结构
illusion of having more physical memory
program relocation
protection

Page faults | 缺页¶

The data is not in memory, retrieve it from disk

huge miss penalty, thus pages should be fairly large (e.g., 4KB)
reducing page faults is important (LRU is worth the price)
can handle the faults in software instead of hardware 通过软件解决缺页问题 Cache是硬件
using write-through is too expensive so we use write back 同时写到硬盘和主存代价太大

Page Table | 页表¶

Page Table 包含了所有的对应关系,不需要Tag

Page offset 计算：由整个Page size决定， = log(Page Size)

页表指针 Page Table Register

在Memory中存储 indexed by the virtual page number
Each Entry in the table contains the physical page number for that virtual pages if the page is current in memory, 即使不在Main Memory中,也会保留一个物理地址(In Disk)
Page table, Program counter and the page table register, specifies the state of the program. Each process has one page table. 如果转换进程，要保存当前进程的状态 包括页表、程序计数器、寄存器
操作系统只分配物理内存地址，不保存页表。
Each program has its own page table 每一个进程有自己的页表。
Size
- Virtual Address - Page Offset = #bits(Virtual Page Number)

操作系统 OS

When the OS creates a process, it usually creates the space on disk for all the pages of a process.
- When a page fault occurs, the OS will be given control through exception mechanism.
- The OS will find the page in the disk by the page table.
- the OS will bring the requested page into main memory. If all the pages in main memory are in use, the OS will use LRU strategy to choose a page to replace

Virtual Address

virtual page number
Page offset

Write

use write-back strategy. To do so, the machines need add a dirty bit to the entry of page table.
The dirty bit is set when a page is first written. If the dirty bit of a page is set, the page must be written back to disk before being replaced.

TLB¶

相当于Page Table的一个cache

Without a TLB, almost every memory access would require two accesses to RAM: An access to the page table, followed by an access to the requested data.

使用Page Table中的Valid标记是否在主存中
TLB一般采用全相联
硬件/软件处理失效，大多数MIPS使用软件

Combination of Miss

alt text

例题¶

Note

大部分地址都是从00开始的，别忘了0的地址。

计算
- cache
  - cache的size
  - 多重cache的performance measure
- Virtual Memory
  - Page Table、TLB 是如何work的，可能要给出流程
  - Page Table 的size

Write back / through概念¶

If all blocks in the higher level cache are also present in the lower level cache, then the lower level cache is said to be inclusive of the higher level cache.
If the lower level cache contains only blocks that are not present in the higher level cache, then the lower level cache is said to be exclusive of the higher level cache.