$\require{cancel}$

Cache Basics

This is section 5.3
There are two basic types of cache
- Direct mapped cache: an assigned parking space.
- Fully associative cache : just park wherever, we will help you find your car.
In direct mapped cache each piece of data can only go to one space.
- Assume that the cache has $2^n$ entries.
- n is probably small compared to main memory.
- From an intel page:
  - L1 Data 32KB per core,
  - L1 Instruction 32KB per core
  - For a 4 core chip, 256KB
Generally caches follow the memory triangle. (larger/slower/cheaper).
Notice the different caches for instruction and for data.
Let's assume an address of 10 bits and a cache size of $2^4$.
- This means that there are 16 cache locations (0000 through 1111)
- We will use the bottom 4 bits of an address to determine where an item goes in cache.
- We will record the top 6 bit as a tag.
- Bits 9 through 4 TAG Bits 3 through 0 cache address
A tag is a field in a table use for memory hierarchy that contains the address information required to identify whether the associated block in the hierarchy corresponds to a requested word.
The example continues
- Where would the data from address 0x24F be found?
- Where would the data from address 0x032 be found?
- Where would the data from address 0x03F be found?
When looking to use the cache, we need to know if the data in the cache location is valid or not.
- To do this, we use a valid bit.
- If the bit is set the entry is valid.
- We initialize the cache with all valid bits set to false.

When data is requested from cache (given a memory address)

Calculate the cache address and tag.
if cache[index].valid != valid
   fail
else if cache[index].tag == tag
   return cache[index].value
else
   fail

What do we do on a fail?
- Retrieve the data from memory
- Store it in the cache.
- And return it to the CPU.
They note that is MIPS the memory is byte addressable, but all memory transactions are on word addresses.
- So with a 1024 entry cache
- 10 bits are used for the index.
- 2 bits can be ignored as they are 0
- 20 bits are used for the tag.
- And 1 bit for the valid bit
- Look at the picture on page 407.
This is good, as it overcomes fetching the same data twice, but we can do better.
If we take 2 bits from the address (bottom) to use for an offset into the cache table.
- we can store 4 adjacent words in each line of cache.
  
  TAG INDEX OFFSET 00
- When we fetch any word in the block, we fetch the entire block.
- Then locality says we will probably use the three words around the word we fetched.
- So if we use the word at address 0x??????40, we will also use the words at 0x??????44, 0x??????48 and 0x???????4c
- See figure 5.12 o page 414.
What happens when we write to cache?
- There are two schemes
  - write through - write to the cache and to memory
  - write back - write to the cache and only write to memory when the cache entry is invalidated.
- Each of these has advantages and disadvantages speed wise.
- And multiprocessing makes it even more complex