Memory

Read Page 343 for all of the 4 letter words.
- I don't expect you to memorize these, just read them through once
- And remember in 4 years it is most likely that they will all be different!
Cache
- Terms
  - Hit - we find the item we are looking for in the cache
  - Miss - we do not find the item we are looking for in cache
  - Hit Rate - percentage of times we have a cache hit (80% or even higher)
  - Miss Rate - 1-Hit Rate
  - Hit Time - or access time - the time to get an entry from cache
  - Miss Penalty - the time to get an entry from memory
- These terms actually are applicable at lower levels too, (see picture page 236) but we usually use different terms
  - Page Fault, etc for memory to disk
  - Off Line, etc for disk to other storage
- Principle of locality
  - This applies to both data and instructinos
  - This applies to both time and space
  - If we used something, we are likely to use something near it
  - Temporal locality - in time
  - Spatial Locality - in space
  - We tend to write programs that loop, or go in predictable patterns
  - We tend to access the same, or predictable memory locations (arrays)
  - Cache can exploite these patterns to attempt to keep the right thing available
- Mapping Schemes
  - The problem is we want to make a tiny bit of memory look like real memory
  - So we can't use addresses (like memory does)
  - Two and a half schemes
    - Direct Mapped Cache
      - Assume we have 32 words of main memory 00000 - 11111
      - Assume we have 4 words of cache memory 00 - 11
      - Direct mapped caching uses the bottom two (or top two or middle two, but let's be reasonable) bits of the actual address to determine where the data will be stored
      - For example 00010 01010 11110 will all be stored at cache location 10
      - And 00000 01000 11100 will all be stored at cache location 00
      - To tell which one, we store the top three bits with it in a field called a tag
      - On top of that we have a single bit, called the valid bit, to tell if the cache entry has something stored in it.
      - To check for a word tttmm
        
        Check the tag at cache location mm to see if it is ttt
        Check the valid bit at cachelocation mm to see if it is 1
        If both conditions are met, return the data,
        Otherwise fetch the data from memory location tttmm and
        
        Store it at cache location mm
        Return it to the CPU
      - In this case, if we want to find something in cache when there is already something there, it is called a collision, the cache is too small for what we are trying to do. (this is a cache miss)
      - If we are looking for data and nothing is in the cache, we have encountered a compulsory miss, (no way around this one)
      - Trace the fetch of memory addresses 00000, 00001, 11010, 00010, 11100, 0000
      - In this scheme, every memory location is mapped to a single cache entry
      - But each cache entry can be mapped to from multiple memory locations
      - We know where to look, but it might not be there
    - Fully Associative Cache
      - Collisions are a problem with direct mapped cache.
        
        What if we are accessing a set of instructions at 0000xxx
        And a set of data at 1111xxx?
        We would have collisions even though cache was mostly unused
      - So how about placing things anywhere in cache?
      - The tag field needs to become the full size of an address
      - We still need a data field and a valid bit
      - To check for something
        
        Search all of the tags to see if they match the address we are looking for
        If not, replace one, with the new data, and send it to the cpu
      - In this case, we might not have room for everything we want to store in cache, and this is called a capacity miss.
      - Replacement Schemes
        
        A victim block must be found to replace with new data
        Multiple schemes exist
        We want to eliminate the one that will never be used again, or at least is needed furthest in the future
        This is hard (impossible without running the program to see what it is)
        So we can pick the least reciently used (but this requires us to keep a time stamp on cache entries and search when looking for a space to replace them)
        Or use First In First Out
        Or random (which is just about as good as the others, but is really inexpensive to implement)
      - Trace the fetch of memory addresses 00000, 00001, 11010, 00010, 11100, 0000
    - N-Way Set Associative Cache
      - Fully Associative needs to be searched.
      - So N-Way set associative cache can be formed.
      - In this case, assume we have 8 words in our cache.
      - We will arrange them in 4 groups of two words
      - Each word will still have a tag and a valid bit
      - But words ending in 00 can be stored in location 000 or 100
      - This is 2-way set assocative.
      - It is subject to all of the above miss problems, but not as bad
      - It is subject to replacement policy problems above, but not as bad
      - Trace the fetch of memory addresses 00000, 00001, 11010, 00010, 11100, 0000
  - These schemes can work with larger chunks of memory too.
- Writing to Cache
  - Two schemes
  - Write Through
    - Each time we write to cache, write through to memory as well
    - No book keeping
    - Is slower
    - And we might perform unnecessairy memory writes
  - Write Back
    - Only write to memory when the entry is to be removed from the cache
    - Needs a dirty bit to indicate that it needs to be written
    - It suffers a double penalty - write old data, read new data
    - But does not perform unnecessairy writes when variables are changing often.
- No matter what, you can write a program that messes up cache if you really try. But they are getting smarter.
- Access Time = Hit Ration * Hit Time + (1-Hit Ration) * Miss Penalty
  - This gets worse when considering writes.
  - And you like math so well so ...
  - Cache Hit rate of 85%, Access time 5 ns, Miss Penalty 25 ns
  - AT = .85*5+.15*25 = 8 ns
  - But it could be placed into our model for processor performance.
  - And we would use the 8ns as our memory speed with cache
Everyone read 6.0 - 6.4.5
Zimmer's class read 6.5 - end