Instruction Sequencing

memory latency problem
- Wide Memories
  - Remember from chapter 3, we could use multiple banks of memory to decrease memory access time.
  - In our simple machine, what if we fetch 32 bits at a time?
  - How about 64?
  - Especially useful for instruction fetch.
  - Not bad for sequential processing of an array.
  - We do not improve the time to fetch a single instruction,
  - But we do improve the instruction bandwidth.
  - But jumps are a problem.
  - There are some problems, but this has been implemented in the IBM RS/6000
- Interleaving
  - How about multiple independant banks of memory
  - 1/n of the words.
  - This can work for both data and instruction
  - In fact, how about data in one and instructions in another.
- Instruction Prefetching
  - We have already discussed this one.
  - Possibly interpret instructions before they get to the cpu
  - This will help with unconditional branches.
- Cache
  - Small memory for instructions only built into the processor
  - Same speed as the processor
  - Principle of locality - we will execute instructions close to the last instruction we executed.
    - This works with linear code
    - But also with loops
  - A cache hit occurs when an instruction we want is in the cache
  - A cache miss occurs otherwise.
  - Cache miss is expensive
    - We must go to main memory to get the instruction
    - This will cause wait states, or a processor stall
  - Caches can be set up in a number of ways
    - Fully Associative - address and data are stored in the cache
    - Direct Mapped - each memory address goes to a unique space in cache
    - N-way set associative - each address can go to one of N slots.
    - Each of these has different performance characteristics.
  - We also need to worry about when to change entries in the cache
    - LRU - least recently used
    - FIFO
    - Random
    - Different performances, overhead, and ease of implementing.
- Data Caching
  - Like instruction cache
  - But we need to worry about writes
    - Write Through - data is written to cache and to memory
    - Write Back - data is written to cache only, and only written to memory when the cache block is replaced.
    - Again there are a number of considerations here, but we won't worry about them too much. (Until next Fall)
- Caches can be set up at different levels and speeds.
- The further from the processor, the bigger, and slower.
- Some processor information
  - P4 - 1.5 GHz - 2.4 GHz
    - 512 KB cache
    - 8 - way set associative
    - 20 stage pipeline
  - UltraSparc III
    - 700 - 900 MHz
    - 32K I
    - 64K D
    - 8 MB external
  - HPPA 8700
    - 750 MHz
    - .75 MB d cache
    - 1.5 MB I cache
    Machine Processor SPECINT 2000 SPECFP 2000
    
    Sun Blade 2050 900 MHZ 537 610
    
    HP 3700 750 MHZ 568 604
    
    Dell 340 2.2 GHz 790 811
New 2003, cache information
- 3.06 GHz XEON
  - I don't completely trust the source for this one.
  - ICache: Execution Trace Cache (12K, micro-ops)
  - ICache: L2 512K, 8-way
  - L1 dcache, 8K, 4-way
  - Specint 2000 1099
  - Specfp 2000 1053
- 1GHZ Alpha
  - 64K icache, 64K dcache
  - 8MB L2
  - Specint 2000 795
  - Specfp 2000 1124
- 1.2 GHz Ultrasparc III cu
  - 32K I cache
  - 64 K Data
  - 8MB L2
  - Specint 2000 642
  - Specfp 2000 953

Machine	Processor	SPECINT 2000	SPECFP 2000
Sun Blade 2050	900 MHZ	537	610
HP 3700	750 MHZ	568	604
Dell 340	2.2 GHz	790	811