Pipelining

Let's look at the execution of an instruction
- Fetch the instruction
- Decode the instruction
- Fetch data
- Execute the instruction
- Store data
What parts of our simple machine are in use durring any phase?
If we made a change in our instruction set this would change somewhat
- The pc is independant of all other operations
- We do have some bus contention, but that can be worked out.
- We only use the ALU once
- Perhpas the registers twice
- But the memory is used multiple times
- Increase the number of registers
- The only instructions that interact with memory are load and store instructions
  - Load R1 R2, R3 R1<-M[R2+R3]
  - Store R1, R2, R3 M[R2+R3]<-R1
  - We want two registers for array access, base plus offset
- All other operations are register register
  - Add R1, R2, R3 R₁<-R₂+R₃
- We would eliminate one of the memory access if we did this
  - Fetch the Instruction
  - Decode the instruction
  - Execute the instruction
  - Memory Operation
  - Write Back
- Write back is for a load
  - Fetch the load
  - Decide it is a load
  - Compute the address of the data
  - load the data
  - Store it in the registers
- Store
  - F, D
  - Compute the Address
  - Save the Data
  - Nothing
- Add R1, R2, R3
  - F, D
  - Compute R2+R3
  - Nothing
  - Save the result in R1
- Jump addres
  - F, D,
  - Compute Offset of jump
  - Change PC
  - Nothing
Timing diagrams
- Instruction, time period: plot stage
- Stage, time period : plot instruction
- instruction, stage: plot time period
  - Trace each way
  - If we have a k-stage pipeline
    - The clock rate is t_p
    - It can execute a single stage in t_p
    - It can execute a single instruction in k*t_p
    - It can execute two instructions in k*t_p + t_p
    - This is really (k+n-1) * t_p
    - How would this work without a pipeline? n*k*t_p
    - One way to compute speedup is old time divided by new time.
    - This would be ((k+n-1)*t_p) / n*k*t_p
    - take out a t_p, and we ale left with nk/(k+n-1).
    - Take a limit as n goes to infinity, and we have k
    - This is idealized, and we will see why it doesn't work in the end.