Pipelining
- Let's look at the execution of an instruction
- Fetch the instruction
- Decode the instruction
- Fetch data
- Execute the instruction
- Store data
- What parts of our simple machine are in use durring any phase?
- If we made a change in our instruction set this would change somewhat
- The pc is independant of all other operations
- We do have some bus contention, but that can be worked out.
- We only use the ALU once
- Perhpas the registers twice
- But the memory is used multiple times
- Increase the number of registers
- The only instructions that interact with memory are load and store instructions
- Load R1 R2, R3 R1<-M[R2+R3]
- Store R1, R2, R3 M[R2+R3]<-R1
- We want two registers for array access, base plus offset
- All other operations are register register
- We would eliminate one of the memory access if we did this
- Fetch the Instruction
- Decode the instruction
- Execute the instruction
- Memory Operation
- Write Back
- Write back is for a load
- Fetch the load
- Decide it is a load
- Compute the address of the data
- load the data
- Store it in the registers
- Store
- F, D
- Compute the Address
- Save the Data
- Nothing
- Add R1, R2, R3
- F, D
- Compute R2+R3
- Nothing
- Save the result in R1
- Jump addres
- F, D,
- Compute Offset of jump
- Change PC
- Nothing
- Timing diagrams
- Instruction, time period: plot stage
- Stage, time period : plot instruction
- instruction, stage: plot time period
- Trace each way
- If we have a k-stage pipeline
- The clock rate is tp
- It can execute a single stage in tp
- It can execute a single instruction in k*tp
- It can execute two instructions in k*tp + tp
- This is really (k+n-1) * tp
- How would this work without a pipeline? n*k*tp
- One way to compute speedup is old time divided by new time.
- This would be ((k+n-1)*tp) / n*k*tp
- take out a tp, and we ale left with nk/(k+n-1).
- Take a limit as n goes to infinity, and we have k
- This is idealized, and we will see why it doesn't work in the end.