流水线CPU实验报告
1.实验要求
在单周期的基础上,实现流水线CPU
并且实现处理数据冒险,控制冒险等
该实验是分层次递进的,分为以下几个阶段
- 流水线CPU(不考虑冒险)
- 流水线CPU(EXE级数据冒险1)
- 流水线CPU(EXE级数据冒险2)
- 流水线CPU(EXE级数据冒险3)
- 流水线CPU(J,JAL,JR控制冒险)
- 流水线CPU(BEQ指令控制冒险)
- 流水线CPU(BEQ指令引入的ID级数据冒险)
2.实验过程
2.1 基础模块
单周期的cpu部分已经介绍的一些模块,在流水线CPU中已经介绍的很详细了,故此处不多赘述
下面是在流水线CPU中需要增加的模块
2.1.1 IF_ID级流水线寄存器
基本功能
保存当前指令字段和产生的信号
模块信号
这部分信号比较多,而且输入输出基本相同,和单周期相同的就省略了,底下几个与它差不多,时间比较紧,就不详细写了
信号名 | 描述 |
---|
s_data_write | 写寄存器数据选择信号 | s_ext | 扩展信号 | s_b | alu多选器选择信号 | aluop | alu控制信号 | s_num_write | 写寄存器号 | mem_write | 储存器写使能信号 | reg_write | 寄存器写使能信号 | s_npc | 下地址选择信号 | mem_read | 存储器读信号 | rd | rd字段 | rt | rt字段 | rs | rs字段 | Imm26 | 26位立即数 | Imm16 | 16位立即数 | pc_out | pc的输出信号 |
模块代码
module if_id (
input if_id_flush,
input [31:0] pc_4,
input [1:0] s_npc_in,
output reg [1:0] s_npc_out,
input reg_write_in,
input s_ext_in,
input [1:0] s_num_write_in,
input [1:0] s_data_write_in,
input mem_write_in,
input mem_read_in,
output reg mem_read_out,
output reg reg_write_out,
output reg s_ext_out,
output reg [1:0] s_num_write_out,
output reg [1:0] s_data_write_out,
output reg mem_write_out,
input [3:0] aluop_in,
output reg [3:0] aluop_out,
input s_b_in,
output reg s_b_out,
input [31:0] instruction,
output reg [31:0] pc_4_out,
output reg [15:0] Imm_16,
output reg [25:0] Imm_26,
output reg [4:0] rs,
output reg [4:0] rd,
output reg [4:0] rt,
input clock,
input reset,
input if_id_write);
always@(posedge clock , negedge reset)
begin
if (!reset )
begin
pc_4_out <= 32'h0000_0000;
Imm_16 <= 16'h0000;
Imm_26 <= {26{1'b0}};
rs = 5'b00000;
rt = 5'b00000;
rd = 5'b00000;
mem_read_out=0;
s_npc_out=2'b01;
reg_write_out=1;
mem_write_out=0;
s_num_write_out=2'b00;
end
else
begin
if(if_id_flush)
begin
pc_4_out <= 32'h0000_0000;
Imm_16 <= 16'h0000;
Imm_26 <= {26{1'b0}};
rs = 5'b00000;
rt = 5'b00000;
s_num_write_out=2'b00;
mem_read_out=0;
s_npc_out=2'b01;
reg_write_out=1;
mem_write_out=0;
end
else if(if_id_write)
begin
s_npc_out=s_npc_in;
mem_read_out=mem_read_in;
aluop_out=aluop_in;
s_b_out=s_b_in;
reg_write_out=reg_write_in;
s_ext_out=s_ext_in;
s_num_write_out=s_num_write_in;
s_data_write_out=s_data_write_in;
mem_write_out=mem_write_in;
pc_4_out = pc_4;
Imm_16 = instruction[15:0];
Imm_26 = instruction[25:0];
rs = instruction[25:21];
rt = instruction[20:16];
rd = instruction[15:11];
end
end
end
endmodule //if_i
2.1.2 ID_EXE级流水线寄存器
基本功能
保存ID级输出信号和数据
输出EXE级信号和数据
可以看到,在ID级就使用的信号是不需要传入该寄存器的,但是后续还有使用的信号需要传入
模块信号
信号名 | 描述 |
---|
data_1_out | 读取数值1 | data_2_out | 读取数值2 |
模块代码
`include "ctrl_encode_def.v"
module id_exe (input [31:0] pc_4,
input [3:0] aluop_in,
input s_b_in,
output reg s_b_out,
input reg_write_in,
output reg reg_write_out,
output reg [3:0] aluop_out,
input [1:0] s_data_write_in,
output reg [1:0] s_data_write_out,
input mem_write_in,
output reg mem_write_out,
input mem_read_in,
output reg mem_read_out,
input [31:0] data_1,
input [31:0] data_2,
input [31:0] Imm_32,
input [4:0] num_write_in,
output reg [31:0] pc_4_out,
output reg [31:0] data_1_out,
output reg [31:0] data_2_out,
output reg [31:0] Imm_32_out,
output reg [4:0] num_write_out,
input [4:0] rs_in,
input [4:0] rt_in,
output reg [4:0] rs_out,
output reg [4:0] rt_out,
input clock,
input reset,
input id_exe_flush);
always@(posedge clock,negedge reset)
if (!reset )
begin
pc_4_out = {32{1'b0}};
data_1_out = {32{1'b0}};
data_2_out = {32{1'b0}};
Imm_32_out = {32{1'b0}};
num_write_out={5{1'b0}};
reg_write_out=0;
mem_write_out=0;
s_data_write_out=2'b00;
end
else if (id_exe_flush)
begin
mem_read_out=0;
reg_write_out=0;
// s_b_out=1;
// s_data_write_out=2'b00;
aluop_out<=`ALUOp_NOP;
rs_out={5{1'b0}};
rt_out={5{1'b0}};
pc_4_out = {32{1'b0}};
data_1_out = {32{1'b0}};
data_2_out = {32{1'b0}};
Imm_32_out = {32{1'b0}};
num_write_out={5{1'b0}};
mem_write_out=0;
num_write_out=0;
end
else
begin
mem_read_out=mem_read_in;
rs_out=rs_in;
rt_out=rt_in;
reg_write_out=reg_write_in;
s_b_out=s_b_in;
s_data_write_out=s_data_write_in;
mem_write_out=mem_write_in;
pc_4_out = pc_4;
data_1_out = data_1;
data_2_out = data_2;
Imm_32_out = Imm_32;
num_write_out = num_write_in;
aluop_out=aluop_in;
end
endmodule //id_exe
2.1.3 EXE_MEM级流水线寄存器
基本功能
和之前差不多
模块信号
和前面基本一样
模块代码
module exe_mem (input [31:0] pc_4,
input [31:0] alu_in,
input [31:0] data_in,
input [1:0] s_data_write_in,
input mem_write_in,
input reg_write_in,
output reg reg_write_out,
output reg [1:0] s_data_write_out,
output reg mem_write_out,
input mem_read_in,
output reg mem_read_out,
input [4:0] num_write_in,
output reg [31:0] pc_4_out,
output reg [31:0] alu_out,
output reg [31:0] data_out,
output reg [4:0] num_write_out,
input clock,
input reset);
always@(posedge clock,negedge reset)
if (!reset)
begin
pc_4_out = {32{1'b0}};
alu_out = {32{1'b0}};
data_out = {32{1'b0}};
num_write_out = {5{1'b0}};
mem_read_out=0;
mem_write_out=0;
reg_write_out=0;
s_data_write_out=2'b00;
end
else
begin
mem_read_out=mem_read_in;
reg_write_out=reg_write_in;
s_data_write_out=s_data_write_in;
mem_write_out=mem_write_in;
pc_4_out = pc_4;
alu_out = alu_in;
data_out = data_in;
num_write_out = num_write_in;
end
endmodule //exe_mem
2.1.4 MEM_WB级流水线寄存器
基本功能
和之前一样
模块信号
基本差不多
模块代码
module mem_wb (input [31:0] pc_4,
input [31:0] alu_in,
input [31:0] mem_in,
input [4:0] num_write_in,
input [1:0] s_data_write_in,
input reg_write_in,
output reg reg_write_out,
output reg [1:0] s_data_write_out,
output reg [31:0] pc_4_out,
output reg [31:0] alu_out,
output reg [31:0] mem_out,
output reg [4:0] num_write_out,
input clock,
input reset);
always@(posedge clock,negedge reset)
begin
if (!reset)
begin
pc_4_out = {32{1'b0}};
alu_out = {32{1'b0}};
mem_out = {32{1'b0}};
num_write_out = {5{1'b0}};
reg_write_out=0;
s_data_write_out=2'b00;
end
else
begin
reg_write_out=reg_write_in;
s_data_write_out=s_data_write_in;
pc_4_out = pc_4;
alu_out = alu_in;
mem_out = mem_in;
num_write_out = num_write_in;
end
end
endmodule //mem_w
2.1.5 SIDE_ROAD旁路信号单元
基本功能
实现输出旁路信号,正确选择源数据
模块信号
信号名 | 描述 |
---|
s_forwardA2 | 读数据选择源A | s_forwardB2 | 读数据选择源B | s_forwardA3 | ALU旁路选择数据源A | s_forwardB3 | ALU旁路选择数据源B |
模块代码
`include "ctrl_encode_def.v"
module side_road (input [4:0] rs,
input [4:0] rt,
input [4:0] rs_in,
input [4:0] rt_in,
input [4:0] rd_exe_mem,
input [4:0] rd_mem_wb,
input [4:0] num_write_out,
input reg_write_exe_mem,
input reg_write_mem_wb,
input reg_write_id_exe,
output reg [1:0] s_forwardA3,
output reg [1:0] s_forwardB3,
output reg [1:0] s_forwardA2,
output reg [1:0] s_forwardB2);
wire flag_rs_exe_mem;
wire flag_rs_mem_wb;
wire flag_rt_exe_mem;
wire flag_rt_mem_wb;
wire flag_rt_write;
wire flag_rs_write;
assign flag_rs_exe_mem = (reg_write_exe_mem
&& (rd_exe_mem !== 5'b00000)
&& (rd_exe_mem === rs));
assign flag_rs_mem_wb = (reg_write_mem_wb
&& (rd_mem_wb!== 5'b00000)
&& rd_mem_wb === rs);
assign flag_rt_exe_mem = (reg_write_exe_mem
&& (rd_exe_mem !== 5'b00000)
&& (rd_exe_mem === rt));
assign flag_rt_mem_wb = (reg_write_mem_wb
&& (rd_mem_wb!== 5'b00000)
&& rd_mem_wb === rt);
assign flag_rt_write=(reg_write_id_exe
&&(num_write_out!==5'b00000)
&& num_write_out===rt_in);
assign flag_rs_write=(reg_write_id_exe
&&(num_write_out!==5'b00000)
&& num_write_out===rs_in);
always@(*)
begin
s_forwardA3=(flag_rs_mem_wb)?
(flag_rs_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_MEM_WB):
(flag_rs_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_GPR);
s_forwardB3=(flag_rt_mem_wb)?
(flag_rt_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_MEM_WB):
(flag_rt_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_GPR);
s_forwardA2=(flag_rs_write)?`SIDE_ROAD_MEM_WB:`SIDE_ROAD_GPR;
s_forwardB2=(flag_rt_write)?`SIDE_ROAD_MEM_WB:`SIDE_ROAD_GPR;
end
endmodule //side_roa
2.1.6 HAZARD 阻塞信号产生单元
基本功能
产生阻塞信号,阻塞流水线,使各数据选择正确,避免数据冒险
模块信号
信号名 | 描述 |
---|
pc_write | pc写使能信号 | if_id_write | if_id级写使能信号 | id_exe_flush | id_exe级阻塞信号 | if_id_flush | if_id级阻塞信号 |
模块代码
module hazard(input mem_read,
input [4:0] rs,
input [4:0] rt,
input [4:0] rt_id_exe,
output reg pc_write,
output reg if_id_write,
output reg id_exe_flush,
input reset,
input clock,
input [1:0] s_npc,
output reg if_id_flush);
wire flag;
assign flag = (mem_read)&& (rt_id_exe == rs || rt_id_exe == rt)&&(rt_id_exe!={5{1'b0}});
assign flag_j=(s_npc==2'b00)||(s_npc==2'b10);
always@(*)
if (!reset)
begin
pc_write = 1;
if_id_write = 1;
id_exe_flush = 0;
if_id_flush=0;
end
else
begin
begin
if (flag)
begin
pc_write = 0;
if_id_write = 0;
id_exe_flush = 1;
end
else
begin
pc_write = 1;
if_id_write = 1;
id_exe_flush = 0;
end
if(flag_j)
begin
if_id_flush=1;
end
else
begin
if_id_flush=0;
end
end
end
endmodule
2.2 冒险控制过程
2.2.1 EXE数据冒险1
发生原因
EXE级需要从寄存器堆读数据,如果读取时数据不是最新数据,就会产生数据冒险
解决方法
设置旁路,选择合适的alu输入
以旁路判断条件为
assign flag_rs_exe_mem = (reg_write_exe_mem
&& (rd_exe_mem !== 5'b00000)
&& (rd_exe_mem === rs));
assign flag_rs_mem_wb = (reg_write_mem_wb
&& (rd_mem_wb!== 5'b00000)
&& rd_mem_wb === rs);
assign flag_rt_exe_mem = (reg_write_exe_mem
&& (rd_exe_mem !== 5'b00000)
&& (rd_exe_mem === rt));
assign flag_rt_mem_wb = (reg_write_mem_wb
&& (rd_mem_wb!== 5'b00000)
&& rd_mem_wb === rt);
s_forwardA3=(flag_rs_mem_wb)?
(flag_rs_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_MEM_WB):
(flag_rs_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_GPR);
s_forwardB3=(flag_rt_mem_wb)?
(flag_rt_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_MEM_WB):
(flag_rt_exe_mem?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_GPR);
测试结果及仿真波形
测试代码为
addu $5,$6,$7
sub $8,$9,$5
or $2,$5,$3
or 指令发生数据冒险
在时钟周期3,4时发生旁路,数据选择exe级输出和mem级输出
测试结果运算正确
2.2.2 EXE数据冒险2
发生原因
寄存器新值是从DM中读出的数据
解决方法
产生一个阻塞信号,先将流水线阻塞一个周期,再运行
这个阻塞信号将暂停pc改变。将if_id,id_exe控制信号清零
else if (id_exe_flush)
begin
mem_read_out=0;
reg_write_out=0;
// s_b_out=1;
// s_data_write_out=2'b00;
aluop_out<=`ALUOp_NOP;
rs_out={5{1'b0}};
rt_out={5{1'b0}};
pc_4_out = {32{1'b0}};
data_1_out = {32{1'b0}};
data_2_out = {32{1'b0}};
Imm_32_out = {32{1'b0}};
num_write_out={5{1'b0}};
mem_write_out=0;
num_write_out=0;
end
测试结果及仿真波形
测试例子为
lw $5,8($6)
sub $8,$9,$5
sub 指令发生数据冒险
在第三个时钟周期,发生阻塞
2.2.3 EXE数据冒险3
发生原因
寄存器在WB周期结束时才能写入引起的数据冒险
1.课本中的理想化的寄存器堆采用前半个周期写后半个周期读,所以不存在这种数据冒险。
2.实际实现中,寄存器堆在WB级的结束时才写入值,所以add指令在clock5读取的**$5**号寄存器的值是旧值,存在数据冒险。
解决方法
在id级产生两个旁路信号,选择正确的数据源
产生条件是
assign flag_rt_write=(reg_write_id_exe
&&(num_write_out!==5'b00000)
&& num_write_out===rt_in);
assign flag_rs_write=(reg_write_id_exe
&&(num_write_out!==5'b00000)
&& num_write_out===rs_in);
s_forwardA2=(flag_rs_write)?`SIDE_ROAD_MEM_WB:`SIDE_ROAD_GPR;
s_forwardB2=(flag_rt_write)?`SIDE_ROAD_MEM_WB:`SIDE_ROAD_GPR;
测试结果及仿真波形
测试程序
addu $5,$6,$7
sub $8,$9,$10
or $2,$1,$3
add $11,$12,$5
在第五个时钟周期时,发生旁路,第二个读出数据选则写入gpr 的值
此时数据正确选择
得到正确结果
2.2.4 J,JAL,JR控制冒险
发生原因
当j型指令确定了跳转地址之后,后面一条指令已经被取值,所以必须增加一个flush信号将其清理
解决方法
增加一个清零信号
if(if_id_flush)
begin
pc_4_out <= 32'h0000_0000;
Imm_16 <= 16'h0000;
Imm_26 <= {26{1'b0}};
rs = 5'b00000;
rt = 5'b00000;
s_num_write_out=2'b00;
mem_read_out=0;
s_npc_out=2'b01;
reg_write_out=1;
mem_write_out=0;
end
测试结果及仿真波形
测试例子为
add $4,$8,$9
j abc
or $6,$1,$3
abc:
add $11,$12,$5
在第三个时钟周期,产生了一个flush信号
这个信号将已经取得的指令清零
最终,可以观察到pc值正确,正确跳转到相应地址
与mars中结果一致
2.2.5 BEQ指令控制冒险
发生原因
beq 指令确定下地址在id 级,必须把已经取得的指令清零以进行跳转
解决方法
只需要增加判断即可
当两个寄存器值一样时,zero 为1,选择下地址为beq 指令计算出的地址
assign flag_j=(s_npc==2'b00)||(s_npc==2'b10)||(s_npc==2'b11&&zero==1);
同时,也需要将流水线阻塞
if(flag_j&&!flag)
begin
if_id_flush=1;
end
else
begin
if_id_flush=0;
end
测试结果及仿真波形
测试例子
Loop:
add $4,$4,$1
beq $4,$7,Loop_end
sub $8,$8,$8
j Loop
Loop_end:
add $5,$4,$1
改程序是自己编写的一个简单汇编程序,是将寄存器$4 的值加一,判断与寄存器$7 的大小,如果相等就退出循环,并且把$4 的值加一放在寄存器$5
可以看到,在beq 指令发生时,流水线阻塞,产生一个flush 信号
如果寄存器的值不一样,则pc值选择pc+4
如果寄存器值一样,则在这个时钟周期计算得到npc
最终,各寄存器值为
2.2.6 BEQ指令引入的ID级数据冒险
发生原因
beq 指令提前到id级,引入了新的数据冒险
解决方法
id级增加旁路选择信号,并且实现对应的阻塞
assign flag = (mem_read)&& (rt_id_exe == rs || rt_id_exe == rt)&&(rt_id_exe!={5{1'b0}})
||(s_npc==2'b11&&(rs_id_exe==rs||rs_id_exe==rt));
assign flag_rs_id_exe=(!id_exe_flush&&s_npc==2'b11
&&(num_write_out_exe_mem!=5'b00000)
&& num_write_out_exe_mem==rs_in);
assign flag_rt_id_exe=(!id_exe_flush&&s_npc==2'b11
&&(num_write_out_exe_mem!=5'b00000)
&& num_write_out_exe_mem==rt_in);
s_forwardA2=(flag_rs_id_exe)?
(flag_rs_write?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_EXE_MEM):
(flag_rs_write?`SIDE_ROAD_MEM_WB:`SIDE_ROAD_GPR);
s_forwardB2=(flag_rt_id_exe)?
(flag_rt_write?`SIDE_ROAD_EXE_MEM:`SIDE_ROAD_EXE_MEM):
(flag_rt_write?`SIDE_ROAD_MEM_WB:`SIDE_ROAD_GPR);
测试波形及仿真波形
仿真程序为三阶斐波那契数列
.text
addi $t5,$t5,40 # $t5 = 20
li $t2, 1 # $t2 = 1
sw $t2, 0($t0) # store F[0] with 1
sw $t2, 4($t0) # store F[1] with 1
sw $t2, 8($t0) # store F[2] with 1
ori $t6, $zero, 3 # $t6 = 3
subu $t1, $t5, $t6 # the number of loop is (size-3)
ori $t7, $zero, 1 # the lastest loop $t7 = 1
addi $t0, $t0, 12 # point to F[3]
Loop:
slt $t4, $t1, $t7 # $t4 = ($t1 < 1) ? 1 : 0
beq $t4, $t7, Loop_End # repeat if not finished yet
lw $a0, -12($t0) # $a0 = F[n-3]
lw $a1, -8($t0) # $a0 = F[n-2]
lw $a2, -4($t0) # $a1 = F[n-1]
jal fibonacci # F[n] = fibonacci( F[n-3], F[n-2], F[n-1] )
sw $v0, 0($t0) # store F[n]
addi $t0, $t0, 4 # $t0 point to next element
addi $t1, $t1, -1 # loop counter decreased by 1
j Loop
Loop_End:
lui $t6, 0xABCD # $t6 = 0xABCD0000
sw $t6, 0($t0) # *$t0 = $t6
Loop_Forever:
j Loop_Forever # loop forever
fibonacci :
addu $v0, $a0, $a1 # $v0 = x + y
addu $v0, $v0, $a2 # $v0 = x + y
jr $ra # return
仿真结果为正确实现跳转,并且各寄存器的值正确
3.实验总结
本次实验收获
偶然发现,似乎modelsim有直接可以看数据流的方法,可以使用这个方法进行观察数据流并且观察各模块之间的联系
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU/IF_ID
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU/ID_EXE
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU/EXE_MEM
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU/MEM_WB
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU/SIDE_ROAD
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU/HAZARD
这样就可以在view->dataflow 立面观察了,十分方便
当然也可以直接添加顶层模块以观察整体设计,不用再打开quartus进行分析了
add dataflow sim:/tb_pipeline_cpu/PIPELINE_CPU/*
实验中遇到的问题及解决
一开始总是看不懂实验平台上提供的测试用例,一直想的是,怎么把平台上的机器码读懂,去推导对应的指令,寻求老师帮助,老师说可以自己编写汇编测试用例进行测试,我一开始还不太理解,后来才知道,原来之所以没有出现对应的译码器,是因为没有这种需求,对于机器码,是为了给计算机进行运算的,没必要让人读懂
所以我学习了使用mars进行测试,学会了怎么自己进行测试以满足合适的结果
如何让测试一步步复杂,调试程序
临近考试,事情非常多,复习压力繁重,我们需要坚持下去,认真负重前行!
|