CSE x25 Lab Assignment 3
Welcome to CSE 125/225! Each lab is formed from one or more parts. Each part is
relatively independent and parts can normally be completed in any order. Each part will teach a
concept by implementing multiple modules.
Each lab will be graded on multiple categories: correctness, style/lint, git hygiene, and
demonstration. Correctness will be assessed by our autograder. Lint will be assessed by
Verilator (make lint). Style and hygiene will be graded by the TAs.
To run the test scripts in this lab, run make test from one of the module directories.
This will run (many) tests against your solution in two simulators: Icarus Verilog, and Verilator.
Both will generate waveform files (.fst) in a directory: run/<test_name and parameter
values>/<simulator name>. You will need to run make extraclean after make test
to clean your previous output files.
You may use any waveform viewer you would like. The codespace has Surfer installed.
It is also an excellent web-based viewer. You can also download GTKWave, which is a bit more
finicky.
Each Part will have a demonstration component that must be shown to a TA or instructor
for credit. We may manually grade style/lint after the assignment deadline. Style checking and
linting feels pedantic, but it is the gold standard in industry.
At any time you can run make help from inside one of the module directories to find all
of the commands you can run.
When you have questions, please ask. Otherwise, happy hardware hacking!
Assignment 3 Repository Link: https://classroom.github.com/a/jgWdhG0R
Part 1
Part 2
Part 3
Part 4
Part 1: Memories as LUTs as Programmable Logic
Déjà vu anyone?
We like to treat FPGAs as a “sea of gates”, but they are not. They are actually made up
of discrete elements, like look-up-tables (LUTs) (Xilinx/AMD, Lattice), multiplexers (Xilinx/AMD),
multipliers (Xilinx/AMD, Lattice) and memories (Xilinx/AMD). The job of the Electronic Design
Automation (EDA) toolchain is to synthesize SystemVerilog into these discrete elements and
then program them.
In this part, the objective is to use actual FPGA primitives to re-create the logic functions
you completed in Lab 1 and 2. In effect, do synthesis by hand.
The following is the instantiation template for a AMD/XILINX LUT6 module:
module LUT6
#(parameter [63:0] INIT = 64'h0000000000000000)
(output O,
,input I0
,input I1
,input I2
,input I3
,input I4
,input I5);
The Look-up-Table (LUT) operates by using I0 through I5 as a 5-bit address that indexes
the bits in the INIT parameter to produce O. For example, if I5 - I0 have the values {1’b0, 1’b0,
1’b0, 1’b1, 1’b0, 1’b0. O will be the value at index 4 in INIT (0 in this example). Edit/Update:
Remember that index 0 of of the bit string 2’b01 is 1 (not 0)
We do not use the Xilinx LUT6 in this lab; we use the SB_LUT4 on your ICE40 FPGA.
This is the SB_LUT4 definition:
module SB_LUT4 (
output O,
input I0,
input I1,
input I2,
input I3
);
parameter [15:0] LUT_INIT = 0;
Like in the Xilinx example, LUT_INIT is the LUT initialization string. The Look-up-Table
(LUT) operates by using I0 through I3 as a 4-bit address that indexes the bits in the INIT
parameter to produce O.
Please complete the following parts using the primitives dictated. All of the FPGA
primitives are available in the provided folder.
● xor2: Using only the Lattice SB_LUT4 module, create a 2-input Exclusive-Or module.
● xnor2: Using only the Lattice SB_LUT4 module, create a 2-input Exclusive-Nor module.
● mux2: Using only the Lattice SB_LUT4 module, create a 2-input multiplexer module.
● full_add: Using only the Lattice ICESTORM_LC module, create a full_add module.
The documentation for this module is here (Page 2-2). Another good reference is here.
You will need to produce sum_o using the internal Look-up-Table and inputs I1, I2, and
CIN. These inputs also connect to “hardened” carry logic (It looks like a mux in the
diagram). These are the relevant lines in the provided module:
wire mux_cin = CIN_CONST ? CIN_SET : CIN;
assign COUT = CARRY_ENABLE ? (I1_pd && I2_pd) || ((I1_pd || I2_pd) &&
mux_cin) : 1'bx;
LUT_INIT, CIN_SET, and CARRY_ENABLE are the three key parameters to set. All the
remaining parameters can be ignored.
● We are removing this module to simplify Lab 3.
adder: Using only the Xilinx CARRY4 module, create a parameterized adder in
adder.sv. You should use a generative for-loop. You will need to handle arbitrary
with_p values. This document has a good (english) description of the ports on Page 44.
CARRY4 adds the inputs S[3:0] and DI[3:0], and produces O[3:0]. However, not all
adds are 4 bits; so the carry output from each bitwise addition is in CO[3:0]. If you are
chaining two CARRY4‘s together, you will use the MSB of CO (CO[3]) in the first CARRY4
as the input to CI of the second CARRY4. If you are adding three 2-bit values, you will
take CO[2] of the first CARRY4, to get the MSB (bit 3) of the addition.
Here is an example of using the CARRY4 (Hopefully, this is a big hint):
CARRY4
CARRY4_i
(.CO(wCarryOut[4*i+4-1:4*i]), // 4-bit carry out
.O(wResult[4*i+4-1:4*i]), // 4-bit carry chain sum
.CI(wCascadeIn[i]), // 1-bit carry cascade input
.CYINIT(1'b0), // 1-bit carry initialization
.DI(wInputB[4*i+4-1:4*i]), // 4-bit carry-MUX data in
.S(wInputA[4*i+4-1:4*i])); // 4-bit carry-MUX select input
● triadder: Sometimes, ripple-carry-adders (chaining the c_o from one Full-Adder to c_i
in the next Full-Adder) are suboptimal. For example, adding three numbers together
requires two ripple carry adders – the longest path through the circuit would be through
all the carry bits. Fortunately, there’s a “faster” approach.
Implement a 3-way adder without using the verilog + operator more than once. You
should use the full_add module (either the one above, or the one from the previous
lab). The key technique here is to use a 3:2 compressor, and then add the resulting 2-bit
output using an adder. This is a good reference: link
Tl:DR: Use the full_add module as a 3:2 compressor, and then add the resulting two
bits to get the final result. This technique generalizes to N inputs if you read further in the
link above.
● shift: Using only the Lattice ICESTORM_LC module, create a parameterized, shift
register identical to Lab 1. The shift register should shift left, and shift in d_i to the
low-order-bit, on the positive edge of clk_i, when enable_i == 1.
The documentation for the ICESTORM_LC module is here (Page 2-2). Another good
reference is here.
These are the key lines from the provided module:
always @(posedge polarized_clk)
if (CEN_pu)
o_reg <= SR_pd ? SET_NORESET : lut_o;
assign O = DFF_ENABLE ? ASYNC_SR ? o_reg_async : o_reg : lut_o;
The key ports for this module are I0, 0, CLK, CEN and SR. The key parameters for this
module are: LUT_INIT (How do you use the LUT_INIT to pass through I0, unmodified?
How do you use it to make a mux, to select the d_i input?), SET_NORESET (Related to
reset_val_p), and DFF_ENABLE (parameter for enabling the D-Flip-Flop. NEG_CLK and
ASYNC_SR must be left at their default values.
Demonstration (All Students):
There is no demonstration for this part.
Part 2: Asynchronous Memories in SystemVerilog
Let’s get some practice with memories. Instead of using the LUTs in the FPGA, let’s use
verilog to describe memories. Since these memories are asynchronous-read, they are
synthesized to registers in the actual fabric.
● ram_1r1w_async: Using behavioral SystemVerilog, create a read-priority (aka read-first)
asynchronous memory with 1 write port and 1 read port. It must implement the
parameters width_p, and depth_p. Read priority means that reads get the old write
data when there is an address collision (i.e. the read happens first).
Your asynchronous memory should initialize using the function $readmemh.
When simulating, you will see a warning like this in icarus: FST warning: array
word ram_1r1w_async.ram[10] will conflict with an escaped identifier. This is OK.
● hex2ssd: Using your ram_1r1w_async, create a module that converts a 4-bit
hexadecimal number into a seven-segment display encoding.
Commit your memory initialization file along with your solution.
● kpyd2hex: Using your ram_1r1w_async memory, create a module that converts from a
keypad (Row, Column) output to a hexadecimal value.
kpyd_i is the one-hot encoding of the row value in the high-order bits, and the column
value in the low-order bits. I think the Icebreaker PMOD pin definitions are the swapped
from the Digilent PMOD definitions. My solution treats Column 1 as 0001, corresponding
to the column with 1/4/7/0, and Row 1 as 0001, corresponding to the row 1/2/3/A.
Commit your memory initialization file along with your solution. You will need to copy
both .hex files into this directory for it to compile to your FPGA.
Demonstration (All Students):
Demonstrate your working Keypad to Seven-Segment Display module on the FPGA by
instantiating your modules in top.sv. Use your keypad to show which button is being pressed on
the seven segment display.
10/23 Notes (Contributed by Raphael):
● You will need to iteratively select columns to determine which column has a button being
pressed. (Do this with a 1-hot shift register/ring counter!)
● You will need to drive the column pins on the keypad, and it will respond with the row
being pressed within that column.
● The kpyd2hex module takes rows and columns as one-hot values (e.g. 00010001).
However, the keypad columns are zero-hot with pull-up resistors. Therefore, the rows
are also zero-hot (see datasheet for more info). For example, if button “1" is being
pressed and we send 1110 to the keypad, it will respond with 1110.
● Finally, the keypad glitches if you send too many requests, so you need to slow the
12MHz clock
Old Notes:
● You do not need to debounce or edge-detect the buttons.
● You will need to iteratively select columns to determine which column has a button being
pressed. (Do this with a 1-hot shift register!)
● The kpyd2hex module takes rows and columns as one-hot values (e.g. 00010001).
However, the keypad columns are zero-hot with pull-up resistors. Therefore, the rows
are also zero-hot.
● You will need to handle the case where no button is pressed.
● It is safe to assume we will only press one button at a time in a column.
● You can use persistence of vision.
Part 3: Elastic Pipelines and FIFOs
We are working our way up to pipelines. There are two types of pipelines: inelastic, and
elastic (we will cover these in class). You can always wrap inelastic pipelines to create elastic
ones. In this lab, you will write an in-elastic pipeline stage. Next, you will create an elastic
pipeline stage. Finally, you will use your memory (from above) to create a FIFO.
You may use whatever operators and behavioral description you prefer, except you may
not use always@(*). You are encouraged to reuse whatever modules see fit from previous
labs or this lab.
● inelastic: Write an inelastic pipeline stage. When en_i is 1, it should save the data.
Otherwise, it should not. When datapath_reset_p == 1, data_o should be reset to 0
if reset_i ==1 at the positive edge of the clock.
You can use /* verilator lint_off WIDTHTRUNC */ around
datapath_reset_p to clear the lint warnings.
Note: This should look a lot like writing a DFF.
● elastic: Write a mealy elastic pipeline stage. You can think of this as a 1-element FIFO,
with a mealy state machine to improve throughput. The module must be Ready Valid &
on the input/consumer interface (ready_o and valid_i) and Ready Valid & (valid_o
and ready_i) on the output/producer.
When datapath_reset_p == 1, data_o should be reset to 0 if reset_i ==1 at the
positive edge of the clock.
When datapath_gate_p ==1, data_o should only be updated when valid_i == 1.
Otherwise, data_o should be updated whenever ready_o == 1. This a very simple
form of “Data Gating”, and is the missing “bit” from the class lecture slides.
10/23 Note: A potentially better way to say above: When datapath_gate_p == 1,
data_o should only be updated when (valid_i & ready_o) == 1. Otherwise, data_o
should be updated whenever ready_o ==1.
● fifo_1r1w: Using behavioral SystemVerilog, your ram_sync_1r1w module, and any
other module you have written, write a First-in-First-Out (FIFO) module. The module
must be Ready Valid & on the input/consumer interface (ready_o and valid_i) and
Ready Valid & (valid_o and ready_i) on the output/producer. This paper and this
google doc have good breakdowns of the interface types.
Demonstration (All Students):
Demonstrate your working FIFO by using it to connect between audio input and output
on your FPGA board. You will plug the PMOD I2S2 into PMOD Port B on your board, and then
use 3.5mm cables to connect to the Audio I/O ports to/from your computer/speaker. You should
set your FIFO to a very small depth (e.g. 2) because the Lattice boards do not have memories
that support the ram_1r1w_async pattern. The output must sound the same as the original
audio for credit.
What is the maximum value for depth_p that you can use on your FPGA, before the
toolchains fail to compile?
Part 4: Sinusoid / Fixed Point Representation
Have you ever heard anyone complain about how complicated IEEE 754 floating point
is? The problem is that it’s easy to use (in software), until it isn’t: List of Failures from IEEE 754.
For this reason, floating point arithmetic isn’t used in many safety critical applications. For the
same reasons, floating point numbers aren’t used in signal processing. Fixed point operations
are vastly less complicated than floating point operations, require vastly less area, and are
numerically stable.
Fixed point arithmetic follows the same rules as normal two’s-complement arithmetic. In
that sense, you already know the basics. The difference is that when two fixed-point numbers
are multiplied, the number of fractional digits/bits increases. For example, .5 * .5, which is
representable with one fractional digit, produces .25, which needs two fractional digits to
represent. In fixed point, the fractional digits represent ½ (.5), ¼ (.25), ⅛ (.125), etc. In the
example above, .5 is represented in binary by .1. When you multiply 0.1 and 0.1, the result will
be two bits, 0.01 (binary), or .25 (decimal)
I like to handle fractional bits by declaring the fractional bits in the negative range of the
bus. For example, wire [11:-4] foo, has 12 integer bits, and 4 fractional bits. When foo is
multiplied by itself, it produces 24 integer bits, and 8 fractional bits, or [23:-8]. However, if
[-1:-4] bus is multiplied by a [11:-4] bus, the result is only a [11:-8] bus.
Here are a few good tutorials:
From Berkeley: https://inst.eecs.berkeley.edu/~cs61c/sp06/handout/fixedpt.html
From UW: https://courses.cs.washington.edu/courses/cse467/08au/labs/l5/fp.pdf
● sinusoid: Using your ram_1r1w_async memory, create a module that generates a sine
wave, turning hexadecimal indices into (signed) 12-bit values. See the demo below for
more information.
Since this is an audial challenge, there is no testbench for this part. If you need
accommodations, please see the instructor. Commit your memory initialization file (in
hex format) along with your solution.
Demonstration (All Students):
Demonstrate using your counter module from Lab Assignment 1/2, and your sinusoid
module above, play a Tuning-A tone on the speakers in the lab with the PMOD I2S2 module.
You need to figure out how to generate a tone at 440 Hz, given that the PLL clock runs at
22.591MHz, and the I2S2 accepts a Left channel and a Right channel output at approximately
44.1 KHz. The interface to the I2S2 module is Ready-Valid-&. (Note: Do not use the output of
your counter as your clock. All of your logic should run at 22.591MHz.). Implement your solution
in sinusoid/top.sv. We will use this link (or similar) to determine if you have succeeded.
The clock frequency in this lab has changed. The signal from the PLL is faster,
22.591MHz, and called clk_o.
In both demonstration folders, top.sv instantiates an I2S2-to-AXI-Streaming module,
which drives the I2S2 PMOD. The input and output of this module uses a ready/valid
handshake. The left and right audio channels are separate wires, but you can concatenate them
if you would like (for your FIFO). Drive both, for your sinusoid.
top.sv “works out of the box”. You can test your setup works by connecting your
computer to the audio input and connecting the audio output into amplified speakers, i.e. those
with a power cable. You will need to instantiate your logic between the interfaces for the demo.
WARNING WARNING WARNING
DO NOT PLUG YOUR HEADPHONES INTO THE AUDIO OUTPUT WHILE THEY ARE IN
YOUR EARS. PLAY MUSIC FIRST, ADJUST VOLUME, THEN PUT IN EARS.
Use make bitstream to build the FPGA bitstream (configuration file) and program the
FPGA. Your FPGA will need to be plugged into a USB port.
Grading:
1. Push your completed assignment to your git repository. Only push your modified files!
2. Submit your assignment through gradescope, and confirm that the autograder runs.
3. Demonstrate each part to a TA.
This lab will be graded on the following criteria. Weights are available in Canvas.
1. Correctness: Is the code in git correct? Does it pass the checks in Gradescope?
2. Lint and Style: Does the solution pass the Verilog Lint Checker run by Gradescope? Are
variable names consistent with what is being taught in class? This may seem pedantic,
but in industry and open source projects this is standard practice.
Hint: use the make lint command to check your code
3. Demonstration: Was the code demonstrated to a TA or instructor before the deadline?
The following will also be considered in your final grade:
4. Language Features: Does the solution use allowed language features? (i.e. Structural vs
Behavioral Verilog). Maximum 50% deduction.
5. Git Hygiene: Does the assignment submission only contain files that are relevant to the
assignment? Please, please, please don’t check files that aren’t part of the submission.
Maximum 20% deduction.
Finally, modifying any parts of the test/grading infrastructure without permission will
result in a zero on the entire part.
请加QQ:99515681 邮箱:99515681@qq.com WX:codinghelp