1/75
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is the market for PLD?
$6 Billion in 2017 was expected to grow to $10B in 2020
FPGAs are primarily made up of ________?
-OR Gates
-AND Gates
-Lookup Tables (LUTs)
-CPLDs
LUTs
PROM
Programmable Read Only Memory
Invented 1956
Commercially available 1969
EPROM
Erasable PROM
Invented 1971
PLA
Programmable Logic Arrays
invented 1975
PALs
Programmable Array Logic
Invented in 1978
What does PROM architecture consist of?
Fixed AND plane for address decoding logic, OR plane programmable through the change of memory contents
What does PAL architecture consist of?
OR plane is fixed and AND plane is programmable. This lead to CPLDs.
CPLD
Complex Programmable Logic Device
Devices with multiple PALs in same package with registered outputs and interconnecting programmable fabric
Built in CMOS to allow reprogrammability with a collection of macrocells made of a combination of fully re-programmable AND/OR array and a register that perform combinational and sequential logic
FPGA
Field Programmable Gate Array
Consists of wires, gates, and flip flops (registers)
ASIC
Application Specific Integrated Circuit
ASSP
Application Specific Standard Product
-> Marketed for multiple customers for a specific market
SoC
IC that integrates all components of a computer or electronic system into a single chip
LAB
Logic Array Block
Consists of 16 macrocells
PIA
Programmable Interconnect Array
CPLD Pros/Cons
Pros:
Easy generation of wide input functions, very predictable nearly deterministic timing, still a good choice for glue logic applications today
Cons:
Does not work well for designs with many registers, data transfers, and bus interfaces
LUTS and FPGA Architecture
FPGAs designed to imitate gate arrays to create a flexible general purpose logic device
Implemented with LUTs
First FPGAs used 3-input LUTs
Logic Element (LE) consists of LUT, full adder, and D flip flop typically
Config Memory Differences
Antifuse: highly reliable, one time programmable, expensive, but most deterministic and high speed.
FLASH: highly reliable, reprogrammable, more expensive than SRAM. Often times low power, especially microsemi devices
SRAM: reprogrammable, highest density, lowest cost
Why CPLD over FPGA?
-Save money
-Design requires little logic
-more I/O in small space
-Meet very tight and deterministic timing reqs
Total delay in an adder?
T_adder(n) = (n-1)Tc+Ts = (n-1)3D+2D
D - gate delay
Multipliers in Digital Logic
Gates:
Array Multipliers made of partial products, gates increase as n^2
In FPGA:
Combinational Circuits - Fast but big
Sequential shift and add (state machine approach)
Specialty algorithms, like booth's, dadda multiplier, or wallace tree multiplier.
Memory (LUTS) simple but doesn't scale
Some combination
Hard multiplier blocks built in to FPGA - usually best
Sequential shift and add
Smaller in area than combinational multipliers beyond 4x4 multiplie, but slower as 2n clock cycles are required for shifting and adding
Memories for multiplication
4x4 multiplication only need 256 memory locations
Hard multipliers
Provide the greatest speed of any FPGA implementation, running at 200 MHz or more. Typical FPGAs have dozens to hundreds of 18x18 multiplier circuits.
PLD Selection Criteria
1. Size or Logic Density (gates, LEs, slices, ALMs, etc)
2. Cost per logic Gate
3. Speed
4. Power Consumption
5. Reprogrammability
6. Cost per I/O
7. Hard IP available on chip
8. Deterministic Timing
9. Reliability (FIT rate)
10. Endurance (# of programming cycles and years of retention)
11. Design and Data Security
Xilinx XC9500XL
CPLD
5 ns pin-to-pin delay
5V I/O
Reprogrammable FLASH
10,000 program/erase cycles
20 year retention
Xilinx CoolRunner-II
Low power design
Deterministic Timing 5ns delay
Nonvolatile config SRAM
1000 programming cycles
20 year retention
Connected through advanced interconnect matrix
40 inputs to macrocell
Xilinx Spartan 3AN
Small
FLASH
350 Mhz
100K program/erase cycles
20 years flash retention
2x 4 input luts, 2 Flip Flip
Xilinx Spartan-6
Small
4x 6input LUTs, 8 flip flops
FLASH
Xilinx Artix-7
Large SRAM
628 MHz, transceivers can do 6.6 GB/s
48-328 mA
200,000 logic cell, 500 inputs
4x 6 input LUTS, 8 flip flops
Xilinx Kintex-7
Large SRAM
~470K logic cells, 500 inputs
741 Mhz clock, 1818 Mhz toggle
Transceivers go to 12.5Gb/s
4x 6 input LUTS, 8 flip flops
Xilinx Virtex-7
SRAM
2M logic cells, 1100 input
741 Mhz, 1818Mhz toggle freq
Transceivers up to 28.05 GB/s
up to 100W
4x 6 input LUTS, 8 flip flops
Kintex Ultrascale
SRAM
1.4M logic cells, 676 I/O pins
850 Mhz
Transceivers at 16.4 Gb/s
Adds GigE
Virtex Ultrascale
5M logic cells, 1400 I/O pins
850 Mgz
Transceivers at 30.5 Gbps, 3.66 terabyte/sec aggregate
8x 6 input LUTs, 16 flip flops
Altera MAX V
CPLD
FLASH config, but routing based on SRAM
Low power
100 programming cycles
deterministic 7 ns
1x 4 input LUT, 1 flip flop
Altera MAX 10
Small FPGA
FLASH configuration, routing on SRAM
50K logic elements, 500 I/O
450 MHz, 5W limit
1x4 input LUT, 1 Flip FloP
Altera Cyclone V
Small FPGA
SRAM reprogrammable
300K LEs, 480 I/O
ALM has an 8 input LUT, 2 full adders, and 4 flip flops
Can make 4 bit adder with 3 ALMs
Altera Arria V
SRAM
500K LEs, 704 I/O
625 Mhz clock
6.6 Gpbs transceivers
ALM is made of 2x 6 input LUTs, 2 full adders, 4 flip flops
Altera Stratix V
SRAM
950K LEs, 840 I/O
717 Mhz
28.05 Gbps transceivers
ALM is the same, 2x 6 input LUTs, 2 full adders, 4 flip flops
Altera Arria 10
SRAM
1M LEs, 840 I/O
644 Mhz
17.4 Gbps transceivers
Adaptive 8 input LUT, 2 full adders, 4 flip flops
Altera Stratix 10
SRAM
5.5M LEs, 1640 I/O pins
1100 Mhz (fastest FPGA at time)
Has hyperflex: registers are everywhere, core clocking, and hyper aware design flow
Same ALM, 2x 6 inputs LUTs, 2 full adders, 4 flip flops
Microsemi Advantages
true FLASH tech
Live at power up
Integration provides lower total cost
Security
(low power) across board
best security
(superior reliability), flash are immune to radiation effects
FPGA Interconnect Technologies
SRAM - reprogrammed and volatile, must be loaded each time it is powered
FLASH- reprogrammed and non-volatile, stays until reprogrammed
AntiFuse- One time programmable, connections melted in place when programmed
FLASH/SRAM Leakage
SRAM high static current with lots of leakage due to many transistors connected to power/ground
For FLASH there are less paths with 1000x lower leakage current.
FLASH process allows to suspend, called FLASH freeze. Don't generate heat from power loss -> hence IGLOO.
Microsemi IGLOO nano
6K flip flop cells
250Mhz clock
low micro Watt power in flash freeze mode
Small I/O comprable to CPLD
Microsemi IGLOO 2
160K LEs
similar to other IGLOOs
574 I/O pins
many added hard IP compared to last IGLOO
5 Gbps transceivers
4 input LUT, 1 flip flop
Microsemi Axcelerator
Antifuse
21K flip flops
870 Mhz clock
less than 50mW static
684 I/O pins
C and R cell LEs
Microsemi RTAX Rad-Hard
Antifuse
21K flip flops
870 Mhz
<50 mW static
684 I/O
C and R cell logic elements
Lattice ECP5
SRAM
84K LEs
400 Mhz
250 mW
365 I/O
2x 4 input LUT, 2 flip flops
Lattice ECP3
SRAM
149K LEs
500 Mhz
450 mW
586 I/O pins
2x 4 input LUTs, 2 Flip Flops
Lattice MACHXO2
SRAM, but non-volatile config storage
6.8K LEs
200 microW
334 I/O
2x 4 input LUT, 2 flip flips
Lattice MACHXO3
SRAM, but non-volatile config storage
9.4K LEs
20 mW
384 I/O
2x 4 input LUT, 2 flip flips
Lattice ICE40 Ultra
SRAM, non volatile config on chip
3.5K LE
95 microW static
39 I/O
1x 4 input LUT, 1 FF
Lattice Advantages
Low cost, power, and many hard IP
Flash advantages over SRAM?
Lower Power
Higher Reliability
Better Security
RTL View is used to view designs in the simplest schematic perspective to verify logical design. True/False?
True
Power estimation post fitting using actual routing and placement information is done using which of following tools in Quartus Prime ?
Quartus Prime Power Play Analyzer
RTL Viewer
Early Power Estimator
None of the above
Quartus Prime Power Play Analyzer
In digital logic design, Static Hazards can be removed by ...
Adding additional logic to cover transitions
OR
Using flip-flops and synchronous design
Setup Time
The minimum time the data signal must be stable before the clock edge
Data arrival time
The amount of time for data to arrive at a destination register's D input from the common edge. THink of this as time from where the clock starts (launch edge), to get to data endpoint on register. Must go through first clock, register, and time for data to get from register to next.
launch edge (0) + TclkA + Tco +Tdata
Clock Arrival Time
Time for data to arrive at a destination register's clock input from the common clock edge
Latch edge (1 period usually) + TclkB
Data Required Time (Setup)
The minimum time required before the latch edge to get latched in destination register.
Think of this as the time for clock to reach the front of the register.
Clock arrival time - Tsu
Clock period + TclkB - Tsu
Setup Slack
The margin by which the setup timing is met and calculated as data required time - data arrival time
Clock Period + TclkB-Tsu-TclkA-Tco-Tdata
Cause of timing violations?
Long data paths
incorrect analysis of of requirements
Large clock skew
RTL used to view?
Logic and sequencing
Hold Slack
Margin by which the hold timing requirement is met. Meaning data arrived after the data had been latched long enough before the clock goes low.
Data Arrival time - Data required time
TclkA+Tco+Tdata-TclkB-Th
Hold time (T_h)
Amount of time the data signal must be stable after the clock edge
Data Required Time (Hold)
The minimum time for data to get latched into destination register.
Clock Arrival Time + Th
clock period + TclkB + Th
FPGA Design Flow Steps
Design Entry
Functional Sim
Synthesis or Mapping
Place and Route or Fitting
Simulation
Programming
Test and Integration
Release
Design Entry Methods
Schematic Capture
Import IP blocks
HDL text
State Machine entry
Import EDIF
Design Analysis and Synthesis
checks for errors
builds database
synthesis and optimized logic design
Design Fitting
Fitting design in the smallest possible part is a principal design challenge in FPGA design
Quartus fitting approaches:
Balances, high performance (speed), low power, small area
Timing Analysis
Synchonization fundamental to reliable FPGA design
Static timing analysis determines violations
Quartus uses timequest which uses a set of equations to calculate slack, and Fmax
Timing Netlist Definitions
Cell - basic device building block
Pin - input output of cell
Net - connection between pins
Port -top level device pins
Types of programming files
.sof - SRAM object file to configure SRAM over JTAG
.pof - Programming object file to program FLASH
.jam/jbc - ASCII file used to program with JTAG
.jic - is used to program EPCS through serial interface (these devices don't have JTAG)
Programming Modes (Altera Specific)
JTAG
Active Serial
Passive Serial
Fast Passive Parallel
Config via protocol (CVP)