3. Efficient Inference with Transformers

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/4

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

5 Terms

New cards

How is a 32-bit float structured?

1 bit for sign (+,-)
8-bit biased exponent => [10^-38, 10³⁸].
24-bit fraction => 7 digit precision.

=> 32 bits is too large for modern LLMs

$<ul><li>1 bit for sign (+,-)</li><li>8-bit biased exponent => [10-38, 1038].</li><li>24-bit fraction => 7 digit precision.</li></ul>=> 32 bits is too large for modern LLMs$

New cards

How is a 16-bit float structured?

1 bit for sign (+,-)
5-bit biased exponent => [10^-4, 10⁴].
10-bit fraction => 3 digit precision.

=> Range is too small for LLMs

$<ul><li>1 bit for sign (+,-)</li><li>5-bit biased exponent => [10-4, 104].</li><li>10-bit fraction => 3 digit precision. </li></ul>=> Range is too small for LLMs$

New cards

How is a bfloat16 structured?

Idea: Less bits for precision, more for range

1 bit for sign (+,-)
8-bit biased exponent => [10^-38, 10³⁸].
7-bit fraction => 2 digit precision.

$Idea: Less bits for precision, more for range<ul><li>1 bit for sign (+,-)</li><li>8-bit biased exponent => [10-38, 1038].</li><li>7-bit fraction => 2 digit precision.</li></ul>$

New cards

What is the idea behind Quantizing weights?

Map float values to int8 (254 distinct) values.
Uses half as much space as bfloat16.
int8 operations can be computed much faster (hardware acceleration).

→ Results in some errors, but no big difference in performance.

<ul><li><p>Map float values to int8 (254 distinct) values.</p></li><li><p>Uses half as much space as bfloat16.</p></li><li><p>int8 operations can be computed much faster (hardware acceleration).</p></li></ul><p></p><p>→ Results in some errors, but no big difference in performance.</p>

New cards

Symmetric Quantization vs. Asymmetric Quantization

Symmetric Quantization:

0 points of base and quantized match.
Min / Max are negatives of each other.

Asymmetric Quantization:

0 points do not match.
More precision than symmetric.

=> Both have problems with outliers. Can be solved by clipping weights to a pre-determined range.

<p><u>Symmetric Quantization:</u></p><ul><li><p>0 points of base and quantized match.</p></li><li><p>Min / Max are negatives of each other.</p></li></ul><p></p><p><u>Asymmetric Quantization:</u></p><ul><li><p>0 points do not match.</p></li><li><p>More precision than symmetric.</p></li></ul><p></p><p>=> Both have problems with <span style="color: red">outliers</span>. Can be solved by clipping weights to a pre-determined range.</p>