3. Efficient Inference with Transformers

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/4

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

5 Terms

1
New cards

How is a 32-bit float structured?

  • 1 bit for sign (+,-)

  • 8-bit biased exponent => [10-38, 1038].

  • 24-bit fraction => 7 digit precision.

=> 32 bits is too large for modern LLMs

<ul><li><p>1 bit for sign (+,-)</p></li><li><p>8-bit biased exponent =&gt; [10<sup>-38</sup>, 10<sup>38</sup>].</p></li><li><p>24-bit fraction =&gt; 7 digit precision.</p></li></ul><p></p><p>=&gt; 32 bits is too large for modern LLMs</p>
2
New cards

How is a 16-bit float structured?

  • 1 bit for sign (+,-)

  • 5-bit biased exponent => [10-4, 104].

  • 10-bit fraction => 3 digit precision.

=> Range is too small for LLMs

<ul><li><p>1 bit for sign (+,-)</p></li><li><p>5-bit biased exponent =&gt; [10<sup>-4</sup>, 10<sup>4</sup>].</p></li><li><p>10-bit fraction =&gt; 3 digit precision. </p></li></ul><p></p><p>=&gt; Range is too small for LLMs</p>
3
New cards

How is a bfloat16 structured?

Idea: Less bits for precision, more for range

  • 1 bit for sign (+,-)

  • 8-bit biased exponent => [10-38, 1038].

  • 7-bit fraction => 2 digit precision.

<p>Idea: Less bits for precision, more for range</p><ul><li><p>1 bit for sign (+,-)</p></li><li><p>8-bit biased exponent =&gt; [10<sup>-38</sup>, 10<sup>38</sup>].</p></li><li><p>7-bit fraction =&gt; 2 digit precision.</p></li></ul><p></p>
4
New cards

What is the idea behind Quantizing weights?

  • Map float values to int8 (254 distinct) values.

  • Uses half as much space as bfloat16.

  • int8 operations can be computed much faster (hardware acceleration).

→ Results in some errors, but no big difference in performance.

<ul><li><p>Map float values to int8 (254 distinct) values.</p></li><li><p>Uses half as much space as bfloat16.</p></li><li><p>int8 operations can be computed much faster (hardware acceleration).</p></li></ul><p></p><p>→ Results in some errors, but no big difference in performance.</p>
5
New cards

Symmetric Quantization vs. Asymmetric Quantization

Symmetric Quantization:

  • 0 points of base and quantized match.

  • Min / Max are negatives of each other.

Asymmetric Quantization:

  • 0 points do not match.

  • More precision than symmetric.

=> Both have problems with outliers. Can be solved by clipping weights to a pre-determined range.

<p><u>Symmetric Quantization:</u></p><ul><li><p>0 points of base and quantized match.</p></li><li><p>Min / Max are negatives of each other.</p></li></ul><p></p><p><u>Asymmetric Quantization:</u></p><ul><li><p>0 points do not match.</p></li><li><p>More precision than symmetric.</p></li></ul><p></p><p>=&gt; Both have problems with <span style="color: red">outliers</span>. Can be solved by clipping weights to a pre-determined range.</p>