SIMD Instruction sets

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

Arm SIMD

Implemented through co-processor extensions - configurable interfaces to add other blocks etc.

SIMD and SVE are extensions that Arm made

2
New cards

SIMD Concepts

Have vector registers, which group multiple items together.

Items are processed in lanes (e.g. a pixel with 4 8 bit components has 4 lanes, 8 bits each)

Compute on whole vectors at once, in one cycle, instead of separately

Better usage of memory and ALUs.

3
New cards

Arm SIMD Registers / VFP (vector floating point)

64 or 128 bit SIMD registers

4
New cards

Arm SIMD support

supports integer, fixed point, and floating point. fixed is useful for particular signal processing applications

5
New cards

Arm SIMD uses

integers, multimedia, signal processing (video, graphics, voice processing, image processing)

VFP is special and can be used for 3D graphics, games cosoles etc.

6
New cards

Arm SIMD extras

only have it when needed, supports unaligned data access, has powerful load / store instructions that can be interleaved.

7
New cards

SIMD mnemonics

Mnemonics on instructions indicate what type of data can be found in a SIMD register, e.g. VADD.I16

If output is a different size to input this can be handled, e.g.

MUL.S16 Q0, D2, D3

multipliaction could make them as big as 32 bits, so results go into a larger register

8
New cards

Common SIMD instructions

Conversion, Comparison, Artihmetic, newton-rhapson reciprocal estimation, SQRT, Saturating arithmetic, Polynomail arithmetic, specific decoding stuff

9
New cards

How to use SIMD in programs

Intrinsics, or automated

10
New cards

Intrinsics

High level language to specify SIMD behaviour

These can help the compiler compile to SIMD instructions, without needing to guess

C++ implements this with operator overloading

11
New cards

Automatic

Compilers can detect vectorisable loops. They often need hints to help with this though, so they know vectorisation will be ok

These don’t actually do anything, but give hints to the compiler

12
New cards

Arm SVE

Vectors are 128 - 2048 bits (inc in 128 bit chunks)

Agnostic to vector lengths

Nice to compile to

Lots of support for predication

Can vectorise loops that are not exact multiples of vector width without needing peel loops, this is done using predicates to indicate when lanes are empty.

13
New cards

Vector registers

32 Vectors (LEN x 128bits long)

DP & SP Floating Point

64, 32, 16, 8 bit integers

14
New cards

Predicate registers

8 lane masks (LEN x 16 bits)

8 more predicate registers for manipulation

FFR - first fault register, used for deciding if things have gone wrong

15
New cards

Control registers

One to control vector length, one to control privilege level

16
New cards

SVE Predicates

Used to drive loop control.

Overloads usual NZCV predicates

N = first element is active

Z = no element is active

C = last element is not active

V = scalarised loop state, else zero

17
New cards

Use of predicates

If next predicate has Z set, or C is set that tells us not to keep looping

We can branch based off of predicates

18
New cards

Vector partitioning

Use predication to allow speculative vectorisaion

  • operate on a partiion of elements that are “safe” according to dynamic conditions and predicate

  • partitions are inherited by nested conditions and loops

19
New cards

Uncounted loops, data-dependent exits

  • operations with side-effects following a break must not be architecturally performed

  • operate on a before-break partition, then exit loop if break is detected

20
New cards

Speculative load errors

  • loads required to detect break condition may fault

  • operate on a before-fault partition, then iterate until a break is detected

21
New cards

Length agnosticism

  • also uses partitions

  • partition is defined by dynamic vector length

22
New cards

SVE speedup

Not in all test cases, but in some cases speedups are crazy good when compared to NEON

23
New cards

x86 SIMD

Not as prevalent, but there are a wide range of instructions

Vectors (256-bit) of 8-64 bit ints and floats

String manip, CRC, popcount, unalgined loads, AI / ML stuff

24
New cards

AVX Intel

Skylake onwards we have 512 bit vectors