## CS 461/561 Computer Architecture

__Introduction__

A complex number consists of a real and imaginary component and is usually written in the form where and are either integer or floating-point values and (the imaginary value) . Sometimes in engineering, the letter is used in place of because is used for other values.

Multiplying two complex numbers is done by applying the FOIL (Firsts, Outers, Inners and Lasts) method, similar to that of binomial multiplication. For example, multiplying (a + bi)(c + di) is accomplished as follows:

Firsts: a * c

Outers: a * di

Inners: bi * c

Lasts: bi * di

This produces (a+bi)(c+di) = ac + adi + bci + bdi^{2}. The terms are combined to produce the product back in the form a + bi. Keep in mind that i^{2} = -1.

An example using actual values: (2.5 + 3i)(4.0 + 2i)

Firsts: 2.5 * 4.0

Outers: 2.5 * 2i

Inners: 3i * 4.0

Lasts: 3i * 2i

This produces 10 + 5i + 12i + 6i^{2} = 10 + 17i + 6(-1) = 4 + 17i.

Some contemporary programming languages natively support complex numbers (Python, MATLAB). Newer revisions of some older languages (C, FORTRAN) have added support for complex numbers. Some programming languages have no native support for complex numbers.

__Assignment Definition__

Consider the following high-level language code which multiplies two vectors that contain single-precision complex numbers:

Values a, b and c are vectors; _re is the real component element and _im is the imaginary component element in each vector.

- Convert this loop into pseudo RV64V assembly code using strip mining assuming the following architectural features:

Register s0 = loop counter & array index [i]

Vector registers: v0 – v31

MVL (maximum vector length) = 64

Instructions: vld (vector load)

vst (vector store)

vadd (vector add)

vsub (vector subtract)

vmul (vector multiply)

bne (branch if not equal)*

blt (branch if less than)*

j (unconditional jump)*

addi (integer add immediate)*

ori (logical or immediate)*

Note: instructions with an asterisk indicate the instructions are used only for setting initial index value and increments, and for loop control.

- If the vector processor implements chaining with two lanes and has a single vector load/store unit, using the pseudo assembly code from question 1, show how convoys would be constructed to execute in the vector pipeline. How many chimes are required to execute the convoys?

- Assume in the vector processor, the functional units have the following startup overhead: load/store unit: 12 cycles, multiply unit: 7 cycles, and the add/subtract unit: 6 cycles. How many clock cycles are required for each iteration of the loop, including startup overhead?

- How many iterations are required to complete processing the vectors?

__Instruction Formats__

vld (vector load): vld v_{D}, vec_ref

vst (vector store): vst v_{D}, vec_ref

vadd (vector add): vadd v_{D}, v_{S1}, v_{S2}

vsub (vector subtract): vsub v_{D}, v_{S1}, v_{S2}

vmul (vector multiply): vmul v_{D}, v_{S1}, v_{S2}

bne (branch if not equal): bne x_{1}, x_{2}, target_label

blt (branch if less than): blt x_{1}, x_{2}, target_label

j (unconditional jump): j target_label

addi (integer add immediate): addi x_{D}, x_{S1}, x_{S2}

ori (logical or immediate): ori x_{D}, x_{S1}, const

__Format Definitions__

v_{D} = destination vector register

v_{S1} = first source vector register

v_{S2} = second source vector register

vec_ref = vector reference (name)

x_{1} = first general purpose register for comparison

x_{2} = second general purpose register for comparison

x_{S1} = first source general purpose register

x_{S2} = second source general purpose register

target_label = label of the target instruction for branch

const = an integer constant