Registers take up resources so there is a trade of between the convenience of having lots of general purpose registers and the limited number of LUTs available on an FPGA
If our target FPGA has a small amount of ram, we would like to keep instruction size limited. Most of the instructions refer to three registers. If we have just 16 registers we need 4 bits to encode a reference to a register and with our three operand design that keeps us just within sixteen bits: 3 fields of 4 bits, plus 4 bits for the opcode, together make 16 bits.
The complete list is fairly short:
|MOVE||R2,R1,R0||R2 ← R1 + R0|
|ALU||R2,R1,R0||R2 ← R1 op R0|
|LOAD||R2,R1,R0||R2 ← (R1 + R0)|
|LOADL||R2,R1,R0||R2 ← (R1 + R0)|
|STOR||R2,R1,R0||R2 → (R1 + R0)|
|STORL||R2,R1,R0||R2 → (R1 + R0)|
|JAL||R2,R1,R0||R2 → PS, PC → R1 + R0|
|MOVER||R2,R1,n||R2 ← R1 + 4 * n|
|SETBRA||cond,R1||R1 ← cond?1:0, PC += cond? offset + 2: 2|
|LOADI||R2,n||R2 ← n|
|LOADIL||R2,n||R2 ← (PC), PC += 4|
|PUSH||R2||SP -= 4, (SP) ← R2|
|POP||R2||(SP) → R2, SP -= 4|
|MARK||R2||R2 ← counter|
Note the ALU instruction though: the operation it performs is determined by the low byte of the flags register (register 13). This implies that most arithmetical and logical operations are in fact two instructions: one to load the lower byte of the flags register followed by an ALU instruction.
We do not have to reload this operation if we want to perform the same operation on multiple set of operands though and addition of two register can be done with MOVE R2,R1,R0 as well.
The up5k on the iCEbreaker board has both block ram and single port ram. Both can be configured to address memory as 16 bit words if we want but because we want to address individual bytes quite often and keep the addressing logic as simple as possible, we opt to do all memory access in byte size chunks.
Even if we load 4-byte long words, we do this byte for byte. Storage order is big-endian.
As with many risc designs, memory access is mostly limited to load/store. In this implementation for both activities the sum of the two source registers is calculated and used as the memory address. This address is then used to store or load a byte or a word.
We do not have dedicated I/O instructions of any kind. If the CPU is part of a SoC, the SoC design will have to implement memory mapped I/O
The design uses 32-bit words throughout, except for loading and storing bytes. This also means we have no facilities to manipulate 16-bit words whatsoever.
Also, originally there were no POP and PUSH instructions. This resulted in every stack operation to consist of two instructions: one LOADL or STORL instruction and an instruction to increment or decrement the register that was used as the stack pointer (typically register 14), for example MOVER r14,r14,1 to add 4 to register 14. This resulted in rather low density code when used with our C compiler so dedicated POP and PUSH instructions were implemented.
We did not implement similar call and return instructions however as these are less frequent. A call to a known address can be implemented as
PUSH r11 ; make sure we can restore this later LOADIL r11,#function JAL r11,r11,r0 ; jump to the address in r11, storing pc in r11 POP r11
and a return simply as
JAL r0,r11,r0 ; jump to addres in r11, ignoring the link because r0 is always 0
The rather peculiar named SETBRA instructions can be used for two things: to set a register to 1 or 0 depending on a condition being met, or conditionally branch to another location using a 16-bit signed offset. This can even be combined to set a register and branch in one instruction. The Robin CPU provides 3 flags that can be tested: zero and negative, which are both set by ALU operations, and always, which by definition is a flag that is always set. Typically an assembler will provided macros to easily implement just a conditional branch or a set register operation with common names.