Home page Wiki Blog Code


A risc cpu in verilog targeted at the iCEbreaker board

Counting leading zeros

In many scenarios we need a function to calculate the number of leading zeros in a binary number. This plays a role in division (where you might need to shift the first bit in the the divisor to align with the dividend) or in floating point operations where for example multiplication requires mantissas that align in order to get the maximum accuracy.

So the code below shows a modular implementation.

module clz(
	input [31:0] a,
	output [31:0] c

	wire  [7:0] ai;
	wire  [15:0] z;

	genvar i;

	for(i=7; i>=0 ; i=i-1) begin
		assign ai[i    ] = ~|a[i*4+3:i*4];
		assign  z[i*2+1] = ~(a[i*4+3]|a[i*4+2]);
		assign  z[i*2  ] = ~((~a[i*4+2] & a[i*4+1]) | a[i*4+3]);

	wire [5:0] y;

	assign y = 	ai[7] ? (	// leftmost nibble all zeros?
			ai[6] ? (	// 2nd nibble all zeros?
			ai[5] ? (	// 3rd nibble all zeros?
			ai[4] ? (
			ai[3] ? (
			ai[2] ? (
			ai[1] ? (
			ai[0] ? ( 6'b100000              // all 32 bits are zero
				) : {4'b0111, z[ 1: 0]}
				) : {4'b0110, z[ 3: 2]}
				) : {4'b0101, z[ 5: 4]}
				) : {4'b0100, z[ 7: 6]}
				) : {4'b0011, z[ 9: 8]}
				) : {4'b0010, z[11:10]}  // count in 3rd nibble from left
				) : {4'b0001, z[13:12]}  // count in 2nd nibble from left
				) : {4'b0000, z[15:14]}; // count in leftmost nibble

	assign c = {26'b0, y};


The code implements a count leading zeros module with a 32 bit input. The for loop instantiates 8 identical pieces of wiring, each piece addressing a 4 bit nibble in the input. Each ai[] wire signals whether the whole nibble consists of zeros, while each duo of z[] wires counts the leading zeros in the nibble. This count can be 0, 1, 2 or 3 (for all zeros the ai[] wire will be set.)

With this information for every nibble available we proceed with what is in essence a kind of priority encoder: we first check the leftmost nibble and if it is not all zeros the count will be z[15:14], i.e. the count for the leftmost nibble.

If the left most nibble is all zeros, then we check the 2nd nibble and if that 2nd nibble contains ones, we return the count in that 2nd nibble (z[13:12]) and prepend 4'b0001, that is, 4, because the first nibble was all zeros. Then we repeat this scheme all the way up to the rightmost nibble.

If the rightmost nibble is all zeros too, we return 6'b100000 (32)

Propagation delays

This fully combinatorial implementation works for me on the iCEbreaker at 12 MHz but it is not the fastest implementation possible. The priority encoder contains quite few layers, each with their own delays and with a faster clock this might be too slow. As far as I can tell the longest chain is thru 6 LUTs (as verified with

yosys -p "read_verilog clz.v; hierarchy -libdir . ; synth_ice40 -dsp -flatten -json clz.json; show"

but I am not sure what this actually means in terms of delays other than that nextpnr is fine with it for the iCEbreaker.