Type-casting literals

Imagine you are working on a memory constrained embedded application, and want to save as much size as possible on the binary. The program has following shift-left statement:

res = val << 1;

How about you type-cast the literal 1 to the smallest possible data-type that can hold it? For example:

res = val << (uint8_t) 1;

Do you think this explicit casting of 1 into uint8_t worthwhile? One would hope it should save on the binary size by reserving only 8-bits to represent 1.

It turn out, the answer is NO. This is not helpful. Following reasons:

  • Arithmetic operand promotion: C/C++ automatically promote during compile-time all the operands to at least an int. So, in the example above the binary will hold literal 1 with unsigned int data-type. You can read more in detail about operand type promotion here: https://www.geeksforgeeks.org/integer-promotions-in-c/
  • The type-cast is normally a run-time operation. In our example, a run-time conversion from unsigned int to uint8_t. Therefore, it doesn’t help with our stated goal of saving on binary size which will contain unsigned int as mentioned in previous bullet.

So in conclusion – it’s not recommended to explicitly type-cast literals to smaller datatypes if the goal is to optimize on binary size. Note, there could be other reasons why you still might want to do that (e.g. to maintain sign-agreement).

What is a chicken bit?

It is a configuration bit which can be used to control disable or enable after tapeout of certain functional blocks inside the chip. Sometimes, there is some new functionality that the designer is not completely confident about, or maybe they have not been fully verified across all corners. So, to reduce the post-tapeout bug risk, a config bit – called chicken bit, is left in the design which can be toggled to disable new function, and revert to legacy functionality. Chickens are known for their frightened and uncertain nature (e.g. “chicken out” phrase)- and hence this bit get its interesting name – the engineer can chicken out in face of unexpected observations post-tapeout.

For example, let’s suppose that a digital processing block was designed on the assumption that the input coming from some external comparator cannot glitch. However, during lab validation it is found that under certain corner conditions, the comparator can indeed glitch and lead to serious issues in the downstream logic. As a precaution, designer had added a glitch-filter on the input stage but left it disabled by default – because the filter added some undesirable latency. However, after running into the glitch issues in lab, she can set the chicken bit that enables the glitch-filter, avoiding a potential chip respin (maybe the filter latency is still acceptable).

Normally these bits are OTP (one-time-programmable) and they are not accessible by the end user. These bits are programmed during production on the chip tester and then they get locked against further modification.

Feel free to ask in comments if there are any questions regarding chicken bits, or share your experience if you had to ever make use of them in your career!

break statement in synthesizable Verilog

Generally speaking, break/continue statements find use in simulation purposes – notably in building test-benches. However, modern synthesis tools allow you to make use of these in your design as well! Following is a really cool example of how to make use of break to describe a Priority Multiplexer. Let’s begin from the truth table:

s0 s1 s2 .. sn out
1 x x x x x0
0 1 x x x x1
0 0 1 x x x2
0 0 0 0 x xn
Truth table for a Priority Multiplexer

The corresponding schematic looks as follows:

Priority Multiplexer schematic

Now comes the cool part – following RTL snippet cleverly using break statement can be used to realize the above schematic.

always_comb begin
  out = x[N];
  for (int i = 0; i < N; i++ ) begin
          if (s[i]) begin
               out = x[i];
               break ;
          end
  end
end

Use divided clock for timers

Timers are a usual feature/requirement in several applications of digital design. Fundamentally, timers are implemented as counters that increment/decrement at the rate of provided clock. It’s very common to find a large number of different timers distributed throughout design, each ticking with base clock.

There’s a nice opportunity for area saving in such cases. The idea is very simple, let me illustrate with example:

Imagine, your clock’s base frequency is 4MHz (a period of 0.25us). In the design, three timers are required for 1us, 2us and 16us. If you were to implement these timers by clocking them at 4MHz, that would need 14 (3+4+7) bits.

Here’s the idea, you divide the base 4 MHz clock down to 1 MHz, and then use divided clock for implementing timers. Now, you’ll need 8 (1+2+5) bits. Now of course, we would also need some additional bits for clock-division (in this example 2 more)

The area savings can be significant if a large number of timers can be brought into this divided-clock domain, and especially if the base frequency is high. Also, consider the power savings from reduced switching activity…So whenever possible, it can be a good practice to try and identify as many timers as possible that have the potential to be moved under a shared divided clock domain.

Simulation stuck – no progress in simulation time?!

To understand why simulation might get stuck, let’s first understand what the simulator is doing at every time step. Whenever some signal changes value, the simulation time is stopped. While time still being stopped, simulator proceeds with “delta cycle” and changes the values of all other signals that are affected with this change. If any signal is affected, simulation proceeds with another “delta cycle” checking for all other signals that are affected and the process continues in this manner, until all signal values get static. At this point, the simulation time is advanced.

So, based on the above description, we can say that the root-cause for a simulation to get stuck is that there are infinitely many “delta cycles” being inserted. There are two reasons (at least that I know) why it might be happening: i) Combinational Loops, or ii) HDL Coding style. Lets go over both in detail.

Combinational Loops

A combinational loop in the design may cause continuous updates of signal values (think ring oscillator), and hence the simulator keeps inserting delta cycles.

Note that running simulations is not the way to detect design combinational loops, because not all combinational loops ocillate! (think about the back to back inverter configuration e.g. an SRAM cell). So, it is possible to simulate designs containing combinational loops if all nodes eventually reach stable values on presentation of new inputs.

HDL coding style

This one is more annoying as it is not caused by a design issue, but rather by how the code is written and what simulator is being used. Here is an example:

// Design
module stuck (
  input  logic     in,
  output logic     out
);
  
  logic net_a, net_b;
  
  always_comb begin
    net_a   = 1'b0; // default assignment
    net_a   = in;
    out = net_b;
  end
  
  always_comb begin
    net_b = 1'b0; // default assignment
    net_b = net_a; 
  end
  
endmodule

There is no combinational loop here, and of course it is just a simple buffer which we could have done with a direct assignment of out = in. However, that’s not the point. What is shown above is a very common coding style: in an always block we start out with a default assignment and then follow that with subsequent assignments to same variables. Think of the next-state logic for example, where a frequently used coding style is to start out by assigning current state at the top of always block and then make subsequent assignments depending on inputs.

With certain simulators (at least I tried with Cadence Xcelium 19), the simulation will hang up. Lets say, the current state of variables is in=net_a=net_b=out=1’b0. When in transitions to 1’b1, f0llowing sequence of events will happen:

  1. in is within the sensitivity list of top always_comb and so this block will get triggered.
  2. net_a is assigned the default value 1’b0
  3. net_a is assigned new value of in i.e. 1’b1
  4. out is assigned the current value of net_b i.e. 1’b0
  5. Because of events 2-3, there was an update to net_a, and as net_a is in the sensitivity list of bottom always_comb, this block gets triggered
  6. net_b is assigned the default value 1’b0
  7. net_b is assigned the new value of net_a i.e. 1’b1
  8. Because of events 6-7, there was an update to net_a, and as net_a is in the sensitivity list of bottom always_comb, this block gets triggered
  9. net_a is assigned the default value 1’b0
  10. net_a is assigned value of in i.e. 1’b1
  11. out is assigned the new value of net_b i.e. 1’b1
  12. Because net_a glitched from 1’b1->1b’0->1’b1 during events 9-10, this glitch will trigger the bottom always_comb block
  13. net_b is assigned the default value 1’b0
  14. net_b is assigned the new value of net_a i.e. 1’b1
  15. Because net_b glitched from 1’b1->1b’0->1’b1 during events 13-14, this glitch will trigger the top always_comb block
  16. The back-and-forth triggering of the two always_comb blocks continues infinitely, and so we get a hang up in simulation at the time in had transtioned to 1’b1.

Best take-away recommendation from here is to avoid combinational blocks to feed back to each other, even if the signals involved are unrelated. Although there is nothing wrong with the design, it would be necessary to adjust the HDL code (maintaining logical-equivalence ) so that your simulator does not get stuck with infinite delta cycles. If the code cannot be reformatted, check if the simulator provides switches (e.g. -delay_trigger in case of Xcelium) which makes the blocks not sensitive to “zero width glitches”.

The -delay_trigger tells the simulator to wait until it’s evaluated the entire always block before deciding on whether or not there has been an event on any particular variable, and thus whether or not it should be added to the event queue. Setting this flag prevents the simulator from hanging in this case because variables which don’t change after a complete pass of the always block are not added to the event queue.

You can try out what we discussed at EDA Playground on this link: https://edaplayground.com/x/43ce. If you have the access, run Xcelium simulator and experiment using the -delay_trigger switch.

I want to leave you with a question: the sequence of events in the above analysis was initiated by in transitioning to 1’b1. What do you think if in was assigned 1’b0 at the start of simulation (check out the testbench in the playground linked), would the simulation still hang up? The default assignment is equal to value of in, surely there are no glitches?

Why a flip-flop needs Setup Time?

Let’s begin from the definition: setup time is the minimum time before active edge of the clock that the input data of flip-flop needs to become stable to its new value.

Why do the flip-flops need a setup time in the first place, and what’s actually going on during this time? Flip-flops (and other logic based memory elements) usually have a loop to lock the value in, this provides retention even when the input has changed. In short, its the settling time delays for nodes in the loops that dictate setup time.

I’ll try to illustrate setup time requirement with a concrete example, using a Master-Slave D flip-flop. The circuit schematic is shown in Figure 1.

Figure 1. A transmission gate based Master-Slave D Flip-Flop

While CLK=0 i.e. a positive edge is about to occur, the flip-flop transmission gates are in the state shown in Figure 2.

Figure 2. How a D flip-flop looks during CLK=0

Before the clock goes to 1, the D input needs to propagate through the two inverters and nodes a, b. This path is what determines the setup time of a flip-flop, and is shown highlighted in Figure 3.

Figure 3. The path in red needs to settle before positive edge. This path determines the setup time of flip-flop

Now, let us consider an example of a setup time failure. Imagine enough setup time was not allowed and the node b is still at old value when the positive clock edge arrives. This situation is shown in Figure 4, notice that second inverter on the path has not yet driven node b to its new value.

Figure 4. The value of different nodes in the circuit just before positive clock edge arrives

When the positive clock edge arrives, the Master Latch can go metastable as the node b attempts to drive old value and has contention against node a. This metastability will propagate out on the Q of flip-flop. Figure 5 shows the possible situation after positive clock edge.

Figure 5. Metastability caused by setup time failure after clock goes to 1

A quick follow-up question: do you think setup time failure is dangerous if the D of flip-flop was equal to Q?

During an interview, I also like to test using various other flavors and implementations of a flip-flop.

Floating nets in your schematic

There are some pins, or more generally nets, in your schematic which you don’t plan to use. Would it be okay to leave them floating?

What is a ‘floating net’ anyways? The following discussion is based on the terminology where I define ‘input floating net’ as an undriven but loaded net; while ‘floating output net’ is defined as a driven but unloaded net. Lets take separately the situations where a floating net is input to some logic, and where it is output (quick question: can’t it be both?)

For the case of input: it is a big NO! Never assume that an input you left floating is automatically at ground voltage. It can easily ‘float’ to any voltages across the range and affect functionality of circuit. Another disadvantage is power consumption: in worst case, the net’s voltage oscillates frequently around the threshold voltage of gates it is driving. This would lead to power draw not only in the logic directly connected to this net, but also all the switching activity propagates to downstream circuit.

The recommended approach is to cleanly tie all the floating input nets to a known voltage level.

Finally, I leave you to think about nets that are purely output, is it okay to have them floating?

Hello world!

This site will be mostly about VLSI interview questions for entry and intermediate level job positions. Along the way, I will also share some general tips, best practices and pitfalls…