IO, throughput and blocking: a simple model
A simple model
Continuing with oversimplification, here I’ll present a model that describes throughput for a blocking and non-blocking IO process where messages have to be sent across a communication channel. This channel is characterized by two parameters, bandwidth and latency, that together with a third parameter, message size, completes the model. Thus
m = message size — the size of the messages to be sent
b = bandwith — the channel’s capacity, data that can be sent per unit time
l = latency — round tri,p time for a ping message
and what we want to identify is the throughput, as a function of these parameters
t = throughput — messages processed per unit time = f(m, b, l)
Throughput is defined as the amount of message replies arriving at the source per unit time, which is equivalent to the rate of messages arriving at the target under the assumption that message replies are not constrained by bandwidth. This throughput is what defines the performance of the IO process. We’ll use arbitrary units for the quantities, but you can think of message size, bandwidth and latency in the usual units (ie Kb, Kb/s, ms) if that makes it clearer.
First, the non-blocking throughput is simply the bandwidth divided by the message size
tn = b / m
this results from the assumption that non-blocking message sending will achieve 100% bandwidth utilization; the number of messages per unit time is directly given by the bandwidth.
With blocking, each message will only be sent once a reply has been received for the previous one; previous messages block subsequent ones. This causes the number of message replies that arrive at the source per unit time to not be fully constrained by bandwidth, but by bandwidth and latency. The net effect is to add a time overhead to each message’s total processing time. Since this overhead occurs per message, we first calculate transmission time for one message using available bandwidth, which is
transmission time = message size / bandwidth
the overhead is half the latency, since that is the time it takes for a message reply to arrive, triggering the next message send. So the total time per message is
total time = transmission time + return delay = m / b + l / 2
because this is the total time required for each message send, the throughput is the reciprocal
tb = 1 / (m / b + l / 2)
This is the function we were looking for. As a sanity check, we can see that if the latency is zero, the above equation reduces to the non-blocking case. This also shows that tb ≤ tn for all values.
Note that this throughput corresponds both to that of message replies and messages arriving at the destination, the important point is that the latency blocks further sends, besides adding time to the roundtrip reply. Finally, if delays corresponding to message processing were variable, the term l / 2 would have to be substituted with the average processing time plus half the latency to yield an average throughput.
Let’s plot f using some example values to see what happens. We will use the following
message size = 5
bandwidth = 1-100
latency = 0.01-1.0
To compare performance with and without blocking, we’ll define a quantity
r = ratio of blocking throughput to non-blocking throughput = tb / tn
We will use octave to generate 3D plots for this data, using this simple script
b = [1:100];
l = [0.01:0.01:1];
[x,y] = meshgrid(b,l);
r = (1./((message./x) + y/2))./(x/message);
n = x/message;
b = (1./((message/x) + y/2));
which generates three plots, one for non-blocking throughput, one for blocking throughput, and one for the ratio. The vertical axis shows the throughput, with the two horizontal axes corresponding to bandwidth (left) and latency (right). Message size was fixed at 5 units.
Non-blocking throughput (tn)
Nothing special going on here, throughput scales linearly with bandwidth. Because there is no blocking, latency has no effect, the surface is a plane.
Blocking throughput (tb)
The interesting thing to note here is how latency (bottom right axis) controls the degree to which throughput scales with bandwidth (bottom left axis). With high latency, throughput barely scales, the curve rises but is almost flat. At the other extreme, with zero latency the scaling is the same as in the previous graph, a straight line with the same gradient rising up to the red region. The transition from no scaling to scaling occurs in the middle of the graph, as the shape of the surface changes. Overall, throughput is of course reduced in comparison to non-blocking.
Ratio (r = tb / tn)
A ratio of 1.0, equal throughput, occurs with zero latency, the peak at the top right that continues along the ridge at the top of the graph. But what is interesting is the other ridge that extends towards the bottom. In fact, the trend that we can see from the blue area towards the latency axis is that lower bandwidth produces higher ratios closer to 1.0, equal performance. After a moment’s thought, this makes sense, the lack of bandwidth negates the effects of blocking. Or in other words, blocking still manages to utilize 100% bandwidth when it is scarce.