Is there a bias when doing subtraction with borrowing?
As a professional ‘numbers guy,’ I often find myself drawn to the tactile satisfaction of doing things by hand, just as we all learned to do in school. Even with the convenience of calculators and computer apps, there’s something about the process of manual calculations that I can’t resist.
For some time, however, I noticed an odd phenomenon when subtracting this way over the years: I almost always need to use the borrowing technique. In this article, I will discuss the borrowing technique and explain why it seemed odd to me that I just about always have to use it. Specifically, I’ll discuss the probability of using the borrowing technique given that the numbers I am subtracting seem random to me (hint: they are not). I’ll show that there can be a bias mathematically, but the bias suggests I would use borrowing less frequently than not. I then come up with a solution to the problem that has nothing to do with complex probability calculations. But first, what is subtraction with (and without) borrowing?
What is hand subtraction with borrowing?
Subtraction with borrowing, or regrouping, is a nifty little technique we use to subtract one number from another. It’s a bit like addition with regrouping, where we carry over. The idea is to use the place value to our advantage, allowing us to borrow from the column to the right-whether it’s tens, hundreds, or more. This method is useful when dealing with double-digit and larger numbers, especially when subtracting a smaller number from a larger one. I’ll explain this in more detail later.
Let’s walk through a simple example: subtracting 3 from 29. First, we place the 3 under the 9 of 29, aligning the ‘ones’ column values. Then, we subtract the 3 from 9 and get 6 for the ones column. Next, we move to the tens column and subtract zero (the space is empty) from 2, and place the 2 underneath. And voila, we get 26 as our result. Kid’s stuff.
Now, let’s subtract 9 from 23. We can’t do it directly as 9 is larger than 3. So, we borrow 1 from the tens column and add it to the 3 in the ones column, making it 13. Then, we subtract 9 from 13, which gives us 4. But now we deal with the tens column. We borrowed a 10 to subtract 9 from 13, so now we must take away the borrowed 10. We see in the tens column there is a 2 from 23, so we subtract the borrowed 10, giving us 2 (tens) minus 1 (tens), which leaves 1 (tens). We put the 1 next to the 4 and get 14: 23 - 9 = 14.
It’s a simple process once you get the hang of it. I do this one to five times a week rather than reach for a calculator.
You don’t always need to borrow
The example above borrows from the 10s column, but this doesn’t always have to happen. Consider the first example of 29 - 3. This is so easy that you can do it in your head. Even more complicated examples like 55 - 23 can be done without borrowing. We subtract 3 from 5, giving 2. We then subtract 2 from 5 (20 from 50) and get 3 (30) for the final result 32. So sometimes we need to borrow, and other times we do not.
The Puzzle
Here’s a puzzle for you: I almost always need to borrow when I do these subtractions by hand. Why is that? It’s intriguing, isn’t it? There seems to be an even chance of getting a larger digit on the top than the bottom in the ones column as there is for the reverse. But this doesn’t seem to be the case. Just about always, the digit on the bottom is larger than the digit on the top. What gives? Why do my hand subtractions have this bias?
The following analysis may be tedious, so skip it if figuring out probability questions is not your thing. Join us after the table below. For the rest, this is the logic used to calculate the precise probability of getting a top unit digit of a higher value than the bottom unit digit if the digits are randomly drawn. We will use some symbolic representation for double-digit numbers to make things generic. See below:
So, for example, if we were subtracting 32 from 54, \(m_1\) would be 5, \(m_2\) would be 4, \(n_1\) would be 3, and \(n_2\) would be 2. We can proceed with hand subtraction easily when \(m_2\) is larger than \(n_2\) or if they are equal. Otherwise, we need to use borrowing with hand subtraction.
Ok, let’s start by considering the probability that the top number is \(m_2\). To be concrete, we will ask the probability that it is 9. Well, the options are 0 through 9, giving 10 different options. It can only be one so the probability is 1/10 or, \(p(m_2) = 1/10\), which is true for all the options of \(m_2\). The options and their probability are listed in the first two columns of the table below.
The third column lists the probability that \(m_2\) is selected and the probability that \(m_2 < n_2\). When \(m_2\) is 9, there are no chances \(n_2\) is greater than it, so \(p(m_2 < n_2)\) is zero. If \(m_2\) is 8 then there is a 1/10 chance that \(n_2\) is greater than \(m_2\) - namely when \(n_2\) is 9. But the probability of \(m_2\)and the probaility of \(m_2 < n_2\) is the multiplication of those probailities: \(p(m_2) \cdot p(m_2 , n_2)\). In fancy mathematical terms, we can write this as \(p(m_2 \wedge m_2 < n_2)\).
In the fourth column, we consider the probability of \(m_2 = n_2\), which for a given value of \(m_2\) is 1/10. That is, once we select a value of \(m_2\) there is a 1/10 chance that \(n_2\) will match it. This is true for all values of \(m_2\). But again, the chance of selecting \(m_2\) is 1/10, so the product is 1/100.
In the fifth column, we consider the opposite of column three, and the values in this column should be self-evident by now.
Now, to get the total probability for the three cases: 1) \(m_2 > n_2\), 2) \(m_2 = n_2\), and 3) \(m_2 < n_2\), we sum over all the options for \(m_2\), or sum down the column. These values are given in the final row of the table.
\(m_2\)
\(p(m_2)\)
\(p(m_2 \wedge m_2 < n_2)\)
\(p(m_2 \wedge m_2 = n_2)\)
\(p(m_2 \wedge m_2 > n_2)\)
9
1/10
1/10 \(\cdot\) 0/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 9/10
8
1/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 8/10
7
1/10
1/10 \(\cdot\) 2/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 7/10
6
1/10
1/10 \(\cdot\) 3/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 6/10
5
1/10
1/10 \(\cdot\) 4/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 5/10
4
1/10
1/10 \(\cdot\) 5/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 4/10
3
1/10
1/10 \(\cdot\) 6/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 3/10
2
1/10
1/10 \(\cdot\) 7/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 2/10
1
1/10
1/10 \(\cdot\) 8/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 1/10
0
1/10
1/10 \(\cdot\) 9/10
1/10 \(\cdot\) 1/10
1/10 \(\cdot\) 0/10
\(\Sigma\)
10/10
45/100
10/100
45/100
So now we have the chance that \(m_2\) is larger than \(n_2\), and it is 45/100. The opposite is 45/100, with a 10/100 chance that the numbers are equal. We see the same chance that \(m_2\) is larger or smaller than the \(n_2\). Therefore, the chance we need to use borrowing is evenly split - there is no apparent bias! This probably seems quite obvious in the end, but the apparent bias I see made me do the calculation to check.
Checking my logic with computer code
This model is purely symbolic logic and algebra. We can check this numerically using Python code if we make some assumptions about fair random number generation. Below is a short code snippet that checks that two random numbers between 0 and 9 will be greater than, less than, or equal to each other over 1 million repeats.
import numpy as npm2 = np.random.randint(0, 10, 1_000_000) # generates 1000000 integers between 0 and 9n2 = np.random.randint(0, 10, 1_000_000) # generates 1000000 integers between 0 and 9gt=0; lt=0; eq=0# setup counters for greater than, less than and equalfor i inrange(len(m2)): # iterate over all values in array m2 by index number top = m2[i] bottom = n2[i]if top > bottom: gt +=1# top greater than bottomif top < bottom: lt +=1# top less than bottomif top == bottom: eq +=1# they are equalprint(f'Greater than = {gt}, Less than = {lt}, Equal = {eq}')print(f'p(m2 > n2) = {gt/1_000_000}, p(m2 < n2) = {lt/1_000_000}, p(m2 = n2) = {eq/1_000_000}')
Greater than = 450030, Less than = 450411, Equal = 99559
p(m2 > n2) = 0.45003, p(m2 < n2) = 0.450411, p(m2 = n2) = 0.099559
and those numbers should be close to what we calculated using logic and algebra.
WHERE IS MY BIAS COMING FROM?
Selection Bias In Choosing Which Numbers to Subtract (Part 1)
So, the following occurred to me. When doing subtractions like this, I usually do a subtraction where the first number is larger than the second. Otherwise, I reach for a calculator to handle the negative number that would result. This is a form of selection bias because I only choose to do subtractions this way when I know the result is a positive number. Actually, the borrowing technique only works when the result is a positive number anyway. Does this selection bias introduce an overall bias that means borrowing is more likely?
The Consequences of having \(m_1m_2\) greater than \(n_1n_2\)
To study this, I immediately went with a coded solution. Writing up some Python code is often easier than using logic and algebra, and the logic/algebra approach needed is a bit complicated for this blog post. But I will sketch the reasoning behind what seems like a paradoxical result in addition to letting code answer questions for us.
First, I’d like to explain what we are doing here. I will select 1 million pairs of 2-digit numbers between 00 and 99. Why? These numbers represent 1 million samplings of subtractions of two numbers I may encounter and need to do. But before I consider the pairs for subtraction, I need to ‘correct’ the pairings to make sure they are subtractions I would choose to do. I do this by considering each pair, and if the first number is less than the second number, I swap them around to get a positive number on subtraction. I would assume whether I need to use borrowing during the subtraction then is completely random - spoiler alert: it isn’t! Swapping numbers around is introducing bias!
Lets do a code block to show this so far:
import numpy as npm1m2 = np.random.randint(0, 100, 1_000_000) # one million random numbers between 0 and 99n1n2 = np.random.randint(0, 100, 1_000_000) # another one million random numbers between 0 and 99for i inrange(len(m1m2)): # lets loop over all the numbers in each array (v1 and v2)if m1m2[i] < n1n2[i]: # if the first list number is less than the second list number (then we want to swap) t = n1n2[i] # we n1n2[i] = m1m2[i] # do the m1m2[i] = t # swap
Now, all the numbers in m1m2 are larger than all the corresponding numbers in n1n2.
To determine if I would use borrowing during the hand subtraction, it is only necessary to count how often \(m_2\) is less than \(n_2\). For example, if the numbers are 45 - 29, 5 is less than 9, and I need to do borrowing. I use the modulus command % 10 to compute the last digit of the pairs and then count when the first number is greater, less than, and equal to the second.
Greater than = 494990, Less than = 404267, Equal = 100743
p(m2 > n2) = 0.49499, p(m2 < n2) = 0.404267, p(m2 = n2) = 0.100743
Bias! But the Wrong Bias!
Unsurprisingly, the chances that \(m_2\) = \(n_2\) is 1/10. This aligns with our intuition. However, the chance that \(m_2\) is greater than \(n_2\) is unexpectedly higher than the reverse scenario, at about 0.495 to 0.405. This indicates the presence of a bias. But wait. It is also in the opposite direction of what I initially anticipated! If we set up our subtractions to subtract a smaller number from a larger one, I expect to borrow less frequently since \(p(m_2>n_2)\) is greater than \(p(m_2<n_2)\). Yet, experience tells me I usually need to borrow. So, how does this counterintuitive bias occur? The answer lies in the swap operation in the code above.
Swaps occur when \(m_1m_2\) is less than \(n_1n_2\). This condition is met when \(m_1\) itself is less than \(n_1\), which happens 9 times out of 10. In these cases, the values of \(m_2\) and \(n_2\) are irrelevant, and no bias in their values is introduced. However, when \(m_1\) = \(n_1\), which is 1/10 of the time, a swap only occurs when \(m_2\) is less than \(n_2\). For instance, we would swap 54 for 59 but not swap 58 for 52. This selective swapping creates the bias! We swap these numbers to make the top number larger, but this unintentionally always eliminates the need for borrowing during subtraction. Swapping ensures we don’t need to borrow, and nothing is left up to chance one time out of ten. Hence, we have bias.
This is a fascinating demonstration of how bias can enter numerical analysis. At first glance, it may seem like we are being completely fair. However, by setting up a specific scenario that we desire - subtracting a smaller number from a larger one - we have unwittingly introduced a significant bias.
Of course, this doesn’t get us any closer to understanding why, in my experience, I just about always have to borrow. It turns out the reason is not based on mathematics and probability at all but laziness. Read on…
Selection Bias In Choosing Which Numbers to Subtract (Part 2!)
So how can it be that randomly being assigned two-digit numbers to subtract and then ensuring that their subtraction gives a positive number (by swapping if need be) seems to bias subtraction towards numbers that don’t require borrowing, and yet when in practice, I always seem to be borrowing? Borrowing requires more labour and thought, so is the universe conspiring against me? The answer is a resounding nope. It turns out I am conspiring against myself.
I have dedicated significant time to meticulously going through the mathematics of the probabilities involved here. I did this myself before writing anything because I genuinely thought there must be a hidden bias that somehow leads to \(m_2\) being smaller than \(n_2\). Some quirk of random numbers that turns out to have a hidden bias. However, the only bias I could find was a selection bias that generated a bias against borrowing. Only when I turned my thinking towards other sources of bias did I have the ‘Aha!’ moment. The bias had nothing to do with how random numbers tend to play out, but rather in my choice to perform the hand subtraction in the first place. Consider the following subtraction:
55 - 33
This fulfills all the regular requirements for when I would do hand subtraction. \(m_1 m_2\) is larger than \(n_1 n_2\). Also, \(m_2\) is larger than \(n_2\). But think about this for a moment. While I might need to subtract numbers like this, generating a scenario where I may need to use the borrowing technique, this is a case when I don’t even need to do hand borrowing. The answer is 22, which is obvious to anyone who handles numbers regularly. What about:
78 - 55
This one is a little trickier, but it still fulfills all the usual requirements of when I would do hand subtraction. But again, I don’t actually need to. The answer is 23. It’s fairly easy to compute the 3 in the ones position and then the 2 for the tens position. The fact I don’t need to borrow means I don’t even need to write down the subtraction.
So, the bias is definitional. Why do I often have to use the borrowing technique when I hand subtract? Because I am more likely to hand subtract when the subtraction is difficult enough that borrowing is required. It’s really that simple. It’s also kind of anticlimactic.
Summary
So, what does all this mean? Well, I started out with an odd conundrum. Why is it that when I subtract numbers, I always have to use the difficult borrowing technique? Why is subtraction not more frequently easy when I write it out and do it by hand? In my usual overintellectualizing way, I figured that the reason must be some weird bias in the numbers when they are posed as subtractions. But there is nothing in the numbers themselves that end up biased. It is simply my choice of when actually to do the hand subtraction. I mostly do it when the subtraction is hard - that is when I need to use the borrowing technique. So, obviously, I need to use the borrowing technique more often. It really is that simple.
The point is that the reason for biases is often initially pretty obscure. The tendency to look for reasons in the numbers themselves was wrong. The reason was simple. It was not the numbers per se, but when I needed to actually do the hand subtraction and the implications that had for borrowing or not. The answer was not in the numbers but in the choices that were made because of the numbers.