13 KiB
title | date | tags | favorite | ||||
---|---|---|---|---|---|---|---|
How Many Values Does a Boolean Have? | 2020-08-21T23:05:55-07:00 |
|
true |
A friend of mine recently had an interview for a software engineering position. They later recounted to me the content of the technical questions that they had been asked. Some had been pretty standard:
- "What's the difference between concurrency and parallelism?" -- a reasonable question given that Go was the company's language of choice.
- "What's the difference between a method and a function?" -- a little more strange, in my opinion, since the difference is of little practical use.
But then, they recounted a rather interesting question:
How many values does a bool have?
Innocuous at first, isn't it? Probably a bit simpler, in fact, than the questions about methods and functions, concurrency and parallelism. It's plausible that a candidate has not done much concurrent or parallel programming in their life, or that they came from a language in which functions were rare and methods were ubiquitous. It's not plausible, on the other hand, that a candidate applying to a software engineering position has not encountered booleans.
If you're genuinely unsure about the answer to the question,
I think there's no reason for me to mess with you. The
simple answer to the question -- as far as I know -- is that a boolean
has two values. They are true
and false
in Java, or True
and False
in Haskell, and 1
and 0
in C. A boolean value is either true or false.
So, what's there to think about? There are a few things, ackshually. Let's explore them, starting from the theoretical perspective.
Types, Values, and Expressions
Boolean, or bool
, is a type. Broadly speaking, a type
is a property of something that defines what the something
means and what you can do with it. That something can be
several things; for our purposes, it can either be an
expression in a programming language (like those in the form fact(n)
)
or a value in that same programming language (like 5
).
Dealing with values is rather simple. Most languages have finite numbers,
usually with 2^{32}
values, which have type int
,
i32
, or something in a similar vein. Most languages also have
strings, of which there are as many as you have memory to contain,
and which have the type string
, String
, or occasionally
the more confusing char*
. Most languages also have booleans,
as we discussed above.
The deal with expressions is a more interesting. Presumably expressions evaluate to values, and the type of an expression is then the type of values it can yield. Consider the following snippet in C++:
int square(int x) {
return x * x;
}
Here, the expression x
is known to have type int
from
the type signature provided by the user. Multiplication
of integers yields an integer, and so the type of x*x
is also
of type int
. Since square(x)
returns x*x
, it is also
of type int
. So far, so good.
Okay, how about this:
int meaningOfLife() {
return meaningOfLife();
}
No, wait, doesn't say "stack overflow" just yet. That's no fun.
And anyway, this is technically a tail call, so maybe our
C++ compiler can avoid growing the stack. And indeed,
flicking on the -O2
flag in this compiler explorer example,
we can see that no stack growth is necessary: it's just
an infinite loop. But meaningOfLife
will never return a value. One could say
this computation diverges.
Well, if it diverges, just throw the expression out of the window! That's
no int
! We only want real int
s!
And here, we can do that. But what about the following:
inf_int collatz(inf_int x) {
if(x == 1) return 1;
if(x % 2 == 0) return collatz(x/2);
return collatz(x * 3 + 1);
}
Notice that I've used the fictitious type
inf_int
to represent integers that can hold
arbitrarily large integer values, not just the 32-bit ones.
That is important for this example, and I'll explain why shortly.
The code in the example is a simulation of the process described
in the Collatz conjecture.
Given an input number x
, if the number is even, it's divided in half,
and the process continues with the halved number. If, on the other
hand, the number is odd, it's multiplied by 3, 1 is added to it,
and the process continues with that number. The only way for the
process to terminate is for the computation to reach the value 1.
Why does this matter? Because as of right now, nobody knows whether or not the process terminates for all possible input numbers. We have a strong hunch that it does; we've checked a lot of numbers and found that the process terminates for them. This is why 32-bit integers are not truly sufficient for this example; we know empirically that the function will terminate for them.
But why does this matter? Well, it matters because we don't know
whether or not this function will diverge, and thus, we can't
'throw it out of the window' like we wanted to with meaningOfLife
!
In general, it's impossible to tell whether or not a program will
terminate; that is the halting problem.
So, what do we do?
It turns out to be convenient -- formally -- to treat the result of a diverging computation
as its own value. This value is usually called 'bottom', and written as \bot
.
Since in most programming languages, you can write a nonterminating expression or
function of any type, this 'bottom' is included in all types. So in fact, the
possible values of unsigned int
are \bot, 0, 1, 2, ...
and so on.
As you may have by now guessed, the same is true for a boolean: we have \bot
, true
, and false
.
Haskell and Bottom
You may be thinking:
Now he's done it; he's gone off the deep end with all that programming language theory. Tell me, Daniel, where the heck have you ever encountered
\bot
in code? This question was for a software engineering interview, after all!
You're right; I haven't specifically seen the symbol \bot
in my time
programming. But I have frequently used an equivalent notation for the same idea:
undefined
. In fact, here's a possible definition of undefined
in Haskell:
undefined = undefined
Just like meaningOfLife
, this is a divergent computation! What's more is that
the type of this computation is, in Haskell, a
. More explicitly -- and retreating
to more mathematical notation -- we can write this type as: \forall \alpha . \alpha
.
That is, for any type \alpha
, undefined
has that type! This means
undefined
can take on any type, and so, we can write:
myTrue :: Bool
myTrue = True
myFalse :: Bool
myFalse = False
myBool :: Bool
myBool = undefined
In Haskell, this is quite useful. For instance, if one's in the middle
of writing a complicated function, and wants to check their work so far,
they can put 'undefined' for the part of the function they haven't written.
They can then compile their program; the typechecker will find any mistakes
they've made so far, but, since the type of undefined
can be anything,
that part of the program will be accepted without second thought.
The language Idris extends this practice with the idea of typed holes, where you can leave fragments of your program unwritten, and ask the compiler what kind of thing you need to write to fill that hole.
Java and null
Now you may be thinking:
This whole deal with Haskell's
undefined
is beside the point; it doesn't really count as a value, since it's just a nonterminating expression. What you're doing is a kind of academic autofellatio.
Alright, I can accept this criticism. Perhaps just calling a nonterminating
function a value is far-fetched (even though in denotational semantics
we do extend types with \bot
). But denotational semantics are not
the only place where types are implicitly extend with an extra value;
let's look at Java.
In Java, we have null
. At the
core language level, any function or method that accepts a class can also take null
;
if null
is not to that function or method's liking, it has to
explicitly check for it using if(x == null)
.
This null
value does not at first interact with booleans.
After all, Java's booleans are not classes. Unlike classes, which you have
to allocate using new
, you can just throw around true
and false
as you see
fit. Also unlike classes, you simply can't assign null
to a boolean value.
The trouble is, the parts of Java dealing with generics, which allow you to write
polymorphic functions, can't handle 'primitives' like bool
. If you want to have an ArrayList
of something, that something must be a class.
But what if you really do want an ArrayList
of booleans? Java solves this problem by introducing
'boxed' booleans: they're primitives wrapped in a class, called Boolean
. This class
can then be used for generics.
But see, this is where null
has snuck in again. By allowing Boolean
to be a class
(thereby granting it access to generics), we've also given it the ability to be null.
This example is made especially compelling because Java supports something
they call autoboxing:
you can directly assign a primitive to a variable of the corresponding boxed type.
Consider the example:
Boolean myTrue = true;
Boolean myFalse = false;
Boolean myBool = null;
Beautiful, isn't it? Better yet, unlike Haskell, where you can't really
check if your Bool
is undefined
(because you can't tell whether
a non-terminating computation is as such), you can very easily
check if your Boolean
is true
, false
, or null
:
assert myTrue != myFalse;
assert myFalse != myBool;
assert myTrue != myBool;
We're okay to use !=
here, instead of equals
, because it so happens
each boxed instance of a boolean
value
refers to the same Boolean
object.
In fact, this means that a Boolean
variable can have exactly 3 values!
C and Integers
Oh the luxury of having a type representing booleans in your language!
It's almost overly indulgent compared to the spartan minimalism of C.
In C, boolean conditions are represented as numbers. You can perhaps get
away with throwing around char
or short int
, but even then,
these types allow far more values than two!
unsigned char test = 255;
while(test) test -= 1;
This loop will run 255 times, thereby demonstrating
that C has at least 255 values that can be used
to represent the boolean true
.
There are other languages
with this notion of 'truthy' and 'falsey' values, in which
something not exactly true
or false
can be used as a condition. However,
some of them differ from C in that they also extend this idea
to equality. In JavaScript:
console.assert(true == 1)
console.assert(false == 0)
Then, there are still exactly two distinct boolean values
modulo ==
. No such luck in C, though! We have 256 values that fit in unsigned char
,
all of which are also distinct modulo ==
. Our boolean
variable can contain all of these values. And there is no
respite to be found with enum
s, either. We could try define:
enum bool { TRUE, FALSE };
Unfortunately, all this does is define bool
to be a numeric
type that can hold at least 2 distinct values, and define
numeric constants TRUE
and FALSE
. So in fact, you can
still write the following code:
enum bool b1 = TRUE;
enum bool b2 = FALSE;
enum bool b3 = 15;
And so, no matter how hard you try, your 'boolean' variable can have many, many values!
Conclusion
I think that 'how many values does a boolean have' is a strange question. Its purpose can be one of two things:
- The interviewer expected a long-form response such as this one.
This is a weird expectation for a software engineering candidate -
how does knowing about
\bot
,undefined
, ornull
help in creating software, especially if this information is irrelevant to the company's language of choice? - The interviewer expected the simple answer. In that case, my previous observation applies: what software engineering candidate has not seen a boolean in their time programming? Surely candidates are better screened before they are offered an interview?
Despite the question's weirdness, I think that the resulting investigation of the matter -- outside of the interview setting -- is useful, and perhaps, in a way, enlightening. It may help one understand the design choices made in their language of choice, and how those choices shape the code that they write.
That's all I have! I hope that you found it interesting.