313 lines
13 KiB
Markdown
313 lines
13 KiB
Markdown
---
|
|
title: "How Many Values Does a Boolean Have?"
|
|
date: 2020-08-21T23:05:55-07:00
|
|
tags: ["Java", "Haskell", "C", "C++"]
|
|
favorite: true
|
|
---
|
|
|
|
A friend of mine recently had an interview for a software
|
|
engineering position. They later recounted to me the content
|
|
of the technical questions that they had been asked. Some had
|
|
been pretty standard:
|
|
|
|
* __"What's the difference between concurrency
|
|
and parallelism?"__ -- a reasonable question given that Go was
|
|
the company's language of choice.
|
|
* __"What's the difference between a method and a function?"__ --
|
|
a little more strange, in my opinion, since the difference
|
|
is of little _practical_ use.
|
|
|
|
But then, they recounted a rather interesting question:
|
|
|
|
> How many values does a bool have?
|
|
|
|
Innocuous at first, isn't it? Probably a bit simpler, in fact,
|
|
than the questions about methods and functions, concurrency
|
|
and parallelism. It's plausible that a candidate
|
|
has not done much concurrent or parallel programming in their
|
|
life, or that they came from a language in which functions
|
|
were rare and methods were ubiquitous. It's not plausible,
|
|
on the other hand, that a candidate applying to a software
|
|
engineering position has not encountered booleans.
|
|
|
|
If you're genuinely unsure about the answer to the question,
|
|
I think there's no reason for me to mess with you. The
|
|
simple answer to the question -- as far as I know -- is that a boolean
|
|
has two values. They are `true` and `false` in Java, or `True` and `False`
|
|
in Haskell, and `1` and `0` in C. A boolean value is either true or false.
|
|
|
|
So, what's there to think about? There are a few things, _ackshually_.
|
|
Let's explore them, starting from the theoretical perspective.
|
|
|
|
### Types, Values, and Expressions
|
|
Boolean, or `bool`, is a type. Broadly speaking, a type
|
|
is a property of _something_ that defines what the _something_
|
|
means and what you can do with it. That _something_ can be
|
|
several things; for our purposes, it can either be an
|
|
_expression_ in a programming language (like those in the form `fact(n)`)
|
|
or a value in that same programming language (like `5`).
|
|
|
|
Dealing with values is rather simple. Most languages have finite numbers,
|
|
usually with \(2^{32}\) values, which have type `int`,
|
|
`i32`, or something in a similar vein. Most languages also have
|
|
strings, of which there are as many as you have memory to contain,
|
|
and which have the type `string`, `String`, or occasionally
|
|
the more confusing `char*`. Most languages also have booleans,
|
|
as we discussed above.
|
|
|
|
The deal with expressions is a more interesting. Presumably
|
|
expressions evaluate to values, and the type of an expression
|
|
is then the type of values it can yield. Consider the following
|
|
snippet in C++:
|
|
|
|
```C
|
|
int square(int x) {
|
|
return x * x;
|
|
}
|
|
```
|
|
|
|
Here, the expression `x` is known to have type `int` from
|
|
the type signature provided by the user. Multiplication
|
|
of integers yields an integer, and so the type of `x*x` is also
|
|
of type `int`. Since `square(x)` returns `x*x`, it is also
|
|
of type `int`. So far, so good.
|
|
|
|
Okay, how about this:
|
|
|
|
```C++
|
|
int meaningOfLife() {
|
|
return meaningOfLife();
|
|
}
|
|
```
|
|
|
|
No, wait, doesn't say "stack overflow" just yet. That's no fun.
|
|
And anyway, this is technically a tail call, so maybe our
|
|
C++ compiler can avoid growing the stack. And indeed,
|
|
flicking on the `-O2` flag in this [compiler explorer example](https://godbolt.org/z/9cv4nY),
|
|
we can see that no stack growth is necessary: it's just
|
|
an infinite loop. But `meaningOfLife` will never return a value. One could say
|
|
this computation _diverges_.
|
|
|
|
Well, if it diverges, just throw the expression out of the window! That's
|
|
no `int`! We only want _real_ `int`s!
|
|
|
|
And here, we can do that. But what about the following:
|
|
|
|
```C++
|
|
inf_int collatz(inf_int x) {
|
|
if(x == 1) return 1;
|
|
if(x % 2 == 0) return collatz(x/2);
|
|
return collatz(x * 3 + 1);
|
|
}
|
|
```
|
|
|
|
Notice that I've used the fictitious type
|
|
`inf_int` to represent integers that can hold
|
|
arbitrarily large integer values, not just the 32-bit ones.
|
|
That is important for this example, and I'll explain why shortly.
|
|
|
|
The code in the example is a simulation of the process described
|
|
in the [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture).
|
|
Given an input number `x`, if the number is even, it's divided in half,
|
|
and the process continues with the halved number. If, on the other
|
|
hand, the number is odd, it's multiplied by 3, 1 is added to it,
|
|
and the process continues with _that_ number. The only way for the
|
|
process to terminate is for the computation to reach the value 1.
|
|
|
|
Why does this matter? Because as of right now, __nobody knows__
|
|
whether or not the process terminates for all possible input numbers.
|
|
We have a strong hunch that it does; we've checked a __lot__
|
|
of numbers and found that the process terminates for them.
|
|
This is why 32-bit integers are not truly sufficient for this example;
|
|
we know empirically that the function will terminate for them.
|
|
|
|
But why does _this_ matter? Well, it matters because we don't know
|
|
whether or not this function will diverge, and thus, we can't
|
|
'throw it out of the window' like we wanted to with `meaningOfLife`!
|
|
In general, it's _impossible to tell_ whether or not a program will
|
|
terminate; that is the [halting problem](https://en.wikipedia.org/wiki/Halting_problem).
|
|
So, what do we do?
|
|
|
|
It turns out to be convenient -- formally -- to treat the result of a diverging computation
|
|
as its own value. This value is usually called 'bottom', and written as \(\bot\).
|
|
Since in most programming languages, you can write a nonterminating expression or
|
|
function of any type, this 'bottom' is included in _all_ types. So in fact, the
|
|
possible values of `unsigned int` are \(\bot, 0, 1, 2, ...\) and so on.
|
|
As you may have by now guessed, the same is true for a boolean: we have \(\bot\), `true`, and `false`.
|
|
|
|
### Haskell and Bottom
|
|
You may be thinking:
|
|
|
|
> Now he's done it; he's gone off the deep end with all that programming language
|
|
theory. Tell me, Daniel, where the heck have you ever encountered \(\bot\) in
|
|
code? This question was for a software engineering interview, after all!
|
|
|
|
You're right; I haven't _specifically_ seen the symbol \(\bot\) in my time
|
|
programming. But I have frequently used an equivalent notation for the same idea:
|
|
`undefined`. In fact, here's a possible definition of `undefined` in Haskell:
|
|
|
|
```
|
|
undefined = undefined
|
|
```
|
|
|
|
Just like `meaningOfLife`, this is a divergent computation! What's more is that
|
|
the type of this computation is, in Haskell, `a`. More explicitly -- and retreating
|
|
to more mathematical notation -- we can write this type as: \(\forall \alpha . \alpha\).
|
|
That is, for any type \(\alpha\), `undefined` has that type! This means
|
|
`undefined` can take on _any_ type, and so, we can write:
|
|
|
|
```Haskell
|
|
myTrue :: Bool
|
|
myTrue = True
|
|
|
|
myFalse :: Bool
|
|
myFalse = False
|
|
|
|
myBool :: Bool
|
|
myBool = undefined
|
|
```
|
|
|
|
In Haskell, this is quite useful. For instance, if one's in the middle
|
|
of writing a complicated function, and wants to check their work so far,
|
|
they can put 'undefined' for the part of the function they haven't written.
|
|
They can then compile their program; the typechecker will find any mistakes
|
|
they've made so far, but, since the type of `undefined` can be _anything_,
|
|
that part of the program will be accepted without second thought.
|
|
|
|
The language Idris extends this practice with the idea of typed holes,
|
|
where you can leave fragments of your program unwritten, and ask the
|
|
compiler what kind of _thing_ you need to write to fill that hole.
|
|
|
|
### Java and `null`
|
|
Now you may be thinking:
|
|
|
|
> This whole deal with Haskell's `undefined` is beside the point; it doesn't
|
|
really count as a value, since it's just a nonterminating
|
|
expression. What you're doing is a kind of academic autofellatio.
|
|
|
|
Alright, I can accept this criticism. Perhaps just calling a nonterminating
|
|
function a value _is_ far-fetched (even though in [denotational semantics](https://en.wikipedia.org/wiki/Denotational_semantics)
|
|
we _do_ extend types with \(\bot\)). But denotational semantics are not
|
|
the only place where types are implicitly extend with an extra value;
|
|
let's look at Java.
|
|
|
|
In Java, we have `null`. At the
|
|
core language level, any function or method that accepts a class can also take `null`;
|
|
if `null` is not to that function or method's liking, it has to
|
|
explicitly check for it using `if(x == null)`.
|
|
|
|
This `null` value does not at first interact with booleans.
|
|
After all, Java's booleans are not classes. Unlike classes, which you have
|
|
to allocate using `new`, you can just throw around `true` and `false` as you see
|
|
fit. Also unlike classes, you simply can't assign `null` to a boolean value.
|
|
|
|
The trouble is, the parts of Java dealing with _generics_, which allow you to write
|
|
polymorphic functions, can't handle 'primitives' like `bool`. If you want to have an `ArrayList`
|
|
of something, that something _must_ be a class.
|
|
But what if you really _do_ want an `ArrayList` of booleans? Java solves this problem by introducing
|
|
'boxed' booleans: they're primitives wrapped in a class, called `Boolean`. This class
|
|
can then be used for generics.
|
|
|
|
But see, this is where `null` has snuck in again. By allowing `Boolean` to be a class
|
|
(thereby granting it access to generics), we've also given it the ability to be null.
|
|
This example is made especially compelling because Java supports something
|
|
they call [autoboxing](https://docs.oracle.com/javase/tutorial/java/data/autoboxing.html):
|
|
you can directly assign a primitive to a variable of the corresponding boxed type.
|
|
Consider the example:
|
|
|
|
```Java
|
|
Boolean myTrue = true;
|
|
Boolean myFalse = false;
|
|
Boolean myBool = null;
|
|
```
|
|
|
|
Beautiful, isn't it? Better yet, unlike Haskell, where you can't _really_
|
|
check if your `Bool` is `undefined` (because you can't tell whether
|
|
a non-terminating computation is as such), you can very easily
|
|
check if your `Boolean` is `true`, `false`, or `null`:
|
|
|
|
```Java
|
|
assert myTrue != myFalse;
|
|
assert myFalse != myBool;
|
|
assert myTrue != myBool;
|
|
```
|
|
|
|
We're okay to use `!=` here, instead of `equals`, because it so happens
|
|
each boxed instance of a `boolean` value
|
|
[refers to the same `Boolean` object](https://stackoverflow.com/questions/28636738/equality-of-boxed-boolean).
|
|
In fact, this means that a `Boolean` variable can have __exactly__ 3 values!
|
|
|
|
### C and Integers
|
|
Oh the luxury of having a type representing booleans in your language!
|
|
It's almost overly indulgent compared to the spartan minimalism of C.
|
|
In C, boolean conditions are represented as numbers. You can perhaps get
|
|
away with throwing around `char` or `short int`, but even then,
|
|
these types allow far more values than two!
|
|
|
|
```C
|
|
unsigned char test = 255;
|
|
while(test) test -= 1;
|
|
```
|
|
|
|
This loop will run 255 times, thereby demonstrating
|
|
that C has at least 255 values that can be used
|
|
to represent the boolean `true`.
|
|
|
|
There are other languages
|
|
with this notion of 'truthy' and 'falsey' values, in which
|
|
something not exactly `true` or `false` can be used as a condition. However,
|
|
some of them differ from C in that they also extend this idea
|
|
to equality. In JavaScript:
|
|
|
|
```JavaScript
|
|
console.assert(true == 1)
|
|
console.assert(false == 0)
|
|
```
|
|
|
|
Then, there are still exactly two distinct boolean values
|
|
modulo `==`. No such luck in C, though! We have 256 values that fit in `unsigned char`,
|
|
all of which are also distinct modulo `==`. Our boolean
|
|
variable can contain all of these values. And there is no
|
|
respite to be found with `enum`s, either. We could try define:
|
|
|
|
```C
|
|
enum bool { TRUE, FALSE };
|
|
```
|
|
|
|
Unfortunately, all this does is define `bool` to be a numeric
|
|
type that can hold at least 2 distinct values, and define
|
|
numeric constants `TRUE` and `FALSE`. So in fact, you can
|
|
_still_ write the following code:
|
|
|
|
```C
|
|
enum bool b1 = TRUE;
|
|
enum bool b2 = FALSE;
|
|
enum bool b3 = 15;
|
|
```
|
|
|
|
And so, no matter how hard you try, your 'boolean'
|
|
variable can have many, many values!
|
|
|
|
### Conclusion
|
|
I think that 'how many values does a boolean have' is a strange
|
|
question. Its purpose can be one of two things:
|
|
|
|
* The interviewer expected a long-form response such as this one.
|
|
This is a weird expectation for a software engineering candidate -
|
|
how does knowing about \(\bot\), `undefined`, or `null` help in
|
|
creating software, especially if this information is irrelevant
|
|
to the company's language of choice?
|
|
* The interviewer expected the simple answer. In that case,
|
|
my previous observation applies: what software engineering
|
|
candidate has _not_ seen a boolean in their time programming?
|
|
Surely candidates are better screened before they are offered
|
|
an interview?
|
|
|
|
Despite the question's weirdness, I think that the resulting
|
|
investigation of the matter -- outside of the interview setting --
|
|
is useful, and perhaps, in a way, enlightening. It may help
|
|
one understand the design choices made in _their_ language of choice,
|
|
and how those choices shape the code that they write.
|
|
|
|
That's all I have! I hope that you found it interesting.
|