313 lines
13 KiB
Markdown
313 lines
13 KiB
Markdown



title: "How Many Values Does a Boolean Have?"


date: 20200821T23:05:5507:00


tags: ["Java", "Haskell", "C", "C++"]


favorite: true







A friend of mine recently had an interview for a software


engineering position. They later recounted to me the content


of the technical questions that they had been asked. Some had


been pretty standard:




* __"What's the difference between concurrency


and parallelism?"__  a reasonable question given that Go was


the company's language of choice.


* __"What's the difference between a method and a function?"__ 


a little more strange, in my opinion, since the difference


is of little _practical_ use.




But then, they recounted a rather interesting question:




> How many values does a bool have?




Innocuous at first, isn't it? Probably a bit simpler, in fact,


than the questions about methods and functions, concurrency


and parallelism. It's plausible that a candidate


has not done much concurrent or parallel programming in their


life, or that they came from a language in which functions


were rare and methods were ubiquitous. It's not plausible,


on the other hand, that a candidate applying to a software


engineering position has not encountered booleans.




If you're genuinely unsure about the answer to the question,


I think there's no reason for me to mess with you. The


simple answer to the question  as far as I know  is that a boolean


has two values. They are `true` and `false` in Java, or `True` and `False`


in Haskell, and `1` and `0` in C. A boolean value is either true or false.




So, what's there to think about? There are a few things, _ackshually_.


Let's explore them, starting from the theoretical perspective.




### Types, Values, and Expressions


Boolean, or `bool`, is a type. Broadly speaking, a type


is a property of _something_ that defines what the _something_


means and what you can do with it. That _something_ can be


several things; for our purposes, it can either be an


_expression_ in a programming language (like those in the form `fact(n)`)


or a value in that same programming language (like `5`).




Dealing with values is rather simple. Most languages have finite numbers,


usually with \(2^{32}\) values, which have type `int`,


`i32`, or something in a similar vein. Most languages also have


strings, of which there are as many as you have memory to contain,


and which have the type `string`, `String`, or occasionally


the more confusing `char*`. Most languages also have booleans,


as we discussed above.




The deal with expressions is a more interesting. Presumably


expressions evaluate to values, and the type of an expression


is then the type of values it can yield. Consider the following


snippet in C++:




```C


int square(int x) {


return x * x;


}


```




Here, the expression `x` is known to have type `int` from


the type signature provided by the user. Multiplication


of integers yields an integer, and so the type of `x*x` is also


of type `int`. Since `square(x)` returns `x*x`, it is also


of type `int`. So far, so good.




Okay, how about this:




```C++


int meaningOfLife() {


return meaningOfLife();


}


```




No, wait, doesn't say "stack overflow" just yet. That's no fun.


And anyway, this is technically a tail call, so maybe our


C++ compiler can avoid growing the stack. And indeed,


flicking on the `O2` flag in this [compiler explorer example](https://godbolt.org/z/9cv4nY),


we can see that no stack growth is necessary: it's just


an infinite loop. But `meaningOfLife` will never return a value. One could say


this computation _diverges_.




Well, if it diverges, just throw the expression out of the window! That's


no `int`! We only want _real_ `int`s!




And here, we can do that. But what about the following:




```C++


inf_int collatz(inf_int x) {


if(x == 1) return 1;


if(x % 2 == 0) return collatz(x/2);


return collatz(x * 3 + 1);


}


```




Notice that I've used the fictitious type


`inf_int` to represent integers that can hold


arbitrarily large integer values, not just the 32bit ones.


That is important for this example, and I'll explain why shortly.




The code in the example is a simulation of the process described


in the [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture).


Given an input number `x`, if the number is even, it's divided in half,


and the process continues with the halved number. If, on the other


hand, the number is odd, it's multiplied by 3, 1 is added to it,


and the process continues with _that_ number. The only way for the


process to terminate is for the computation to reach the value 1.




Why does this matter? Because as of right now, __nobody knows__


whether or not the process terminates for all possible input numbers.


We have a strong hunch that it does; we've checked a __lot__


of numbers and found that the process terminates for them.


This is why 32bit integers are not truly sufficient for this example;


we know empirically that the function will terminate for them.




But why does _this_ matter? Well, it matters because we don't know


whether or not this function will diverge, and thus, we can't


'throw it out of the window' like we wanted to with `meaningOfLife`!


In general, it's _impossible to tell_ whether or not a program will


terminate; that is the [halting problem](https://en.wikipedia.org/wiki/Halting_problem).


So, what do we do?




It turns out to be convenient  formally  to treat the result of a diverging computation


as its own value. This value is usually called 'bottom', and written as \(\bot\).


Since in most programming languages, you can write a nonterminating expression or


function of any type, this 'bottom' is included in _all_ types. So in fact, the


possible values of `unsigned int` are \(\bot, 0, 1, 2, ...\) and so on.


As you may have by now guessed, the same is true for a boolean: we have \(\bot\), `true`, and `false`.




### Haskell and Bottom


You may be thinking:




> Now he's done it; he's gone off the deep end with all that programming language


theory. Tell me, Daniel, where the heck have you ever encountered \(\bot\) in


code? This question was for a software engineering interview, after all!




You're right; I haven't _specifically_ seen the symbol \(\bot\) in my time


programming. But I have frequently used an equivalent notation for the same idea:


`undefined`. In fact, here's a possible definition of `undefined` in Haskell:




```


undefined = undefined


```




Just like `meaningOfLife`, this is a divergent computation! What's more is that


the type of this computation is, in Haskell, `a`. More explicitly  and retreating


to more mathematical notation  we can write this type as: \(\forall \alpha . \alpha\).


That is, for any type \(\alpha\), `undefined` has that type! This means


`undefined` can take on _any_ type, and so, we can write:




```Haskell


myTrue :: Bool


myTrue = True




myFalse :: Bool


myFalse = False




myBool :: Bool


myBool = undefined


```




In Haskell, this is quite useful. For instance, if one's in the middle


of writing a complicated function, and wants to check their work so far,


they can put 'undefined' for the part of the function they haven't written.


They can then compile their program; the typechecker will find any mistakes


they've made so far, but, since the type of `undefined` can be _anything_,


that part of the program will be accepted without second thought.




The language Idris extends this practice with the idea of typed holes,


where you can leave fragments of your program unwritten, and ask the


compiler what kind of _thing_ you need to write to fill that hole.




### Java and `null`


Now you may be thinking:




> This whole deal with Haskell's `undefined` is beside the point; it doesn't


really count as a value, since it's just a nonterminating


expression. What you're doing is a kind of academic autofellatio.




Alright, I can accept this criticism. Perhaps just calling a nonterminating


function a value _is_ farfetched (even though in [denotational semantics](https://en.wikipedia.org/wiki/Denotational_semantics)


we _do_ extend types with \(\bot\)). But denotational semantics are not


the only place where types are implicitly extend with an extra value;


let's look at Java.




In Java, we have `null`. At the


core language level, any function or method that accepts a class can also take `null`;


if `null` is not to that function or method's liking, it has to


explicitly check for it using `if(x == null)`.




This `null` value does not at first interact with booleans.


After all, Java's booleans are not classes. Unlike classes, which you have


to allocate using `new`, you can just throw around `true` and `false` as you see


fit. Also unlike classes, you simply can't assign `null` to a boolean value.




The trouble is, the parts of Java dealing with _generics_, which allow you to write


polymorphic functions, can't handle 'primitives' like `bool`. If you want to have an `ArrayList`


of something, that something _must_ be a class.


But what if you really _do_ want an `ArrayList` of booleans? Java solves this problem by introducing


'boxed' booleans: they're primitives wrapped in a class, called `Boolean`. This class


can then be used for generics.




But see, this is where `null` has snuck in again. By allowing `Boolean` to be a class


(thereby granting it access to generics), we've also given it the ability to be null.


This example is made especially compelling because Java supports something


they call [autoboxing](https://docs.oracle.com/javase/tutorial/java/data/autoboxing.html):


you can directly assign a primitive to a variable of the corresponding boxed type.


Consider the example:




```Java


Boolean myTrue = true;


Boolean myFalse = false;


Boolean myBool = null;


```




Beautiful, isn't it? Better yet, unlike Haskell, where you can't _really_


check if your `Bool` is `undefined` (because you can't tell whether


a nonterminating computation is as such), you can very easily


check if your `Boolean` is `true`, `false`, or `null`:




```Java


assert myTrue != myFalse;


assert myFalse != myBool;


assert myTrue != myBool;


```




We're okay to use `!=` here, instead of `equals`, because it so happens


each boxed instance of a `boolean` value


[refers to the same `Boolean` object](https://stackoverflow.com/questions/28636738/equalityofboxedboolean).


In fact, this means that a `Boolean` variable can have __exactly__ 3 values!




### C and Integers


Oh the luxury of having a type representing booleans in your language!


It's almost overly indulgent compared to the spartan minimalism of C.


In C, boolean conditions are represented as numbers. You can perhaps get


away with throwing around `char` or `short int`, but even then,


these types allow far more values than two!




```C


unsigned char test = 255;


while(test) test = 1;


```




This loop will run 255 times, thereby demonstrating


that C has at least 255 values that can be used


to represent the boolean `true`.




There are other languages


with this notion of 'truthy' and 'falsey' values, in which


something not exactly `true` or `false` can be used as a condition. However,


some of them differ from C in that they also extend this idea


to equality. In JavaScript:




```JavaScript


console.assert(true == 1)


console.assert(false == 0)


```




Then, there are still exactly two distinct boolean values


modulo `==`. No such luck in C, though! We have 256 values that fit in `unsigned char`,


all of which are also distinct modulo `==`. Our boolean


variable can contain all of these values. And there is no


respite to be found with `enum`s, either. We could try define:




```C


enum bool { TRUE, FALSE };


```




Unfortunately, all this does is define `bool` to be a numeric


type that can hold at least 2 distinct values, and define


numeric constants `TRUE` and `FALSE`. So in fact, you can


_still_ write the following code:




```C


enum bool b1 = TRUE;


enum bool b2 = FALSE;


enum bool b3 = 15;


```




And so, no matter how hard you try, your 'boolean'


variable can have many, many values!




### Conclusion


I think that 'how many values does a boolean have' is a strange


question. Its purpose can be one of two things:




* The interviewer expected a longform response such as this one.


This is a weird expectation for a software engineering candidate 


how does knowing about \(\bot\), `undefined`, or `null` help in


creating software, especially if this information is irrelevant


to the company's language of choice?


* The interviewer expected the simple answer. In that case,


my previous observation applies: what software engineering


candidate has _not_ seen a boolean in their time programming?


Surely candidates are better screened before they are offered


an interview?




Despite the question's weirdness, I think that the resulting


investigation of the matter  outside of the interview setting 


is useful, and perhaps, in a way, enlightening. It may help


one understand the design choices made in _their_ language of choice,


and how those choices shape the code that they write.




That's all I have! I hope that you found it interesting.
