diff --git a/content/blog/boolean_values.md b/content/blog/boolean_values.md new file mode 100644 index 0000000..ef7ef47 --- /dev/null +++ b/content/blog/boolean_values.md @@ -0,0 +1,220 @@ +--- +title: "How Many Values Does a Boolean Have?" +date: 2020-08-20T18:37:50-07:00 +draft: ["Java", "Haskell"] +--- + +A friend of mine recently had an interview for a software +engineering position. They later recounted to me the content +of the techical questions that they had been asked. Some had +been pretty standard: + +* __"What's the difference between concurrency +and parallelism?"__ -- a reasonable question given that Go was +the company's language of choice. +* __"What's the difference between a method and a function?"__ -- +a little more strange, in my opinion, since the difference +is of little _practical_ use. + +But then, they recounted a rather interesting question: + +> How many values does a bool have? + +Innocous at first, isn't it? Probably a bit simpler, in fact, +than the questions about methods and functions, concurrency +and parallelism. It's plausible that a programmer +has not done much concurrent or parallel programming in their +life, or that they came from a language in which functions +were rare and methods were ubiquitous. It's not plausible, +on the other hand, that a candidate applying to a software +engineering position has not encountered booleans. + +If you're genuinely unsure about the answer to the question, +I think there's no reason for me to mess with you. The +simple answer to the question -- as far as I know -- is that a boolean +has two values. They are `true` and `false` in Java, or `True` and `False` +in Haskell, and `1` and `0` in C. A boolean value is either true or false. + +So, what's there to think about? There are a few things, _ackshually_. +Let's explore them, starting from the theoretical perspective. + +### What's a Type, Anyway? +Boolean, or `bool`, is a type. Broadly speaking, a type +is a property of _something_ that defines what the _something_ +means and what you can do with it. That _something_ can be +several things; for our purposes, it can either be an +_expression_ in a programming language (in the form of `fact(n)`) +or a value in that same programming langauge (like `5`). + +Dealing with values is rather simple. Most languages have finite numbers, +usually with \\(2^{32}\\) values, which have type `int`, +`i32`, or something in a similar vein. Most languages also have +strings, of which there are as many as you have memory to contain, +and which have the type `string`, `String`, or occasianlly +the more confusing `char*`. Most languages also have booleans, +as we discussed above. + +The deal with expressions is a more interesting. Presumably +expressions evaluate to values, and the type of an expression +is then the type of values it can yield. Consider the following +snippet in C++: + +```C +int square(int x) { + return x * x; +} +``` + +Here, the expression `x` is known to have type `int` from +the type signature provided by the user. Multiplication +of integers yields an integer, and so the type of `x*x` is also +of type `int`. Since `square(x)` returns `x*x`, it is also +of type `int`. So far, so good. + +Okay, how about this: + +```C++ +int meaningOfLife() { + return meaningOfLife(); +} +``` + +No, wait, doesn't say "stack overflow" just yet. That's no fun. +And anyway, this is technically a tail call, so maybe our +C++ compiler can avoid growing the stack And indeed, +flicking on the `-O2` flag in this [compiler explorer example](https://godbolt.org/z/9cv4nY), +we can see that no stack growth is necessary: it's just +an infinite loop. But `meaningOfLife` will never return a value. One could say +this computation _diverges_. + +Well, if it diverges, just throw the expression out of the window! That's +no `int`! We only want _real_ `int`s! + +And here, we can do that. But what about the following: + +```C++ +inf_int collatz(inf_int x) { + if(x == 1) return 1; + if(x % 2 == 0) return collatz(x/2); + return collatz(x * 3 + 1); +} +``` + +Notice that I've used the fictitious type +`inf_int` to represent integers that can hold +arbitrarily large integer values, not just the 32-bit ones. +That is important for this example, and I'll explain why shortly. + +The code in the example is a simulation of the process described +in the [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture). +Given an input number `x`, if the number is even, it's divided in half, +and the process continues with the halved number. If, on the other +hand, the number is odd, it's multiplied by 3, 1 is added to it, +and the process continues with _that_ number. The only way for the +process to terminate is for the computation to reach the value 1. + +Why does this matter? Because as of right now, __nobody knows__ +whether or not the process terminates for all possible input numbers. +We have a strong hunch that it does; we've checked a __lot__ +of numbers and found that the process terminates for them. +This is why 32-bit integers are not truly sufficient for this example; +we know empirically that the function will terminate for them. + +But why does _this_ matter? Well, it matters because we don't know +whether or not this function will diverge, and thus, we can't +'throw it out of the window' like we wanted to with `meaningOfLife`! +In general, it's _impossible to tell_ whether or not a program will +terminate; that is the [halting prorblem](https://en.wikipedia.org/wiki/Halting_problem). +So, what do we do? + +It turns out to be convenient -- formally -- to treat the result of a diverging computation +as its own value. This value is usually called 'bottom', and written as \\(\\bot\\). +Since in most programming languages, you can write a nonterminating expression or +function of any type, this 'bottom' is included in _all_ types. So in fact, the +set of possible values for `unsigned int`: \\(\\bot, 0, 1, 2, ...\\) and so on. +As you may have by now guessed, the same is true for a boolean: we have \\(\\bot\\), `true`, and `false`. + +### Haskell and Bottom +You may be thinking: + +> Now he's done it; he's gone off the deep end with all that programming language +theory. Tell me, Daniel, where the heck have you ever encountered \\(\\bot\\) in +code? This question was for a software engineering interview, after all! + +You're right; I haven't _specifically_ seen the symbol \\(\\bot\\) in my time +programming. But I have frequently used an equivalent notation for the same idea: +`undefined`. In fact, here's a possible definition of `undefined` in Haskell: + +``` +undefined = undefined +``` + +Just like `meaningOfLife`, this is a divergent computation! What's more is that +the type of this computation is, in Haskell, `a`. More explicitly -- and retreating +to more mathematical notation -- we can write this type as: \\(\\forall \\alpha . \\alpha\\). +That is, for any type \\(\\alpha\\), `undefined` has that type! This means +`undefined` can take on _any_ type, and so, we can write: + +```Haskell +myTrue :: Bool +myTrue = True + +myFalse :: Bool +myFalse = False + +myBool :: Bool +myBool = undefined +``` + +In Haskell, this is quite useful. For instance, if one's in the middle +of writing a complicated function, and wants to check their work so far, +they can put 'undefined' for the part of the function they haven't written. +They can then compile their program; the typechecker will find any mistakes +they've made so far, but, since the type of `undefined` can be _anything_, +that part of the program will be accepted without second thought. + +The language `Idris` extends this practice with the idea of typed holes, +where you can leave fragments of your program unwritten, and ask the +compiler what kind of _thing_ you need to write to fill that hole. + +### Java and `null` +Now you may be thinking: + +> This whole deal with Haskell's `undefined` is beside the point; it doesn't +really count as a value, since it's just a nonterminating +expression. What you're doing is a kind of academic autofellatio. + +Alright, I can accept this criticism. Perhaps just calling a nonterminating +function a value _is_ far-fetched (even though denotational semantics +_do_ extend types with \\(\\bot\\)). But denotational semantics is not +the only place where types are implcitily extend with an extra value; +let's look at Java. + +In Java, we have `null`. At the +core language level, any function or method that accepts a class can also take `null`; +if `null` is not to that function or method's liking, it has to +explicitly check for it using `if(x == null)`. + +Java's booleans are not, at first glance, classes. Unlike classes, which you have +to allocate using `new`, you can just throw around `true` and `false` as you see +fit. Also unlike classes, you can't assign `null` to a boolean value. +The trouble is, the _generics_ part of Java, which allows you to write +polymorphic functions, can't handle 'primitives' like `bool`. If you want to have an `ArrayList` +of something, that something _must_ be a class. + +But what if you really _do_ want an `ArrayList` of booleans? Java solves this problem by introducing +'boxed' booleans: they're primitives wrapped in a class, called `Boolean`. This class +can then be used for generics. + +But see, this is where `null` has snuck in again. By allowing `Boolean` to be a class +(thereby granting it access to generics), we've also given it the ability to be null. +This example is made especially compelling because Java supports something +they call [autoboxing](https://docs.oracle.com/javase/tutorial/java/data/autoboxing.html): +you can directly assign a primitive to a variable of the corresponding boxed type. +Consider the example: + +```Java +Boolean myTrue = true; +Boolean myFalse = false; +Boolean myBool = null; +```