Add draft of boolean values post.

2020-08-20 21:19:47 -07:00
parent 8368283a3e
commit 6f0667bb28
1 changed files with 220 additions and 0 deletions
--- a/content/blog/boolean_values.md
+++ b/content/blog/boolean_values.md
@@ -0,0 +1,220 @@
 ---
 title: "How Many Values Does a Boolean Have?"
 date: 2020-08-20T18:37:50-07:00
 draft: ["Java", "Haskell"]
 ---
 A friend of mine recently had an interview for a software
 engineering position. They later recounted to me the content
 of the techical questions that they had been asked. Some had
 been pretty standard:
 * __"What's the difference between concurrency
 and parallelism?"__ -- a reasonable question given that Go was
 the company's language of choice.
 * __"What's the difference between a method and a function?"__ --
 a little more strange, in my opinion, since the difference
 is of little _practical_ use.
 But then, they recounted a rather interesting question:
 > How many values does a bool have?
 Innocous at first, isn't it? Probably a bit simpler, in fact,
 than the questions about methods and functions, concurrency
 and parallelism. It's plausible that a programmer
 has not done much concurrent or parallel programming in their
 life, or that they came from a language in which functions
 were rare and methods were ubiquitous. It's not plausible,
 on the other hand, that a candidate applying to a software
 engineering position has not encountered booleans.
 If you're genuinely unsure about the answer to the question,
 I think there's no reason for me to mess with you. The
 simple answer to the question -- as far as I know -- is that a boolean
 has two values. They are `true` and `false` in Java, or `True` and `False`
 in Haskell, and `1` and `0` in C. A boolean value is either true or false.
 So, what's there to think about? There are a few things, _ackshually_. 
 Let's explore them, starting from the theoretical perspective.
 ### What's a Type, Anyway?
 Boolean, or `bool`, is a type. Broadly speaking, a type
 is a property of _something_ that defines what the _something_
 means and what you can do with it. That _something_ can be
 several things; for our purposes, it can either be an
 _expression_ in a programming language (in the form of `fact(n)`)
 or a value in that same programming langauge (like `5`).
 Dealing with values is rather simple. Most languages have finite numbers,
 usually with \\(2^{32}\\) values, which have type `int`,
 `i32`, or something in a similar vein. Most languages also have
 strings, of which there are as many as you have memory to contain,
 and which have the type `string`, `String`, or occasianlly
 the more confusing `char*`. Most languages also have booleans,
 as we discussed above.
 The deal with expressions is a more interesting. Presumably
 expressions evaluate to values, and the type of an expression
 is then the type of values it can yield. Consider the following
 snippet in C++:
 ```C
 int square(int x) {
    return x * x;
 }
 ```
 Here, the expression `x` is known to have type `int` from
 the type signature provided by the user. Multiplication
 of integers yields an integer, and so the type of `x*x` is also
 of type `int`. Since `square(x)` returns `x*x`, it is also 
 of type `int`. So far, so good.
 Okay, how about this:
 ```C++
 int meaningOfLife() {
    return meaningOfLife();
 }
 ```
 No, wait, doesn't say "stack overflow" just yet. That's no fun.
 And anyway, this is technically a tail call, so maybe our
 C++ compiler can avoid growing the stack And indeed,
 flicking on the `-O2` flag in this [compiler explorer example](https://godbolt.org/z/9cv4nY),
 we can see that no stack growth is necessary: it's just
 an infinite loop. But `meaningOfLife` will never return a value. One could say
 this computation _diverges_.
 Well, if it diverges, just throw the expression out of the window! That's
 no `int`! We only want _real_ `int`s!
 And here, we can do that. But what about the following:
 ```C++
 inf_int collatz(inf_int x) {
    if(x == 1) return 1;
    if(x % 2 == 0) return collatz(x/2);
    return collatz(x * 3 + 1);
 }
 ```
 Notice that I've used the fictitious type
 `inf_int` to represent integers that can hold
 arbitrarily large integer values, not just the 32-bit ones.
 That is important for this example, and I'll explain why shortly.
 The code in the example is a simulation of the process described
 in the [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture).
 Given an input number `x`, if the number is even, it's divided in half,
 and the process continues with the halved number. If, on the other
 hand, the number is odd, it's multiplied by 3, 1 is added to it,
 and the process continues with _that_ number. The only way for the
 process to terminate is for the computation to reach the value 1.
 Why does this matter? Because as of right now, __nobody knows__
 whether or not the process terminates for all possible input numbers.
 We have a strong hunch that it does; we've checked a __lot__
 of numbers and found that the process terminates for them.
 This is why 32-bit integers are not truly sufficient for this example;
 we know empirically that the function will terminate for them.
 But why does _this_ matter? Well, it matters because we don't know
 whether or not this function will diverge, and thus, we can't
 'throw it out of the window' like we wanted to with `meaningOfLife`!
 In general, it's _impossible to tell_ whether or not a program will
 terminate; that is the [halting prorblem](https://en.wikipedia.org/wiki/Halting_problem).
 So, what do we do?
 It turns out to be convenient -- formally -- to treat the result of a diverging computation
 as its own value. This value is usually called 'bottom', and written as \\(\\bot\\).
 Since in most programming languages, you can write a nonterminating expression or
 function of any type, this 'bottom' is included in _all_ types. So in fact, the
 set of possible values for `unsigned int`: \\(\\bot, 0, 1, 2, ...\\) and so on.
 As you may have by now guessed, the same is true for a boolean: we have \\(\\bot\\), `true`, and `false`.
 ### Haskell and Bottom
 You may be thinking:
 > Now he's done it; he's gone off the deep end with all that programming language
 theory. Tell me, Daniel, where the heck have you ever encountered \\(\\bot\\) in
 code? This question was for a software engineering interview, after all!
 You're right; I haven't _specifically_ seen the symbol \\(\\bot\\) in my time
 programming. But I have frequently used an equivalent notation for the same idea:
 `undefined`. In fact, here's a possible definition of `undefined` in Haskell:
 ```
 undefined = undefined
 ```
 Just like `meaningOfLife`, this is a divergent computation! What's more is that
 the type of this computation is, in Haskell, `a`. More explicitly -- and retreating
 to more mathematical notation -- we can write this type as: \\(\\forall \\alpha . \\alpha\\).
 That is, for any type \\(\\alpha\\), `undefined` has that type! This means
 `undefined` can take on _any_ type, and so, we can write:
 ```Haskell
 myTrue :: Bool
 myTrue = True
 myFalse :: Bool
 myFalse = False
 myBool :: Bool
 myBool = undefined
 ```
 In Haskell, this is quite useful. For instance, if one's in the middle
 of writing a complicated function, and wants to check their work so far,
 they can put 'undefined' for the part of the function they haven't written. 
 They can then compile their program; the typechecker will find any mistakes
 they've made so far, but, since the type of `undefined` can be _anything_,
 that part of the program will be accepted without second thought.
 The language `Idris` extends this practice with the idea of typed holes,
 where you can leave fragments of your program unwritten, and ask the
 compiler what kind of _thing_ you need to write to fill that hole.
 ### Java and `null`
 Now you may be thinking:
 > This whole deal with Haskell's `undefined` is beside the point; it doesn't
 really count as a value, since it's just a nonterminating
 expression. What you're doing is a kind of academic autofellatio.
 Alright, I can accept this criticism. Perhaps just calling a nonterminating
 function a value _is_ far-fetched (even though denotational semantics
 _do_ extend types with \\(\\bot\\)). But denotational semantics is not
 the only place where types are implcitily extend with an extra value;
 let's look at Java.
 In Java, we have `null`. At the
 core language level, any function or method that accepts a class can also take `null`;
 if `null` is not to that function or method's liking, it has to 
 explicitly check for it using `if(x == null)`. 
 Java's booleans are not, at first glance, classes. Unlike classes, which you have
 to allocate using `new`, you can just throw around `true` and `false` as you see
 fit. Also unlike classes, you can't assign `null` to a boolean value.
 The trouble is, the _generics_ part of Java, which allows you to write
 polymorphic functions, can't handle 'primitives' like `bool`. If you want to have an `ArrayList`
 of something, that something _must_ be a class.
 But what if you really _do_ want an `ArrayList` of booleans? Java solves this problem by introducing
 'boxed' booleans: they're primitives wrapped in a class, called `Boolean`. This class
 can then be used for generics.
 But see, this is where `null` has snuck in again. By allowing `Boolean` to be a class
 (thereby granting it access to generics), we've also given it the ability to be null.
 This example is made especially compelling because Java supports something
 they call [autoboxing](https://docs.oracle.com/javase/tutorial/java/data/autoboxing.html):
 you can directly assign a primitive to a variable of the corresponding boxed type. 
 Consider the example:
 ```Java
 Boolean myTrue = true;
 Boolean myFalse = false;
 Boolean myBool = null;
 ```