Man or boy test

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 76.23.187.206 (talk) at 05:56, 5 November 2008 (→‎Perl). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The man or boy test was proposed by computer scientist Donald Knuth as a means of evaluating implementations of the ALGOL 60 programming language. The aim of the test was to distinguish compilers that correctly implemented "recursion and non-local references" from those that did not.

I have written the following simple routine, which may separate the "man-compilers" from the "boy-compilers" - Donald Knuth[1].

Knuth's example

begin
  real procedure A (k, x1, x2, x3, x4, x5);
  value k; integer k;
  begin
    real procedure B;
    begin k:= k - 1;
          B:= A := A (k, B, x1, x2, x3, x4);
    end;
    if k <= 0 then A:= x4 + x5 else B;
  end;
  outreal (A (10, 1, -1, -1, 1, 0));
end;

This creates a tree of B call frames that refer to each other and to the containing A call frames, each of which has its own copy of k that changes every time the associated B is called. Trying to work it through on paper is probably fruitless, but the correct answer is −67, despite the fact that in the original paper Knuth conjectured it to be −121. The survey paper by Charles H. Lindsey mentioned in the references contains a table for different starting values. Even modern machines quickly run out of stack space for larger values of k (the following values come from the D language version below, the last term requires a stack of more than 1.6 GB):

k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A 1 0 -2 0 1 0 1 -1 -10 -30 -67 -138 -291 -642 -1446 -3250 -7244 -16065 -35601 -78985 -175416
k 21 22 23 24 25
A -389695 -865609 -1922362 -4268854 -9479595

Explanation

There are three Algol features used in this program that can be nontrivial to reproduce in other languages:

  1. Nested function definitions: Since B is being defined in the local context of A, the body of B has access to symbols that are local to A — most notably k which it modifies, but also x1, x2, x3, x4, and x5. This is straightforward in the Algol descendant Pascal, but not possible in the other major Algol descendant C (although C's address-of operator & makes it quite possible to pass around pointers to local variables in arbitrary functions, so this can be worked around).
  2. Function references: The B in the recursive call A(k,B,x1,x2,x3,x4) is not a call to B, but a reference to B, which will be called only when it appears as x4 or x5 in the statement A:=x4+x5. This is conversely straightforward in C, but not possible in many dialects of Pascal (although the standard supports function parameters). When the set of functions that may be referenced is known beforehand (in this program it is only B), this can be worked around.
  3. Constant/function dualism: The x1 through x5 parameters of A may be numeric constants or references to the function B — the x4+x5 expression must be prepared to handle both cases as if the formal parameters x4 and x5 had been replaced by the corresponding actual parameter (call by name). This is probably more of a problem in statically typed languages than in dynamically typed languages, but the standard work-around is to reinterpret the constants 1, 0, and −1 in the main call to A as functions without arguments that return these values.

These things are however not what the test is about; they're merely prerequisites for the test to at all be meaningful. What the test is about is whether the different references to B resolve to the correct instance of B — one which has access to the same A-local symbols as the B which created the reference. A "boy" compiler might for example instead compile the program so that B always accesses the topmost A call frame.

Other languages

Charles H. Lindsey implemented the algorithm in ALGOL 68, and - as call by name is not necessary - the same algorithm can be implemented in many languages including Pascal and PL/I[2]. Here are implementations in C (which requires simulating closures), D, Haskell, Lisp, Python, Smalltalk[3], Ruby.

ALGOL 68

BEGIN
 PROC a = (REAL in k, PROC REAL xl, x2, x3, x4, x5) REAL:
 BEGIN
   REAL k := in k;
   PROC b = REAL:
   BEGIN k := k - 1;
         a(k, b, xl, x2, x3, x4)
   END;
   IF k<=0 THEN x4 + x5 ELSE b FI
 END;
 printf(($+2d.8d$, a(10, REAL:1, REAL:-1, REAL:-1, REAL:1, REAL:0)))
END

C

Even if closures are not available in a language, their effect can be simulated.

#include "stdio.h"

typedef struct S {
    int k;
    int(*f)(struct S* ref);
    struct S *x1, *x2, *x3, *x4;
} *PS;

int B(PS ref);

int A(int k, PS x1, PS x2, PS x3, PS x4, PS x5) {
    struct S x;

    x.k = k;
    x.f = B;
    x.x1 = x1;
    x.x2 = x2;
    x.x3 = x3;
    x.x4 = x4;

    return x.k <= 0 ? x4->f(x4) + x5->f(x5) : B(&x);
}

int B(PS ref) {
    (ref->k)--;
    return A(ref->k, ref, ref->x1, ref->x2, ref->x3, ref->x4);
}

int x(PS ref) {
    return ref->k;
}

int main() {
    struct S p;
    struct S m;
    struct S z;

    p.k = 1;
    p.f = x;
    m.k = -1;
    m.f = x;
    z.k = 0;
    z.f = x;

    printf("value is %d\n ", A(10, &p, &m, &m, &p, &z));
    return 0;
}

C++

Uses "shared_ptr" smart pointers from Boost / TR1 to automatically deallocate objects. Since we have an object which needs to pass a pointer to itself to another function, we need to use "enable_shared_from_this".

#include <iostream>
#include <tr1/memory>
using std::tr1::shared_ptr;
using std::tr1::enable_shared_from_this;

struct Arg {
  virtual int run() = 0;
  virtual ~Arg() { };
};

int A(int, shared_ptr<Arg>, shared_ptr<Arg>, shared_ptr<Arg>,
      shared_ptr<Arg>, shared_ptr<Arg>);

class B : public Arg, public enable_shared_from_this<B> {
private:
  int k;
  const shared_ptr<Arg> x1, x2, x3, x4;

public:
  B(int _k, shared_ptr<Arg> _x1, shared_ptr<Arg> _x2, shared_ptr<Arg> _x3,
    shared_ptr<Arg> _x4)
    : k(_k), x1(_x1), x2(_x2), x3(_x3), x4(_x4) { }
  int run() {
    return A(--k, shared_from_this(), x1, x2, x3, x4);
  }
};

class Const : public Arg {
private:
  const int x;
public:
  Const(int _x) : x(_x) { }
  int run () { return x; }
};

int A(int k, shared_ptr<Arg> x1, shared_ptr<Arg> x2, shared_ptr<Arg> x3,
      shared_ptr<Arg> x4, shared_ptr<Arg> x5) {
  if (k <= 0)
    return x4->run() + x5->run();
  else {
    shared_ptr<Arg> b(new B(k, x1, x2, x3, x4));
    return b->run();
  }
}

int main() {
  std::cout << A(10, shared_ptr<Arg>(new Const(1)),
		 shared_ptr<Arg>(new Const(-1)),
		 shared_ptr<Arg>(new Const(-1)),
		 shared_ptr<Arg>(new Const(1)),
		 shared_ptr<Arg>(new Const(0))) << std::endl;
  return 0;
}

C# 2.0

C# 2.0 supports anonymous methods which are used in the implementation below:

using System;

delegate T Func<T>();

class ManOrBoy
{
    static void Main()
    {
        Console.WriteLine(A(10, C(1), C(-1), C(-1), C(1), C(0)));
    }

    static Func<int> C(int i)
    {
        return delegate { return i; };
    }

    static int A(int k, Func<int> x1, Func<int> x2, Func<int> x3, Func<int> x4, Func<int> x5)
    {
        Func<int> b = null;
        b = delegate { k--; return A(k, b, x1, x2, x3, x4); };
        return k <= 0 ? x4() + x5() : b();
    }
}

C# 3.0

C# 3.0 supports lambda expressions which are used in the implementation below:

using System;

class ManOrBoy
{
    static void Main()
    {
        Console.WriteLine(A(10, () => 1, () => -1, () => -1, () => 1, () => 0));
    }

    static int A(int k, Func<int> x1, Func<int> x2, Func<int> x3, Func<int> x4, Func<int> x5)
    {
        Func<int> b = null;
        b = () => { k--; return A(k, b, x1, x2, x3, x4); };
        return k <= 0 ? x4() + x5() : b();
    }
}

D

D supports lazy argument evaluation, closures, and lambdas:

import std.stdio: writefln;

int a(int k, lazy int x1, lazy int x2, lazy int x3, lazy int x4, lazy int x5) {
    int delegate() b;
    b = {
        k--;
        return a(k, b(), x1, x2, x3, x4);
    };
    return k <= 0 ? x4 + x5 : b();
}

void main() {
    const int N = 10;
    writefln(a(N, 1, -1, -1, 1, 0));
}

To compute the result with quite larger N you may have to increase the stack size, this is for the DMD compiler (the value is in bytes):

-L/STACK:100000000

Erlang

Erlang variables cannot be changed after binding, so k is decremented by sending a message to a process.

kloop(K) ->
    receive
        {decr,Pid} -> Pid ! K-1, kloop(K-1);
        _          -> ok
    end.


a(K, X1, X2, X3, X4, X5) ->
    Kproc = spawn(fun() -> kloop(K) end),
    B = fun (B) -> 
                Kproc ! {decr, self()},
                receive Kdecr ->
                        a(Kdecr, fun() -> B(B) end, X1, X2, X3, X4)
                end
        end,
    if
        K =< 0  -> Kproc ! X4() + X5();
        true    -> Kproc ! B(B)
    end.


manorboy(N) ->                
     a(N, fun() -> 1 end, fun() -> -1 end, fun() -> -1 end, fun() -> 1 end, fun() -> 0 end ).

Haskell

Haskell is a pure language, so the impure effects of updating k must be wrapped in a state monad.

import Control.Monad.ST
import Data.STRef

type S s = ST s Integer

a :: Integer -> S s -> S s -> S s -> S s -> S s -> S s
a k x1 x2 x3 x4 x5 = a' where
  a' | k <= 0    = do { x4' <- x4; x5' <- x5; return (x4' + x5') }
     | otherwise = do { kr <- newSTRef k; b kr }
  b kr = do
    k <- readSTRef kr
    let k' = k - 1
    writeSTRef kr k'
    a k' (b kr) x1 x2 x3 x4

run k =
  runST (a k (return 1) (return (-1)) (return (-1)) (return 1) (return 0))

Java

We use anonymous classes to represent closures.

public class ManOrBoy {
    interface Arg {
        int run();
    }

    static int A(final int k, final Arg x1, final Arg x2, final Arg x3, final Arg x4, final Arg x5) {
        if (k <= 0) return x4.run() + x5.run();
        else {
            Arg b = new Arg() {
                int m = k;
                public int run() {
                    return A(--m, this, x1, x2, x3, x4);
                }
            };
            return b.run();
        }
    }

    public static void main(String[] args) {
        System.out.println(A(10, new Arg() { public int run() { return 1; } },
                             new Arg() { public int run() { return -1; } },
                             new Arg() { public int run() { return -1; } },
                             new Arg() { public int run() { return 1; } },
                             new Arg() { public int run() { return 0; } }));
    }
}

JavaScript

function A( k, x1, x2, x3, x4, x5 ) {
  function B() {
    k -= 1;
    return A( k, B, x1, x2, x3, x4 );
  }
  if( k <= 0 ) return x4() + x5();
  return B();
}
	
function lambda( value ) {
  return function() { return value };
}
	
alert( A( 10, lambda(1), lambda(-1), lambda(-1), lambda(1), lambda(0) ) );

Lisp

The clarity of the Common Lisp version of the program suffers because Common Lisp differentiates between a functional and a non-functional value for its variables (that's why function and funcall are needed). On the other hand, in Scheme, it being simpler and its invention more strongly influenced by Algol60, the solution is straight-forward.

Common Lisp

(defun a (k x1 x2 x3 x4 x5)
  (labels ((b ()
             (setq k (- k 1))
             (a k (function b) x1 x2 x3 x4)))
    (if (<= k 0)
      (+ (funcall x4) (funcall x5))
      (b))))

(a 10 (lambda () 1) (lambda () -1) (lambda () -1) (lambda () 1) (lambda () 0))

Scheme

(define (A k x1 x2 x3 x4 x5)
  (define (B)
    (set! k (- k 1))
    (A k B x1 x2 x3 x4))
  (if (<= k 0)
      (+ (x4) (x5))
      (B)))

(A 10 (lambda () 1) (lambda () -1) (lambda () -1) (lambda () 1) (lambda () 0))

Mathematica

This Mathematica code was derived from the Ruby example appearing below.

$RecursionLimit = 1665; (* anything less fails for k0 = 10 *)

a[k0_, x1_, x2_, x3_, x4_, x5_] :=
  Module[{k, b},
    k = k0;
    b = (k--; a[k, b, x1, x2, x3, x4]) &;
    If[k <= 0, x4[] + x5[], b[]]]

a[10, 1 &, -1 &, -1 &, 1 &, 0 &] (* => -67 *)

OCaml

OCaml variables are not mutable, so "k" is wrapped in a mutable object, which we access through a reference type called "ref".

let rec a k x1 x2 x3 x4 x5 =
  if k <= 0 then
    x4 () + x5 ()
  else
    let m = ref k in
    let rec b () =
      decr m;
      a !m b x1 x2 x3 x4
    in
    b ()

let () =
  Printf.printf "%d\n" (a 10 (fun () -> 1) (fun () -> -1) (fun () -> -1) (fun () -> 1) (fun () -> 0))

PL/I

morb: proc options (main) reorder;
 dcl sysprint file;

 put skip list(a((10), lambda1, lambda2, lambda3, lambda4, lambda5));

 a: proc(k, x1, x2, x3, x4, x5) returns(fixed bin (31)) recursive;
   dcl k                    fixed bin (31);
   dcl (x1, x2, x3, x4, x5) entry returns(fixed bin (31));

   b: proc returns(fixed bin(31)) recursive;
     k = k - 1;
     return(a((k), b, x1, x2, x3, x4));
   end b;

   if k <= 0 then
     return(x4 + x5);
   else
     return(b);
 end a;

 lambda1: proc returns(fixed bin (31)); return(1);  end lambda1;
 lambda2: proc returns(fixed bin (31)); return(-1); end lambda2;
 lambda3: proc returns(fixed bin (31)); return(-1); end lambda3;
 lambda4: proc returns(fixed bin (31)); return(1);  end lambda4;
 lambda5: proc returns(fixed bin (31)); return(0);  end lambda5;
end morb;

Python

import sys

def A(lk, x1, x2, x3, x4, x5):
    k = [lk]
    def B():
        k[0] -= 1
        return A(k[0], B, x1, x2, x3, x4)
    return x4() + x5() if k[0] <= 0 else B()

sys.setrecursionlimit(1027)
print A(10, lambda:1, lambda:-1, lambda:-1, lambda:1, lambda:0)

Perl

sub A {
    my ($k, $x1, $x2, $x3, $x4, $x5) = @_;
    my $B;
    $B = sub { A(--$k, $B, $x1, $x2, $x3, $x4) };
    return &$x4 + &$x5 if $k <= 0;
    return &$B;
}

print A(10, sub{1}, sub {-1}, sub{-1}, sub{1}, sub{0} ) . "\n";

Ruby

Note: the lambda call can be replaced with Proc.new and still work.

def a(k, x1, x2, x3, x4, x5)
  b = lambda { k -= 1; a(k, b, x1, x2, x3, x4) }
  k <= 0 ? x4[] + x5[] : b[]
end

puts a(10, lambda {1}, lambda {-1}, lambda {-1}, lambda {1}, lambda {0})

Scala

def A(in_k: Int, x1: =>Int, x2: =>Int, x3: =>Int, x4: =>Int, x5: =>Int): Int = {
    var k = in_k
    def B: Int = {
        k = k-1
        A(k, B, x1, x2, x3, x4)
    }
    if (k<=0) x4+x5 else B
}
println(A(10, 1, -1, -1, 1, 0))

Smalltalk

 Number>>x1: x1 x2: x2 x3: x3 x4: x4 x5: x5
    | b k |
    k := self.
    b := [ k := k - 1. k x1: b x2: x1 x3: x2 x4: x3 x5: x4 ].
    ^k <= 0 ifTrue: [ x4 value + x5 value ] ifFalse: b
 
 10 x1: [1] x2: [-1] x3: [-1] x4: [1] x5: [0]

Tcl

There are two nontrivial features in the "man or boy" test. One is that the parameters x1 though x5 are in general going to be function calls that don't get evaluated until their values are needed for the addition in procedure A, which means that these in Tcl are going to be scripts, and therefore it is necessary to introduce a helper procedure C that returns a constant value. The other is that procedure B needs to refer to variables in the local context of its "parent" instance of procedure A. This is precisely what the upvar core command does, but the absolute target level needs to be embedded into the script that performs the delayed call to procedure B (upvar is more often used with relative levels).

proc A {k x1 x2 x3 x4 x5} {
    expr {$k<=0 ? [eval $x4]+[eval $x5] : [B \#[info level]]}
}
proc B {level} {
    upvar $level k k x1 x1 x2 x2 x3 x3 x4 x4
    incr k -1
    A $k [info level 0] $x1 $x2 $x3 $x4
}
proc C {val} {return $val}
interp recursionlimit {} 1157
A 10 {C 1} {C -1} {C -1} {C 1} {C 0}

The [info level 0] here is a sort of "self" idiom; it returns the command (with arguments) that called the current procedure.

Since the values of x1 through x4 are never modified, it is also possible to embed these as parameters of B, thereby slightly purifying the program:

proc AP {k x1 x2 x3 x4 x5} {expr {$k<=0 ? [eval $x4]+[eval $x5] : [BP \#[info level] $x1 $x2 $x3 $x4]}}
proc BP {level x1 x2 x3 x4} {AP [uplevel $level {incr k -1}] [info level 0] $x1 $x2 $x3 $x4}
proc C {val} {return $val}
interp recursionlimit {} 1157
AP 10 {C 1} {C -1} {C -1} {C 1} {C 0}

See also

References

  1. ^ Donald Knuth (1964). "Man or boy?". {{cite web}}: Unknown parameter |accessmonthday= ignored (help); Unknown parameter |accessyear= ignored (|access-date= suggested) (help); Unknown parameter |month= ignored (help)
  2. ^ Charles H. Lindsey (1988). "Block Structure and Environments". {{cite web}}: Unknown parameter |accessmonthday= ignored (help); Unknown parameter |accessyear= ignored (|access-date= suggested) (help); Unknown parameter |month= ignored (help)
  3. ^ ""Man or boy" test".

External links