Scala For Quants: Compositional Models Versus Bayesian Networks

I was pleased to see that Radim Jirousek published a foundational paper bringing together and tidying up results on compositional models, an alternative formulation to Bayesian networks (at least for perfect sequences).

I'll get around to tidying up the code presentation (and code) below but it seems to me that most of the theorems could be quickly checked using ScalaCheck. In the meantime I'm posting this so that someone can tell me how to nicely present scala code in blogger, or suggest a better way of dealing with functions defined on an infinite domain - which seemed to be a nice way to implement the probabilistic models that get composed. So far I have implemented Jirousek's left and right composition, though as a fine point, the composition should really use a terniary operator to resolve 0*0/0=0 as per Jirousek's convention.

The output of the test is:

Compare Table 3. on page 630
x1   x2   x3         PiNu(x)     NuPi(x)
false-false-false Some(0.25)      Some(0.125)
false-false-true Some(0.25)      Some(0.125)
false-true-false Some(0.0)      None
false-true-true Some(0.0)      None
true-false-false Some(0.25)      Some(0.125)
true-false-true Some(0.25)      Some(0.125)
true-true-false Some(0.0)      None
true-true-true Some(0.0)      None

object Compositional {

type FunctionOfList[T1] = List[T1] => Option[Double]
def convertFunctionOfListToFunctionInf[T1](f: FunctionOfList[T1], support: Set[Integer]): (Stream[T1] => Option[Double]) = {
    (xs: Stream[T1]) =>
      {
        val variablesThatMatter = support.toList.map(i => xs(i))
        val fx = f(variablesThatMatter)
        fx
      }
}

class FunctionInf[T1](val f: Stream[T1] => Option[Double])(val support: Set[Int]) {
    /* T1 FunctionInf is T1 real valued function with an infinite number of arguments (represented as T1 stream)
     that only depends on finitely many arguments (called the support, in a slight abuse of language)
     */

    def apply(x: Stream[T1]): Option[Double] = this.f(x)
    type doubleOperator = (Double, Double) => Double
    type optionOperator = (Option[Double], Option[Double]) => Option[Double]

    // Need to define an extended field of reals/undefined ...
    def divide(a: Option[Double], b: Option[Double]): Option[Double] = (a, b) match {
      case (Some(0.0), Some(0.0)) => None // May want to modify this to None
      case (Some(x: Double), Some(0.0)) => None
      case (Some(x: Double), Some(y: Double)) => Some(x / y)
      case _ => None
    }
    def add(a: Option[Double], b: Option[Double]): Option[Double] = (a, b) match {
      case (Some(x: Double), Some(y: Double)) => Some(x + y)
      case _ => None
    }
    def multiply(a: Option[Double], b: Option[Double]): Option[Double] = (a, b) match {
      case (Some(x: Double), Some(y: Double)) => Some(x * y)
      case _ => None
    }
    def subtract(a: Option[Double], b: Option[Double]): Option[Double] = (a, b) match {
      case (Some(x: Double), Some(y: Double)) => Some(x - y)
      case _ => None
    }
    // ... so as to define binary operations on FunctionInf
    def opply(that: FunctionInf[T1], op: optionOperator): FunctionInf[T1] = {
      val supportUnion = this.support.union(that.support)
      val f = (x: Stream[T1]) => op(this.apply(x), that.apply(x))
      new FunctionInf[T1](f)(supportUnion)
    }
    def +(that: FunctionInf[T1]) = opply(that, add)
    def -(that: FunctionInf[T1]) = opply(that, subtract)
    def *(that: FunctionInf[T1]) = opply(that, multiply)
    def /(that: FunctionInf[T1]) = opply(that, divide)

    // Margins
    def project(J: Set[Int])(implicit values: List[T1]): FunctionInf[T1] = {
       // J,K is the same notation as page 626 of Jirousek: J is a subset of K
      val K = this.support
      if (J.isEmpty)
        new FunctionInf((x:Stream[T1])=>Some(1.0))(Set()) // We define projection onto empty set this way, as per page 636 Jirousek
      else {
        val M = (K.diff(J)).toList // M is the set of variables we integrate out
        val marginal = (xs: Stream[T1]) => {
          val ss = subspace(xs, M, values)
          val fx: List[Option[Double]] = ss.map(x => this.apply(x)) // Evaluate f at all the points in the subspace
          val theMarginalAtXs:Option[Double] = fx.foldLeft(Some(0.0): Option[Double])((c, d) => add(c, d))
          theMarginalAtXs
        }
        new FunctionInf(marginal)(K)
      }
    }

    // Right composition
    def |>(that: FunctionInf[T1])(implicit values: List[T1]): FunctionInf[T1] = {
      val supportIntersection = this.support & that.support
      val denominator = that.project(supportIntersection)(values)
      val numerator = this * that
      val theRightComposition = numerator / denominator
      theRightComposition
    }

    // Left composition
    def <|(that: FunctionInf[T1])(implicit values: List[T1]): FunctionInf[T1] = {
      val supportIntersection = this.support & that.support
      val denominator = this.project(supportIntersection)(values)
      val numerator = this * that
      val theRightComposition = numerator / denominator
      theRightComposition
    }
}

// Helper: converts a "sparse" stream representation (i.e. T1 list of coordinates we care about into one of many commensurate streams)
def sparseToStream[T1](l: List[(Int, T1)]): Stream[T1] = {
    if (l.isEmpty)
      Stream.empty[T1]
    else {
      val nextPos = l(1)._1
      val nextVal = l(1)._2
      val hd = List(1 to nextPos).map(i => nextVal).toStream
      hd #::: sparseToStream(l.tail)
    }
}

// Helper: All the perturbations of a point when coordinates in a set M are allowed to spin over all possible values ...
    def subspace[T1](xs: Stream[T1], M: List[Int], values: List[T1]): List[Stream[T1]] = {
      if (M.isEmpty)
        List(xs)
      else {
        val subsub = subspace[T1](xs, M.tail, values)
        val listing = for (b <- values; x <- subsub) yield ((x.take(M.head) :+ b) #::: x.drop(M.head + 1)) // replace the m'th coordinate
        //println("first guy in list is "+listing(0)(0)+listing(0)(1)+listing(0)(2))
        listing
      }
    }

}

object CompositionTest extends App {

// Define Pi as page page 629 of Jirousek - ostensibly depends on 1st and 2nd coordinates
def pi(x:Stream[Boolean]):Option[Double] = {x match {
    case (a:Boolean) #:: false #:: (rest:Stream[Boolean]) => Some(0.5)
    case _ => Some(0)
    }
}
val Pi = new Compositional.FunctionInf[Boolean](pi)(Set(0,1))

// Define Kappa as page page 629 of Jirousek - ostensibly depends on 2nd and 3rd coordinates
def kappa(x:Stream[Boolean]):Option[Double] = {x match {
    case (a:Boolean) #:: true #:: (rest:Stream[Boolean]) => Some(0.5)
    case _ => Some(0)
    }
}
val Kappa = new Compositional.FunctionInf[Boolean](kappa)(Set(1,2))

   // Define nu - ostensibly depends on 1st and 2nd coordinates
val Nu = new Compositional.FunctionInf[Boolean]((x:Stream[Boolean])=>Some(0.25))(Set(1,2))

implicit val booleanValues: List[Boolean] = List(false, true)

// Point-wise operations
val PiTimesNu = Pi * Nu
val PiOnKappa = Pi / Kappa

// Projection on to X1 and X2
val Pi_x1 = Pi.project(Set(0))
val Pi_x2 = Pi.project(Set(1))
val K_x2 = Kappa.project(Set(1))
val K_x3 = Kappa.project(Set(2))

// Composition
val PiNu = Pi |> Nu
val NuPi = Nu |> Pi

val tf = List(false, true)
val X3: List[Stream[Boolean]] = for (b0 <- tf; b1 <- tf; b2 <- tf) yield (b0 #:: b1 #:: b2 #:: Stream.empty[Boolean])

// Check composition
println("Compare Table 3. on page 630")
println("x1   x2   x3         PiNu(x)     NuPi(x)")
for (x <- X3) {
    println(x(0) + "-" + x(1) + "-" + x(2) +" "+ PiNu(x)+"      " + NuPi(x))
}
}

3 comments:

UnknownMay 31, 2012 at 2:51 PM
Here's something you might find helpful for your code formatting http://onesviewfromabove.blogspot.co.uk/2012/05/test-of-scala-code-formatting-on.html .

I've used your code as a test. Of course, if you'd rather I remove your code from my entry then let me know.

Fascinating blog, BTW.
UnknownJune 4, 2012 at 6:40 AM
Hi,I am just starting learning Scala.

Reading your code for a while, OMG, your code is so cool. I find is's very impressive of your way to have created the subspace function.

But, I also think that it seems like that your def subspace function can only list out the one dimension quotient subspace,
which is he set of the points having one coordinate listed in the M list and having that coordinate valued at some fixed value.

I find that paper also do a test for the multi-dimentional subspace marginal distribution. I am interested to have a try in scala.

Anyway, that paper is cool, too. I am missing my days attending some Ergodic theory course and seminar....
UnknownJune 4, 2012 at 6:48 PM
Excuse me, what I mean in my last reply is:

For the n-dimensional space(x1,x2,....xn), your subspace function can list any n-1 dimension subspace (any,any,...,xj,any,...) that xj takes a specific value and j is in the set M. But I am not sure how you deal with the n-2 dimension subspace like (any,any,...xj,...xk,...any,any) that xj,and xk take specific values.

BTW, your using 'foldleft' function to do the summation(Integration) is so cool!

I am still reading Jirousek's paper and trying to understand why do you say you want to 'define a function on a infinite domain'....

Wednesday, January 4, 2012

Compositional Models Versus Bayesian Networks

3 comments: