Miking DPPL is a framework for developing probabilistic programming languages (PPLs) using Miking. Currently, the framework includes the PPLs CorePPL and RootPPL.
Building and Installing
The main binary for development in Miking DPPL is called
cppl. Currently, this binary is built by running
make in the project root, and installed to
make install (uninstall is also possible through
make uninstall). You must have Miking installed (the
mi command must be globally available) to build
make test executes a set of tests for various components in the Miking DPPL framework.
CorePPL extends MExpr (see Miking) with probabilistic constructs for defining random variables and for likelihood updating. For example, the program
let x = assume (Beta 10.0 5.0) in
observe true (Bernoulli x);
encodes a simple distribution for the bias of a coin, by setting a Beta prior for the probability of observing heads (i.e.,
true) for a single flip of the coin.
Defining random variables in CorePPL is done by providing a probability distribution to the
Currently, there is no generated documentation for available distributions.
Likelihood updating is done through the
weight, the logarithm of the likelihood is updated directly (e.g.,
weight (log 2) multiplies the likelihood with 2).
observe, the likelihood is instead updated with the value of the pmf or pdf for the given distribution at the given observation.
observe true (Bernoulli 0.5) updates the likelihood with a factor of 0.5.
Compiling CorePPL to MExpr
The default option for inferring the distribution encoded by a CorePPL program is to compile it to MExpr (which then compiles to OCaml).
You compile a CorePPL program
cpplprog.mc using the command
cppl -m <method> cpplprog.mc, where
<method> is an inference algorithm prefixed with
mexpr- (run the command
cppl to see the current list of available algorithms).
cppl -m mexpr-importance cpplprog.mc compiles
cpplprog.mc to a binary file
out which you can subsequently run to produce samples from the distribution encoded by
$ ./out 10
The argument is the number of samples.
The first row prints the log of the normalizing constant, and the subsequent rows the samples.
The first column is the sample, and the second its log-weight.
For more help and options, run the
cppl command without any arguments.
Implementing New Inference Algorithms
The CorePPL to MExpr compiler is organized to make it easy to implement new inference algorithms.
Currently, the best way to understand the framework is to look at the source code.
There are also some slides available at
Compiling CorePPL to RootPPL
Another option to run inference over a CorePPL program is to compile it to RootPPL.
This option potentially results in more efficient inference (due to the efficiency of RootPPL), but also has quite a lot of limitations (for example, RootPPL does not support automatic memory management or higher-order functions).
A CorePPL program in a file
cpplprog.mc can be compiled to a RootPPL file
out.cu using the command
cppl -m rootppl-smc cpplprog.mc.
out.cu can then be compiled using, e.g.,
rootppl out.cu --stack_size 10000.
The stack size option is mandatory for compiling RootPPL programs compiled from CorePPL.
More information about RootPPL can be found at RootPPL.
RootPPL is an intermediate language for representing probabilistic models and comes with a framework that performs inference on the GPU in these models. See examples in the folder
rootppl/models. The idea is that high-level Miking probabilistic programming languages should be compiled to this intermediate language.
The instructions below are tested on Ubuntu 18.04 but should work for other Unix-like systems.
Before building RootPPL programs, a C++/CUDA compiler is required. RootPPL works on CPU and Nvidia GPU:s. For the CPU version, a C++ compiler should suffice, e.g. g++. In order to build for GPU, CUDA must be installed. See CUDA installation guides: Linux, Mac.
To run the CPU version in parallel, OpenMP must be installed. OpenMP comes with recent gcc versions. However, OpenMP is not necessary if one does not want to execute programs in parallel on the CPU.
rootppl for the current user, first clone this repository and change directory to the
rootppl folder. Then run:
This will install
$HOME/.local/bin and inference engine sources to
$HOME/.local/src/rootppl. Some systems, e.g. Mac OS, will require manually adding
To compile a model and build an executable:
This will compile the model along with the inference framework for CPU. The inference framework build products are placed under
rootppl command wraps
nvcc, and you may supply options used for
These options will be used when compiling the supplied models, but not when compiling the inference framework.
One limitation is that it is currently only possible to supply source file(s) to
Object/library files are not supported (for this, you must write your own custom build script).
To compile for GPU, add your GPU:s compute capability to the
You can find your GPU:s compute capability in the Wikipedia table.
Here is an example that will compile the airplane example for a GPU with a minimum compute capability of 7.5
--target sm_75 to compile for CPU):
rootppl models/airplane/airplane.cu --target sm_75 -j 5
The optional argument
-j x speeds up the compilation process by spawning
x jobs, allowing for parallel compilation.
The corresponding parallel CPU (OpenMP) example is:
rootppl models/airplane/airplane.cu --target omp -j 5
Alternatively, the c++ compiler can be specified with CXX. This is often required on Mac OS to enable OpenMP, by using g++ instead of the default clang. On Mac OS, g++ can be installed with e.g.
brew install gcc. Then, assuming the version installed was
rootppl models/airplane/airplane.cu --target omp -j 5 --cxx g++-10
The first build will compile the entire inference framework and can take 20 seconds or so when building for GPU. (Run
rootppl with the
-j num_threads to use multiple threads and speed up the build). The inference framework is recompiled whenever options such as
--target are modified between runs of
Any of the above commands should generate an executable named
-o <exec_name> to name it differently). Execute it with
./a.out num_particles. For example:
An example output of this:
Num particles close to target: 98.9%, MinX: 56.2349, MaxX: 91.1572
First is the command that is executed, it executes the executable
a.out with argument
The first row is the normalizing constant. The second row is a print statement within the model.
You can supply a second argument to
a.out, which is the number of sweeps:
./a.out 1000 3
Num particles close to target: 89.7%, MinX: 51.5005, MaxX: 88.2039
Num particles close to target: 90.5%, MinX: 53.0577, MaxX: 90.0298
Num particles close to target: 88.7%, MinX: 48.1773, MaxX: 89.0957
Creating a simple model
Models are divided into fragments to enable pausing the execution within models.
These fragments are functions referred to as basic blocks (
To control the program execution flow, a program counter (
PC) can be modified.
If it remains unchanged in when the basic block returns, resampling will be done, and then
the same block will be executed again. The program counter corresponds to the index of the basic block
to be executed. So, incrementing it means that the next block will be executed after resampling. An
example of this can be seen in the example below. However, if no following blocks are defined,
the inference will terminate as the model program has been executed.
Any interesting model will contain random choices, i.e. sampling from distributions.
Below is a coin flip example which flips a biased coin.
SAMPLE is a macro that can take variable number of arguments.
First is the distribution name, followed by the distribution specific arguments.
Sampling is done differently on the CPU and the GPU,but these differences are hidden in the model with the help
SAMPLE macro. All of these RootPPL keywords are macros similar to
SAMPLE, they provide higher-level
constructs that can be used in the same way regardless of if the model will be executed on the CPU or the GPU.
int coinSample = SAMPLE(bernoulli, 0.6);
In the coin flip example above, there is one problem, however. The sample is only stored in a local
variable and is never used. To store data that remains when the next block is executed, the program
PSTATE) should be used. Before defining the blocks, the model must be
initialized with the macro
INIT_MODEL that takes two arguments. First the type of the program
state (this could be any type, e.g.
int or a structure), then the number of basic blocks in
the program. So, adding it to the above example:
PSTATE = SAMPLE(bernoulli, 0.6);
Now to run this program, only one thing remains to be defined, the main function.
This is done with the
MAIN macro, taking a code block as argument.
The code below shows what we need in order to run the program and perform SMC.
First, the block must be added to the array of blocks to be executed. The order in which
blocks are added with
ADD_BLOCK, defines their order in the array and thus
defines the order of execution together with the program counter. Secondly, the
SMC macro (its parameter is explained below) starts the inference.
Now the model can be compiled and executed! However, the result of the coin flips is
stored, but never used. To aggregate the results of all the particles' samples,
a callback function (
CALLBACK) can be used. The callback will be called after
inference, but before clean up of the particles. This way, the results can be used to
generate desired distributions or values before particles are deleted.
CALLBACK macro, the array of program states is accessed
PSTATES and the number of particles is accessed with the
N (which is hidden within the macro). Also, the log-weights array can be accessed with
WEIGHTS macro. These weights are normalised (so that the exponent of these log-weights sum to 1).
Here is an example callback, called "sampleMean" that calculates and prints the mean
of the samples.
double sum = 0;
for(int i = 0; i < N; i++)
sum += PSTATES[i];
double mean = sum / N;
printf("Sample mean: %f\n", mean);
For this to be used, the
SMC macro call in main must be
This example can be found in
and, being in the
rootppl directory, compiled with:
Then it can be executed with the executable followed by the
number of particles, for example:
An example output is then:
Sample mean: 0.608000
log normalization constant = 0.000000
First we see our callback function's output. Then on the next line, is the logarithm of the normalization constant approximated by the inference. This is simply 0 here, since the model contains no statements that alter the weights of the particles.
Below follows some examples, these are all models that are defined within one single block.
Coin Flip Posterior
In this model, a bias for a coin is sampled from the prior beta distribution. Then we observe that the coin flip is true. This model thus infers the posterior distribution of the bias, conditioned on the observation.
double x = SAMPLE(beta, 2, 2);
OBSERVE(bernoulli, x, true);
PSTATE = x;
Gaussian Mixture Model
This model demonstrates an example of stochastic branching, meaning that different code is executed depending on the outcome of the sample.
x = SAMPLE(normal, -2, 1);
x = SAMPLE(normal, 3, 1);
PSTATE = x;
Geometric Distribution (Recursive)
This model combines stochastic branching with recursion. Basic blocks do not fully support recursion themselves, as they take no custom arguments or return values. Instead, a helper function is used to express the recursive model:
return BBLOCK_CALL(geometricRecursive, p) + 1;
}, int, double p)
PSTATE = BBLOCK_CALL(geometricRecursive, 0.6);
Note that the helper function takes its return value and parameters comma-separated after the function body.
While recursive functions is supported by CUDA, iterative solutions are encouraged. Below is the same model, implemented with a loop instead.
Geometric Distribution (Iterative)
int numFlips = 1;
while(! SAMPLE(bernoulli, 0.6))
PSTATE = numFlips;
More sophisticated models can be found in the phylogenetics directory. These probabilistic models to inference on observed phylogenetic trees. These observed trees can be found in phylogenetics/tree-utils/trees.cuh. The correct phylogenetic models contain a link in the top of the file to the WebPPL source code used as reference when implementing them. These models contain a number of new things, e.g.:
- Multiple basic blocks
- Structures as program states
- Recursive simulations
- Global data used by the model
- Resampling throughout the traversal of the observed tree
In phylogenetics/crbd, the constant-rate birth-death models can be found. The most interesting file here is crbd.cu as it is a correct model used by evolutionary biologists. This model uses a pre-processed DFS traversal path over the observed tree, rather than using a call stack.
Further phylogenetic models
Further phylogenetic models are found in the package
The RootPPL Architecture
The architecture's main goal is to decouple the inference from the probabilistic models, this is the case for probabilistic
All inference engine code is located under
inference directory, the code for inference methods can be found. Currently, only SMC is supported.
smc directory, the
resampling directory contain resampling strategies. Currently only Systematic Resampling is supported. The resampling has
a parallel implementation for the GPU, and a sequential and parallel implementation for the CPU.
distsdirectory contains distributions and scoring functions. Both models and inference can rely on these.
macrosdirectory contains all the macros/constructs that are mostly used by models but also in inference code.
utilsdirectory contains code that RootPPL uses, but can be used by models as well.
modelsdirectory contains example models that use the inference.
Compile with the built-in stack
To compile with the
progStateStack_t program state, the macro
INIT_MODEL_STACK(num_bblocks) should be used instead of
Then the stack size is specified in bytes with
--stack_size num_bytes, e.g.
rootppl my_stack_model.cu --stack_size 10000.
Building a more interesting model
TODO, stuff not yet demonstrated explicitly in README:
- WEIGHT macro
- Multiple BBLOCK models
- Global data accessible by bblocks on GPU
- Program States containing more than a primitive datatype, e.g. structs.