--- title: "Introduction to Jaatha" author: "Paul Staab" date: "jaatha `r packageVersion('jaatha')`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to Jaatha} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Jaatha is an estimation method that can use computer simulations to produce maximum-likelihood estimates even when the likelihood function can not be evaluated directly. It can be applied whenever it is feasible to conduct many simulations, but works best when the data is at least approximately Poisson distributed. Jaatha was originally designed for demographic inference in evolutionary biology. Please also refer to the vignette ```{r eval=FALSE} vignette("jaatha-evolution", package = jaatha) ``` if you are interested in this application. Before we start, we need to load the package and set a seed to ensure that jaatha's results are reproducible: ```{r load_jaatha} library(jaatha) set.seed(112233) ``` The "real" data --------------- Imagine that we have observed the following data ```{r data_obs} data_obs <- c(2, 8, 0, 6, 1, 3, 2, 2, 0, 7) data_obs ``` and we assume that the data are independent samples from two Poisson distributions with parameters p1 and p2, respectively. The odd positions of the vector are samples from the first distribution, and the even positions are samples taken from the second distribution. In order to run jaatha, we need first formalize this model and convert the data into a format that jaatha can work with. Creating a model ---------------- The usual way to describe the data generating model for jaatha is trough a simulation function. The function takes model parameters as input, and simulates data according to the model. In our example of the mixed samples from Poisson distributions, we can use the function ```{r sim_func} sim_func <- function(x) rpois(10, x) sim_func(c(p1 = 1, p2 = 10)) ``` Simulation functions for jaatha must have exactly one argument, which is a vector of model parameters for which the simulation is conducted. There are no requirements on the return format of a simulation function from jaatha's site, any R objects work with Jaatha. Here, the function returns an vector of ten integers. Jaatha does not use the simulated data directly, but works on a number of summary statistics instead. Summary statistics are transformations of the data that capture an aspect of the data. The transformation is again described by a function that takes the simulation results as input, and returns a fix number of Poisson distributed values. Here, this already applies to the simulation results, and we can just use them: ```{r sum_stats} sum_stats <- list(create_jaatha_stat("id", function(x, opts) x)) ``` Note that we create a list containing our statistic. In our example, we'll use just one statistic, but it is possible to add more than one statistic to this list. Please refer to the documentation for `create_jaatha_stat` for additional information, in particular if you can not generate Poisson distributed statistics from the simulation results easily. Next, we need to define the possible values for the model parameters. These range should cover all reasonable values that the parameters can take: ```{r par_ranges} par_ranges <- matrix(c(0.1, 0.1, 10, 10), 2, 2) rownames(par_ranges) <- c("x", "y") colnames(par_ranges) <- c("min", "max") par_ranges ``` This three components -- a simulation function, parameter ranges and a list of summary statistics -- are required to describe an probabilistic framework within witch jaatha can fit parameters. Since we have the pieces together now, we can use the `create_jaatha_model` function to combine them into a formal model that we can pass to the `jaatha` function later: ```{r create_model} jaatha_model <- create_jaatha_model(sim_func, par_ranges, sum_stats) ``` The function performs a test simulation to ensure that the components fit together. Again, the documentation for `create_jaatha_model` provides additional details. Preparing the Data ------------------ Use the `create_jaatha_data` function to prepare the observed data for being used in Jaatha. The function expected the data to be in the format that is also returned by simulation function. In our example, `sim_func` returns a numeric vector of length `10`, and `data_obs` already has the same format. So we can import it ```{r create_data} jaatha_data <- create_jaatha_data(data_obs, jaatha_model) ``` Running Jaatha -------------- Now that we have prepared model and data, we can use the `jaatha` function to estimate parameters: ```{r execute_jaatha} estimates <- jaatha(jaatha_model, jaatha_data, sim = 50, repetitions = 2, verbose = FALSE) ``` Here, were are conducting `50` simulations in each step of the optimization procedure and repeat the complete optimization two times from different starting positions. For real applications, higher values are recommended. In this simple toy example, the above values work quite well: ```{r print_estimates} estimates ```