---
title: "A Gentle Introduction to Modeling with gips"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{A Gentle Introduction to Modeling with gips}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk[["set"]](
  collapse = TRUE,
  comment = "#>",
  cache = FALSE
)

old_options <- options(scipen = 999) # turn off scientific notation
```


# The problem

Quite often, we have too little data to perform valid inferences. Consider the 
situation with multivariate Gaussian distribution, where we have few observations 
compared to the number of variables. For example, that's the case for
graphical models used in biology or medicine. In such a setting, the usual way
of finding the covariance matrix (the maximum likelihood method) isn't 
statistically applicable. What now?


# Invariance by permutation

Sometimes, the interchange of variables in the vector does not change its distribution.
In the multivariate Gaussian case, it would mean that they have the same variances and covariances with other respective variables. For instance, in the following covariance matrix, variables X1 and X3 are interchangeable, meaning that vectors (X1, X2, X3) and (X3, X2, X1) have the same distribution.

```{r symvariant_matrix, echo=FALSE}
X <- matrix(c(
  1, 2, 3,
  2, 4, 2,
  3, 2, 1
), byrow = TRUE, ncol = 3, dimnames = list(
  c("X1", "X2", "X3"),
  c("X1", "X2", "X3")
))
heatmap(X, Rowv = NA, Colv = NA, main = "", symm = TRUE)
```

Now, we can state this interchangeability property in terms of permutations. In our case, the distribution of (X1, X2, X3) is **invariant by permutation** ($1\mapsto3$, $3\mapsto1$), or in cyclic form $(1,3)(2)$. This is equivalent to saying that swapping the first with the third row and then swapping the first and third columns of the covariance matrix results in the same matrix. Then we say that this covariance matrix is **invariant by permutation**.

Of course, in the samples collected in the real world, no perfect equalities will be observed. 
Still, if the respective values in the (poorly) estimated covariance matrix were close, adopting
a particular assumption about invariance by permutation would be a reasonable step.


# Package `gips`

We propose creating a set of constraints on the covariance matrix to use the maximum likelihood method. The constraint we consider is - none other than - invariance under permutation symmetry. 

This package provides a way to find a *reasonable* permutation to be used as a constraint in covariance matrix estimation. In this case, *reasonable* means maximizing the Bayesian posterior distribution when using a Wishart-like distribution on symmetric, positive definite matrices as a prior. The idea, exact formulas, and algorithm sketch are explored in another vignette that can be accessed by `vignette("Theory", package="gips")` or on its [pkgdown page](https://przechoj.github.io/gips/articles/Theory.html).

For an in-depth analysis of the package performance, capabilities, and comparison with other packages, see the article "Learning permutation symmetries with gips in R" by `gips`' developers Adam Chojecki, Paweł Morgen, and Bartosz Kołodziejek, available on [arXiv:2307.00790](https://arxiv.org/abs/2307.00790).


# Practical example

Let's examine 12 books' thick, height, and breadth data:
```{r change_D_matrix_example0, include=FALSE}
plot_cosmetic_modifications <- function(gg_plot_object) {
  my_col_names <- names(DAAG::oddbooks[, c(1, 2, 3)])

  suppressMessages( # message from ggplot2
    out <- gg_plot_object +
      ggplot2::scale_x_continuous(
        labels = my_col_names,
        breaks = 1:3
      ) +
      ggplot2::scale_y_reverse(
        labels = my_col_names,
        breaks = 1:3
      ) +
      ggplot2::theme(
        title = ggplot2::element_text(face = "bold", size = 18),
        axis.text.y = ggplot2::element_text(face = "bold", size = 17),
        axis.text.x = ggplot2::element_text(face = "bold", size = 17)
      ) +
      ggplot2::scale_fill_gradient2(
        low = "#F0EA3E", mid = "#A41836", high = "#95E956",
        midpoint = 1.239099
      )
  )

  out +
    ggplot2::geom_text(ggplot2::aes(label = round(covariance, 1)),
      fontface = "bold", size = 8
    ) +
    ggplot2::theme(legend.position = "none")
}
```

```{r change_D_matrix_example1}
library(gips)

Z <- DAAG::oddbooks[, c(1, 2, 3)]
```

We suspect books from this dataset were printed with $\sqrt{2}$ aspect ratio as in popular [A-series paper size](https://en.wikipedia.org/wiki/Paper_size#A_series). Therefore, we can use this expert knowledge in the analysis and unify the data for height and width:

```{r change_D_matrix_example1_1}
Z$height <- Z$height / sqrt(2)
```

Now, let's plot the data:

```{r change_D_matrix_example1_2}
number_of_observations <- nrow(Z) # 12
p <- ncol(Z) # 3

S <- cov(Z)
round(S, 1)
g <- gips(S, number_of_observations)
plot_cosmetic_modifications(plot(g, type = "heatmap")) +
  ggplot2::ggtitle("Standard, MLE estimator\nof a covariance matrix")
```

We can see similarities between columns 2 and 3, representing the book’s height and breadth. In particular, the covariance between [1,2] is very similar to [1,3], and the variance of [2] is similar to the variance of [3]. Those are not surprising, given the data interpretation (after the rescaling of height that we did).

Let's see what the `gips` will tell about this data:

```{r change_D_matrix_example2}
g_map <- find_MAP(g,
  optimizer = "brute_force",
  return_probabilities = TRUE, save_all_perms = TRUE
)

g_map
get_probabilities_from_gips(g_map)
```
`find_MAP` found the symmetry represented by permutation (2,3).

```{r change_D_matrix_example4}
plot_cosmetic_modifications(plot(g_map, type = "heatmap"))
round(project_matrix(S, g_map), 1)
```

The result depends on two input parameters, `delta` and `D_matrix`. By default, they are set to `3` and `diag(p) * d`, respectively, where `d = mean(diag(S))`. The method is not scale-invariant, so we recommend running gips for different values of `D_matrix` of the form `D_matrix = d * diag(p)`, where `d` $\in \mathbb{R}^+$). The impact analysis of those can be read in [2] in section *3.2. Hyperparameter’s influence*.


# Theoretic example

<div id="spoiler1" style="display:none">
```{r toy_example_data_making, include = TRUE}
p <- 5
number_of_observations <- 4
mu <- runif(p, -10, 10) # Assume we don't know the mean
sigma_matrix <- matrix(c(
  8.4, 4.1, 1.9, 1.9, 1.9,
  4.1, 3.5, 0.3, 0.3, 0.3,
  1.9, 0.3, 1,   0.8, 0.8,
  1.9, 0.3, 0.8, 1,   0.8,
  1.9, 0.3, 0.8, 0.8, 1
), ncol = p)
# sigma_matrix is a matrix invariant under permutation (3,4,5)
toy_example_data <- withr::with_seed(2022,
  code = MASS::mvrnorm(number_of_observations,
    mu = mu, Sigma = sigma_matrix
  )
)
```
</div>

<button title="Click to show the data preparation" type="button"
   onclick="if(document.getElementById('spoiler1') .style.display=='none')
              {document.getElementById('spoiler1') .style.display=''}
            else{document.getElementById('spoiler1') .style.display='none'}">
  Show/hide data preparation
</button>

```{r toy_example_data_show1}
library(gips)

toy_example_data

dim(toy_example_data)
number_of_observations <- nrow(toy_example_data) # 4
p <- ncol(toy_example_data) # 5

S <- cov(toy_example_data)

sum(eigen(S)$values > 0.00000001)
```
Note that the rank of the `S` matrix is 3, despite the `number_of_observations` being 4. This is because `cov()` estimated the mean on every column to compute `S`.

We want to find reasonable additional assumptions on `S` to make it easier to estimate.

```{r toy_example_data_show2}
g <- gips(S, number_of_observations)

plot(g, type = "heatmap")
```

Looking at the plot, one can see the similarities between columns 3, 4, and 5. They have similar variance and covariance to each other. The 3 and 5 have similar covariance with columns 1 and 2. However, the 4 is also close.

Let's see if `gips` will find the relationship:

```{r toy_example_data_show3}
g_map <- find_MAP(g,
  optimizer = "brute_force",
  return_probabilities = TRUE, save_all_perms = TRUE
)

plot(g_map, type = "heatmap")
```

`gips` decided that $(3,4,5)$ was the most reasonable assumption. Let's see how much better it is:

```{r toy_example_data_show4}
g_map
```
This assumption is over 3 times more believable than making no assumption. Let's examine how reasonable are other possible assumptions:

```{r toy_example_data_show5}
get_probabilities_from_gips(g_map)
```
We see that assumption $(3,4,5)$ is the most likely with a $6.2\%$ posterior probability. 21 possible permutations are more likely than id.

Remember that the `n0` could still be too big for your data. In this example, the assumptions with transpositions (like $(3,5)$) would yield the `n0` $= 5$, which would be insufficient for us to estimate covariance correctly. The assumption $(3,4,5)$ will be just right:

```{r toy_example_data_show6}
summary(g_map)$n0 # n0 = 4 <= 4 = number_of_observations

S_projected <- project_matrix(S, g_map)
S_projected
sum(eigen(S_projected)$values > 0.00000001)
```
Now, the estimated covariance matrix is of full rank (5).


# Further reading

1. To learn more about the available optimizers in `find_MAP()` and how to use those, see `vignette("Optimizers", package="gips")` or its [pkgdown page](https://przechoj.github.io/gips/articles/Optimizers.html).
2. To learn more about the math and theory behind the `gips` package, see `vignette("Theory", package="gips")` or its [pkgdown page](https://przechoj.github.io/gips/articles/Theory.html).
3. For an in-depth analysis of the package performance, capabilities, and comparison with other packages, see the article "Learning permutation symmetries with gips in R" by `gips` developers Adam Chojecki, Paweł Morgen, and Bartosz Kołodziejek, available on [arXiv:2307.00790](https://arxiv.org/abs/2307.00790).

```{r options_back, include = FALSE}
options(old_options) # back to the original options
```