Using Harris Inequality to Bound Intractable Integrals

When modelling the properties of a system, you may find yourself needing to integrate over an impractically large set. For example, $f\colon \{0, 1\}^n \to \{0, 1\}$ may map the status of all nodes in a system to a single “up” or “down” signal, and we aim to compute uptime by integrating over $\{0,1\}^n$ with respect to a measure $\mu\colon \{0, 1\}^n \to [0, 1]$ .

To avoid this, we can find alternatives to $f$ and $\mu$ which allow us to compute uptime with a set of size $n$ rather than a set of size $2^n$ . The new functions $f*\big(m)$ and $\mu*\big(m)$ approximate the probability the system is “up” given $m$ “up” nodes and the probability that $m$ nodes are up.

$\begin{equation} \int_{s \in S} f(s) \ d\mu\big(s) \ \ \to \ \ \int_0^n f*(m) \ d\mu*\big(m) \end{equation}$

Computationally, this should be a massive improvement, but this strategy doesn’t guarantee it’s “easy” to integrate $f*$ with respect to $\mu*$ . I recently encountered such a situation and found that Harris Inequality was a useful tool to get bounds on the property I was looking for. For non-decreasing functions, $f$ and $g$ :

$\begin{equation} \int_{\mathbb{R}} f(x)g(x) \, d\mu(x) \geq \int_{\mathbb{R}} f(x) \, d\mu(x) \int_{\mathbb{R}} g(x) \, d\mu(x) \end{equation}$

Applying this to our setting, we obtain the following useful bounds. In the following, I’m assuming that $\mu*$ is increasing on $[0, cn)$ and decreasing on $(cn, n]$ .

$\begin{equation} \int_{0}^{cn}f*\left(x\right)\ d\mu*\left(x\right) \ \geq \ \frac{1}{cn}\left(\int_{0}^{cn}f*\left(x\right)\ dx\right)\left(\int_{0}^{cn}1\ \cdot\mu*\left(x\right)\ dx\right) \end{equation}$

$\begin{equation} \int_{cn}^{n}f*\left(x\right)\ d\mu*\left(x\right) \ \leq \ \frac{1}{n - cn}\left(\int_{cn}^{n}f*\left(x\right)\ dx\right)\left(\int_{cn}^{n}1\ \cdot\mu*\left(x\right)\ dx\right) \end{equation}$

For illustration’s sake (Desmos), let’s consider a model of uptime that we’ve approximated with the following equations (notice that both are increasing!). Do not worry where they come from, I am pulling them out of thin air. For a $k$ node system, the probability that $x$ nodes are up depends on some constant $c$ . To check the “uptime” conditional on $k/2$ or more nodes being up, we can apply the above result to our uptime function and probability measure.

With all of this said, it’s unlikely that $\int f$ would be significantly less challenging for the computer than $\int fg$ . I don’t really know, maybe one $\big(1)$ dot product is just too many.