Draft: Mask Statistical Analysis

\documentclass[11pt]{article}
%Gummi|065|=)
\title{\textbf{Mask Statistical Analysis}}
\author{s243a\\
        No One Else}
\date{}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}

\usepackage{hyperref}

\usepackage[outputdir=/tmp]{minted}

\usepackage[most]{tcolorbox}

\newtcblisting{commandshell}{colback=black,colupper=white,colframe=yellow!75!black,
listing only,listing options={language=sh},
every listing line={\textcolor{red}{\small\ttfamily\bfseries DeathStar \$> }}}

\begin{document}

\maketitle

\section{First Section}

The Guardian gave the following absolute risks for contracting COVID-19:

$13\% <1m$ (No physical distancing)

$3\% >1m$ (Physical Distancing) \newline


The guardian sighted following short literature review: \newline


\emph{C Raina MacIntyre, \href{https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)31183-1/fulltext}{Physical distancing, face masks, and eye protection for prevention of COVID-19} - (\href{https://www.thelancet.com/action/showPdf?pii=S0140-6736\%2820\%2931183-1}{pdf}), \href{https://www.pearltrees.com/s243a/distancing-protection/id33469256}{pt} }  \newline


but MacIntyre cited the following paper:\newline

\emph{Derek K Chu, \href{https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)31142-9/fulltext}{Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis} (\href{https://www.thelancet.com/action/showPdf?pii=S0140-6736\%2820\%2931142-9}{pdf}) (\href{https://www.pearltrees.com/s243a/distancing-transmission/id32761118}{pt})} \newline

The exact numbers from Chu were:


$12.8\% <1m$ (No physical distancing)

$2.6\% >1m$ (Physical Distancing) \newline


**Note that in the paper the actual numbers were written as 12·8\% and 2·6\% respectively.

The paper notes that duration wasn't taken into account but for most studies the duration was at least 1h. This is problematic if one is basing risks based on The Independent Action Hypothesis (IAH). Also I doubt that someone was at a fixed distance for over an hour so I wonder how distance values were assigned. Chu's paper was a meta-analysis and some papers that Chu assigned 0 distance to may be better though of as close contacts. For example in reference\#46 of Chu's paper we have the following statement:

\begin{quote}"On inspection of the living quarters, the field team found that most of the windows in the bedrooms were closed and sealed and that ventilation within the bedrooms was poor. Initial open-ended interviews with some residents informed the study team that residents shared the same kitchen and dining room within the villa but did not typically eat together or share food at mealtimes. There were no designated social spaces; however, residents reported gathering around laptops to watch movies together."\end{quote}

\url{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6759265/} \newline

I suppose that we can expect a bit of hand-waving in these types of observational studies because of ethical issues that would occur by trying to get this data via experimental means. Therefore for now let's accept the distance assignments given in "Figure 2" of Chu's paper which is actually tabular data that gives:

1. distance measures for mitigating against spread,

2. both the sample size and the number of people infected at the shorter distance (i.e. the baseline)

3. both the sample size and the number of people infected at the longer distance distance (i.e. intervention aka mitigating measures)

4. the relative risk (RR)


The first study in this table seems to have "0" given for the "further distance" which doesn't make sense to me. For each study the risk ratio is transparent. For instance on the second line of the table for the study "Arwady et al (2016) [35]"  we have:

Events, furtherdistance (n/N) = 1/10

Events, shorterdistance (n/N) = 8/20


and the relative risk RR is (1/10)/(8/20)=0.25


what is less apparent is how a relative risk is calculated for a a combination of studies. There appears to be a weight to each study called "random weight" and presumably using these weights one can get a combined value for the relative risk (RR). Two ways that one might try to use these weights are as follows:

MATLAB Code
\definecolor{bg}{rgb}{0.95,0.95,0.95}
\begin{minted}[linenos=true,bgcolor=bg]{matlab}
% Risk Ratio's for the MERS physical distancing studies;
RR=[0.05 0.25 0.72 0.59];
% Corresponding random weights for "RR"
W=[5.5 2.6 3.2 1.6];
%If the RR was just summed using the weights
RR1=W*RR'/12.9
%If the weights were for "Weighted Least Means squre"
RR2=sqrt(W*RR'/(W*W'))
\end{minted}

Output
\begin{commandshell}
RR1 =  0.32349
RR2 =  0.28944
\end{commandshell}

Neither of these methods of calculating the combined risk ratios (RR) for the MERS study produces the value of 0.23 from Chu's paper but the method akin to weighted least squares is the closest. Weighted least squares is a minimum variance estimator but it is a biased estimator. Weighted least squares also has the problem of the weights not being known a priori. In weighted least means squares the weights are: \newline

1. the reciprocal of the variance for uncorrelated events or

2. the inverse of the covariance matrix  \newline


For each study the paper calculates a confidence interval for the risk ratio. Wikipedia gives the following formula.

\begin{equation}
CI_{1 - \alpha}(\log(RR)) = \log(RR)\pm SE(\log(RR))\times z_\alpha
\end{equation}

where:
\begin{equation}
SE(\log(RR)) = \sqrt{\frac{IN}{IE(IE + IN)} + \frac{CN}{CE(CE + CN)}}
\end{equation}

$IE$ = Events in the Intervention Goup

$IN$ = Non events in the intervention group

$CE$ = Events in the Control Group

$CN$ = Non Events in the control Group \newline

$Z_{alpha}$ is the standard score: and given by:

\begin{equation}
z = {x- \bar{x} \over S}
\end{equation}

where:

$\bar{x}$ is the mean of the sample.

S is the standard deviation of the sample. \newline

For a normal distribution the z score of a 95\% confidence interval is +/-2:In general the form of the CI (confidence interval) given by Wikipedia looks correct to me but the expression for SE (standard error) didn't seem to produce the confidence interval's in Chu's paper. The formula given by Wikipedia is supposedly derived via the delta method and the following source was given:The delta method provides the following rule for variance:

\begin{equation}
Var(G(X)) = G'(\mu) Var(X)G'(\mu)^T
\end{equation}
\url{www.stata.com/support/faqs/statistics/delta-method/}

and confidence interval for a transformed variable:
\begin{equation}
\left[g^{-1}(g(B) - z*se(g(B))), g^{-1}(g(B) + z*se(g(B)))\right]
\end{equation}

\url{https://www.stata.com/support/faqs/statistics/delta-rule/}\newline

Where, $z$ is the "standard score" and denotes how many standard deviations are required to get a given confidence interval.\newline

**Make this an image

\url{https://en.wikipedia.org/wiki/File:The_Normal_Distribution.svg}

The motivation behind the transformation is that the log(RR) is supposed to be closer to a normal distribution that the risk ratio. The mean and variance of the risk ratio (RR) is given by respectively:
\begin{equation}
\int {(n_1/N_1) \over (n_2/N_2)}B(n_1,p_1)B(n_2,p_2) dn_1 dn_2
\end{equation}
\begin{equation}
\int \left({(n_1/N_1) \over (n_2/N_2)}-E\left[{(n_1/N_1) \over (n2/N2)} \right] \right)^2B(n_1,p_1)B(n_2,p_2) dn_1 dn_2
\end{equation}

Where:

$B(n1,p1)$ and $B(n2,p2)$

are the Binomial distributions for the baseline case and the intervention case respectively.\newline

Which has the following mean and variance:
\begin{equation}
\mu=np
\end{equation}

\begin{equation}
\sigma^2=npq
\end{equation}

We can consider a variable substitution so that:$n_1=exp(x_1)$ and $n_2=exp(x_2)$ and $x3=ln(x_1-x_2)$

\end{document}