SHARE TWEET Draft5: Notes on the Pareto_distribution.tex

\documentclass[11pt]{article}
%Gummi|065|=)
\title{\textbf{Draft: Notes on the Pareto distribution}}
\author{John Creighton\\
        Nobody else}
\date{}
\usepackage{amsmath}
\usepackage[pdftex]{graphicx}
\usepackage[utf8]{inputenc}
\usepackage{hyperref}
\begin{document}

\maketitle

\section{Introduction}

The Pareto distribution is a well known model for wealth and income distribution and sometimes used as a model for variations in productivity. However, for this to be an accurate model of productivity in the tail of the distribution (AKA survival function) for income must be proportional to productivity. This is a highly contentious claim given that the differences in incomes vastly exceed what could reasonably attributed to individual productivity.

\section{The Survival Function and The Deffinition of The Pareto Distribution}

The principle characteristic of a Pareto distribution is that it has a survival function

\begin{equation}
S(t)=P(\{T>t\})=\int _{t}^{\infty }f(u)\,du=1-F(t).
\end{equation}

which is asymptotically a power law distribution.

\begin{equation}
{\displaystyle {\overline {F}}(x)=\Pr(X>x)={\begin{cases}\left({\frac {x_{\mathrm {m} }}{x}}\right)^{\alpha }&x\geq x_{\mathrm {m} },\\1&x<x_{\mathrm {m} },\end{cases}}}
\end{equation}

where $X_m$ is the minimum value of x and acts as a scale parmater and $\alpha$ is known as the "Pareto Index". $\alpha$ is a "shape paramater
\footnote{From \href{https://en.wikipedia.org/w/index.php?title=Shape_parameter&oldid=952709345}{Wikipedia} \cite{WikipediaShapeParameter}: "Specifically, a shape parameter is any parameter of a probability distribution that is neither a location parameter nor a scale parameter (nor a function of either or both of these only, such as a rate parameter). Such a parameter must affect the shape of a distribution rather than simply shifting it (as a location parameter does) or stretching/shrinking it (as a scale parameter does)."}"
[\cite{WikipediaParetoDistribution}].


\includegraphics[width=0.9\textwidth]{Probability_density_function_of_Pareto_distribution.png}

\subsection{Applications of Power Law Distributions,the 80:20 rule and Mathew's Law}

 Power Law distributions have applications in many fields such as epidemiology (e.g. super spreaders
 \footnote{\href{https://en.wikipedia.org/w/index.php?title=Pareto_principle&oldid=967358099\#Other_applications}{Wikipedia(Pareto Principle)\#Other\_Applications} \cite{WikipediaParetoPrinciple} Superspreader: approximately 20\% of infected individuals are responsible for 80\% of transmissions \cite{Galvani2005}}
 and networking.
 \footnote{ \href{https://en.wikipedia.org/w/index.php?title=Pareto_distribution&oldid=967068451\#Occurrence_and_applications}{Wikipedia(Pareto\_Distribution)\#Occurence\_and\_applications} File Size Distribution in TCP traffic \cite{Reed2004}}
 (e.g. scale free networks \footnote{preferential attachment: "\href{https://arxiv.org/abs/cond-mat/9910332}{Emergence\ of\ scaling\ in\ random networks}" ((Barabasi1999)\cite{Barabasi1999})
Square Root of Time Growth: \cite{Perc2014} } \cite{Crovella1997} \cite{Guadamuz2011}).

 Such distributions are often charcterized by the \href{https://en.wikipedia.org/w/index.php?title=Pareto_principle&oldid=964003639}{80:20} rule (AKA Pareto Principle \cite{WikipediaParetoPrinciple}) and forces which lead to these types of distributions might be descibed as a "\href{https://en.wikipedia.org/w/index.php?title=Matthew_effect&oldid=962195433}{Mathew Effect}", which is analoges to the principle "The rich get richer and the poor get poorer".

 Examples of the "Mathew Effect can be found in fields such as
 sociology (e.g. science funding
 \footnote{\href{https://en.wikipedia.org/w/index.php?title=Matthew_effect&oldid=962195433\#Examples}{Wikipedia(MathewEffect\#Examples} \cite{WikipediaMathewEffect} "\href{http://www.pnas.org/content/pnas/115/19/4887.full.pdf}{The Matthew Effect in Science Funding}" (Bol2018 \cite{Bol2018})})
  and education (e.g. reading development \footnote{\href{https://en.wikipedia.org/w/index.php?title=Matthew_effect&oldid=962195433\#Examples}{Wikipedia(MathewEffect\#Examples} \cite{WikipediaMathewEffect} "Are There any Matthew Effects in Literacy and Cognitive Development?" \cite{Kempe2011}, "Beginning to Read: Thinking and Learning about Print" \cite{Adams1990}}).

\subsection{The CDF of the Pareto Distribution}

The survival function is a way to represent the non constant part of the CDF (cumulative distribution function). From this we can directly obtain the CDF of the Pareto distribution.  (\href{https://en.wikipedia.org/w/index.php?title=Survival_function&oldid=920310453#Definition}{wikipedia} \cite{wikipediaSurvivalFunction} \href{https://www.facebook.com/groups/280538506628903/permalink/281646936518060/}{fb}).

\begin{equation}
F_X(x) = \begin{cases}
1-\left(\frac{x_\mathrm{m}}{x}\right)^\alpha & x \ge x_\mathrm{m}, \\
0 & x < x_\mathrm{m}.\label{eq:CDF_pareto}
\end{cases}
\end{equation}

However, the survival function is  more suitable for curve fitting than the CDF due to it's linearlity in log space and that the constantpart of the CDF is exact. In-fact the original law was originally developed by linear regression of the wealth distribution in log space. \cite{JulietteFournierSep2015} \cite{Persky1992}  (verify)

\subsection{Emperical Derivation of the Pareto Principle}

In logspace the Pareto distribution can be written in the following emperical form:

\begin{equation}
log \, N = A - a \, log \, x \label{eg:LogSpaceParetoDistribution}
\end{equation}
where $N$ is the number of households which income is greater than $x$. If we are using the natural logarithm than for rotational convenience:

\begin{equation}
C=e^A
\end{equation}

$C$ is related to the scale parameter by normalizing
\footnote{To normalize a distribugion we divide by the total quanity (e.g. $N(X_m)$) in order to obtain either a probability distribution or an estimate of the probability distribution}
equation \eqref{eg:LogSpaceParetoDistribution} and solving for $X_m$. This yields:

\begin{equation}
{A \over N(X_m)}=\left(X_m\right)^{\alpha}
\end{equation}

where $Xm$ is the minimum value of $X$ and is typically called "The Scale Paramater" \cite{wikipediaSurvivalFunction}.

\subsection{Interpretation of the Scale Parameter ($X_m$) }

In the case where the Pareto distribution is used to model income then Xm might be thought of as the minimum income required to sustain a worker, and should include indirect income such as gifts, charity, subsidies.

Such indirect income could take monetary form, of some physical good or service (e.g. parental labour). For wealth one would need to convert the above effective income into a present value based on the time value of money. $Xm$ is known as the scale parameter. This is somewhat problematic because we don't know what a worker will earn in the future and the time value of money is variable based on context (e.g. credit score and earning potential.)

\section{Is the Pareto Distribution "Tail Heavy"?}

\subsection{The Moments of the Parato Distribution}

The Parato Distribution is defined for $\alpha>2$. This can be scene from it's probability density function:

\begin{equation}
f_X(x)= \begin{cases} \frac{\alpha x_\mathrm{m}^\alpha}{x^{\alpha+1}} & x \ge x_\mathrm{m}, \\ 0 & x < x_\mathrm{m}. \end{cases}
\end{equation}

which is a power law distribution . If we try to integrate this pdf (probability density function) at infinity convergence requires $\alpha>0$ and the origin it requires $\alpha<-1$

These two conditions can not be simultaneously true and this is why the Pareto distribution is defined in terms of the lower limit $Xm$.

Moreover, for the moments to be well defined $m < \alpha$, \cite{WikipediaPowerLaw} which means that that for the mean to be well defined $\alpha>1$

In general the moments of the Pareto distribution are expressed as:

\begin{equation}
\langle x^{m} \rangle = \int_{xm}^{\infty} x^{m} p(x) dx = { \alpha \over \alpha - m } \left(x_m\right)^m, \; where \;\; m>\alpha
\end{equation}

From which one can derive:

\begin{equation}
{\displaystyle \operatorname {E} (X)={\begin{cases}\infty &\alpha \leq 1,\\{\frac {\alpha x_{\mathrm {m} }}{\alpha -1}}&\alpha >1.\end{cases}}}
\end{equation}

and
\begin{equation}
{\displaystyle \operatorname {Var} (X)={\begin{cases}\infty &\alpha \in (1,2],\\\left({\frac {x_{\mathrm {m} }}{\alpha -1}}\right)^{2}{\frac {\alpha }{\alpha -2}}&\alpha >2.\end{cases}}}
\end{equation}

\subsection{Typical values of $\alpha$ and the 80-20 rule}

Recall from the previous section for the Parato distribution to even converge $\alpha>1$ and for the mean to be well defined $\alpha>2$. Without some further limiting factor (e.g. exponential limiting), the Pareto will always be a bit tail heavy because there will be some limit on how high an order of central moments is defined.

Furthermore, not only are the higher order moments not guaranteed to be defined but even the first order moment is heavily tail dependent. Sometimes this is known as (Breaking the curve) where a few exceptionally high values can play a large role in the mean.

The rule of thumb for a Pareto distribution is that 20\% of all people receive 80\% of all income. As given on Wikipedia with this rule we have,
\begin{equation}
{\displaystyle \alpha_{(80-20)} =\log _{4}5={\cfrac {\log _{10}5}{\log _{10}4}}\approx 1.161}
\end{equation}

as we will show later this number is close to what one might infer from Oxfam data for wealth.


\subsection{Kurtosis}

Kurtosis plays an important role in determining how quickly various estimates of central moments coverage and is also a measure of how tail heavy a function is. The Kurtosis for the Pareto distribution is (\href{https://en.wikipedia.org/w/index.php?title=Pareto_distribution&oldid=965348211#Relation_to_the_\%22Pareto_principle\%22}{from wikipedia} \cite{WikipediaParetoDistribution} ):

\begin{equation}
\text{Excess kurtosis}=\frac{6(\alpha^3+\alpha^2-6\alpha-2)}{\alpha(\alpha-3)(\alpha-4)}\text{ for }\alpha>4
\end{equation}

The Pareto distribution has an Excess Kurtosis value which is greater than one. This type of distribution is referred to as \href{https://en.wikipedia.org/w/index.php?title=Kurtosis&oldid=965968832#Leptokurtic}{Leptokurtic} \cite{WikipediaKurtosis} and is characterized by a fatter tail. Other examples of such distibutions are  Student's t-distribution, Rayleigh distribution, Laplace distribution, Poisson distribution and the logistic distribution.

\section{Lorenz curve} \label{LorenzCurve}

The Lorenz Curve provides a good way to visualize inequality (\href{https://www.facebook.com/groups/280538506628903/permalink/285154029500684/}{faceook})(\href{https://en.wikipedia.org/w/index.php?title=Pareto_distribution&oldid=967068451#Lorenz_curve_and_Gini_coefficient}{wikipedia} \cite{WikipediaParetoDistribution} \cite{WikipediaLorenzCurve}).

\includegraphics[width=0.9\textwidth]{ParetoLorenzSVG.png}

The formal definition is:

\begin{equation}
L(F)=\frac{\int_{x_\mathrm{m}}^{x(F)}xf(x)\,dx}{\int_{x_\mathrm{m}}^\infty xf(x)\,dx} =\frac{\int_0^F x(F')\,dF'}{\int_0^1 x(F')\,dF'}
\end{equation}
where x(F) is the inverse of the CDF. The inverse of a PDF is known as a \cite{WikipediaQuantileFunction} (see section \ref{Sec_QuantileFunction})

The CDF is given in equation \eqref{eq:CDF_pareto} and has the following inverse.

\begin{equation}
x(F)=\frac{x_\mathrm{m}}{(1-F)^{\frac{1}{\alpha}}} \label{eq:inv_cdf_pareto}
\end{equation}
where, $F(X)=P(x<X)$ and the Lorenz curve is therefore:

\begin{equation}
L(F) = 1-(1-F)^{1-\frac{1}{\alpha}}, \label{eq:Lorenz_Curve}
\end{equation}

rearranging:

\begin{equation}
F=1-(1-L(F))^{\alpha \over (1-\alpha)}
\end{equation}

From this we see that the variable $Y=(1-L(F))$ also has a survivale function which follows a power law. We can also differenate:

\begin{equation}
f(L)=dF/dL={\alpha \over (1-\alpha)}(1-L(F))^{2 \alpha - 1 \over (1-\alpha)}  \label{eq:Lorenz_Curveasdf}
\end{equation}

to get something like a pdf funciton in terms of the Lorenz value.

\section{The log-logistic Function and the Type II Pareto Distribution}

\subsection{Pareto Distributions Type I to Type IV}
The Pareto distribution is a statement of the survival function. In this respect various functions are asymptotically equivalent to the Pareto function but differ for low values of the random variable.

\subsubsection{Type II Parato Distributions and the Lowmax Distribution}

Pareto Considered Several of these. The first of these is the Type II pareto distirbution (\href{https://en.wikipedia.org/w/index.php?title=Pareto_distribution&oldid=967068451#Pareto_types_I\%E2\%80\%93IV}{wikipedia}  \cite{WikipediaParetoDistribution} )(\href{https://www.facebook.com/groups/280538506628903/permalink/281084956574258/?comment_id=281090623240358}{fb}) wich reduces to the lowmax distribution (\href{https://en.wikipedia.org/w/index.php?title=Lomax_distribution&oldid=958008294#Characterization}{wikipedia}  \cite{WikipediaLomax} )(\href{https://www.facebook.com/groups/280538506628903/permalink/281084956574258/}{fb})  when the parameter $\mu=0$ is zero.

\begin{equation}
{\displaystyle {\overline {F}}(x)=\Pr(X>x)=1-F(x) = \left[1+{\frac {x-\mu }{\sigma }}\right]^{-\alpha }}
\end{equation}

the lomax distribution was referred to by Johnson \& Kotz (1970) \cite{Johnson1970} as a Pareto distribution of the second kind ( \href{https://www.facebook.com/groups/280538506628903/permalink/281081976574556/}{facebook} \cite{Clark1999}), which has the following probability mass function.

\begin{equation}
{\displaystyle {\displaystyle p(x)={{\alpha \lambda ^{\alpha }} \over {(x+\lambda )^{\alpha +1}}}}={\alpha  \over \lambda }\left[{1+{x \over \lambda }}\right]^{-(\alpha +1)},\qquad x\geq 0,}
\end{equation}

the mean for the Type II Pareto distribution is given by:

\begin{equation}
 E[X]=\frac{ \sigma }{\alpha-1}
\end{equation}

and in general the central moments are:

\begin{equation}
 E[X^\delta]= \frac{ \sigma^\delta \Gamma(\alpha-\delta)\Gamma(1+\delta)}{\Gamma(\alpha)}
\end{equation}


where for positive integers (\href{https://en.wikipedia.org/w/index.php?title=Gamma_function&oldid=962235242}{wikipedia} \cite{WikipediaGammaFn}) the gamma function can be expressed as a factorial.
\begin{equation}
{\displaystyle \Gamma (n)=(n-1)!\ .}
\end{equation}

\subsubsection{Type III \& IV Parato Distributions}

A type IV Parato Distribution can generalizes Types I through to III as follows:

\begin{equation}
P(IV)(\sigma, \sigma, 1, \alpha) = P(I)(\sigma, \alpha),
\end{equation}
\begin{equation}
P(IV)(\mu, \sigma, 1, \alpha) = P(II)(\mu, \sigma, \alpha),
\end{equation}
\begin{equation}
P(IV)(\mu, \sigma, \gamma, 1) = P(III)(\mu, \sigma, \gamma).
\end{equation}

The survival function for the Type IV Pareto distribution is:

\begin{equation}
{\displaystyle {\overline {F}}(x)=P(X>x)=1-F(x) = \left[1+\left({\frac {x-\mu }{\sigma }}\right)^{1/\gamma }\right]^{-\alpha }} \label{eq:TypeIVParetoSurvival}
\end{equation}
where, ${\displaystyle x\geq \mu }$ and $\mu \in R \;\;$ $\sigma, \gamma > 0, \alpha$ \newline

and has the following central moments

\begin{equation}
E[X^\delta]= \frac{\sigma^\delta\Gamma(\alpha-\gamma \delta)\Gamma(1+\gamma \delta)}{\Gamma(\alpha)}
\end{equation}

where $\alpha$ is the tail index, $\mu$ is location, $\sigma$ is scale, $\gamma$ is an inequality parameter.

\subsection{The Log-Logistic Distribution} \label{LogLogisticDist}

The cumulative distribution for the Type IV Pareto distribution can be written as:

\begin{equation}
{\displaystyle F(x)=1-{\overline F}(x)=P(X<x)= 1-\left[1+\left({\frac {x-\mu }{\sigma }}\right)^{1/\gamma }\right]^{-\alpha }} \label{eq:GenLogLogDist}
\end{equation}
rearranging
\begin{equation}
{\displaystyle F(x)= \frac{\left[ \sigma^{1/\gamma} + \left(x-\mu\right)^{1/\gamma } \right]^{\alpha}}{\left[ \sigma^{1/\gamma} + \left(x-\mu\right)^{1/\gamma } \right]^{\alpha}}- {\sigma^{\alpha / \gamma} \over \left[ \sigma^{1/\gamma} + \left(x-\mu\right)^{1/\gamma } \right]^{\alpha} } }
\end{equation}

and when $\alpha=1$ is close to one this is approximately

\begin{equation}
{\displaystyle F(x)= \frac{\left(x-\mu\right)^{\beta }}{ \sigma^{\beta} + \left(x-\mu\right)^{\beta }}} \label{eq:LogLogisticPRIII_CDF}
\end{equation}
where, $\beta=\alpha/\gamma$ \newline

which when $\mu=0$ is the log-logistic distribution (\href{https://en.wikipedia.org/w/index.php?title=Log-logistic_distribution&oldid=943797276}{wikipedia} \cite{WikipediaLogLogisticDistribution} ).

\subsubsection{Alpha ($\alpha$) approximately equal to one is exact for Type III Pareto}

The above approximation $\alpha=1$ is exact for a "Type III" Pareto distribution.

\begin{equation}
 \overline{F}(x)=1-F(x)={\displaystyle \left[1+\left({\frac {x-\mu }{\sigma }}\right)^{1/\gamma }\right]^{-1}}
\end{equation}

Also for large values of X the "\emph{Type IV Parato distribution}" (equation \eqref{eq:TypeIVParetoSurvival}) reduces to a "\emph{Type I Pareto distribution}". With $alpha=1$ then $\gamma$ is a free parameter to fit the tail behavior which is the same for Pareto Distributions of Types I to IV.

When both $\gamma=1$, $\sigma=1$ and  $\mu=0$ we get a special cases of both the lomax
\footnote{If X has a Lomax distribution, then  $\frac {X}{\sigma }\sim \beta ^{\prime }(1,\sigma )$ (\href{https://en.wikipedia.org/w/index.php?title=Lomax_distribution&oldid=958008294\#Relation_to_the_beta_prime_distribution}{Wikipedia} \cite{WikipediaLomax} ) }
 and \href{https://en.wikipedia.org/wiki/Beta_prime_distribution}{beta prime distributions} and F(2,2) distribution.
\footnote{The Lomax distribution with shape parameter $\alpha = 1$ and scale parameter $\sigma = 1$ has density $f(x)=\frac{1}{(1+x)^{2}}$, the same distribution as an \href{https://en.wikipedia.org/wiki/F-distribution}{F(2,2) distribution}. ( \href{https://en.wikipedia.org/w/index.php?title=Lomax_distribution&oldid=958008294\#Relation_to_the_F_distribution}{wikipedia} \cite{WikipediaLomax} )}.


\subsubsection{The Moments of the Log-Logistic Distribution}

The log-logistic distribution has the following central moments:

\begin{equation}
\operatorname {E}(X)=\alpha b/\sin b,\quad \beta >1, \; b=\pi /\beta
\end{equation}
\begin{equation}
\operatorname {Var}(X)=\alpha ^{2}\left(2b/\sin 2b-b^{2}/\sin ^{2}b\right),\quad \beta >2,  \; b=\pi /\beta
\end{equation}

\section{Derivation of Log-Type Distributions}

The log-logistic distribution is the probability distribution of a random variable whose logarithm has a logistic distribution. In general we can consider a random variable of the form:
\begin{equation}
X=e^{\mu +\sigma Z}
\end{equation}

Where Z is a random variable of a given type (e.g. logistic or normal) and X is a variable who is a distribution of that type. Stated formally:

\begin{equation}
{\displaystyle \ln(X)\sim {f_X(X;\mu ,\sigma ^{2}).}}
\end{equation}

and it follows:

\begin{equation}
{\displaystyle {\begin{aligned}f_{X}(x)&={\frac {\rm {d}}{{\rm {d}}x}}\Pr(X\leq x)={\frac {\rm {d}}{{\rm {d}}x}}\Pr(\ln X\leq \ln x)={\frac {\rm {d}}{{\rm {d}}x}}\Phi \left({\frac {\ln x-\mu }{\sigma }}\right)\\[6pt]&=\varphi \left({\frac {\ln x-\mu }{\sigma }}\right){\frac {\rm {d}}{{\rm {d}}x}}\left({\frac {\ln x-\mu }{\sigma }}\right)=\varphi \left({\frac {\ln x-\mu }{\sigma }}\right){\frac {1}{\sigma x}}\\[6pt].\end{aligned}}}
\end{equation}

and if $\phi(x)$ is a normal distribution then
\begin{equation}
f_{X}(x)={\frac {1}{x}}\cdot {\frac {1}{\sigma {\sqrt {2\pi \,}}}}\exp \left(-{\frac {(\ln x-\mu )^{2}}{2\sigma ^{2}}}\right)
\end{equation}

alternatively if $\phi(x)$ is a logistic distribution then

\begin{equation}
f(x;\alpha ,\beta )={\frac  {(\beta /\alpha )(x/\alpha )^{{\beta -1}}}{\left(1+(x/\alpha )^{{\beta }}\right)^{2}}}
\end{equation}

\section{The Quantile Function} \label{Sec_QuantileFunction}

\subsection{The Type III Pareto is Essentially log-logistic but slightly more general}
In section \ref{LogLogisticDist} we showed that the Type III Pareto distribution is equivalent to the log-logistic distribution (Eq \eqref{eq:LogLogisticPRIII_CDF} w/ $\mu=0$)  when $\mu=0$. Also when $\mu=0$ the type IV Pareto distribution (Equation \eqref{eq:TypeIVParetoSurvival} ) will converage to the log-logistic distribution as $\alpha$ approaches 1.

All types (i.e.Types I to IV) as well as the log-logistic distrtion will converge to the "\emph{Type I Pareto Distribution}" for large values of $X$. The Type III pareto distribution can be obtained by setting $\alpha=1$ in the Type IV pareto distbution and one can pick $\gamma$ to match the tail behavior of any Type I Pareto distribution.

\subsection{The Quantile Function for the Type III Pareto Distribution}

For the following analysis with the cumulative distirbution function given in equation \eqref{eq:GenLogLogDist} since it is more general than the log-logistic distirubtion. We can invert this equation by solving for $X$ in terms of the value of the cumulative distribution function.

\begin{equation}
x=\mu + \left[ {\sigma^\beta F(x) \over 1 - F(x) } \right]^{\left( 1/\beta\right) }=\mu + \sigma \left[ {F(x) \over 1 - F(x) } \right]^{\left( 1/\beta\right) }
\end{equation} \label{eq:inv_cdf_paretoIII}

The result is the Quantile Function for the Type III Pareto distribution and if we set $\mu=0$ this is the Quantile Function for the log-logistic distribution (\href{https://en.wikipedia.org/w/index.php?title=Logistic_distribution&oldid=955853904#Quantile_function}{wikipedia} \cite{WikipediaQuantileFunction} )(\href{https://www.facebook.com/groups/280538506628903/permalink/280655966617157/}{facebook}).

\subsection{The Asymptotic Quantile Function For Types I \& III Parato Distributions}

The form of the Type III Pareto Distiribution Quantile Function \eqref{eq:inv_cdf_paretoIII} is noticeably different than the Type I Parto Distiribution Quantile Function (i.e. equation  \eqref{eq:inv_cdf_pareto}). The main distinguishing factor is the $F(x)$ in the numberator is not present in the Type I version of the Quantile Function.

The asymptotic similarity can be shown by using a new variable $\epsilon = 1-F(x)$. With this substitution \eqref{eq:inv_cdf_paretoIII} becomes:

\begin{equation}
x=\mu + \sigma \left[ \frac{1}{\epsilon} -1 \right]^{\left( 1/\beta\right) }\cong \mu + \sigma\left[ \frac{1}{\epsilon} \right]^{\left( 1/\beta\right) }
\end{equation} \label{eq:inv_cdf_paretoIIIapprox}

when both $\mu=0$ and $\sigma=(x_m)^{1/\beta}$ we get the same asymptotic result for both the types I and III Pareto distributions.

\section{The Isograph}

If we rearrange the Quantile Function for the type III Pareto distribution

\begin{equation}
x^\prime=\frac{x-\mu}{\sigma}= \left[ {F(x) \over 1 - F(x) } \right]^{\left( 1/\beta\right)}
\end{equation} \label{eq:inv_cdf_LogLogistic}

we get what looks like the Quantile function of the log-logistic distribution in terms of a new variable $x^\prime$. The parameter $\mu$ can be thought of as a minimum value and $\sigma$ can be thought of as a scale parameter.

If we take the natural logarithm of each side we get:

\begin{equation}
ln(x^\prime)=ln \left( \frac{x-\mu}{\sigma} \right)= ln \left( \left[ {F(x) \over 1 - F(x) } \right]^{\left( 1/\beta\right)} \right)=\left( \frac{1}{\beta}\right)ln \left( \left[ {F(x) \over 1 - F(x) } \right] \right)
\end{equation} \label{eq:inv_cdf_LogLogistic_logspace}

Using the natation of (Chauvel2014 \cite{Chauvel2014}), we can write this as:
\begin{equation}
\begin{aligned}
& ln(m_i)=\alpha ln(p_i/(1-p_i)) \\
& or \; M_i=\alpha X_i
\end{aligned}
\end{equation} \label{eq:LogitRankParetoChauvel}

which is still the log space representation of the log-logistic Quantile function and indirectly (via a change of variables) the Type III pareto distribution. This log-space form of the distibution is known as the  Champernowne-I–Fisk (CF), which can be written as $CF_\alpha$ or CF-I. The Champernowne-I–Fisk is a special case of a four parmater distribugion known as the Champernowne-II (1937) four-parameter distribution \cite{Chauvel2014} (Fisk (1961) \cite{Chauvel1961} )

Chauvel (2014 \cite{Chauvel2014}), replaced $\alpha$ by a function of $X_i$ so that both for very large and very small values of $X_i$ (i.e. the tails) the curve fit a pareto distribution and in the middle it matched the median. Or stated mathematically,
\begin{equation}
M_i=ISO(X_i) X_i
\end{equation}
and by definition the ISO graph is the average slope (i.e. $\alpha=ISO(X_i)=\frac{M_i}{X_i}$ ) of \eqref{eq:LogitRankParetoChauvel}. When the ISO graph is interpolated via equation \eqref{eq:AGBISO} then we will call this the ABG curve (for alpha, beta gamma) to be consistent with Chauvel 2014 \cite{Chauvel2014}. \newline

To provided a smooth transition between the tails and the median Chauvel (2014 \cite{Chauvel2014}) used hyperbolic tangent functions.
\begin{equation}
\theta_1(X)=tanh(X/2) \; and \; \theta_2(X)=tanh^2(X/2)
\end{equation}

\includegraphics[width=0.7\textwidth]{Theta1and2_hyperbolicTangents.png}

both of these functions gradually increase in magnitude from zero. The difference and sums of each of these functions behave similar but only on one side of the Y axis and are zero on the opposite side. This can be used to gradually phase in a function on opposite sides of the y axis by defining the following parameters.
\begin{equation}
B(X)=\frac{\theta_1(X)+\theta_2(X)}{2} \; and \; G(X)=\frac{-\theta_1(X)+\theta_2(X)}{2}
\end{equation}

We can now define an interpolating function for $ISO(X_i)$ as a linear combination of $1$, $(B(X_i)$ and %G(X_i)$ as follows:

\begin{equation}
ISO(X_i)=\alpha+\beta B(X_i)+\gamma G(X_i) \label{eq:AGBISO}
\end{equation}
Which is a piece-wise fit to our there interpolation points (i.e. the median and both tails) using a smooth transition.


\subsection{The Relation of B(X) and G(X) to AI Techniques}

 B(X) and G(X) are like fuzzy numbers which represent wealth either greater than or less than the median respectively. Also the smooth transitional properties of the hyperbolic tangent make it a popular choice for a sigmoid function in neural networks.

\subsection{the log-logit rank and the median}

Equations \eqref{eq:inv_cdf_LogLogistic_logspace} or equivalently equation \eqref{eq:LogitRankParetoChauvel} in Chauvel's notation, maps the logit-rank of a variable to the ln of a related variable. When constructing the ISO graph this variable is typically the original variable divided by the median.

The median provides a common point to compare curves and also be dividing be the median there is the convenient property where the logarithm changes from positive to negative which facilities the previously discussed interpolation.

\subsection{Ex. log (Wealth Inequality) using the log-logit rank }

A rank is simply a way of sorting data. The log-logit rank provides a one to one continues from percentiles in a way that is linear for the Pareto distribution. For example we can use it to compare the wealth distributionin America between 2013 and 1992. The steeper slope in 2013 indicates more inequality.

\includegraphics[width=1.2\textwidth]{LogitWealth.png}

\subsection{Ex. ISOgraphs }

The linearity in the previous graph shows that a Pareto type distribution is a good fit but doesn't show well the level of inequality at each income level. Such differences in inequality are better shown with the ISO graph and the plots of this graph also better show the differences between the tails and the median that we would get by trying to fit the exponent to a Pareto distirbution. The following figure is from Chauvel2014 \cite{Chauvel2014}

\includegraphics[width=1.2\textwidth]{ISOGraph_Examples}

\subsection{more stuff}


The Isograph can be obtained by first subtracting $\mu$ from each side of equation \eqref{eq:inv_cdf_paretoIII} and then dividing by the median to yield a log-logit transformation of the Type III Pareto Distribution.

This should be s straight line with slope $(1/\beta)$ and intercept $\frac{\sigma}{\sigma_{median}}=\sigma^\prime$ \newline

As can be scene this log-logit transformation produces a faily straight line \cite{isographslide} \newline


Dividing the Y variable \cite{Mishra2017} \cite{Chauvel2018} we get the isograph, which is a constant when the curve is a Type III Pareto distribution. The deviation from this constant represents unexpected inequality within a given group. Typically the logrithm used for this graph is the natural logarithm. The isograph essentially shows where the Type III Pareto is not a good fit.

\section{The Quantile Function}


\begin{thebibliography}{9}
\bibitem{wikipediaSurvivalFunction}
Wikipedia, Survival function,
\url{https://en.wikipedia.org/w/index.php?title=Survival_function&oldid=920310453#Definition} (\href{https://www.facebook.com/groups/280538506628903/permalink/281646936518060/}{Facebook:281646936518060})
\begin{verbatim}
Breadcrumb:
https://www.facebook.com/groups/280538506628903/permalink/281646936518060/\end{verbatim}

\bibitem{JulietteFournierSep2015}
JulietteFournier Sep 2015, Generalized Pareto curves:Theory and application using income and inheritance tabulations for France 1901-2012, \url{http://piketty.pse.ens.fr/files/Fournier2015.pdf} (\href{https://www.facebook.com/groups/280538506628903/permalink/281658899850197/}{Facebook:281658899850197})


\bibitem{Persky1992}
Persky, J. (1992). Retrospectives: Pareto’s Law.Journal of Economic Perspectives, 6(2):181–192.

\bibitem{WikipediaPowerLaw} \url{https://en.wikipedia.org/w/index.php?title=Power_law&oldid=965150788#Power-law_probability_distributions}

\bibitem{WikipediaParetoDistribution}
\url{https://en.wikipedia.org/w/index.php?title=Pareto_distribution&oldid=965348211#Relation_to_the_\%22Pareto_principle\%22}

\bibitem{WikipediaKurtosis}
\url{https://en.wikipedia.org/w/index.php?title=Kurtosis&oldid=965968832#Leptokurtic}

\bibitem{WikipediaParetoPrinciple}
\url{https://en.wikipedia.org/w/index.php?title=Pareto_principle&oldid=964003639}

\bibitem{WikipediaLorenzCurve}
\url{https://en.wikipedia.org/w/index.php?title=Lorenz_curve&oldid=946106761}

\bibitem{WikipediaQuantileFunction}
\url{https://en.wikipedia.org/w/index.php?title=Logistic_distribution&oldid=955853904#Quantile_function}

\bibitem{WikipediaLomax}
\url{https://en.wikipedia.org/w/index.php?title=Lomax_distribution&oldid=958008294#Characterization}

\bibitem{Clark1999}
R. M. Clark (1999), Generalizations of power-law distributions applicable to sampledfault-trace lengths: model choice, parameter estimation and caveats, Geophys. J. Int.(1999)136,357\^372

\bibitem{Johnson1970}
Johnson, N.L. and Kotz (1970). Continuous Univariate Distributions I, Houghton Mi, New York

\bibitem{WikipediaGammaFn}
Aswini Kumar Mishra, EXAMINING CHANGES IN THE LEVEL AND SHAPE OF INCOME DISTRIBUTIONS IN INDIA, 2005-2012
\url{https://en.wikipedia.org/w/index.php?title=Gamma_function&oldid=962235242}

\bibitem{isographslide}
Slide 18 of \url{https://slideplayer.com/slide/14345914/}, Income and wealth distribution

\bibitem{Mishra2017}
Aswini Kumar Mishra, EXAMINING CHANGES IN THE LEVEL AND SHAPE OF INCOME DISTRIBUTIONS IN INDIA, 2005-2012
pg 5 of \url{https://pdfs.semanticscholar.org/58c2/c262230b2a150a1d4c68d1806d9cbe5e7212.pdf} for isograph deffintion

\bibitem{Chauvel2018}
Louis Chauvel (2018), Wealth as an Increasing Source of Inequality and Distortion in Income Groups
pg 5 of \url{http://www.iariw.org/copenhagen/chauvel.pdf} for isograph deffintion

\bibitem{Chauvel2014}
THE INTENSITY AND SHAPE OF INEQUALITY: THE ABG METHODOF DISTRIBUTIONAL ANALYSIS
\url{https://www.gc.cuny.edu/CUNY_GC/media/LISCenter/2019\%20Inequality\%20by\%20the\%20Numbers/Instructor\%20Readings/Chauvel-1.pdf}

\bibitem{Chauvel1961}
Fisk, P. R.,"The Graduation of Income Distributions," Econometrica, 29, 171–85,1961.

\bibitem{WikipediaMathewEffect}
\url{https://en.wikipedia.org/w/index.php?title=Matthew_effect&oldid=962195433}

\bibitem{Crovella1997}
Crovella, Mark E.; Bestavros, Azer (December 1997). \href{https://www.cs.bu.edu/~crovella/paper-archive/self-sim/journal-version.pdf}{Self-Similarity in World Wide Web Traffic}: Evidence and Possible Causes (PDF). IEEE/ACM Transactions on Networking. 5. pp. 835–846.

\bibitem{Galvani2005}
Galvani, Alison P.; May, Robert M. (2005). "\href{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7095140/}{Epidemiology: Dimensions of superspreading}". Nature. 438 (7066): 293–295. Bibcode:2005Natur.438..293G. doi:10.1038/438293a. PMC 7095140. PMID 16292292.

\bibitem{Bol2018}
Bol, T.; de Vaan, M.; van de Rijt, A. (2018). "\href{http://www.pnas.org/content/pnas/115/19/4887.full.pdf}{The Matthew Effect in Science Funding}" (PDF). PNAS. 115 (19): 4887–4890. doi:10.1073/pnas.1719557115. PMC 5948972. PMID 29686094.

\bibitem{Reed2004}
Reed, William J.; et al. (2004). "The Double Pareto-Lognormal Distribution – A New Parametric Model for Size Distributions". Communications in Statistics – Theory and Methods. 33 (8): 1733–53. CiteSeerX 10.1.1.70.4555. doi:10.1081/sta-120037438.

\bibitem{Barabasi1999}
Barabási, A-L; Albert, R (1999). "Emergence of scaling in random networks". Science. 286 (5439): 509–512. arXiv:\href{https://arxiv.org/abs/cond-mat/9910332}{cond\-mat\/9910332}. Bibcode:1999Sci...286..509B. doi:10.1126/science.286.5439.509. PMID 10521342.

\bibitem{Perc2014}
Perc, Matjaž (2014). "The Matthew effect in empirical data". Journal of the Royal Society Interface. 12 (104): 20140378. arXiv:1408.5124. Bibcode:2014arXiv1408.5124P. doi:10.1098/rsif.2014.0378. PMC 4233686. PMID 24990288.

\bibitem{Guadamuz2011}
Guadamuz, Andres (2011). Networks, Complexity And Internet Regulation – Scale-Free Law. Edward Elgar. ISBN 9781848443105.

\bibitem{Kempe2011}
Kempe, C., Eriksson‐Gustavsson, A. L., \& Samuelsson, S (2011). "Are There any Matthew Effects in Literacy and Cognitive Development?". Scandinavian Journal of Educational Research. 55 (2): 181–196. doi:10.1080/00313831.2011.554699.

\bibitem{Adams1990}
Adams, Marilyn J. (1990). Beginning to Read: Thinking and Learning about Print. Cambridge, MA: MIT Press. pp. 59–60.

\bibitem{WikipediaShapeParameter}
\url{https://en.wikipedia.org/w/index.php?title=Shape_parameter&oldid=952709345}


\bibitem{WikipediaLogLogisticDistribution}
\url{https://en.wikipedia.org/w/index.php?title=Log-logistic_distribution&oldid=943797276}

\bibitem{einstein}
Albert Einstein.
\textit{Zur Elektrodynamik bewegter K{\"o}rper}. (German)
[\textit{On the electrodynamics of moving bodies}].
Annalen der Physik, 322(10):891–921, 1905.


\bibitem{knuthwebsite}
Knuth: Computers and Typesetting,
\\\texttt{http://www-cs-faculty.stanford.edu/\~{}uno/abcde.html}
\end{thebibliography}


\end{document}