随机变量

$$F(x) = P \{X \leq x\}, x \in \mathbb{R}$$

\begin{aligned} P\left\{x_{1}<X \leqslant x_{2}\right\} &=P\left\{X \leqslant x_{2}\right\}-P\left\{X \leqslant x_{1}\right\} \\ &=F\left(x_{0}\right)-F\left(x_{1}\right) . \end{aligned}

累积分布函数

$$F_X(x) = P(X \leq x)$$

CDF 必须满足的条件：

1. F 不减
2. F 是规范的： $\lim*{x\to-\infty} F (x) = 0$ 且 $\lim*{x\to\infty} F (x) = 1$
3. F 右连续

概率质量函数

$F_X(x) = \mathbb{P}(X = x)$

PMF 和 CDF 的关系：

$F_{X}(x)=\mathbb{P}(X \leq x)=\sum_{x_{i} \leq x} f_{X}\left(x_{i}\right)$

$$\mathbb{P}(a<X<b)=\int_{a}^{b} f_{X}(x) \mathrm{d}x$$

$$F_{X}(x)=\int_{-\infty}^{x} f_{X}(t) \mathrm{d}t$$

【例子】均匀 (0,1) 分布：

pdf:

$$f_{X}(x)= \begin{cases}1 & \text { for } 0 \leq x \leq 1 \\ 0 & \text { otherwise. }\end{cases}$$

cdf:

$$F_{X}(x)= \begin{cases}0 & x<0 \\ x & 0 \leq x \leq 1 \\ 1 & x>1\end{cases}$$

figure:

【引理】设 F 是 X 的 CDF，则

1. $\mathbb {P}(X=x)=F (x)-F\left (x^{-}\right)$, 其中，$F\left (x^{-}\right)=\lim _{y \uparrow x} F (y)$.
2. $\mathbb{P}(x<X \leqslant y)=F(y)-F(x)$.
3. $\mathbb{P}(X>x)=1-F(x)$
4. 如果 $X$ 是连续的，则
\begin{aligned} F(b)-F(a) &=\mathbb{P}(a<X<b)=\mathbb{P}(a \leqslant X<b) \\ &=\mathbb{P}(a<X \leqslant b)=\mathbb{P}(a \leqslant X \leqslant b) . \end{aligned}

【定义】设 F 是 X 的 CDF，则逆 CDF，或称分位数函数（inverse CDF or quantile function）:

$$F^{-1}(q)=\inf \{x: F(x)>q\}$$

$q \in [0,1]$

$\inf$ 下确界

$p = \frac {1}{4}$ 第一分位数 (first quartile) $p = \frac {1}{2}$ 第二分位数，中位数 (median) $p = \frac {3}{4}$ 第三分位数 (third quartile)

X,Y 同分布(equal in distribution)： $F_X (x) = F_Y (x)$ ，记为

$$X \stackrel{d}{=} Y$$

X，Y 同分布，不代表 X = Y

概率密度函数

$$\forall -\infty <a<\infty ,\quad F_{X}(a)=\int _{{-\infty }}^{{a}}f_{{X}}(x)\,dx$$

$$f_{X}(x)=\dfrac{\mathrm{d} F_{X}}{\mathrm{~d} x}(x)$$

• $x$：是随机变量 $X$ 的特定函数值
• $F_X (x)$：是 $x$ 的 CDF

独立随机变量

Two random variables $X$ and $Y$ are independent if, for every $A$ and $B$,

$$\mathbb{P}(X \in A, Y \in B) = \mathbb{P}(X \in A) \mathbb{P}(Y \in B)$$

We write $X \text{ ⫫ } Y$.

In principle, to check whether $X$ and $Y$ are independent we need to check the equation above for all subsets $A$ and $B$. Fortunately, we have the following result which we state for continuous random variables though it is true for discrete random variables too.

Theorem 3.30. Let $X$ and $Y$ have joint PDF $f_{X, Y}$. Then $X \text{ ⫫ } Y$ if and only if $f_{X, Y}(x, y) = f_X(x) f_Y(y)$ for all values $x$ and $y$.

The statement is not rigorous because the density is defined only up to sets of measure 0.

The following result is helpful for verifying independence.

Theorem 3.33. Suppose that the range of $X$ and $Y$ is a (possibly infinite) rectangle. If $f(x, y) = g(x) h(y)$ for some functions $g$ and $h$ (not necessarily probability density functions) then $X$ and $Y$ are independent.

Multivariate Distributions and IID Samples

Let $f(x_1, \dots, x_n)$ denote the PDF. It is possible to define their marginals, conditionals, etc. much the same way as in the bivariate case. We say that $X_1, \dots, X_n$ are independent if, for every $A_1, \dots, A_n$,

$$\mathbb{P}(X_1 \in A_1, \dots, X_n \in A_n) = \prod_{i=1}^n \mathbb{P}(X_i \in A_i)$$

We shall write this as $X_1, \dots, X_n \sim f$ or, in terms of the CDF, $X_1, \dots, X_n \sim F$. This means that $X_1, \dots, X_n$ are independent draws from the same distribution. We also call $X_1, \dots, X_n$ a random sample from $F$.

常见的分布

Point Mass Distribution

\mathbb{P}(X = a) = 1

PDF:

$$F(x)= \begin{cases}0, & x<a \\ 1, & x \geqslant a .\end{cases}$$

Discrete Uniform Distribution

$$f(x)= \begin{cases}1 / k, & x=1, \cdots, k \\ 0, & \text { elsewhere }\end{cases}$$

Bernoulli Distribution (0-1 Distribution)

X 表示硬币的正反。 $\mathbb {P}(X=1)=p$ and $\mathbb {P}(X=0)=1-p$ for some $p \in [0,1] .$

X 0 1
px 1-p p

Binomial Distribution

$$\mathbb{P}(X \mid_{x = k}) = {n \choose k} p^k (1 - p)^{n-k}$$

$f(x)=\mathbb{P}(X=x)$

$$f(x)= \begin{cases}\left(\begin{array}{l} n \\ x \end{array}\right) p^{x}(1-p)^{n-x} & \text { for } x=0, \ldots, n \\ 0 & \text { otherwise }\end{cases}$$

If $X_{1} \sim \operatorname{Binomial}\left(n_{1}, p\right)$ and $X_{2} \sim \operatorname{Binomial}\left(n_{2}, p\right)$ then $X_{1}+X_{2} \sim \operatorname{Binomial}\left(n_{1}+n_{2}, p\right)$.

$$\mathbb{E}(X) = \lambda = np \quad \mathbb{V}(X) = np(1-p)$$

Geometric Distribution

$X$ has a geometric distribution with parameter $p \in(0,1)$, written $X \sim \operatorname{Geom}(p)$, if

$$\mathbb{P}(X=k)=p(1-p)^{k-1}, \quad k \geq 1$$

We have that

$$\sum_{k=1}^{\infty} \mathbb{P}(X=k)=p \sum_{k=1}^{\infty}(1-p)^{k}=\frac{p}{1-(1-p)}=1$$

Think of $X$ as the number of flips needed until the first head when flipping a coin.

$\mathbb{E}(X) = \dfrac{1}{p}$

Poisson Distribution

$$\mathbb{P}(X \mid_{x = k}) =\frac{\lambda^k e^{-\lambda}}{k!}$$

$X$ has a Poisson distribution with parameter $\lambda$, written $X \sim \operatorname{Poisson}(\lambda)$ if

$$f(x)=e^{-\lambda} \frac{\lambda^{x}}{x !} \quad x \geq 0$$

Note that

$$\sum_{x=0}^{\infty} f(x)=e^{-\lambda} \sum_{x=0}^{\infty} \frac{\lambda^{x}}{x !}=e^{-\lambda} e^{\lambda}=1$$

The Poisson is often used as a model for counts of rare events like radioactive decay and traffic accidents. If $X_{1} \sim \operatorname{Poisson}\left(\lambda_{1}\right)$ and $X_{2} \sim \operatorname{Poisson}\left(\lambda_{2}\right)$ then $X_{1}+X_{2} \sim \operatorname{Poisson}\left(\lambda_{1}+\lambda_{2}\right)$

Uniform Distribution

$X$ has a Uniform $(a, b)$ distribution, written $X \sim$ Uniform $(a, b)$, if

$$f(x)= \begin{cases}\frac{1}{b-a} & \text { for } x \in[a, b] \\ 0 & \text { otherwise }\end{cases}$$

where $a<b$.

The distribution function is

$$F(x)= \begin{cases}0 & x<a \\ \frac{x-a}{b-a} & x \in[a, b] \\ 1 & x>b\end{cases}$$

Normal (Gaussian) Distribution

$X$ has a Normal (or Gaussian) distribution with parameters $\mu$ and $\sigma$, denoted by $X \sim N\left(\mu, \sigma^{2}\right)$, if

$$f(x)=\frac{1}{\sigma \sqrt{2 \pi}} \exp \left\{-\frac{1}{2 \sigma^{2}}(x-\mu)^{2}\right\}, \quad x \in \mathbb{R}$$

(i) If $X \sim N\left(\mu, \sigma^{2}\right)$, then $Z=(X-\mu) / \sigma \sim N(0,1)$. (ii) If $Z \sim N(0,1)$, then $X=\mu+\sigma Z \sim N\left(\mu, \sigma^{2}\right)$. (iii) If $X_{i} \sim N\left(\mu_{i}, \sigma_{i}^{2}\right), i=1, \ldots, n$ are independent, then

$$\sum_{i=1}^{n} X_{i} \sim N\left(\sum_{i=1}^{n} \mu_{i}, \sum_{i=1}^{n} \sigma_{i}^{2}\right)$$

It follows from (i) that if $X \sim N\left(\mu, \sigma^{2}\right)$, then

\begin{aligned} \mathbb{P}(a<X<b) &=\mathbb{P}\left(\frac{a-\mu}{\sigma}<Z<\frac{b-\mu}{\sigma}\right) \\ &=\Phi\left(\frac{b-\mu}{\sigma}\right)-\Phi\left(\frac{a-\mu}{\sigma}\right) \end{aligned}

Gamma Distribution.

$${\displaystyle \Gamma (z)=\int _{0}^{\infty }x^{z-1}\mathrm {e} ^{-x}{\rm {{d}x}}}$$

$\forall \alpha>0$, Gamma 函数定义为 $\Gamma (\alpha)=\int_{0}^{\infty} \lambda ^{\alpha-1} e^{-\lambda} \mathrm {d}\lambda$

Gamma 分布表示：要等 $n$ 个 IID 随机事件都发生，需要多长时间。

Gamma 分布的记号： $X\sim \Gamma (\alpha ,\beta) 或 {\displaystyle X\sim \Gamma (\alpha ,\lambda)}$ (${\displaystyle {\color {Red}{\lambda ={\frac {1}{\beta}}}}}$ )

$${\displaystyle f\left(x\right)={\frac {x^{\left(\alpha -1\right)}{\color {Red}\lambda }^{\alpha }e^{\left(-{\color {Red}\lambda }x\right)}}{\Gamma \left(\alpha \right)}}={\frac {x^{\left(\alpha -1\right)}e^{\left(-{\color {Red}{\frac {1}{\beta }}}x\right)}}{{\color {Red}\beta }^{\alpha }\Gamma \left(\alpha \right)}}}$$

$\beta$ 和在泊松过程中的 $\lambda$ 含义类似，代表速率 。

$\alpha$ 则称为形状参数。

Exponential Distribution

$$\mathbb{P}(X \mid_{x = k}) = \lambda e^{-\lambda x}$$

Beta Distribution

$X$ has a Beta distribution with parameters $\alpha>0$ and $\beta>0$, denoted by $X \sim \operatorname{Beta}(\alpha, \beta)$, if

$$f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha) \Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1}, \quad 0<x<1$$

$t$ and Cauchy Distribution

t AND CAUCHY DISTRIBUTION. $X$ has a $t$ distribution with $\nu$ degrees of freedom - written $X \sim t_{\nu}-$ if

$$f(x)=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\Gamma\left(\frac{\nu}{2}\right)} \frac{1}{\left(1+\frac{x^{2}}{\nu}\right)^{(\nu+1) / 2}}$$

The $t$ distribution is similar to a Normal but it has thicker tails. In fact, the Normal corresponds to a $t$ with $\nu=\infty$. The Cauchy distribution is a special case of the $t$ distribution corresponding to $\nu=1$. The density is

$$f(x)=\frac{1}{\pi\left(1+x^{2}\right)}$$

To see that this is indeed a density:

\begin{aligned} \int_{-\infty}^{\infty} f(x) d x &=\frac{1}{\pi} \int_{-\infty}^{\infty} \frac{d x}{1+x^{2}}=\frac{1}{\pi} \int_{-\infty}^{\infty} \frac{d \tan ^{-1}(x)}{d x} \\ &=\frac{1}{\pi}\left[\tan ^{-1}(\infty)-\tan ^{-1}(-\infty)\right]=\frac{1}{\pi}\left[\frac{\pi}{2}-\left(-\frac{\pi}{2}\right)\right]=1 \end{aligned}

$\chi^2$ Distribution

$X$ has a $\chi^{2}$ distribution with $p$ degrees of freedom - written $X \sim \chi_{p}^{2}-$ if

$$f(x)=\frac{1}{\Gamma(p / 2) 2^{p / 2}} x^{(p / 2)-1} e^{-x / 2}, \quad x>0$$

If $Z_{1}, \ldots, Z_{p}$ are independent standard Normal random variables then $\sum_{i=1}^{p} Z_{i}^{2} \sim$ $\chi_{p}^{2}$