www.opensourceshakespeare.org/stats/] words. Assume that the average English word length is 5.1 characters. So total number of characters = \( 4510547 \). Let a monkey be typing on a keyboard randomly (independent keystrokes). The keyboard has \( 30 \) characters, and the event that in the n\text{th} \( 4510547 \) characters replicate Shakespeare's works is denoted \( E_n \). Clearly, \( ℙ(E_n) = 30^{-4510547} \), which is a constant. Then \( ∑ ℙ(E_n) = ∞ \), and by BC1, \( ℙ\{ E_n \text{ i.o.} \} = 1 \). That is, the monkey will replicate Shakespeare's works infinitely often!
Modes of convergence
- Sure convergence or pointwise convergence (pointless!)
- Complete convergence
- A.s. convergence
- \( L^p \) convergence
- Convergence in probability
- Weak* convergence, or convergence in distribution
- Vague convergence
Important theorems
- Markov inequality.
- Borel-Cantelli Lemmas (see above).
-
(Laws of Large Numbers) Let \( (X_n) \) be a sequence of IID RVs, and \( S_n = ∑_{i = 1}^n X_i \). Then
- (WLLN) \( S_n → 𝔼(X_1) \) in probability as \( n → ∞ \).
- (SLLN) \( S_n → 𝔼(X_1) \) a.s. as \( n → ∞ \).
-
Example. of WLLN: Bernstein polynomials uniformly approximate continuous functions (probabilistic proof of the Weierstrass Approximation Theorem)
-
(Kolmogorov's 0-1 Law): Let \( (X_n) \) be a sequence of independent RVs. If \( E_T \) is a tail event \( (E_T ∈ ℱ_T = ⋂_{n ∈ ℕ} σ(X_n)) \), then \( ℙ(E_T) = 0 \) or \( ℙ(E_T) = 1 \).
- (Central Limit Theorem) Let \( (X_n) \) be a sequence of IID RVs with \( 𝔼(X_1) = μ, 𝕍(X_1) = σ^2 \), and let \( S_n = ∑_{i = 1}^n X_i \). Then \( \sqrt{n} \frac{S_n - μ}{σ} → N(0, 1) \) in distribution as \( n → ∞ \).
Conditioning
- Motivation: \( ℙ(B | A) = \frac{ℙ(B ∩ A)}{ℙ(A)} \). But \( ℙ(A) \) may be zero!
- (Conditional expectation) Let \( (Ω, ℱ, ℙ) \) a complete probability space (complete means that all sets of measure \( 0 \) are in \( ℱ \)), \( X ∈ L_+^1(Ω, ℱ, ℙ) \) be a positive integrable random variable and \( 𝒢 ⊆ ℱ \) be a sub σ−algebra. On \( 𝒢 \), we define the measure induced by \( X \) as \( ℚ(A) = 𝔼(X 𝟙_A) \ ∀ A ∈ 𝒢 \). Then \( ℚ ≪ ℙ \), and so by Radon-Nikodym’s theorem, there exists (a.s.) a \( 𝒢 \)-measurable function \( Y \) such that \( 𝔼(Y 𝟙_A) = 𝔼(X 𝟙_A) \ ∀ A ∈ 𝒢 \). We denote \( 𝔼(X | 𝒢) = Y \). The general case \( X ∈ L_+^1(Ω, ℱ, ℙ) \) can be handled by writing \( X = X_+ - X_- \).
- (Conditional probability) Define \( ℙ(E | 𝒢) = 𝔼(𝟙_E | 𝒢) \).
- Note: the conditional expectation (and hence the conditional probability) is a RV. Heuristically, it is the RV “closest” to the original RV. In this sense, the conditional expectation is like a projection. This can be seen by the property: \( 𝔼(𝔼(X | 𝒢) | 𝒢) = 𝔼(X | 𝒢) \). In fact, if \( X ∈ L^2(Ω, ℱ, ℙ) \), then \( 𝔼(X | 𝒢) \) is indeed the orthogonal projection onto the subspace \( L^2(Ω, 𝒢, ℙ) \).
Stochastic processes
-
A set of RVs, indexed by an ordered set (Example. \( ℕ, ℝ \)), is called a stochastic process.
-
Martingale: Let \( (Ω, ℱ, (ℱ_n), ℙ) \) is called a filtered probability space, and \( (X_n) \) be a stochastic processes such that \( X_n \) is \( ℱ_n \)-measurable \( ∀n \). Then the stochastic processes \( (X_n) \) is called a martingale if \( ∀n, X_n ∈ L^1 \) and \( 𝔼(X_{n+1} | ℱ_n) = X_n \).
Last update: 2020-07-21 17:41:22