AlphaCheatSheet: CFA Level I: Quantitative Methods - Statistical Concepts and Market Returns (Part 2)

For list of other CFA Level I topics, click here.
For the previous part of this topic, click here.

In part 2 of this topic, we are going to cover the following items:
- Calculations with quantiles
- measures of central tendency (mean, mode, median, range, standard deviation, variance, etc.)
- skewness and kurtosis of distributions

1. Calculations with quatiles

Quantile is the general term for a value at or below which a stated proportion of the data in a distribution lies.

Quartile - the distribution is divided into quarters
Quintile - the distribution is divided into fifths
Decile - the distribution is divided into tenths
Percentile - the distribution is divided into hundredths (percents)

The equation for the position of the observation at a given percentile y , with n data points sorted in ascending order is:

L_y = (n + 1)y/100

The following example is taken from the CFA Level I curriculum (2011) as an illustration of the concepts above.

No.	Company	Div Yield (%)	No.	Company	Div Yield (%)
1	AstraZeneca	0.00	26	UBS	2.65
2	BP	0.00	27	Tesco	2.95
3	Deutsche Telekom	0.00	28	Total	3.11
4	HSBC Holdings	0.00	29	GlaxoSmithKline	3.31
5	Credit Suisse Group	0.26	30	BT Group	3.34
6	L’Oreal	1.09	31	Unilever	3.53
7	SwissRe	1.27	32	BASF	3.59
8	Roche Holding	1.33	33	Santander Central Hispano	3.66
9	Munich Re Group	1.36	34	Banco Bilbao Vizcaya Argentina	3.67
10	General Assicurazioni	1.39	35	Diageo	3.68
11	Vodafone Group	1.41	36	HBOS	3.78
12	Carrefour	1.51	37	E.ON	3.87
13	Nokia	1.75	38	Shell Transport and Co.	3.88
14	Novartis	1.81	39	Barclays	4.06
15	Allianz	1.92	40	Royal Dutch Petroleum Co.	4.27
16	Koninklije Philips Electronics	2.01	41	Fortis	4.28
17	Siemens	2.16	42	Bayer	4.45
18	Deutsche Bank	2.27	43	DaimlerChrysler	4.68
19	Telecom Italia	2.27	44	Suez	5.13
20	AXA	2.39	45	Aviva	5.15
21	Telefonica	2.49	46	Eni	5.66
22	Nestle	2.55	47	ING Group	6.16
23	Royal Bank of Scotland Group	2.60	48	Prudential	6.43
24	ABN-AMRO Holding	2.65	49	Lloyds TSB	7.68
25	BNP Paribas	2.65	50	AEGON	8.14

a. Caluclate the 10th and 90th percentile
b. Calculate first, second, and third quartile
c. Find Median

Answers

a. In this example: n = 50, using the equation L_y = (n + 1)y/100 for the position of the y^th percentile (P_y)

For the 10th percentile: L₁₀ = (50 + 1)(10/100) = 5.1

L₁₀ is between the 5^th and 6^th observations with values X₅ = 0.26 (Credit Suisse Group) and X₆ = 1.09 (L’Oreal). The estimate of the 10^th percentile (first decile) for the dividend yield is

P₁₀ ≈ X₅ + (L₁₀ – 5)(X₆ – X₅) = 0.26 + (5.1 – 5)(1.09 – 0.26) = 0.34%

For the 90th percentile: L₉₀ = (50 + 1)(90/100) =45.9

L₉₀ is between the 45^th and 46^th observations with X₄₅ = 5.15 and X₄₆ = 5.66. The estimate of the 90th percentile is

P₉₀ ≈ X₄₅ + (L₉₀ – 45)(X₄₆ – X₄₅) = 5.15 + (45.9 – 45)(5.66 – 5.15) = 5.61%

Note: In the calculations above, P₁₀ shows that 10th percentile lies (5.1 – 5) = 10% of the distance between the 5^th and 6^th observations. The distance between the 5^th and 6^th observations is 1.09 – 0.26 = 0.83, 10% of that distance is 0.083. We obtain P₁₀ by adding this value (0.083) to the closest observation before L₁₀ (i.e. X₅). The calculation for P₉₀is exactly the same.

b. The first, second, and third quartile correspond to P₂₅, P₅₀, and P₇₅ respectively.

L₂₅ = (50 + 1)(25/100) = 12.75

L₅₀ = (50 + 1)(50/100) = 25.50

L₇₅ = (50 + 1)(75/100) = 38.25

Using the same way we calculate the positions of the 10^th and 90^th percentile in the previous question, we obtain the following results

P₂₅ = Q₁ = 1.69% P₅₀ = Q₂= 2.65% P₇₅= Q₃ = 3.93%

c. The median is the 50th percentile, 2.65%.

2. Range, Mean Absolute Deviation, Variance, Standard Deviation, and Chebyshev's Inequality

Range is the distance between the largest and the smallest value in a data set

range = max value – min value

The Mean Absolute Deviation (MAD) is the average of the absolute values of the deviations of individual observations from the arithmetic mean

Population Variance (σ²) is the average of squared deviations from the mean.

Population Standard Deviation (σ) is a measure of the dispersion of a set of data from its mean. The more spread apart the data, the higher the deviation. Standard deviation is calculated as the square root of variance.

Example: Find MAD, variance, and standard deviation of the following set of investment returns [5%, 15%, 22%, 12%, 7%]

Mean = (5 + 15 + 22 + 12 + 7)/5 = 12.2%
MAD = (|5 – 12.2| + |15 – 12.2| + |22 – 12.2| + |12 – 12.2| + |7 – 12.2|)/5 = 5.04%

This result can be interpreted to mean that, on average, an individual return deviate +/- 5.04% from the mean return of 12.2%

Variance = σ² = [(5 – 12.2)² + (7 – 12.2)² + (12 – 12.2)² + (15 – 12.2)²+ (22 – 12.2)²]/5 = 36.56 (%²)

Standard Deviation = σ = 6.05%

Sample variance (s²) is the measure of dispersion that applies when we evaluate a sample of n observations from a population.

Sample Standard Deviation (s) is the square root of sample variance

Chebyshev's Inequality states that for any set of observations, whether sample or population data and regardless of the shape of the distribution, the percentage of observations that lie within k standard deviations of the mean is at least 1 – 1/k² for all k > 1

According to Chebyshev's Inequality, the following relationships hold for any distribution. At least:

36% of observations lie within ± 1.25 standard deviations of the mean
56% of observations lie within ± 1.50 standard deviations of the mean
75% of observations lie within ± 2 standard deviations of the mean
89% of observations lie within ± 3 standard deviations of the mean
94% of observations lie within ± 4 standard deviations of the mean

Example: find out the minimum percentage of any distribution that will lie within ± 2.5 standard deviations of the mean.

3. Coefficient of Variance, Sharpe Ratio,

Coefficient of Variantion (CV) is a statistical measure of the dispersion of data points in a data series around the mean. In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment.

Example: Given monthly the mean return on T-bills is 0.25% (usually represents risk-free rate) with a standard deviation of 0.36% and the mean monthly return for S&P500 is 1.09% with a standard deviation of 7.3%. Calculate and interprete the CVs of these 2 investments.

CV_T-bills = 0.36/0.25 = 1.44

CV_S&P500 = 7.3/1.09 = 6.70

The reults indicate that there is less dispersion (risk) per unit of monthly return for T-bills than for S&P500

The Sharpe Ratio (Reward-to-variability ratio) measures excess return per unit of risk. Investments with large positive Sharpe ratios are preferred to portfolios with smaller ratios.

Note: Limitations of the Sharpe Ratio

If 2 porfolios have negative Sharpe ratios, it is not necessarily true that the higher Sharpe ratio means better risk-adjusted performance.
Sharpe ratio is useful when standard deviation is an appropriate measure of risk. However, investment strategies with option characteristics have asymmetric return distributions (i.e. large probability of small gains and small probability of large losses). In such cases, standard deviation may underestimate risk and produce high Sharpe ratios.

4. Skewness and Kurtosis

A distribution is symmetrical if it is shaped identically on both sides of its mean. In finance, it means that intervals of losses and gains will exhibit the same frequency.

Skewness refers to the extent to which a distribution is not symmetrical. This depends on the occurrence of outliers in the data set. Outliers are the observations with extraordinary large values, either positive ornegative

A positively skewed distribution is chracterized by many outliers in the upper region (right tail).
A negatively skewed distribution has many outliers in the lower region (left tail)

The skewness affects the location of the mean, median, and mode of a nonsymmetrical, unimodal distribution.

Kurtosis is a measure of the degree to which a distribution is more or less "peaked" than a normal distribution.

Leptokurtic - more peaked than a normal distribution
Platykurtic - flatter than a normal distribution
Mesokurtic - same kurtosis as a normal distribution

The kurtosis for normal distribution is 3. If a distribution has more or less kurtosis than the normal distribution, it is said to exhibit excess kurtosis.

Normal distribution has excess kurtosis = 0
Leptokurtic distribution has excess kurtosis > 0
Platykurtic distribution has excess kurtosis < 0

To find out the skewness of a sample, apply the following formula

Note: if |S_K| > 0.5, the distribution has a significant level of skewness

Sample Kurtosis is measured using the following formula

The sample kurtosis is measured relative to the kurtosis of a normal distribution, which is 3.

Excess Kurtosis = Sample Kurtosis – 3

Excess kurtosis > 0, the distribution is leptokurtic (more peaked, fat tail)

Excess kurtosis < 0, the distribution is platokurtic (less peaked, thin tail)

Excess kurtosis > 1 in absolute value is considered large.

Sunday, 9 September 2012

CFA Level I: Quantitative Methods - Statistical Concepts and Market Returns (Part 2)

1 comment: