Sunday, 9 September 2012

CFA Level I: Quantitative Methods - Statistical Concepts and Market Returns (Part 2)

For list of other CFA Level I topics, click here.
For the previous part of this topic, click here.

In part 2 of this topic, we are going to cover the following items:
- Calculations with quantiles
- measures of central tendency (mean, mode, median, range,  standard deviation, variance,  etc.)
- skewness and kurtosis of distributions

1. Calculations with quatiles

Quantile is the general term for a value at or below which a stated proportion of the data in a distribution lies.
  • Quartile - the distribution is divided into quarters
  • Quintile - the distribution is divided into fifths
  • Decile - the distribution is divided into tenths
  • Percentile - the distribution is divided into hundredths (percents) 


The equation for the position of the observation at a given percentile y , with n data points sorted in ascending order is:
Ly = (n + 1)y/100

The following example is taken from the CFA Level I curriculum (2011) as an illustration of the concepts above.

No.
Company
Div Yield (%)
No.
Company
Div Yield (%)
1
AstraZeneca
0.00
26
UBS
2.65
2
BP
0.00
27
Tesco
2.95
3
Deutsche Telekom
0.00
28
Total
3.11
4
HSBC Holdings
0.00
29
GlaxoSmithKline
3.31
5
Credit Suisse Group
0.26
30
BT Group
3.34
6
L’Oreal
1.09
31
Unilever
3.53
7
SwissRe
1.27
32
BASF
3.59
8
Roche Holding
1.33
33
Santander Central Hispano
3.66
9
Munich Re Group
1.36
34
Banco Bilbao Vizcaya Argentina
3.67
10
General Assicurazioni
1.39
35
Diageo
3.68
11
Vodafone Group
1.41
36
HBOS
3.78
12
Carrefour
1.51
37
E.ON
3.87
13
Nokia
1.75
38
Shell Transport and Co.
3.88
14
Novartis
1.81
39
Barclays
4.06
15
Allianz
1.92
40
Royal Dutch Petroleum Co.
4.27
16
Koninklije Philips Electronics
2.01
41
Fortis
4.28
17
Siemens
2.16
42
Bayer
4.45
18
Deutsche Bank
2.27
43
DaimlerChrysler
4.68
19
Telecom Italia
2.27
44
Suez
5.13
20
AXA
2.39
45
Aviva
5.15
21
Telefonica
2.49
46
Eni
5.66
22
Nestle
2.55
47
ING Group
6.16
23
Royal Bank of Scotland Group
2.60
48
Prudential
6.43
24
ABN-AMRO Holding
2.65
49
Lloyds TSB
7.68
25
BNP Paribas
2.65
50
AEGON
8.14

a. Caluclate the 10th and 90th percentile
b. Calculate first, second, and third quartile
c. Find Median

Answers

a. In this example: n = 50, using the equation Ly = (n + 1)y/100 for the position of the yth percentile (Py)

For the 10th percentile: L10 = (50 + 1)(10/100) = 5.1
L10 is between the 5th and 6th observations with values X5 = 0.26 (Credit Suisse Group) and X6 = 1.09 (L’Oreal). The estimate of the 10th percentile (first decile) for the dividend yield is
P10 ≈ X5 + (L10 – 5)(X6 – X5) = 0.26 + (5.1 – 5)(1.09 – 0.26) = 0.34% 

For the 90th percentile:  L90 = (50 + 1)(90/100) =45.9
L90 is between the 45th and 46th observations with X45 = 5.15 and X46 = 5.66. The estimate of the 90th percentile is
   P90 ≈ X45 + (L90 – 45)(X46 – X45) = 5.15 + (45.9 – 45)(5.66 – 5.15) = 5.61%

Note: In the calculations above, P10 shows that 10th percentile lies (5.1 – 5) = 10% of the distance between the 5th and 6th observations. The distance between the 5th and 6th observations is 1.09 – 0.26 = 0.83, 10% of that distance is 0.083. We obtain P10 by adding this value (0.083) to the closest observation before L10 (i.e. X5).  The calculation for P90 is exactly the same.

b. The first, second, and third quartile correspond to P25, P50, and P75 respectively. 
L25 = (50 + 1)(25/100) = 12.75
L50 = (50 + 1)(50/100) = 25.50
L75 = (50 + 1)(75/100) = 38.25
Using the same way we calculate the positions of the 10th and 90th percentile in the previous question, we obtain the following results
P25 = Q1 = 1.69%         P50 = Q2 = 2.65%         P75 = Q3 = 3.93%

c. The median is the 50th percentile, 2.65%.

2. Range, Mean Absolute Deviation, Variance, Standard Deviation, and Chebyshev's Inequality

Range is the distance between the largest and the smallest value in a data set
range = max value – min value

The Mean Absolute Deviation (MAD) is the average of the absolute values of the deviations of individual observations from the arithmetic mean
Population Variance (σ2) is the average of squared deviations from the mean.  
Population Standard Deviation (σ) is a measure of the dispersion of a set of data from its mean. The more spread apart the data, the higher the deviation. Standard deviation is calculated as the square root of variance. 

Example: Find MAD, variance, and standard deviation of the following set of investment returns [5%, 15%, 22%, 12%, 7%]

Mean = (5 + 15 + 22 + 12 + 7)/5 = 12.2%
MAD = (|5 – 12.2| + |15 – 12.2| + |22 – 12.2| + |12 – 12.2| + |7 – 12.2|)/5 = 5.04%
This result can be interpreted to mean that, on average, an individual return deviate +/- 5.04% from the mean return of 12.2% 
Variance = σ2 = [(5 – 12.2)2 + (7 – 12.2)2 + (12 – 12.2)2 + (15 – 12.2)2 + (22 – 12.2)2]/5 = 36.56 (%2) 
Standard Deviation = σ = 6.05%

Sample variance (s2) is the measure of dispersion that applies when we evaluate a sample of n observations from a population.
Sample Standard Deviation (s) is the square root of sample variance


Chebyshev's Inequality states that for any set of observations, whether sample or population data and regardless of the shape of the distribution, the percentage of observations that lie within k standard deviations of the mean is at least 1 – 1/k2 for all k > 1

According to Chebyshev's Inequality, the following relationships hold for any distribution. At least:
  • 36% of observations lie within ± 1.25 standard deviations of the mean
  • 56% of observations lie within ± 1.50 standard deviations of the mean 
  • 75% of observations lie within ± 2 standard deviations of the mean
  • 89% of observations lie within ± 3 standard deviations of the mean 
  • 94% of observations lie within ± 4 standard deviations of the mean
Example: find out the minimum percentage of any distribution that will lie within ± 2.5 standard deviations of the mean.


3. Coefficient of Variance, Sharpe Ratio, 

Coefficient of Variantion (CV) is a statistical measure of the dispersion of data points in a data series around the mean. In the investing world, the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment.
Example: Given monthly the mean return on T-bills is 0.25% (usually represents risk-free rate) with a standard deviation of 0.36% and the mean monthly return for S&P500 is 1.09% with a standard deviation of 7.3%. Calculate and interprete the CVs of these 2 investments.

CVT-bills = 0.36/0.25 = 1.44
CVS&P500 = 7.3/1.09 = 6.70
The reults indicate that there is less dispersion (risk) per unit of monthly return for T-bills than for S&P500

The Sharpe Ratio (Reward-to-variability ratio) measures excess return per unit of risk. Investments with large positive Sharpe ratios are preferred to portfolios with smaller ratios.

Note: Limitations of the Sharpe Ratio
  • If 2 porfolios have negative Sharpe ratios, it is not necessarily true that the higher Sharpe ratio means better risk-adjusted performance.
  • Sharpe ratio is useful when standard deviation is an appropriate measure of risk. However, investment strategies with option characteristics have asymmetric return distributions (i.e. large probability of small gains and small probability of large losses). In such cases, standard deviation may underestimate risk and produce high Sharpe ratios. 
4. Skewness and Kurtosis

A distribution is symmetrical if it is shaped identically on both sides of its mean. In finance, it means that intervals of losses and gains will exhibit the same frequency. 
Skewness refers to the extent to which a distribution is not symmetrical. This depends on the occurrence of outliers in the data set. Outliers are the observations with extraordinary large values, either positive ornegative
  • A positively skewed distribution is chracterized by many outliers in the upper region (right tail).
  • A negatively skewed distribution has many outliers in the lower region (left tail)
The skewness affects the location of the mean, median, and mode of a nonsymmetrical, unimodal distribution.

Kurtosis is a measure of the degree to which a distribution is more or less "peaked" than a normal distribution.
  • Leptokurtic - more peaked than a normal distribution
  • Platykurtic - flatter than a normal distribution
  • Mesokurtic - same kurtosis as a normal distribution

The kurtosis for normal distribution is 3. If a distribution has more or less kurtosis than the normal distribution, it is said to exhibit excess kurtosis.
  • Normal distribution has excess kurtosis = 0
  • Leptokurtic distribution has excess kurtosis > 0
  • Platykurtic distribution has excess kurtosis < 0
To find out the skewness of a sample, apply the following formula
 Note: if |SK| > 0.5, the distribution has a significant level of skewness
Sample Kurtosis is measured using the following formula
The sample kurtosis is measured relative to the kurtosis of a normal distribution, which is 3
 
Excess Kurtosis = Sample Kurtosis 3

Excess kurtosis > 0, the distribution is leptokurtic (more peaked, fat tail)
Excess kurtosis < 0, the distribution is platokurtic (less peaked, thin tail)
Excess kurtosis > 1 in absolute value is considered large.

1 comment:

  1. The graphs showing different kurtosis really just show different variances. Also, kurtosis measures tails (outliers) only, not "peakedness" or "flatness."

    ReplyDelete