Book Notes: “Naked Statistics” by Charles Wheelan
Summary
If you think statistics are boring, Naked Statistics: Stripping the Dread from the Data by Charles Wheelan (2013) might disabuse your belief. Wheelan takes a potentially dry topic and—through a combination of relevant examples and a breezy but incisive writing style—demonstrates the power, utility, and even the fun behind statistical analysis. Although Naked Statistics is no replacement for a rigorous college textbook, the reader will gain a fundamental understanding of important statistical concepts and techniques—descriptive statistics, sampling, distribution, correlation, probability, the central limit theory, inference, and regression analysis are all covered.
Wheelan takes great care not to overwhelm the reader with inscrutable mathematical formulae. In most cases, he relegates the nittygritty details to chapter appendices (should the reader be so inclined to dive deeper). In the few instances where he does describe a mathematical equation (e.g. the linear regression equation, standard error equation, or the expected value equation), the math is straightforward and the author does an excellent job of explaining the core concepts. This is a highly accessible book and the rare appearance of a mathematical equation should not scare off wouldbe readers.
There’s no grand hypothesis guiding “Naked Statistics; this is a straightforward instructional tome, the goal is to educate. In the information age, critical thinking is more important than ever. The ability to reason, understand and tease apart the statistical information we are fed by friends, the news media, social media, and institutional sources is more critical than ever. “Naked Statistics” won’t turn you into a statistics genius, but you will walk away with a good grasp of the basics.
Pros: Wellwritten and engaging introduction to an important topic. Should be required reading for all college students.
Cons: Some chapters get repetitive. For instance, the normal distribution and the importance of reading standard deviations rears its head throughout the book but in different guises. Tying these interrelated ideas better would have been appreciated.
Verdict: 7/10
Highlights
Chapter 1: What’s the Point?

This chapter offers an overview of the contents of the subsequent chapters.

Examples of descriptive statistics:
 The quarterback rating which provides a single number that encapsulates QB performance.
 The Gini index which provides a measure of income distribution in a nation.
 Bowling score which expresses performance in a game.
 Grade point average which aggregates academic performance.

The strength of descriptive statistics: Simplifies complex ideas and provides a means of comparison.
 This simplification is also the weakness of descriptive statistics.
 Example: GPA. It is easy to calculate, easy to understand, and helpful for comparison. GPA cons: does not reflect course difficulty and variance in grading standards.

Statistics are useful because they help us process and understand data.

Data: the raw material of knowledge. They often comprise numerical information collected through observation.

“An overreliance on any descriptive statistic can lead to misleading conclusions, or cause undesirable behavior.”

“Descriptive statistics exist to simplify, which always implies some loss of nuance or detail.”

Sampling: Gathering small sets of data that can be used to draw general conclusions representative of similar or larger entities. For instance: polling can be used to understand general population sentiment by sampling a smaller, subset of said population.

Research and polling firm Gallup can generate sound conclusions for the American population based on sample sizes as small as 1000 households.

“Any model to deal with risk must have probability as its foundation.”

“The scientific method dictates that if we are testing a scientific hypothesis, we should conduct a controlled experiment in which the variable of interest (e.g. smoking) is the only thing that differs between the experimental group and the control group. If we observe a marked difference in some outcome between the two groups (e.g. lung cancer), we can safely infer that the variable of interest is what caused that outcome.”

Regression analysis: Statistical tool for isolating the relationship between two variables (e.g. smoking and cancer) while controlling for the effects of other variables (e.g. diet, exercise, weight, age, sex, etc.).

Statistical analysis is imperfect:
 Statistics deal with probabilities and likelihoods.
 “We are usually building a circumstantial case based on imperfect data.”
 The question being answered may not be easy to pin down or may be subjective in nature.
 There are practical and ethical limits on what type of data can be gathered.

The point of learning statistics?
 To summarize large amounts of data.
 To make better decisions.
 To answer important social questions.
 To identify patterns that can offer ideas for improvement.
 To catch cheaters and prosecute criminals.
 To evaluate the effectiveness of policies, drugs, medical treatments, etc.
Chapter 2: Descriptive Statistics

Per capita income (total national income divided by national population):

United States per capita income: $7,787 in 1980; $26,487 in 2010.

Problems with this descriptive statistic:
 Figures are not inflation adjusted (not comparing apples to apples). Nonadjusted figures are also known as nominal figures.
 Inflation adjusted figure for 1980 = $19,600. Inflation adjusted figures are also known as real figures.
 Statistic does not reflect the income of the average American. Example: growth among the 1% of wage earners might raise per capita income but add to the wellbeing of the other 99%.


Statistical paradox: More data can create less clarity. To overcome this we must simplify and condense (doing so we lose nuance and may introduce types of bias).

“Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading.”

Mean (aka “average”): The most basic tool for measuring of the “middle” of a distribution.
 Con: Prone to distortion by outliers (observations farthest from the center).
 Example: If Bill Gates is among a group of 10 random people, the mean net worth calculation for that group will likely be distorted.

Median: Another statistic for identifying the middle of a distribution. The median is the point that divides a distribution in half—half of the distributions lie above the median and half below.
 Con: Does not weight observations based on distance from the midpoint (only whether they lie above or below it).
 The median and mean will be similar in distributions without significant outliers.

Other ways to segment distributions: Can be used for comparison between groupings.
 Quartiles: First quartile is the bottom 25% of observation, second quartile is the next 25% of the observation, etc.
 Deciles: First decile is the bottom 10% of observation, etc.
 Percentiles: First percentage is bottom 1% of observation, etc.

Absolute score: A figure or number with intrinsic meaning.
 Example: A golf score is an absolute figure. A temperature reading is an absolute figure.
 Some absolute figures can be interpreted without additional information and minimal context.

Relative statistic: A figure that has meaning in comparison to something else.
 Example: If you place 9th in a tournament, that is a relative statistic.
 Example: Standardized tests provide absolute scores but that need distribution data (relative information) to become meaningful.

Standard deviation: A measure of data dispersion regarding the mean (i.e. “How spread out are the observations?”).
 For a normal distribution of data, a high proportion of observations lie within one standard deviation of the mean.

Normal distribution: Data that is distributed symmetrically around the mean in a bell curve.
 68.2% of observations lie within 1 standard deviation.
 95.4% of observations lie within 2 standard deviations.
 99.7% of observations lie within 3 standard deviations.

Percentage change: A measure that provides a sense of scale and is useful for comparison: “A percentage change always gives the value of some figure relative to something else.”

Heads up: Don’t confuse percentage change with a change in percentage points.

Example: A sales tax changes from 4% to 5%.
 This is a 25% increase. A relative change over the past rate.
 This is a 1% percentage point increase. An absolute change.

Both above characterizations are accurate, but depending on your agenda you might want to emphasize one over the other.


“Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components.

Case: How to assess the economic health of the American middle class?
 Recommendation: Look at changes in the median wage (adjusted for inflation) over several decades.
 Also examine changes at the 25th and 75th percentiles (the upper and lower bounds of the middle class).
 Note that wages are not the same as income. A wage is a fixed hourly rate of payment. Income is the sum of all payments from different sources.
Chapter 3: Deceptive Description

Precision vs. accuracy:
 Precision: The level of exactitude or specificity used to express something. Example: 41.6 miles is more precise than “a few dozen miles.”
 Accuracy: The level of veracity for a thing. Something that is highly accurate is consistent with the truth.
 “If an answer is accurate, then more precision is usually better. But no amount of precision can make up for inaccuracy.”
 Do not confuse precision with accuracy: “precision can mask accuracy by giving us a false sense of certainty...”

Determining the appropriate measure to use in evaluation is critical. Example: To determine the health of the US economy, is it better to use economic output or employment levels, or a combination of both?

Statistical analysis involves a great deal of interpretive wiggle room.

Example: Two politicians make two seemingly contradictory statements that can be simultaneously true:
 Politician A: Our schools are getting worse. 60% of our schools had lower test scores this year than last year.
 Politician B: Our schools are getting better. 80% of our students had higher test scores this year than last year.

The unit of analysis matters and can be used to support divergent conclusions. The unit of analysis is the thing being compared.


Be aware of when a source is using nominal figures (unadjusted) vs. real figures (inflation adjusted).
 Example: Hollywood studios neglect to use inflationadjusted numbers when considering alltime box office grossing films.
 There is an incentive to use nominal figures to make recent films appear more successful than they are.

Be aware that relative and absolute changes can be misleading or exaggerate certain affects.
 Example: A large percentage change on a small absolute risk or value. We reduce the amount of arsenic in drinking water by 22%, but the absolute amount of arsenic already present was measure in mere micrograms.
 Example: A small percentage change on a large absolute sum is still a large number. We increase military spending by only 4% is still a large sum if the existing budget is $700 billion dollars.

Be aware of the starting point and ending point for any comparisons. The statistical picture of economic growth varies significantly if you compare 1 year vs. 3 years vs. 10 years vs. 20 years.

“Statistics measure the outcomes that matter; incentives give us a reason to improve those outcomes.”

Example: How can school administrators improve district test scores?
 Option A: Improve the quality of education so that students learn more and test better.
 Option B: Prevent the worst students from taking the test or move them to other districts.

“The easiest way for a doctor to improve his mortality rate is by refusing to operate on the sickest patients.”

Chapter 4: Correlation

Correlation: A measurement of the relationship between two phenomena.
 Positive correlation: When one thing increases, so does the related thing. Example: The correlation between outdoor temperature and ice cream. When temperature is high, ice cream consumption increases.
 Negative correlation: When one thing increases, the other goes down. Example: The correlation between exercise and weight. When a person exercises more, their weight tends to decrease.

Correlation coefficient: A statistical value that describes that association between two variables.
 The coefficient is a single value between 1 and 1.
 Perfect correlation = 1. Every change in one variable is associated with an equivalent change in the other variable in the same direction.
 Negative correlation = 1. Every change in one variable is associated with an equivalent change in the other variable in the opposite direction.
 The closer correlation is to 1 or 1, the stronger the association.
 A correlation of 0 (or close to it) signifies no meaningful relationship between the variables.

Correlation example: Why are standardized tests so important when GPA is available?

GPA is a useful, but imperfect descriptive statistic. One student might receive good grades via easy coursework. Another student might receive mediocre grades via challenging coursework.

Standardized tests offer a comparative measure across all students applying to college.

The correlation between SAT scores and firstyear grades is meaningful:
 The correlation between firstyear college GPA is .56.
 The correlation between GPA and firstyear college GPA is .56.
 The correlation of combined GPA and SAT scores and firstyear college GPA is .64.
 In other words, the SAT complements GPA as a reasonable predictor of future academic success.

Chapter 5: Basic Probability

“Probability is the study of events and outcomes involving an element of uncertainty.”
 Some events have known probabilities. Example: The probability of flipping a fair coin is 1/2. The probability of rolling a 1 on a single die is 1/6.
 Some probabilities can be inferred from past data. For instance, the probability of kicking an extra point after a touchdown in American football is .94.

“Probabilities do not tell us what will happen for sure; they tell us what is likely to happen and what is less likely to happen.”

People are poor at comprehending risk and probability. For instance, we fear the improbable and don’t worry enough about more mundane likelihoods.

The probability of two independent events both happening = the product of their respective probabilities.
 Example: The probability of flipping heads on a coin two times in a row is 1/4. This is the product of each independent event 1/2 * 1/ 2.
 This formula only applies if both events are independent (i.e. the outcome of one does not affect the outcome of the other).

The probability of one event or another event happening is the sum of the likelihood of their probabilities.

Example: The likelihood of rolling a 1, 2, or 3 on a single die is 1/2. This is the sum of the individual probabilities: 1/6 + 1/6 + 1/6.

If the events are not mutually exclusive, the probability is the sum of the individual probabilities MINUS the probability of both events happening.
 Example: What is the probability of drawing a 5 or a heart from a standard deck of 52 playing cards? The probability of drawing a 5 is 4/52. The probability of drawing a heart is 13/52. One of the hearts is a 5, so we must account for that. The probability is 16/52 = 4/52 + 13/52  1/32.


Expected value: is the sum of all the different outcomes weighted by probability and payoff. It is a decisionmaking tool for forecasting the payout from a specific event.
 Example: A game of chance in which the payout is $1 if you roll a 1, $2 if you roll a 2, etc. The expected value for a single die roll is the sum of each possible outcome’s value multiplied by its probability. In this case, the EV is $3.50 = $1(1/6) + $2(1/6) + $3(1/6) + $4(1/6) + $5(1/6) + $6(1/6).
 Expected value tells you if a scenario is worth pursuing or not.
 One counterintuitive conclusion that expected value can determine is when it is not worth generally screening a population for a rare disease. In many cases, cost analyses determine that only highrisk groups should be evaluated rather than the general population.

The law of large numbers: As a sample size grows, its mean gets closer to the average of the whole population. For expected value, as the number of independent trials increases, the average of the outcomes approaches the expected value.
 It’s the reason why casinos make money: in the longrun, casino games favor the house.

Insurance is a business clearly tied to expected value. In this case expected loss. Premiums must be priced higher than expected losses (from accidents and claims). So long as the premiums, via the law of large numbers, exceed the latter, the insurance firm should make money.
 Buying insurance is a bad bet from a statistical standpoint: you will pay the insurance company more, on average, than you get back.
 “Insurance will not save you money...it will...prevent some unacceptably high loss...”

Decision trees: Can be used to organize complex sets of possible outcomes (some of which are dependent on others). The ends of the tree yield individual outcomes, payouts and probabilities which can be used to compute expected value.
Chapter 5 1/2: The Monty Hall Problem

The Monty Hall Problem: A probability puzzle based on the American TV game show “Let’s Make a Deal.”

The premise: You are given the choice of 3 doors. Behind one door is a car; behind the others, goats.

Contestant selects a door to start the game (but does not open the door).

The host, who knows what’s behind each door, opens one of the two remaining doors. The door the host selects ALWAYS has a goat behind it.

The host then asks the contestant if he would like to switch doors.

Based on the laws of probability, the contestant SHOULD switch doors:
 The initial selection carried a 1/3 wining probability.
 The subsequent selection carries a 2/3 winning probability.
 This outcome is counterintuitive to most, but computer simulations bear out the conclusion. Test it yourself: [https://www.mathwarehouse.com/montyhallsimulationonline/]


Lesson: Your gut intuition regarding probability can steer you astray.
Chapter 6: Problems with Probability

“Statistics cannot be any smarter than the people who use them. And in some cases, they can make smart people do dumb things.”

Example: The Value at Risk model, VaR used by investment firms in the 2000s (and lead to the 2008 financial crisis).

VaR gave investment managers a precise measure of risk as a single dollar figure (a descriptive statistic).

False precision and faith in the VaR created a false sense of confidence and security in the measure.

Two problems with VaR:
 The underlying probabilities were based on past market performance (as we know: the past is not a predictor of the future).
 The model assumed 99% probability that the investments were safe. But the 1%, while unlikely, is disastrous should it happen. People tend to discount tail risk.

“I don’t need to stock up on water because if there is a natural disaster, I’ll just go to the supermarket and buy some.” (of course the supermarkets are either destroyed or overrun by panic buyers at that point).


Nassim Taleb (essayist and risk analyst): “The greatest risks are never the ones you can see and measure, but the ones you can’t see and therefore can never measure. The ones that seem so far outside the boundary of normal probability that you can’t imagine they could happen in your lifetime—even though, of course, they do happen, more often than you care to realize.”

“Unlikely things happen...over a long enough period of time, they are not even that unlikely.”

Common probability errors and challenges:
 Assuming events are independent when they are not.
 Not understanding when events ARE independent.
 Clusters and unlikely random outcomes do happen. “When we see an anomalous event...we assume that something besides randomness must be responsible.”
 The prosecutor’s fallacy. This occurs when the context surrounding statistical evidence is neglected (e.g. the chances of finding a coincidental oneinamillion match is high if you run a sample through a database with samples form millions of people).
 Reversion to the mean (or regression to the mean). Outlier observations are likely to be followed by outcomes more consistent with longterm averages.
 Statistical discrimination. What should we do ethically, philosophically, and legally when statistics suggest discriminatory policies? Example: Men pay more for auto insurance because they are more likely to get into accidents. Women pay more for annuities because they live longer.
Chapter 7: The Importance of Data

“No amount of fancy analysis can make up for fundamentally flawed data. Hence, the expression “garbage in, garbage out.”

Data is generally asked to do the following:
 Be representative of a larger group or population.
 Provide a source or means of comparison. In many cases, this means creating a treatment group and control group through randomization.
 To be collected for future use (when we can better determine what to do with it).

Longitudinal studies collect subject data over long periods of time (e.g. once every two years).

Crosssectional studies collect data from a single point in time.

Common examples of “garbage in, garbage out”:
 Selection bias: “If each member of the relevant population does not have an equal chance of ending up in the sample, we are going to have a problem with whatever results emerge from that sample.”
 Publication bias: Positive findings are more likely to be published than negative findings. This can skew results and our understanding of the reported phenomenon.
 Recall bias: Memories are systematically fragile. It is difficult to get test subjects to accurately recount things in the past.
 Survivorship bias: This phenomenon occurs when observations fall out of the sample and changes the final composition being observed. [Wald’s bullet holes are a good example of this.]
 Healthy user bias: “People who take vitamins regularly are likely to be healthy—because they are the kind of people who take vitamins regularly.”
Chapter 8: The Central Limit Theorem

“The central limit theorem is the ‘power source’ for many of the statistical activities that involve using a sample to make inferences about a large population.”

“The core principle underlying the central limit theorem is that a large, properly drawn sample will resemble the population from which it is drawn.”

“If a sample usually looks like the population from which it’s drawn, it must also be true that a population will usually look like a sample drawn from that population.”

Per the central limit theorem, sample means (averages) for any population will be distributed roughly as a normal distribution around the population mean.
 This means we can use the same standard deviation probabilities from Chapter 2 when analyzing data and making statistical inferences.

Standard error measures dispersion of sample means. It is the standard deviation of a sample’s distribution (or an estimate of that standard deviation).
 Standard deviation measure dispersion in the underlying population.
 Standard error measures dispersion of sample means.
 “A large standard error means that the sample means are spread out widely around the population mean; a small standard error means that they are clustered relatively tightly.”
 “Sample means will cluster more tightly around the population mean as the size of each sample gets larger.”
 Standard error = standard deviation of population from which the population is drawn divided by the square of the sample size.

“The less likely it is that an outcome has been observed by chance, the more confident we can be in surmising that some other factor is in play.”
Chapter 9: Inference

“Statistics cannot prove anything with certainty. Instead, the power of statistical inference derives from observing some pattern or outcome and then using probability to determine the most likely explanation for that outcome.”

Remember: The most likely explanation isn’t always the correct one. Improbable things do happen.

Statistical inference: The use of statistical analysis to answer important questions. For example: Is a new drug effective at treating heart disease? Does increased school spending result in better student outcomes?

Hypothesis testing:
 Statistical inference begins with an implicit or explicit null hypothesis (starting assumption).
 If we reject the null hypothesis, then we must accept an alternative hypothesis consistent with the data observed.
 Example: In a court of law, the null hypothesis is that the defendant is innocent. The prosecution must convince the judge or jury to reject the null hypothesis and accept the alternative hypothesis (that the defendant is guilty).

Significance level: The arbitrary probability threshold at which a null hypothesis can be rejected.
 A common threshold is 5% (typically represented in decimal form as .05).
 Pvalue is the specific probability. In statistics, the pvalue is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test. A smaller pvalue means that there is evidence favoring the alternative hypothesis.
 “When we can reject a null hypothesis at some reasonable significance level, the results are said to be statistically significant.”
 “Statistical significance says nothing about the size of the association.”

Confidence interval: a range associated with a given probability level. Example: we can say with 95% confidence (95 times out of 100) that the interval between 1202.8 and 1274.8 cubic cm includes the average brain volume for children in the general population without autism.

Choosing an appropriate significance level involves a tradeoff:

A low burden of proof (e.g. 0.1 or 10%) means we are more likely to reject the null hypothesis when it is true. Alternatively we may erroneously select a false alternative (false positive). This is called a Type I error.
 Example: An innocent person is convicted.

A high burden of proof (e.g. .01 or 1%) means we are more likely to reject an alternative hypothesis when it is true (false negative). This is called a Type II error.
 Example: A guilty person is not convicted.

Consider some real world situations involving the tradeoff:

Spam filters: Null hypothesis is that a message is not spam.
 Type I error means screening emails that are not spam (false positive).
 Type II error means allowing spam into your inbox (false negative).

Cancer screening: Null hypothesis is that there is no cancer.
 Type I error means identifying cancer in healthy patients (false positive).
 Type II error means failing to identify cancer in sick patients (false negative).

Capturing terrorists: Null hypothesis is that a person is not a terrorist.
 Type I error identifies innocent people as terrorists (false positive).
 Type II error means failing to identify a real terrorist (false negative).


Chapter 10: Polling

“A poll (or survey) is an inference about the opinions of some population that is based on the views expressed by some sample drawn from that population.”

A poll’s margin of error functions like a confidence interval. This is a 95% confidence interval. If the margin of error is ± 3 percent, this means that 95 out of 100 polls conducted with samples drawn from the same population should fall within the general population’s true sentiment by ± 3 points (note that the 5% outside the confidence range is still possible).

In polling the sample statistic is not a mean but a percentage (e.g. 47% of voters).

“The only way to become more certain that your polling results will be consistent with the election outcome without new data is to become more timid in your prediction.”
 Example: You’re pretty sure that Thomas Jefferson was the 3rd or 4th President. To become more confident, you must get less specific by saying, for example, “he was one of the first five presidents.”

A large sample has a smaller standard error and is more accurate.

A small sample has a larger standard error and a larger confidence interval.

Bad polling results from a biased sample, bad questions, or combination of both.
Chapter 11: Regression Analysis

Regression analysis: A tool that quantifies the relationship between variables while controlling for other factors.

Regression analysis finds the “best fit” linear relationship between two variables.

Ordinary least squares (OLS): A methodology that fits the line based on minimizing the sum of squared residuals (vertical distance from the regression line).

Linear regression equation: Y = a + bX
 Y is the yaxis variable (dependent variable)
 a is the yintercept (value for Y when X = 0)
 X is the xaxis variable (explanatory variable)
 b is the slope of the line

Basic terminology (using an example that considers the relationship between height and weight).

Dependent variable is the variable being explained (it depends on other factors). Example: weight (yaxis).

Explanatory variable or independent variable is the variable that explains our dependent variable. Example: height (xaxis).

Weight = 135 + (4.5) * Height in Inches
 Y = a + bX
 4.5 is the coefficient on height. Every oneunit increase in the independent variable is associated with 4.5 unit increase in the dependent variable (weight).
 In other words: without any other information, we could use the height of a person from the studied population to generate a good guess about their weight.

Regression coefficient has three interesting characteristics:
 Sign (positive or negative): Shows the direction of the association with the dependent variable. In the above example, the coefficient on height is positive.
 Size: How big is the observed effect between independent and dependent variable? In the above example, the size is 4.5 pounds which is significant.
 Significance: Is the association between variables meaningful? One way to confirm is to determine if other samples express this association.
 A standard error can be calculated for the coefficient. This measures dispersion of the value in repeated analyses on further samples.
 “One rough rule of thumb is that the coefficient is likely to be statistically significant when the coefficient is at least twice the size of the standard error.”

R2 (Rsquared): The measure of the total amount of variation explained by the regression equation.
 In the example above, R2 tells us how much of the variation around the mean is associated with differences in height alone.
 For the weightheight analysis, the R2 is .25 (25%). Which is to say that 75% of the variance is unexplained by the analysis (perhaps factors such as sex, diet, age, genetics, etc.).


Regression analysis can include multiple variables. A multivariate regression or multiple regression analysis estimates the linear relationship between each explanatory variable and the dependent variable (while holding other dependent variables constant).
Chapter 12: Common Regression Mistakes

“Regression analysis provides precise answers to complicated questions. These answers may or may not be accurate. In the wrong hands, regression analysis will yield results that are misleading or just plain wrong.”

Common errors in regression analysis:

Using regression to analyze a nonlinear relationship.

Correlation does not equal causation. Regression only demonstrates an association between the variables.

Reverse causality. “A statistical association between A and B does not prove that A causes B. In fact, it’s entirely plausible that B is causing A.”
 Causality can work in both directions.

Omitted variable bias. “Regression results will be misleading and inaccurate if the regression equation leaves out an important explanatory variable...”
 Example: Study that finds that golfers are more prone to heart disease and cancer. The median age of golfers skews older. If the study fails to account for age in its analysis, the conclusion will be flawed: golf isn’t killing the participants, old age is.

Highly correlated explanatory variables (multicollinearity). If two or more variables are highly correlated, the study may not discern the true relationship between each variable.
 Example: Study that is looking at SAT scores and drug use. If the drug variables are Heroin and Cocaine and users of each drug tend to use both, we will not gain a strong understanding of each drug independently unless the study is restructured.

Extrapolating beyond the data. The results of analysis are only valid for a population that is similar to the sample studied.

Data mining (too many variables). Extraneous, unjustified variables will muddle the analysis.

Chapter 13: Program Evaluation

“Brilliant researchers are often individuals or teams who find creative ways to do ‘controlled’ experiments.”

Questions such as “How would going to Harvard effect your life?” and “Does putting more police officers on the street deter crime?” are difficult to answer.

Program evaluation: The process by which we seek to measure the causal effect of some intervention (e.g. the impact of a new cancer drug, a job placement program for high school dropouts, etc.).

Treatment: The intervention or factor being evaluated.

Common approaches for isolating treatment effects:
 Randomized, controlled experiments (create a treatment and control group). Clinical trials use this model.
 Natural experiments. Exploit a natural event which mimics the characteristics of a randomized, controlled experiment.
 Nonequivalent control. Sometimes you have no choice but to create nonrandomized treatment and control groups. Valid conclusions can still be drawn, but you must manage the inherent bias of the test groups in clever ways.
 Differences in differences. Observe cause and effect sequentially through “before” and “after” data.
 Discontinuity analysis studies similar groups that diverge at some point and experience different outcomes (example: compare one group that barely qualifies for an intervention with another that just missed eligibility and then compare outcomes).

“To understand the true impact of a treatment, we need to know the counterfactual, which is what would have happened in the absence of that treatment or intervention.”