ClearConcept — A-Level Revision

151 terms

ANOVA

One-way ANOVA tests H₀: μ₁ = μ₂ = ... = μₖ by calculating F = MS_between / MS_within, where MS_between and MS_within are

ANOVA

Standard

ANOVA (Analysis of Variance)

ANOVA tests H₀: all group means are equal vs. H₁: at least one group mean differs. The F-statistic is F = (between-group

ANOVA

Standard

Binomial Coefficient

The binomial coefficient, denoted ⁿCₖ or C(n, k) or (n choose k), equals n! / (k!(n-k)!). It counts the number of ways t

Binomial Distribution

Standard

Cell Combining in Chi-squared Test

If any expected frequency is < 5, rows or columns with small expected frequencies should be combined before calculating

Chi-Squared Tests

Standard

Cohen's d

Cohen's d = (mean₁ - mean₂) / s_pooled, where s_pooled = √[((n₁-1)s₁² + (n₂-1)s₂²) / (n₁+n₂-2)]. Interpretation: |d| < 0

Effect Size

Standard

Conditional Probability

The probability of event B occurring given that event A has already occurred, calculated as P(B|A) = P(A ∩ B) / P(A), wh

Probability

Standard

Conditional Probability Notation

The conditional probability P(A | B) is read 'the probability of A given B' and equals P(A ∩ B) / P(B), provided P(B) >

Probability

Standard

Conditions for Non-parametric Tests

Non-parametric tests require: random sampling, independent observations (for each test type), and that variables are mea

Non-Parametric Tests

Standard

Contingency Table Independence

A chi-squared test for independence uses χ² = Σ[(O - E)² / E] where O = observed frequency and E = expected frequency un

Chi-Squared Tests

Standard

Continuity Correction

When approximating a discrete random variable (taking integer values k) with a continuous distribution, the discrete val

Normal Distribution

Standard

Continuous Random Variable

A continuous random variable X can assume any real value within a specified interval. Probabilities are described by a p

Random Variables

Standard

Cumulative Distribution Function

The cumulative distribution function F(x) = P(X ≤ x) gives the total probability for all values up to and including x. F

Random Variables

Standard

Discrete Random Variable

A discrete random variable X assigns numerical values to outcomes of a random experiment and can only take on specific,

Random Variables

Standard

Distribution of Sample Mean

If X ~ N(μ, σ²), then the sample mean X̄ from n independent observations follows X̄ ~ N(μ, σ²/n). The standard error of

Normal Distribution

Standard

Expected Frequency Calculation

For a contingency table, the expected frequency for a cell is E = (row total × column total) / sample size. This represe

Chi-Squared Tests

Standard

Hypothesis Test Structure

A hypothesis test follows: (1) state H₀ (null hypothesis) and H₁ (alternative hypothesis); (2) select a significance lev

Expectation Algebra

Standard

Linear Combination

For independent random variables X and Y with E(X)=μₓ, Var(X)=σₓ², E(Y)=μᵧ, Var(Y)=σᵧ², the linear combination Z = aX +

Geometric Distribution

Standard

Margin of Error

The margin of error (ME) is the maximum expected difference between a sample statistic and the population parameter. For

Hypothesis Testing

Standard

Normal Approximation to Binomial

For large n, a binomial distribution B(n, p) can be approximated by a normal distribution N(np, np(1-p)), provided np ≥

Normal Distribution

Standard

Paired t-test

The paired t-test compares the mean difference d̄ to zero using t = d̄ / (sₐ/√n), where sₐ is the standard deviation of

Paired t-Test

Standard

Power of Test

The power of a statistical test is P(reject H₀ | H₀ is false) = 1 - β, where β is the probability of Type II error. Powe

Hypothesis Testing

Standard

Probability Density Function

A probability density function f(x) for a continuous random variable X satisfies f(x) ≥ 0 for all x, and ∫f(x)dx = 1 ove

Random Variables

Standard

Randomisation in Experimental Design

Randomisation means each experimental unit has an equal chance of receiving any treatment. Random assignment breaks asso

Experimental Design

Standard

Sampling Bias

Sampling bias occurs when the sampling method inherently favors certain types of units. The sample is not a representati

Sampling Methods

Standard

SEC: Statistical Enquiry Cycle Initial Planning

Initial planning in the SEC involves: (1) identifying factors related to the problem, (2) defining a research question o

Statistical Enquiry Cycle

Standard

SEC: Statistical Enquiry Cycle Initial Planning

Data collection involves: (1) designing unbiased primary data collection methods, (2) researching secondary data sources

Statistical Enquiry Cycle

Standard

SEC: Statistical Enquiry Cycle Initial Planning

Data processing and presentation involves: (1) organising and processing data using technology, (2) making inferences ab

Statistical Enquiry Cycle

Standard

SEC: Statistical Enquiry Cycle Initial Planning

Interpretation of results involves: (1) analysing diagrams and calculations, (2) drawing conclusions related to original

Statistical Enquiry Cycle

Standard

SEC: Statistical Enquiry Cycle Initial Planning

Evaluation and review involves: (1) identifying weaknesses in data collection and display methods, (2) recognising limit

Statistical Enquiry Cycle

Standard

Unpaired t-test

The unpaired (or independent samples) t-test compares two independent sample means using t = (x̄₁ - x̄₂) / SE, where SE

Unpaired t-Test

Standard

Wilcoxon Rank-sum Test

A non-parametric test for independent samples testing whether two populations have the same distribution, using ranks fr

Non-Parametric Tests

Standard

Wilcoxon Signed-Rank Procedure

The Wilcoxon signed-rank test: (1) calculates differences within pairs; (2) ignores signs, ranks absolute differences; (

Non-Parametric Tests

Standard

1.1: Mean

The arithmetic average of a set of values, calculated by summing all values and dividing by the number of values. Denote

Descriptive Statistics

Standard

1.1: Mean

The middle value of a dataset when arranged in ascending or descending order. For an even number of values, it is the av

Descriptive Statistics

Standard

1.1: Mean

The value that occurs with the highest frequency in a dataset. For grouped data, the modal class is the class with the h

Descriptive Statistics

Standard

1.1: Mean

The simplest measure of spread, calculated as the difference between the maximum and minimum values: Range = Max - Min.

Descriptive Statistics

Standard

1.1: Mean

A measure of spread calculated as IQR = Q₃ - Q₁, where Q₃ is the upper quartile and Q₁ is the lower quartile. It represe

Descriptive Statistics

Standard

1.1: Mean

A measure of dispersion representing the average distance of data values from the mean. Calculated as s = √(Σ(x - x̄)²/(

Descriptive Statistics

Standard

1.1: Mean

The average of the squared deviations from the mean, calculated as s² = Σ(x - x̄)²/(n-1) for a sample or σ² = Σ(x - μ)²/

Descriptive Statistics

Standard

1.2: Histogram

A graphical representation of grouped continuous data using rectangular bars where the area of each bar is proportional

Descriptive Statistics

Standard

1.2: Histogram

A graphical display consisting of a rectangular box showing Q₁, median, and Q₃, with lines (whiskers) extending to show

Descriptive Statistics

Standard

1.2: Histogram

A running total of frequencies, where the cumulative frequency for a class is the sum of the frequency of that class and

Descriptive Statistics

Standard

1.2: Histogram

A data display technique where values are split into a stem (all digits except the last) and a leaf (the last digit), ar

Descriptive Statistics

Standard

1.3: Skewness

A measure of the asymmetry of a distribution. Calculated using the formula: skewness = (mean - mode) / standard deviatio

Descriptive Statistics

Standard

1.3: Skewness

Values that are significantly different from the rest of the dataset. Formally, a value x is an outlier if x < Q₁ - 1.5×

Descriptive Statistics

Standard

10.1: Sign Test

A non-parametric test for paired data using the binomial distribution, testing H₀: median difference = 0 by comparing th

Non-Parametric Tests

Standard

10.1: Sign Test

A non-parametric test for paired data that ranks absolute differences and uses signed ranks, testing H₀: the distributio

Non-Parametric Tests

Standard

10.1: Sign Test

The Mann-Whitney U test compares two independent samples by ranking all observations together and calculating U = n₁n₂ +

Non-Parametric Tests

Standard

11.1: Bayes' Theorem

A formula for conditional probability: P(A|B) = [P(B|A) × P(A)] / P(B), allowing calculation of P(A|B) when P(B|A), P(A)

Bayes' Theorem

Standard

11.1: Bayes' Theorem

The conditional probability P(A|B) after observing event B, calculated using Bayes' theorem as the updated belief about

Bayes' Theorem

Standard

11.1: Bayes' Theorem

The probability P(A) before observing any evidence related to A, representing initial belief or background information.

Bayes' Theorem

Standard

12.1: Geometric Distribution

The geometric distribution models the number of independent Bernoulli trials X required until the first success, where e

Geometric Distribution

Standard

12.1: Geometric Distribution

The negative binomial distribution describes the number of trials X required to achieve r successes when each trial has

Geometric Distribution

Standard

12.1: Geometric Distribution

The continuous uniform distribution on interval [a, b] has constant probability density f(x) = 1/(b-a). It models situat

Geometric Distribution

Standard

12.1: Geometric Distribution

A Bernoulli trial is a random experiment with two mutually exclusive outcomes: success (with probability p) and failure

Geometric Distribution

Standard

13.1: Randomisation

Randomisation is the random allocation of experimental units to different treatment conditions. It ensures that differen

Experimental Design

Standard

13.1: Randomisation

Replication means applying each treatment to multiple independent experimental units rather than just once. This allows

Experimental Design

Standard

13.1: Randomisation

A control group is a set of experimental units that receive no treatment (or a standard/placebo treatment) and serves as

Experimental Design

Standard

13.1: Randomisation

In a blind (or single-blind) trial, participants do not know which group they're in or what treatment they're receiving.

Experimental Design

Standard

13.1: Randomisation

A confounding variable is an extraneous variable that is correlated with the treatment variable and affects the response

Experimental Design

Standard

13.1: Randomisation

Replication means each treatment is applied to multiple experimental units, not just one. Replication provides: (1) samp

Experimental Design

Standard

13.1: Randomisation

In a double blind trial, both participants and experimenters are unaware of treatment assignments. This eliminates bias

Experimental Design

Standard

13.2: Blocking

Blocking divides experimental units into homogeneous groups (blocks) based on some characteristic that is known to affec

Experimental Design

Standard

13.2: Blocking

Blocking divides experimental units into homogeneous groups (blocks) before randomization. Treatments are randomly assig

Experimental Design

Standard

13.3: Randomised Block Design

A randomised block design divides units into blocks based on variables expected to affect the response, then randomly as

Experimental Design

Standard

13.3: Randomised Block Design

In a completely randomised design, all units are randomly assigned to treatment groups without blocking. Each unit has e

Experimental Design

Standard

14.1: Parameter

A parameter is a numerical property of an entire population. Parameters are typically unknown and fixed for a given popu

Estimation

Standard

14.1: Parameter

A statistic is a numerical property of a sample, calculated from the observed data. Common statistics include x̄ (sample

Estimation

Standard

14.1: Parameter

A statistic is an unbiased estimator of a parameter if E(statistic) = parameter. For example, the sample mean x̄ is an u

Estimation

Standard

14.1: Parameter

The standard error (SE) is the standard deviation of a statistic's sampling distribution. For the sample mean, SE(x̄) =

Estimation

Standard

14.2: Central Limit Theorem

The Central Limit Theorem states that if X₁, X₂, ..., Xₙ are independent random variables from any distribution with mea

Estimation

Standard

14.2: Central Limit Theorem

The sampling distribution of a statistic is the probability distribution of that statistic calculated from all possible

Estimation

Standard

14.2: Central Limit Theorem

The bootstrap generates confidence intervals and estimates sampling distributions by repeatedly resampling from the obse

Estimation

Standard

15.1: Hypothesis Test

A hypothesis test is a formal procedure for evaluating evidence against a null hypothesis H₀ using a test statistic and

Hypothesis Testing

Standard

15.1: Hypothesis Test

The null hypothesis H₀ is a statement that specifies no change, no effect, or no difference between groups. It represent

Hypothesis Testing

Standard

15.1: Hypothesis Test

The alternative hypothesis H₁ (or Hₐ) is a statement that contradicts the null hypothesis, typically asserting that an e

Hypothesis Testing

Standard

15.1: Hypothesis Test

The p-value is P(test statistic as extreme as observed | H₀ is true). For a two-sided test with test statistic z, p-valu

Hypothesis Testing

Standard

15.1: Hypothesis Test

The significance level α is the predetermined probability threshold for making a Type I error (rejecting a true null hyp

Hypothesis Testing

Standard

15.1: Hypothesis Test

A test statistic is a function of the sample data that follows a known distribution (such as t, z, or χ²) under the null

Hypothesis Testing

Standard

15.1: Hypothesis Test

A critical value is a point on the distribution of the test statistic that separates the rejection region from the non-r

Hypothesis Testing

Standard

15.1: Hypothesis Test

A confidence interval for parameter θ is an interval [L, U] calculated from sample data such that P(L < θ < U) = confide

Hypothesis Testing

Standard

15.1: Hypothesis Test

The t-distribution is a family of distributions indexed by degrees of freedom (df). For a sample of size n, df = n - 1.

Hypothesis Testing

Standard

15.1: Hypothesis Test

Degrees of freedom (df) is the number of independent pieces of information available for estimation or hypothesis testin

Hypothesis Testing

Standard

15.1: Hypothesis Test

A 95% confidence interval means that if sampling and interval calculation are repeated many times, about 95% of the calc

Hypothesis Testing

Standard

15.5: Type I Error

A Type I error occurs when you reject H₀ even though H₀ is true. The probability of making a Type I error is α (the sign

Hypothesis Testing

Standard

15.5: Type I Error

A Type II error occurs when you fail to reject H₀ even though H₀ is false. The probability of a Type II error is denoted

Hypothesis Testing

Standard

15.7: One-sided Test

In a one-sided (or one-tailed) test, the alternative hypothesis is directional: H₁: θ > θ₀ (right-tailed) or H₁: θ < θ₀

Hypothesis Testing

Standard

15.7: One-sided Test

In a two-sided (or two-tailed) test, the alternative hypothesis is non-directional: H₁: θ ≠ θ₀. The critical region is s

Hypothesis Testing

Standard

18.1: Exponential Distribution

The exponential distribution with parameter λ > 0 has probability density function f(x) = λe^(-λx) for x ≥ 0, and cumula

Poisson and Exponential Distributions

Standard

18.1: Exponential Distribution

A discrete distribution with parameter λ > 0 where P(X = k) = e^(-λ) × λ^k / k! for k = 0, 1, 2, ... The mean is E(X) =

Poisson and Exponential Distributions

Standard

19.1: Goodness of Fit Test

A goodness of fit test uses χ² = Σ((O-E)²/E) to compare observed frequencies to those expected under a hypothesized dist

Goodness of Fit

Standard

19.1: Goodness of Fit Test

Goodness of fit tests compare observed frequencies O to expected frequencies E under a hypothesized distribution using χ

Goodness of Fit

Standard

2.1: Addition Rule of Probability

The rule that P(A ∪ B) = P(A) + P(B) - P(A ∩ B), which gives the probability that at least one of the events A or B occu

Probability

Standard

2.1: Addition Rule of Probability

The rule that P(A ∩ B) = P(A) × P(B|A), giving the probability that both events A and B occur. For independent events, t

Probability

Standard

2.1: Addition Rule of Probability

Set theory provides the formal language for probability, where the sample space S is represented as a set, events as sub

Probability

Standard

2.1: Addition Rule of Probability

The sample space S (or Ω) is the universal set containing every possible outcome of an experiment. Any event is a subset

Probability

Standard

2.2: Independent Events

Events A and B are independent if P(A ∩ B) = P(A) × P(B), or equivalently, if P(B|A) = P(B) and P(A|B) = P(A).

Probability

Standard

2.2: Independent Events

Events A and B are mutually exclusive if they cannot occur together, meaning P(A ∩ B) = 0 and P(A ∪ B) = P(A) + P(B).

Probability

Standard

2.2: Independent Events

A two-way table (or contingency table precursor) presents frequencies or probabilities for two categorical variables, wi

Probability

Standard

2.2: Independent Events

A tree diagram represents probability experiments as a series of branches, with each branch labeled with a probability.

Probability

Standard

2.4: Venn Diagram

A graphical representation of sample spaces and events using overlapping circles or regions, where the area (or size) of

Probability

Standard

2.4: Venn Diagram

A visual representation of sequential events where branches show possible outcomes at each stage, with conditional proba

Probability

Standard

2.4: Venn Diagram

The addition law states P(A ∪ B) = P(A) + P(B) - P(A ∩ B); for mutually exclusive events, P(A ∪ B) = P(A) + P(B). The mu

Probability

Standard

21.1: Effect Size

Effect size quantifies the strength of an effect or the magnitude of a difference between groups. Common measures includ

Effect Size

Standard

21.1: Effect Size

Effect size quantifies the magnitude of an effect without depending on sample size. Cohen's d = (mean₁ - mean₂) / σ stan

Effect Size

Standard

3.1: Population

The entire collection of items, people, or observations of interest in a statistical investigation. A population can be

Sampling Methods

Standard

3.1: Population

A complete count of the entire population where data is collected from every individual member. Examples include the nat

Sampling Methods

Standard

3.1: Population

A subset of the population selected for investigation. The sample is studied to make inferences about the population. A

Sampling Methods

Standard

3.1: Population

When all outcomes in a sample space are equally likely, the probability of each outcome is 1/n where n is the total numb

Sampling Methods

Standard

3.2: Random Sampling

A sampling method where every member of the population has an equal probability of selection, and selections are indepen

Sampling Methods

Standard

3.2: Random Sampling

A sampling method where the population is divided into k equal intervals and every kth member is selected after a random

Sampling Methods

Standard

3.2: Random Sampling

A sampling method that divides the population into non-overlapping subgroups (strata) based on shared characteristics, t

Sampling Methods

Standard

3.2: Random Sampling

A sampling method that divides the population into naturally occurring clusters, randomly selects some clusters, and inc

Sampling Methods

Standard

3.2: Random Sampling

A non-random sampling method where the population is divided into strata and the researcher selects members from each st

Sampling Methods

Standard

3.2: Random Sampling

A convenience sampling method where the sample consists of individuals who are readily available and accessible, with no

Sampling Methods

Standard

3.3: Judgmental Sampling

Judgmental (purposive) sampling relies on the researcher's judgment to select a sample. Items are chosen because they're

Sampling Methods

Standard

3.3: Judgmental Sampling

In stratified sampling, strata are non-overlapping groups within the population, each defined by a characteristic (age,

Sampling Methods

Standard

4.1: Random Variable

A function that assigns numerical values to the outcomes of a random experiment. Discrete random variables take specific

Random Variables

Standard

4.1: Random Variable

For a discrete random variable, a table or formula showing P(X = x) for each possible value x. For a continuous random v

Random Variables

Standard

4.2: Expected Value

The mean or average value of a random variable, calculated as E(X) = ΣxP(X = x) for discrete variables or E(X) = ∫xf(x)d

Random Variables

Standard

4.2: Expected Value

Variance measures the spread of a random variable about its mean, calculated as Var(X) = E[(X - μ)²] = E(X²) - [E(X)]²,

Random Variables

Standard

5.1: Binomial Distribution

A discrete probability distribution for the number of successes X in n independent Bernoulli trials, each with success p

Binomial Distribution

Standard

5.1: Binomial Distribution

A binomial distribution B(n, p) requires: (1) a fixed number n of trials, (2) each trial has two mutually exclusive outc

Binomial Distribution

Standard

5.1: Binomial Distribution

In a binomial distribution X ~ B(n, p), n is the number of independent trials and p is the probability of success in eac

Binomial Distribution

Standard

6.1: Normal Distribution

A continuous probability distribution with probability density function determined by mean μ and standard deviation σ, d

Normal Distribution

Standard

6.1: Normal Distribution

The normal distribution N(μ, σ²) has: (1) a bell-shaped, symmetric probability density function; (2) mean μ, median, and

Normal Distribution

Standard

6.2: Standardisation

The process of converting a normal random variable X ~ N(μ, σ²) to a standard normal random variable Z ~ N(0, 1) using t

Normal Distribution

Standard

6.2: Standardisation

A standardized value representing how many standard deviations a value is from the mean. For X ~ N(μ, σ²), the z-score i

Normal Distribution

Standard

6.3: Normal Tables

Tables providing cumulative probabilities P(Z ≤ z) for the standard normal distribution, typically showing P(Z ≤ z) for

Normal Distribution

Standard

6.3: Normal Tables

The inverse normal process: given a probability p, find the z-score such that P(Z ≤ z) = p. This requires reverse lookup

Normal Distribution

Standard

7.1: Scatter Diagram

A graphical display of bivariate data using points (xi, yi) plotted on a coordinate plane, where the pattern of points r

Correlation and Regression

Standard

7.1: Scatter Diagram

Pearson's product-moment correlation coefficient r = Σ[(x - x̄)(y - ȳ)] / √[Σ(x - x̄)² × Σ(y - ȳ)²]. Values range from -

Correlation and Regression

Standard

7.2: Pearson Correlation Coefficient

The product moment correlation coefficient (PMCC) calculated as r = Σ(x - x̄)(y - ȳ) / √[Σ(x - x̄)² × Σ(y - ȳ)²], rangin

Correlation and Regression

Standard

7.2: Pearson Correlation Coefficient

A rank-based correlation coefficient calculated as rs = 1 - (6Σd² / n(n² - 1)), where d is the difference in ranks for e

Correlation and Regression

Standard

7.3: Least Squares Regression

A regression method that finds the line y = a + bx minimizing Σ(observed y - predicted y)², where b = Σ(x - x̄)(y - ȳ) /

Correlation and Regression

Standard

7.3: Least Squares Regression

Using the regression equation to estimate y for an x-value within the range of x-values in the original data.

Correlation and Regression

Standard

7.3: Least Squares Regression

Using the regression equation to estimate y for an x-value outside the range of x-values in the original data.

Correlation and Regression

Standard

7.3: Least Squares Regression

The vertical distance from each data point to the regression line, calculated as ei = yi - ŷi, where ŷi = a + bxi is the

Correlation and Regression

Standard

7.4: Scatter Diagram with Regression Line

A scatter diagram plots pairs of observations as points on a graph. The least squares regression line y = a + bx minimiz

Correlation and Regression

Standard

7.4: Scatter Diagram with Regression Line

For a regression line ŷ = a + bx, the residual for observation (x, y) is e = y - ŷ = y - (a + bx). The sum of residuals

Correlation and Regression

Standard

7.4: Scatter Diagram with Regression Line

Regression diagnostics include: residual plots (checking for patterns, outliers, heteroscedasticity), normal probability

Correlation and Regression

Standard

8.1: Expectation Algebra

Mathematical rules for expected values: E(aX+b) = aE(X)+b, E(X+Y) = E(X)+E(Y), and E(XY) = E(X)E(Y) when X and Y are ind

Expectation Algebra

Standard

8.1: Expectation Algebra

Mathematical rules for variance: Var(aX+b) = a²Var(X), Var(X+Y) = Var(X)+Var(Y) when independent, and Var(X-Y) = Var(X)+

Expectation Algebra

Standard

8.4, 15.7: Acceptance Region

The acceptance region consists of test statistic values inside the critical value(s). If the test statistic falls in thi

Expectation Algebra

Standard

8.4, 15.7: Acceptance Region

The rejection region consists of test statistic values beyond the critical value(s). If the test statistic falls in this

Expectation Algebra

Standard