Friday, 4 September 2015

Statistics Terms

Absolute Value

The absolute value of a number is its distance from zero on the number line. For example, -7 is 7 units away from zero, so its absolute value would be 7. And 7 is also 7 units away from zero, so its absolute value would also be 7.
Thus, the absolute value of a number refers to the magnitude of the number, without regard to its sign. The absolute value of -1 and 1 is 1, the absolute value of -2 and 2 is 2, the absolute value of -3 and 3 is 3, and so on.

Accuracy

Accuracy refers to how close a sample statistic is to a population parameter . Thus, if you know that a sample mean is 99 and the true population mean is 100, you can make a statement about the sample accuracy. For example, you might say the sample mean is accurate to within 1 unit.

Alpha

With respect to estimation problems , alpha refers to the likelihood that the true population parameter lies outside the confidence interval . Alpha is usually expressed as a proportion. Thus, if theconfidence level is 95%, then alpha would equal 1 - 0.95 or 0.05.
With respect to hypothesis tests , alpha refers to significance level , the probability of making a Type I error .

Confidence Interval

Statisticians use a confidence interval to express the degree of uncertainty associated with a samplestatistic. A confidence interval is an interval estimate combined with a probability statement.
For example, suppose a statistician conducted a survey and computed an interval estimate, based on survey data. The statistician might use a confidence level to describe uncertainty associated with the interval estimate. He/she might describe the interval estimate as a "95% confidence interval". This means that if we used the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter to fall within the interval estimates 95% of the time.
Confidence intervals are preferred to point estimates and to interval estimates, because only confidence intervals indicate (a) the precision of the estimate and (b) the uncertainty of the estimate.

Estimation

In statistics, estimation refers to the process by which one makes inferences about a population, based on information obtained from a sample. Often, we use sample statistics (e.g., mean, proportion) to estimate population parameters (e.g., mean, proportion).

Alternative Hypothesis

There are two types of statistical hypotheses.
  • Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance.
  • Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause.
For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be that the number of Heads and Tails would be very different. Symbolically, these hypotheses would be expressed as
H0: p = 0.5
Ha: p <> 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be inclined to reject the null hypothesis. That is, we would conclude that the coin was probably not fair and balanced.

Alternative Hypothesis

There are two types of statistical hypotheses.
  • Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance.
  • Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause.
For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be that the number of Heads and Tails would be very different. Symbolically, these hypotheses would be expressed as
H0: p = 0.5
Ha: p <> 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be inclined to reject the null hypothesis. That is, we would conclude that the coin was probably not fair and balanced.

Bias

Bias refers to the tendency of a measurement process to over- or under-estimate the value of a population parameter. In survey sampling, for example, bias would be the tendency of a samplestatistic to systematically over- or under-estimate a population parameter.

Biased Estimate

When the mean of the sampling distribution of a statistic is not equal to a population parameter, that statistic is said to be a biased estimate of the parameter.

Bimodal Distribution

Distributions of data can have few or many peaks. Distributions with one clear peak are calledunimodal, and distributions with two clear peaks are called bimodal, as illustrated in the figures below.
1234567
 
1234567
Unimodal Bimodal

Confidence Interval

Statisticians use a confidence interval to express the degree of uncertainty associated with a samplestatistic. A confidence interval is an interval estimate combined with a probability statement.
For example, suppose a statistician conducted a survey and computed an interval estimate, based on survey data. The statistician might use a confidence level to describe uncertainty associated with the interval estimate. He/she might describe the interval estimate as a "95% confidence interval". This means that if we used the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter to fall within the interval estimates 95% of the time.
Confidence intervals are preferred to point estimates and to interval estimates, because only confidence intervals indicate (a) the precision of the estimate and (b) the uncertainty of the estimate.

Degrees of Freedom

The number of degrees of freedom generally refers to the number of independent observations in a sample minus the number of population parameters that must be estimated from sample data.
For example, the exact shape of a t distribution is determined by its degrees of freedom. When the t distribution is used to compute a confidence interval for a mean score, one population parameter (the mean) is estimated from sample data. Therefore, the number of degrees of freedom is equal to the sample size minus one.

Normal Distribution

The normal distribution is a probability distribution that associates the normal random variable X with a cumulative probability . The normal distribution is defined by the following equation:
Normal equation. The value of the random variable Y is:
Y = [ 1/σ * sqrt(2π) ] * e -(x - μ)2/2σ2
where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately 3.14159, and e is approximately 2.71828.
The graph of the normal distribution depends on two factors - the mean and the standard deviation. The mean of the distribution determines the location of the center of the graph, and the standard deviation determines the height of the graph. When the standard deviation is large, the curve is short and wide; when the standard deviation is small, the curve is tall and narrow. All normal distributions look like a symmetric, bell-shaped curve, as shown below.
The curve on the top is shorter and wider than the curve on the bottom, because the curve on the top has a bigger standard deviation.

Range

The range is a simple measure of variation in a set of random variables. It is difference between the biggest and smallest random variable.
Range = Maximum value - Minimum value
Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 - 3 or 4.

t-Test

A t-test is any hypothesis test in which the test statistic follows Student's t distribution if the null hypothesis is true. Some common t-tests are:
  • One-sample t-test. Used to determine whether a hypothesized population mean differs significantly from an observed sample mean. See one-sample t-test example.
  • Two-sample t-test. Used to determine whether the difference between samples means differs significantly from the hypothesized difference between population means. See two-sample t-test example.
  • Matched pairs t-test. Used to test the significance of the difference between paired means. SeeMatched pairs t-test example.
  • Linear regression t-test. Used in simple linear regression to determine whether the slope of regression line differs significantly from zero. See linear regression t-test example.

Variance

The variance is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa.
It is important to distinguish between the variance of a population and the variance of a sample. They have different notation, and they are computed differently. The variance of a population is denoted by σ2; and the variance of a sample, by s2.
The variance of a population is defined by the following formula:
σ2 = Σ ( Xi - X )2 / N
where σ2 is the population variance, is the population mean, Xi is the ith element from the population, and N is the number of elements in the population.
The variance of a sample is defined by slightly different formula:
s2 = Σ ( xi - x )2 / ( n - 1 )
where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the variance of the sample is an unbiased estimate of the variance of the population.
And finally, the variance is equal to the square of the standard deviation.

Reference

http://stattrek.com/statistics/dictionary.aspx?definition=Degrees_of_freedom

No comments:

Post a Comment