Data4DecisionMaking: Data Analysis - Descriptive Statistics

There are two equally important Statistics in any Data Analysis:

Descriptive Statistics is the term given to the analysis of data that helps describe, show or summarize the basic features of data in a meaningful way. Descriptive statistics aim to quantitatively summarize a sample, rather than use the data to learn about the population that the sample of data represents.

Inferential statistics is the analysis of data that involves making predictions or inferences about a population from observations and analyses of a sample.

Using Descriptive Statistics for Data Analysis

Descriptive Statistics form the basis of the initial quantitative description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.

Some measures that are commonly used to describe a data set are:

Measures of Central Tendency

The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency:

Mean - the average of the data set which is given by the sum of all measurements divided by the number of observations in the data set
Median - the middle value that separates the higher half from the lower half of the data set after the data set has been arranged in ascending order.
Mode - the most frequent value in the data set.

The following measures of central tendency can be classified under mean:

Arithmetic Mean

Geometric mean

Harmonic mean

Weighted mean

Truncated mean

Interquartile mean

Midrange

Midhinge

Trimean

Winsorized mean

Geometric median

Quadratic mean (root mean square)

Measures of Spread or Dispersion or Variability

Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other or Measures of dispersion measure how spread out a set of data is.

The more similar the scores are to each other, the lower the measure of dispersion will be
The less similar the scores are to each other, the higher the measure of dispersion will be
In general, the more spread out a distribution is, the larger the measure of dispersion will be.

A measure of statistical dispersion is a non-negative real number that is zero if all the data are the same and increases as the data become more diverse.

Measures of dispersion that have dimensions

These measures have the same units as the quantity being measured.

Standard deviation

Range

Interquartile range (IQR)

Semi-interquartile range(SIR)

Interdecile range (IDR)

Mean difference

Median absolute deviation (MAD)

Average absolute deviation (Average deviation)

Distance standard deviation

Measures of dispersion that are dimensionless

These measures have no units even if the variable itself has units.

Variance (the square of the standard deviation)

Variance-to-mean ratio

Allan variance

Hadamard variance

Coefficient of variation

Quartile coefficient of dispersion

Relative mean difference

Gini coefficient

Kurtosis

Skewness

The Distribution or Measure of Shape

The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur.The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value.

The most common way to describe a single variable is with a frequency distribution. Depending on the particular variable, all of the data values may be represented, or you may group the values into categories.

Frequency distributions can be depicted in two ways, as a table or as a graph.

The following Statistical graphs can be used to describe a data set.

Bar chart

Box Plot

Control Chart

Histogram

Ogive

Pie Chart

Scatter Plot ( Scatter Diagram)

Steam-and-leaf Plot

The following measures of shape can be used to describe a data set.

Variance

Kurtosis

Skewness

Data4DecisionMaking

Pages

Saturday, January 10, 2015

Data Analysis - Descriptive Statistics