Saturday, January 10, 2015

Data Analysis - Descriptive Statistics

There are two equally important Statistics in any Data Analysis:

Descriptive Statistics is the term given to the analysis of data that helps describe, show or summarize the basic features of data in a meaningful way. Descriptive statistics aim to quantitatively summarize a sample, rather than use the data to learn about the population that the sample of data represents.

Inferential statistics is the analysis of data that involves making predictions or inferences about a population from observations and analyses of a sample.

Using Descriptive Statistics for Data Analysis

Descriptive Statistics form the basis of the initial quantitative description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.
Some measures that are commonly used to describe a data set are:


Measures of Central Tendency


The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency:
  • Mean - the average of the data set which is given by the sum of all measurements divided by the number of observations in the data set
  • Median - the middle value that separates the higher half from the lower half of the data set after the data set has been arranged in ascending order. 
  • Mode - the most frequent value in the data set.

The following measures of central tendency can be classified under mean:

  1. Arithmetic Mean
  2. Geometric mean
  3. Harmonic mean
  4. Weighted mean
  5. Truncated mean
  6. Interquartile mean
  7. Midrange
  8. Midhinge
  9. Trimean
  10. Winsorized mean
  11. Geometric median
  12.  Quadratic mean (root mean square)


Measures of Spread or Dispersion or Variability


Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other or Measures of dispersion measure how spread out a set of data is.
  • The more similar the scores are to each other, the lower the measure of dispersion will be 
  • The less similar the scores are to each other, the higher the measure of dispersion will be 
  • In general, the more spread out a distribution is, the larger the measure of dispersion will be.
A measure of statistical dispersion is a non-negative real number that is zero if all the data are the same and increases as the data become more diverse.

Measures of dispersion that have dimensions


These measures have the same units as the quantity being measured.

  • Standard deviation
  • Range
  • Interquartile range (IQR)
  • Semi-interquartile range(SIR)
  • Interdecile range (IDR)
  • Mean difference
  • Median absolute deviation (MAD)
  • Average absolute deviation (Average deviation)
  • Distance standard deviation

Measures of dispersion that are dimensionless


These measures have no units even if the variable itself has units.

  • Variance (the square of the standard deviation)
  • Variance-to-mean ratio
  • Allan variance
  • Hadamard variance
  • Coefficient of variation
  • Quartile coefficient of dispersion
  • Relative mean difference
  • Gini coefficient
  • Kurtosis
  • Skewness


The Distribution or Measure of Shape

The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur.The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value.


The most common way to describe a single variable is with a frequency distribution. Depending on the particular variable, all of the data values may be represented, or you may group the values into categories.

Frequency distributions can be depicted in two ways, as a table or as a graph.

The following Statistical graphs can be used to describe a data set. 

  • Bar chart
  • Box Plot
  • Control Chart
  • Histogram
  • Ogive
  • Pie Chart
  • Scatter Plot ( Scatter Diagram)
  • Steam-and-leaf Plot

The following measures of shape can be used to describe a data set.

  • Variance
  • Kurtosis
  • Skewness


No comments:

Post a Comment