Descriptive Statistics is the term given to the analysis of data that helps describe, show or summarize the basic features of data in a meaningful way. Descriptive statistics aim to quantitatively summarize a sample, rather than use the data to learn about the population that the sample of data represents.
Using Descriptive Statistics for Data Analysis
Descriptive Statistics form the basis of the initial quantitative description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.
Some measures that are commonly used to describe a data set are:
Measures of Central Tendency
The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency:
- Mean - the average of the data set which is given by the sum of all measurements divided by the number of observations in the data set
- Median - the middle value that separates the higher half from the lower half of the data set after the data set has been arranged in ascending order.
- Mode - the most frequent value in the data set.
The following measures of central tendency can be classified under mean:
- Arithmetic Mean
- Geometric mean
- Harmonic mean
- Weighted mean
- Truncated mean
- Interquartile mean
- Midrange
- Midhinge
- Trimean
- Winsorized mean
- Geometric median
- Quadratic mean (root mean square)
Measures of Spread or Dispersion or Variability
Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other or Measures of dispersion measure how spread out a set of data is.
- The more similar the scores are to each other, the lower the measure of dispersion will be
- The less similar the scores are to each other, the higher the measure of dispersion will be
- In general, the more spread out a distribution is, the larger the measure of dispersion will be.
Measures of dispersion that have dimensions
These measures have the same units as the quantity being measured.
- Standard deviation
- Range
- Interquartile range (IQR)
- Semi-interquartile range(SIR)
- Interdecile range (IDR)
- Mean difference
- Median absolute deviation (MAD)
- Average absolute deviation (Average deviation)
- Distance standard deviation
Measures of dispersion that are dimensionless
These measures have no units even if the variable itself has units.
- Variance (the square of the standard deviation)
- Variance-to-mean ratio
- Allan variance
- Hadamard variance
- Coefficient of variation
- Quartile coefficient of dispersion
- Relative mean difference
- Gini coefficient
- Kurtosis
- Skewness
The Distribution or Measure of Shape
The distribution of a statistical data set (or a population) is a listing or function showing all the possible values (or intervals) of the data and how often they occur.The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value.
The most common way to describe a single variable is with a frequency distribution. Depending on the particular variable, all of the data values may be represented, or you may group the values into categories.
Frequency distributions can be depicted in two ways, as a table or as a graph.
The following Statistical graphs can be used to describe a data set.
- Bar chart
- Box Plot
- Control Chart
- Histogram
- Ogive
- Pie Chart
- Scatter Plot ( Scatter Diagram)
- Steam-and-leaf Plot
The following measures of shape can be used to describe a data set.
- Variance
- Kurtosis
- Skewness
No comments:
Post a Comment