How Do I Use Mean, Median, Mode, and Standard Deviation?
Introduction to Descriptive Statistics in the Earth Sciences

This module is available for public use, but it is undergoing revision after classroom implementation with the Math You Need project.

An Introduction to Calculating Mean, Median, Mode, and Standard Deviation

Geoscientists collect tremendous amounts of data to describe, measure, and monitor the natural environment. It is necessary to summarize values rather than awkwardly listing long strings of data. For example, if you are planning a river rafting trip to investigate sandbars, you would look up the typical river flow near the sandbars of interest for a certain week instead of looking at a huge list of the flow levels measured every fifteen minutes all year long. Summary data are also needed to compare datasets to each other; for example, the number of landslides on rainy days versus the number of landslides on dry days. All such data are dependent on multiple measures that need to be summarized. This summary data is called descriptive statistics.

What Are Mean, Median, Mode, and Standard Deviation?

There are several commonly used parameters to describe data that are the backbone of descriptive statistics:

  • Mean is the average of the values in a numerical dataset. You may also see it written as `barx`.
  • Median is the midpoint of the values in a numerical dataset. Half of the values in the data set will fall below the median, half will be above it.
  • Mode is the most commonly occurring value in the dataset. If all values are unique, there is no mode. There can be multiple modes in a dataset (a.k.a "bimodal"). Mode is the only one of the three that can be used for numerical or categorical data.

There are also several ways to describe the variability of the data or "measure of spread". Here we will primarily work with:

  • Standard deviation (s) describes how much variation there is around the mean (not the median or mode). A low standard deviation means the data are mostly close to the mean, while a high standard deviation means the data are spread out. Note that the s is not the same as the full range (minimum to maximum) of the data.

Other descriptive statistics terms you might run into:
Minimum - lowest value in a data set
Maximum - highest value in a data set
Range - difference between the minimum and maximum
Data set size is often referred to as "n" - "We measured the stream width in 15 places, so n=15 for our study."

How Do I Calculate Mean, Median, Mode, and Standard Deviation?

Here are steps to take to calculate the mean, median, mode, and standard deviation. For these calculations, we will use minimum January temperatures in Bozeman, Montana, for the years 2016–2020.

Year Min. Jan. Temp. (°F)
2020 21.6
2019 15.0
2018 18.8
2017 6.5
2016 16.4

Part 1: Calculate mean:

To calculate the mean, add all the data points together and divide the total by the number of data points.

Part 2: Calculate standard deviation

Calculating standard deviation by hand is more difficult than calculating the mean, median, or mode. It is often more efficient to calculate this statistic using Excel or Google Sheets, but it is important to understand the math behind calculating standard deviation. At its heart, it involves taking the difference between each data value and the mean.

standard deviation `= sqrt(((value1-mean)^2+(value2-mean)+(value3-mean)+etc)/(n - 1)`

Part 3: Calculate median:

To calculate the median, put the data values in order from smallest to greatest. If there is an odd number of data points, find the value located in the middle of the data points. If there is an even number of data points, take the mean (average) of the two middle data points.

Part 4: Calculate mode:

To find the mode, put your data points in order from smallest to largest and identify the value that appears most commonly.  This is the mode. It is possible that there is no mode (if all values are unique). There may also be multiple modes if there are many repeats in the data, so make sure to look carefully at the ordered data. If your data are not numerical but categorical (e.g., soil type, atmospheric pollutant type, sand dune type), provide a numeric value (e.g. 1–5) for each type, and calculate the mode according to the instructions above.

When Do I Use Mean, Median, Mode, and Standard Deviation?

Mean 

is the most commonly used measure of the center (or average) of a data set. Most people know it as an "average." It is most useful when you do not have large outliers, which are values that are very far from the mean.

Median

is often used for datasets that are not normally distributed (that is, they have large outliers). The median is not as strongly influenced by particularly large or small values in the dataset. It can be helpful to look at the distribution (shape) of the data to check if it is normally distributed or if there are outliers. To learn about distributions and practice working with histograms, see How Do I Interpret and Create Histograms.

Mode 

is used when you need the most common value(s) in the dataset. The mode is particularly appropriate when there are a lot of repeated values in a dataset, such as when there is a binary (yes/no, present/absent, high/low, or 0/1 type of response), or when the data are not numerical but categorical. If the data are categorical, it may help to code the category types with a numerical value (for example, yes = 1 and no = 0).

Standard deviation 

can be used with the mean of a data set. It is a measure of how large the variability is around the single mean value. In a perfectly normal distribution (a "bell curve"), 68.3% of the values will be within 1 standard deviation on either side of the mean. It is not the full range (minimum to maximum) of data. When a median is reported, a different measure of variability, such as the Interquartile Range, is appropriate.

Example problem: Summarizing ocean depth data

Ocean depth is needed to calculate the speed at which a tsunami travels across the ocean. The speed of a tsunami is greater when the water is deeper. In this problem, you will work toward finding the depth value.

Problem: An megathrust M 9.1  earthquake strikes off of Kodiak, Alaska, and causes a massive displacement of water, generating a tsunami. To calculate the arrival time of a tsunami in Hilo, Hawaii, which is 4090 km away, one first needs to know the speed that the tsunami is traveling across the ocean, and that is dependent on the mean depth of the ocean.

From Google Earth, you can obtain values of ocean depth between Kodiak and Hilo. Marking 20 evenly spaced values of ocean depth between the two locations, you find the following depths:

-4531m, -4607m, -4927m, -5051m, -5286m, -5276m, -4899m, -5609m, -5475m, -5255m, -5582m, -5564m, -5407m, -5525m, -5444m, -5525m, -5630m, -4662m, -4992m, and -1879m

 

a) Find the mean, standard deviation, median, and mode of the water depth measurements.

b) Compare the usefulness of mean, median, and mode in this case:

 

c) How long will it take the tsunami to get to Hilo, Hawaii, from Kodiak, Alaska ?

Now that you know the mean ocean depth, you can calculate the speed at which the tsunami is traveling. The velocity of a tsunami is `V= sqrt(9.8 m/s^2 xx abs(D))` where `abs(D)` is the absolute value of the mean ocean depth.

Where Do You Calculate Mean, Median, Mode, and Standard Deviation in Earth Science?

These calculations are used in virtually every area of Earth science, including

  • Volcanology
  • Oceanography
  • Meteorology and atmospheric sciences
  • Planetary geology
  • Sedimentology

Next Steps

I am ready to PRACTICE!

If you think you have a handle on the steps above, click on this bar to try practice problems with worked answers.
Or, if you want even more practice, see 'More help' below.

More Help 

Pages written by Sonia Nagorski, University of Alaska Southeast, and Robyn Gotz, Montana State University.


      Next Page »

OSZAR »