How can you determine the standard deviation in R?

3 years ago

Olivia Parker

6 minutes

R, being a statistical language, provides the sd(’ ‘) function to calculate the standard deviation of the values.

What does the standard deviation mean?

‘Standard deviation is the measure of the dispersion of the values’.

The higher the standard deviation, the wider the spread of values.

The lower the standard deviation, the narrower the spread of values.
In simple words the formula is defined as – Standard deviation is the square root of the ‘variance’.

The significance of standard deviation.

Why is standard deviation so widely used and significant in statistics? The following factors explain its popularity and importance.

Standard deviation converts the negative number to a positive number by squaring it.
It shows the larger deviations so that you can particularly look over them.
It shows the central tendency, which is a very useful function in the analysis.
It has a major role to play in finance, business, analysis, and measurements.

Before we dive into the subject, make sure to remember this definition!

Variance can be described as the squared deviations between the observed value and the expected value.

Calculate the standard deviation of a list of values in R.

Firstly, we will generate a list called ‘x’ and append certain values to it. Subsequently, we can calculate the standard deviation of those values within the list.

 x <- c(34,56,87,65,34,56,89)    #creates list 'x' with some values in it.

 sd(x)  #calculates the standard deviation of the values in the list 'x'

The result is 22.28175.

We can now attempt to derive the standard deviation by extracting particular values from the list ‘y’.

 y <- c(34,65,78,96,56,78,54,57,89)  #creates a list 'y' having some values
 
data1 <- y[1:5] #extract specific values using its Index

sd(data1) #calculates the standard deviation for Indexed or extracted values from the list.

The result is 23.28519.

Calculate the standard deviation of the values contained in a CSV file.

We are utilizing this approach to bring in a CSV file, where we aim to calculate the standard deviation in R for the data stored within the file.

readfile <- read.csv('testdata1.csv')  #reading a csv file

data2 <- readfile$Values      #getting values stored in the header 'Values'

sd(data2)                              #calculates the standard deviation

The result is 17.88624.

Large and small variability

Typically, low standard deviation results in values that are very similar to the average, whereas high standard deviation leads to values that are widely scattered from the average.

We can demonstrate this using an example.

x <- c(79,82,84,96,98)
mean(x)
--->  82.22222
sd(x)
--->  10.58038

To create a bar graph in R using these values, execute the code provided below.

In order to install the ggplot2 package, execute this code within the R studio environment.

Please install the package “ggplot2”.

library(ggplot2)

values <- data.frame(marks=c(79,82,84,96,98), students=c(0,1,2,3,4,))
head(values)                  #displayes the values
 marks students
1    79        0
2    82        1
3    84        2
4    96        3
5    98        4
x <- ggplot(values, aes(x=marks, y=students))+geom_bar(stat='identity')
x                             #displays the plot

From the aforementioned findings, it is evident that a majority of the data is concentrated around the average value (79,82,84), indicating a narrow range or low standard deviation.

Example representing a significant deviation from the norm.

y <- c(23,27,30,35,55,76,79,82,84,94,96)
mean(y)
---> 61.90909
sd(y)
---> 28.45507

To generate a bar graph in R using ggplot and plot the given values, execute the code provided below.

library(ggplot2)

values <- data.frame(marks=c(23,27,30,35,55,76,79,82,84,94,96), students=c(0,1,2,3,4,5,6,7,8,9,10))
head(values)                  #displayes the values
  marks students
1    23        0
2    27        1
3    30        2
4    35        3
5    55        4
6    76        5
x <- ggplot(values, aes(x=marks, y=students))+geom_bar(stat='identity')
x                             #displays the plot

In the aforementioned findings, you can observe the extensive data. The minimum score of 23 stands significantly apart from the mean score of 61. This phenomenon is referred to as a high standard deviation.

At this point, you should have a decent grasp of how to calculate the standard deviation using the sd(’ ‘) function in the R programming language. To conclude this tutorial, let’s solve some straightforward problems.

Example #1: Calculating the Standard Deviation of a Sequence of Even Numbers

Calculate the standard deviation of the even numbers ranging from 2 to 18 (excluding 1 and 20).

The solution entails listing the even numbers ranging from 1 to 20.

Two, four, six, eight, ten, twelve, fourteen, sixteen, eighteen.

Let’s determine the standard deviation of these values.

x <- c(2,4,6,8,10,12,14,16,18)  #list of even numbers from 1 to 20

sd(x)                           #calculates the standard deviation of these 
                            values in the list of even numbers from 1 to 20

The result is approximately 5.477226.

One possible option for paraphrasing the statement is:

The US Population Data’s Standard Deviation is given in Example #2.

Calculate the USA’s state-wise population’s standard deviation.

To accomplish this in R, import the CSV file and extract the data. Then, calculate the standard deviation of the values and visualize the outcome by plotting it on a histogram.

df<-read.csv("population.csv")      #reads csv file
data<-df$X2018.Population           #extarcts the data from population 
                                     column
mean(data)                          #calculates the mean
                          
View(df)                            #displays the data
sd(data)                            #calculates the standard deviation

The output shows that the mean is 6432008 and the standard deviation is 7376752.

In summary,

Calculating the standard deviation of values in R is straightforward. R provides the sd(’ ‘) function to determine the standard deviation. You have the option to either generate a list of values or import a CSV file in order to compute the standard deviation.

Make sure to calculate the standard deviation by extracting values using indexing from a file or list as demonstrated above.

Feel free to use the comment box to share any questions or uncertainties you may have about the sd(’ ‘) function in R. Enjoy the learning experience!