Skip to contents
# load in the libraries

# for num_rev and test_data
library(adlgraphs)
# this is for some other basic data transformations
library(dplyr)
# for working with labelled data

adlgraphs provides three main functions to reduce the amount of time it takes to perform a few very common data transformations:

num_rev()

num_rev() was designed with forcats::fct_rev() in mind. However, instead of operating on factors, num_rev() operates on numeric vectors. Let’s take a look at how this operates in practice. To do so, let’s look at the variable top from the data set test_data. As we can see, survey respondents saw the a positively valanced statement, “I would feel comfortable buying products from Israel”, with the values ranging from 1 to 4 where 1 = Strongly agree and 4 = Strongly disagree.

str(test_data$top)
#>  dbl+lbl [1:250] 1, 2, 2, 3, 2, 4, 2, 2, 2, 4, 4, 2, 4, 3, 4, 2, 3, 4, 2, 4...
#>  @ label        : chr "An ideal society requires some groups to be on top and others to be on the bottom"
#>  @ format.spss  : chr "F40.0"
#>  @ display_width: int 5
#>  @ labels       : Named num [1:4] 1 2 3 4
#>   ..- attr(*, "names")= chr [1:4] "Strongly agree" "Somewhat agree" "Somewhat disagree" "Strongly disagree"

In survey research, we often have to reverse a variable. This is primarily done for two reasons. The first is because we want to flip the valence of the question. One reason this might be the case is because we wanted to create an index by summing this up with other variables. If the other statements were negatively valanced, it may be in our best interest to keep the value labels the same but reverse the valance of top so that it can now be interpreted as “I would not feel comfortable buying products from Israel.” Since the value labels don’t change, the people who were previously classified as “Strongly agree” are now going be classified as “Strongly disagree”, etc.

The second reason one might reverse the values, is to change the direction of the scale. For example, as the values in top increase, so does one’s level of disagreement. We might want to reverse the values so that now a 1 = “Strongly disagree” and a 4 = “Strongly agree”, and thus a higher number means more agreement. Because this method does not flip the valance of the question, people who were originally classified as “Strongly agree” are still going to be classified as “Strongly agree”. Put another way, the question isn’t changing and therefore people’s responses aren’t changing. The only thing that changes is the value associated with their response.

num_rev() was designed with the second goal in mind. Normally when simply reversing the values by subtracting them, all underlying metadata and attributes are lost. As a result, we would have to reverse the values, update the value labels and set the variable label again. This is really time consuming. The purpose of num_rev() is to fix this by automating this process of reversing a numeric vector while maintaining the variable and value labels. In addition, this function adds a new attribute called transformation that describes the data transformation used to create this variable.

I’m now going to show this function in action. We can see that when top = 1, top_rev = 4; when top = 2, top_rev = 3; top = 3, top_rev = 2; and when top = 4, top_rev = 1.

new_df <- test_data %>% 
  # let's make a new variable with the num_rev function
  mutate(top_rev = num_rev(top)) %>% 
  # keep only these two variables
  select(top_rev, top)

head(new_df)
#> # A tibble: 6 × 2
#>   top_rev               top                  
#>   <dbl+lbl>             <dbl+lbl>            
#> 1 4 [Strongly agree]    1 [Strongly agree]   
#> 2 3 [Somewhat agree]    2 [Somewhat agree]   
#> 3 3 [Somewhat agree]    2 [Somewhat agree]   
#> 4 2 [Somewhat disagree] 3 [Somewhat disagree]
#> 5 3 [Somewhat agree]    2 [Somewhat agree]   
#> 6 1 [Strongly disagree] 4 [Strongly disagree]

This function does a lot more than just 5 - top. Let’s take a look. Using str() we can see that both variables have “label” and “labels” attributes and top_rev has a new attribute called “transformation”. The new “transformation” attribute is automatically added and describes what sort of data transformation the variable underwent when it was created. This is valuable in case you forgot how it was created. In addition, both variables have the same variable label as seen in the “label” attribute. The key difference between these can be found in the “labels” attribute. In top_rev, the labels are reversed. In the original variable, top, 1 = “Strongly agree”, 2 = “Somewhat agree”, etc. However, in the new variable 4 = “Strongly agree”, 3 = “Somewhat agree”, etc.


str(new_df)
#> tibble [250 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ top_rev: dbl+lbl [1:250] 4, 3, 3, 2, 3, 1, 3, 3, 3, 1, 1, 3, 1, 2, 1, 3, 2, 1, ...
#>    ..@ labels        : Named num [1:4] 1 2 3 4
#>    .. ..- attr(*, "names")= chr [1:4] "Strongly disagree" "Somewhat disagree" "Somewhat agree" "Strongly agree"
#>    ..@ label         : chr "An ideal society requires some groups to be on top and others to be on the bottom"
#>    ..@ transformation: 'glue' chr "Reversing 'top' while maintaining correct value labels"
#>  $ top    : dbl+lbl [1:250] 1, 2, 2, 3, 2, 4, 2, 2, 2, 4, 4, 2, 4, 3, 4, 2, 3, 4, ...
#>    ..@ label        : chr "An ideal society requires some groups to be on top and others to be on the bottom"
#>    ..@ format.spss  : chr "F40.0"
#>    ..@ display_width: int 5
#>    ..@ labels       : Named num [1:4] 1 2 3 4
#>    .. ..- attr(*, "names")= chr [1:4] "Strongly agree" "Somewhat agree" "Somewhat disagree" "Strongly disagree"

Now, there are two reasons that the different labels is significant. First, it means that when we check the frequencies for both variables we will get the same results. The only difference is the order of the response options.

get_freq_table(new_df, top_rev)
An ideal society requires some groups to be on top and others to be on the bottom N Percent
Strongly disagree 65 26%
Somewhat disagree 75 30%
Somewhat agree 85 34%
Strongly agree 25 10%

get_freq_table(new_df, top)
An ideal society requires some groups to be on top and others to be on the bottom N Percent
Strongly agree 25 10%
Somewhat agree 85 34%
Somewhat disagree 75 30%
Strongly disagree 65 26%

Second, the means of the variable will be different. For top, higher score means more disagreement. However, for top_rev, a higher number means more agreement. We can see the differences below.


get_means(new_df, top_rev)
#> # A tibble: 1 × 5
#>    mean    sd     n conf.low conf.high
#>   <dbl> <dbl> <dbl>    <dbl>     <dbl>
#> 1  2.28  0.96   250     2.16       2.4

get_means(new_df, top)
#> # A tibble: 1 × 5
#>    mean    sd     n conf.low conf.high
#>   <dbl> <dbl> <dbl>    <dbl>     <dbl>
#> 1  2.72  0.96   250      2.6      2.84