This page is under development. Stay tuned!
This vignette gives an overview of how data is preprocessed by the surveygraph package, following a number of optional arguments that specify how certain data is to be handled.
We’ll start by loading surveygraph,
and assume data S
that we attempt to supply to
surveygraph.
df <- data.frame(
item1 = c(2, -99, 1, 1, 100, 5, 5, 4, 3),
item2 = c(1, 3, 1, 2, 4, 3, 4, 5, 4),
item3 = c(2, 1, 3, -99, 5, 6, 8, 4, 10)
)
df
#> item1 item2 item3
#> 1 2 1 2
#> 2 -99 3 1
#> 3 1 1 3
#> 4 1 2 -99
#> 5 100 4 5
#> 6 5 3 6
#> 7 5 4 8
#> 8 4 5 4
#> 9 3 4 10
Data frame input
The first this we check is that the input data S
is a
dataframe. If it’s not the program is halted and an error is output.
Future versions may attempt to coerce other formats to dataframes.
For instance, if we attempt to run the make_projection()
routine on a list, we get the following error.
make_projection(list(c(1, 2, 3)))
#> Error in make_projection(list(c(1, 2, 3))): Input data must be provided as a data frame.
Similarly, an error is output if an empty data frame is provided.
make_projection(data.frame())
#> Error in make_projection(data.frame()): Data frame cannot be empty.
Coercion
Our approach is to coerce all data to floating point types, and to set them to NA otherwise.
Dummy coding
This is a flag that if set to TRUE
, dummy codes
everything that falls outside the range specified by the
likert
flag.
Likert range
The likert
optional argument allows us to specify the
range of the values that we are to interpret as valid input data. The
idea is that anything that falls outside of this range is set to
NA
, or is dummy coded.
l <- data.frame(
minval = apply(df, 2, min, na.rm = TRUE),
maxval = apply(df, 2, max, na.rm = TRUE)
)
This creates the following data frame.
l
#> minval maxval
#> item1 -99 100
#> item2 1 5
#> item3 -99 10
The idea is that by visually inspecting the limiting values for each
item, it is obvious which columns contain flags, such as
-99
and 100
in our data. As such, we might
set
# set the minimum value of items one and three to 1
l$minval[1] <- 1
l$minval[3] <- 1
# set the maximum value of item one to 10
l$maxval[1] <- 10
Following these changes, we interpret the Likert ranges to be
l
#> minval maxval
#> item1 1 10
#> item2 1 5
#> item3 1 10
Now, we provide the Likert specification l
to
make_projection
to tell surveygraph how to handle the
outliers.
make_projection(df, likert=l, showdata=T)
#> Note: columns 1, 3 contain entries outside the specified range. Setting them to NA.
#> 1: -99, 100
#> 3: -99
#> item1 item2 item3
#> 1 2 1 2
#> 2 NA 3 1
#> 3 1 1 3
#> 4 1 2 NA
#> 5 NA 4 5
#> 6 5 3 6
#> 7 5 4 8
#> 8 4 5 4
#> 9 3 4 10