Calculates a random subset of the data based on repeated values from a specified column.
Arguments
- df
data frame. Must include a column named by the argument colName.
- colName
column name to check for duplicates
- seed
integer value. Defaults to NA, which will not change the current seed. Setting the seed to any given value can be used to create repeatable output.
Examples
df <- data.frame(Julian = c(1,2,2,3,4,4,4,6),
y = 1:8)
df
#> Julian y
#> 1 1 1
#> 2 2 2
#> 3 2 3
#> 4 3 4
#> 5 4 5
#> 6 4 6
#> 7 4 7
#> 8 6 8
df_random <- randomSubset(df, "Julian")
df_random
#> Julian y
#> 1 1 1
#> 3 2 3
#> 4 3 4
#> 5 4 5
#> 8 6 8