So I have loaded an Excel file which contains duplicate column names. I would like to add a suffix each time a column name is repeated. So:
problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5), B = rep(4, 5), A = rep(5, 5))
solution_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A_1 = rep(3, 5), B_1 = rep(4, 5), A_2 = rep(5, 5))
Or the column name suffixes can be '_2' and '_3'.
We can do with make.unique which also have the sep argument
make.unique(c("A", "B", "A", "B", "A"), sep="_")
#[1] "A" "B" "A_1" "B_1" "A_2"
In our 'problem_df', the data.frame call is using the check.names = TRUE, which call the make.names that calls the make.unique and by default the sep is ..
On checking the data.frame, it is in the code block that starts from line 124
if (check.names) {
if (fix.empty.names)
vnames <- make.names(vnames, unique = TRUE) ###
else {
nz <- nzchar(vnames)
vnames[nz] <- make.names(vnames[nz], unique = TRUE) ###
}
}
names(value) <- vnames
One option is to use check.names = FALSE and then assign the column names with make.unique and sep="_"
problem_df <- data.frame(A = rep(1, 5), B = rep(2, 5), A = rep(3, 5),
B = rep(4, 5), A = rep(5, 5), check.names = FALSE)
names(problem_df) <- make.unique(names(problem_df), sep="_")
Or using sub assuming that the dataset object is created with the .\\d+ as column names for duplicate names
sub("\\.", "_", names(problem_df))
#[1] "A" "B" "A_1" "B_1" "A_2"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With