Inserting a new row to data frame for each group id

Tags:

r

I've looked extensively on stack overflow for a solution, but have yet to find one that works for me. I have a data frame that looks something like this:

id    time    latitude    longitude
A     11:10   381746.0    6008345
A     11:11   381726.2    6008294
B     10:56   381703.0    6008214
B     10:57   381679.7    6008134
C     4:30    381654.4    6008083
C     4:31    381629.2    6008033

I would like to insert a new row at the END of each id. In this row, I would like 'id' and 'time' to be the same as the previous observation. I would like latitude and longitude to be '394681.4' and '6017550' (corresponding to the end location of all id's).

id    time    latitude    longitude
A     11:10   381746.0    6008345
A     11:11   381726.2    6008294
A     11:11   394681.4    6017550
B     10:56   381703.0    6008214
B     10:57   381679.7    6008134
B     10:57   394681.4    6017550
C     4:30    381654.4    6008083
C     4:31    381629.2    6008033
C     4:32    394681.4    6017550

Can anyone think of a solution? Dplyr or data table solutions preferred.

907

asked Dec 27 '16 19:12

Splash1199

2 Answers

A base R solution using the split-apply-combine concept.

do.call(rbind, lapply(split(df, df$id), 
                      function(x) rbind(x,
                         within(x[nrow(x),], {latitude <- 394681.4; longitude <- 6017550}))))

which returns

     id  time latitude longitude
A.1   A 11:10 381746.0   6008345
A.2   A 11:11 381726.2   6008294
A.21  A 11:11 394681.4   6017550
B.3   B 10:56 381703.0   6008214
B.4   B 10:57 381679.7   6008134
B.41  B 10:57 394681.4   6017550
C.5   C  4:30 381654.4   6008083
C.6   C  4:31 381629.2   6008033
C.61  C  4:31 394681.4   6017550

split breaks the data.frame into a list of data.frames, lapply rbinds the final row to each data.frame, and do.call rbinds the resulting list of data.frames. The final row of each data.frame is produced using within which returns a modified version of the data.frame it is given. nrow is used to select the final row. referencing @akrun's answer, x[nrow(x),] could be replaced with tail(x, 1).

answered Oct 19 '22 03:10

lmo

We can do this with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', get the last row with tail, assign the 'latitude' and 'longitude' with the new values, rbind with the original dataset and order by 'id'.

library(data.table)
rbind(setDT(df1), df1[, tail(.SD, 1) , by = id
        ][, c("latitude", "longitude") := .(394681.4,  6017550)
         ])[order(id)]
#    id  time latitude longitude
#1:  A 11:10 381746.0   6008345
#2:  A 11:11 381726.2   6008294
#3:  A 11:11 394681.4   6017550
#4:  B 10:56 381703.0   6008214
#5:  B 10:57 381679.7   6008134
#6:  B 10:57 394681.4   6017550
#7:  C  4:30 381654.4   6008083
#8:  C  4:31 381629.2   6008033
#9:  C  4:31 394681.4   6017550

Or using dplyr, with similar methodology

library(dplyr)
df1 %>%
   group_by(id) %>%
   summarise(time = last(time)) %>%
   mutate(latitude = 394681.4, longitude = 6017550) %>% 
   bind_rows(df1, .) %>% 
   arrange(id)

answered Oct 19 '22 03:10

akrun

Related questions
                            
                                melt multiple groups of measure.vars
                            
                                R: Avoid accidently overwriting variables
                            
                                05:00:00 - 28:59:59 time format
                            
                                NumPy percentile function different from MATLAB's percentile function
                            
                                Cannot use dput for data.table in R
                            
                                R: Reorder facet_wrapped x-axis with free_x in ggplot2
                            
                                How to order data within subgroups in data.table R
                            
                                Different colour palettes for two different colour aesthetic mappings in ggplot2
                            
                                Why is zoo::rollmean slow compared to a simple Rcpp implementation?
                            
                                How to hide figures in knitr, but create them as png?
                            
                                R data.table: subgroup weighted percent of group
                            
                                How to check if a filename is writeable in R?
                            
                                dplyr mutate using rbinom do not return random numbers
                            
                                Plotting POSIXct timestamp series with ggplot2
                            
                                nls troubles: Missing value or an infinity produced when evaluating the model
                            
                                Filter groups in dplyr that exclusively contain specific combinations of values
                            
                                group_by() into fill() not working as expected
                            
                                Fastest way to remove all duplicates in R
                            
                                gsub only part of pattern
                            
                                How to compute rolling covariance more efficiently

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With