I've looked extensively on stack overflow for a solution, but have yet to find one that works for me. I have a data frame that looks something like this:
id time latitude longitude
A 11:10 381746.0 6008345
A 11:11 381726.2 6008294
B 10:56 381703.0 6008214
B 10:57 381679.7 6008134
C 4:30 381654.4 6008083
C 4:31 381629.2 6008033
I would like to insert a new row at the END of each id. In this row, I would like 'id' and 'time' to be the same as the previous observation. I would like latitude and longitude to be '394681.4' and '6017550' (corresponding to the end location of all id's).
id time latitude longitude
A 11:10 381746.0 6008345
A 11:11 381726.2 6008294
A 11:11 394681.4 6017550
B 10:56 381703.0 6008214
B 10:57 381679.7 6008134
B 10:57 394681.4 6017550
C 4:30 381654.4 6008083
C 4:31 381629.2 6008033
C 4:32 394681.4 6017550
Can anyone think of a solution? Dplyr or data table solutions preferred.
The easiest way to add or insert a new row into a Pandas DataFrame is to use the Pandas . append() method. The . append() method is a helper method, for the Pandas concat() function.
You can use the df. loc() function to add a row to the end of a pandas DataFrame: #add row to end of DataFrame df.
You can append a row to the dataframe using concat() method. It concatenates two dataframe into one. To add one row, create a dataframe with one row and concatenate it to the existing dataframe.
Pandas DataFrame – Add or Insert Row. To append or add a row to DataFrame, create the new row as Series and use DataFrame. append() method.
A base R solution using the split-apply-combine concept.
do.call(rbind, lapply(split(df, df$id),
function(x) rbind(x,
within(x[nrow(x),], {latitude <- 394681.4; longitude <- 6017550}))))
which returns
id time latitude longitude
A.1 A 11:10 381746.0 6008345
A.2 A 11:11 381726.2 6008294
A.21 A 11:11 394681.4 6017550
B.3 B 10:56 381703.0 6008214
B.4 B 10:57 381679.7 6008134
B.41 B 10:57 394681.4 6017550
C.5 C 4:30 381654.4 6008083
C.6 C 4:31 381629.2 6008033
C.61 C 4:31 394681.4 6017550
split breaks the data.frame into a list of data.frames, lapply rbinds the final row to each data.frame, and do.call rbinds the resulting list of data.frames. The final row of each data.frame is produced using within which returns a modified version of the data.frame it is given. nrow is used to select the final row. referencing @akrun's answer, x[nrow(x),] could be replaced with tail(x, 1).
We can do this with data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', get the last row with tail, assign the 'latitude' and 'longitude' with the new values, rbind with the original dataset and order by 'id'.
library(data.table)
rbind(setDT(df1), df1[, tail(.SD, 1) , by = id
][, c("latitude", "longitude") := .(394681.4, 6017550)
])[order(id)]
# id time latitude longitude
#1: A 11:10 381746.0 6008345
#2: A 11:11 381726.2 6008294
#3: A 11:11 394681.4 6017550
#4: B 10:56 381703.0 6008214
#5: B 10:57 381679.7 6008134
#6: B 10:57 394681.4 6017550
#7: C 4:30 381654.4 6008083
#8: C 4:31 381629.2 6008033
#9: C 4:31 394681.4 6017550
Or using dplyr, with similar methodology
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(time = last(time)) %>%
mutate(latitude = 394681.4, longitude = 6017550) %>%
bind_rows(df1, .) %>%
arrange(id)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With