I have a set of data (10 columns, 1000 rows) that is indexed by an ID number that one or more of these rows can share. To give a small example to illustrate my point, consider this table:
ID Name Location
5014 John
5014 Kate California
5014 Jim
5014 Ryan California
5018 Pete
5018 Pat Indiana
5019 Jeff Arizona
5020 Chris Kentucky
5020 Mike
5021 Will Indiana
I need for all entries to have something in the Location field and I'm having a hell of a time trying to do it.
Things to note:
Any ideas for a solution? I'm currently using R with the data.table package, but I'm relatively new to it.
We can convert the 'data.frame' to 'data.table' (setDT(df1)), Grouped by 'ID', get the elements of Location that are not '' (Location[Location!=''][1L]). Suppose, if there are more than one element per group that are not '', the [1L], selects the first non-blank element, and assign (:=) the output to Location
library(data.table)
setDT(df1)[, Location := Location[Location != ''][1L], by = ID][]
# ID Name Location
# 1: 5014 John California
# 2: 5014 Kate California
# 3: 5014 Jim California
# 4: 5014 Ryan California
# 5: 5018 Pete Indiana
# 6: 5018 Pat Indiana
# 7: 5019 Jeff Arizona
# 8: 5020 Chris Kentucky
# 9: 5020 Mike Kentucky
#10: 5021 Will Indiana
Or we can use setdiff as suggested by @Frank
setDT(df1)[, Location:= setdiff(Location,'')[1L], by = ID][]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With