I want to construct two data frames and merge them without using any form of merge(). Instead I need to use set operations union() and match() or %in% operator. The following output must display the content of d1,d2 and the result of merging d1 and d2.
I have figured out how to do this with merge() but I cannot find out how to do it using union() and match() or %in% operator. Or any other way of doing this. Also my output doesn't match what the output should be. Im a beginner thanks for your help.
d1.Kids <- c("Jack", "Jill", "Jillian", "John", "James")
d1.States <- c("CA", "MA", "DE", "HI", "PA")
d1 <- data.frame(d1.Kids, d1.States, stringsAsFactors = FALSE)
d2.Ages <- c(10, 7, 12, 30)
d2.Kids <- c("Jill", "Jillian", "Jack", "Mary")
d2 <- data.frame(d2.Ages, d2.Kids, stringsAsFactors = FALSE)
# Merging two created data frame
merge <- merge(d1, d2, by.x = "d1.Kids", by.y = "d2.Kids", all = TRUE)
print(merge)
Output should be:
  kids    ages states 
1 Jack    12   CA
2 Jill    10   MA
3 Jillian 7    DE
4 John    NA   HI
5 James   NA   PA
6 Mary    30   NA
The concat() function in pandas is used to append either columns or rows from one DataFrame to another. The concat() function does all the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored.
We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.
one-to-one joins: for example when joining two DataFrame objects on their indexes (which must contain unique values). many-to-one joins: for example when joining an index (unique) to one or more columns in a different DataFrame . many-to-many joins: joining columns on columns.
Something like this will do what the question asks for.
It seems long but in fact it's the same set of instructions for each of the dataframes to be merged.
Kids <- union(d1$d1.Kids, d2$d2.Kids)
States <- rep(NA_character_, length(Kids))
Ages <- rep(NA_real_, length(Kids))
States[match(d1$d1.Kids, Kids)] <- as.character(d1$d1.States)
Ages[match(d2$d2.Kids, Kids)] <- d2$d2.Ages
mrg <- data.frame(Kids, States, Ages)
mrg
#     Kids States Ages
#1    Jack     CA   12
#2    Jill     MA   10
#3 Jillian     DE    7
#4    John     HI   NA
#5   James     PA   NA
#6    Mary   <NA>   30
Using base R:
kids <- unique(c(d1$Kids, d2$Kids))
d3 <- data.frame("Kids" = kids, "ages" = NA, "states" = NA)
for (i in seq_along(kids)) {
if (any(d2$Kids == kids[i])) {
d3[which(d3$Kids == kids[i]),]$ages <- d2[which(d2$Kids == kids[i]),]$ages
} 
if (any(d1$Kids == kids[i])) {
d3[which(d1$Kids == kids[i]),]$states <- d1[which(d2$Kids == kids[i]),]$states
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With