Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merging rows within dataframe based on row.names

Tags:

r

aggregate

Apologies if this question has already been answered but all the info. I have been able to find is to do with merging data-frames themselves or merging in a different way. I'd really appreciate any thoughts.

I have a very large but very simple data frame with approx. 22500 rows and 48 columns. I would like to merge some of the rows within the data frame based on the row names and am wondering if there is any way to do this.

A portion of the data frame looks like this:

                         Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
    Nasvi2EG000001t1         28         43         33         25         64
    Nasvi2EG000002t2          0          3          0          0          4
    Nasvi2EG000002t5          0          0          0          0          0
    Nasvi2EG000002t6          0          0          0          0          0
    Nasvi2EG000004t1          1          0          0          0          0
    Nasvi2EG000009t1          0          4          2          0          4
    Nasvi2EG000013t1         21          8         17         19          7
    Nasvi2EG000014t1          0          3          0          0          4
    Nasvi2EG000014t2          0          4          0          0          3

As you can see rows 2, 3 and 4 are identical in name until the digit after the "t" and same with rows 8 and 9. I'd like to merge the similarly named rows together...

What I'd like to end up with is this:

                     Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
    Nasvi2EG000001t1         28         43         33         25         64
    Nasvi2EG000002            0          3          0          0          4
    Nasvi2EG000004t1          1          0          0          0          0
    Nasvi2EG000009t1          0          4          2          0          4
    Nasvi2EG000013t1         21          8         17         19          7
    Nasvi2EG000014            0          7          0          0          7

where the values in the rows that have been merged are summed.

Would be very grateful for any thoughts.

Thanks!

like image 285
Nicki Avatar asked Dec 06 '25 19:12

Nicki


1 Answers

Assuming your data.frame is called "SODF", create a vector from the row.names that strips out the "t+some digit" from the end of the row.names and use that as your aggregation variable.

> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
          aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001         28         43         33         25         64
2 Nasvi2EG000002          0          3          0          0          4
3 Nasvi2EG000004          1          0          0          0          0
4 Nasvi2EG000009          0          4          2          0          4
5 Nasvi2EG000013         21          8         17         19          7
6 Nasvi2EG000014          0          7          0          0          7
like image 129
A5C1D2H2I1M1N2O1R2T1 Avatar answered Dec 09 '25 13:12

A5C1D2H2I1M1N2O1R2T1



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!