I have the following data:
Newspaper Month Year Date Topic1 Topic2 Topic3 Topic4 Topic5
1 Scotsman December 2005 December 2005 0.013749700 0.000127470 0.38575261 0.000127470 0.070778523
2 Scotsman December 2005 December 2005 0.000165017 0.000165017 0.05219433 0.004611941 0.000165017
3 Scotsman December 2005 December 2005 0.000356507 0.024344932 0.01135670 0.000356507 0.000356507
4 Scotsman December 2005 December 2005 0.000185186 0.000185186 0.10796924 0.044639345 0.106613401
5 Scotsman December 2005 December 2005 0.065869506 0.009775978 0.09610254 0.017584819 0.000103681
6 Scotsman December 2005 December 2005 0.000153257 0.000153257 0.11443001 0.000153257 0.046316677
I would like to create a separate variable that corresponds to the TopicN with higher percentage.
In the case of the first article (row), it would be 3. Any idea?
You could use max.col() on the topic columns. If df is the data, try
max.col(df[grepl("^Topic", names(df))])
# [1] 3 3 2 3 3 3
So to add a new column MaxPct, we can do
df$MaxPct <- max.col(df[grepl("^Topic", names(df))])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With