Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert dataframe into usable format for sequence mining in R?

Tags:

r

sequence

arules

I'd like to do sequence analysis in R, and I'm trying to convert my data into a usable form for the arulesSequences package.

library(tidyverse)
library(arules)
library(arulesSequences)

df <- data_frame(personID = c(1, 1, 2, 2, 2),
             eventID = c(100, 101, 102, 103, 104),
             site = c("google", "facebook", "facebook", "askjeeves", "stackoverflow"),
             sequence = c(1, 2, 1, 2, 3))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))

If leave my columns as their original class as above, I get an error:

Error in asMethod(object) : 
  column(s) 1, 2, 3, 4 not logical or a factor. Discretize the columns first.

However, if I convert the columns to factors, I get another error:

df <- data_frame(personID = c(1, 1, 2, 2, 2),
             eventID = c(100, 101, 102, 103, 104),
             site = c("google", "facebook", "facebook", "askjeeves", "stackoverflow"),
             sequence = c(1, 2, 1, 2, 3))

df <- as.data.frame(lapply(df, as.factor))
df.trans <- as(df, "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
seq <- cspade(df.trans, parameter = list(support = 0.4), control = list(verbose = TRUE))

Error in asMethod(object) :
In makebin(data, file) : 'eventID' is a factor

Any advice on getting around this or advice on sequence mining in R in general is much appreciated. Thanks!

like image 483
mowglis_diaper Avatar asked Oct 25 '25 06:10

mowglis_diaper


1 Answers

Only the actual items (in your case "site") go into the transactions. Always inspect your intermediate results to make sure it looks right. The type of transactions needed for sequence mining is described in ? cspade.

library("arulesSequences")
df <- data.frame(personID = c(1, 1, 2, 2, 2),
             eventID = c(100, 101, 102, 103, 104),
             site = c("google", "facebook", "facebook", "askjeeves", "stackoverflow"),
             sequence = c(1, 2, 1, 2, 3))

# convert site into itemsets and add sequence and event ids
df.trans <- as(df[,"site", drop = FALSE], "transactions")
transactionInfo(df.trans)$sequenceID <- df$sequence
transactionInfo(df.trans)$eventID <- df$eventID
inspect(df.trans)

# sort by sequenceID
df.trans <- df.trans[order(transactionInfo(df.trans)$sequenceID),]
inspect(df.trans)

# mine sequences
seq <- cspade(df.trans, parameter = list(support = 0.2), 
              control = list(verbose = TRUE))
inspect(seq)

Hope this helps!

like image 82
Michael Hahsler Avatar answered Oct 27 '25 20:10

Michael Hahsler