I'm hoping someone can give me some advice on importing and parsing .eml files in r. I have a folder with around 1000 .eml files containing text which includes entries like the one below:
Return-Path: < [email protected]>
What I would like to do is import all of these files in to a data.frame or data.table in r, and parse out the email addresses in to a separate field.
I think I've seen something like this done before with text files and using grep.
Any tips would be very much appreciated.
I started with an mbox file that I downloaded from gmail. Broke it down into a bunch of individual messages in eml format. Then from each file pulled out the lines I need and assembled them into a data frame.
library(tm.plugin.mail)
mbf <- "mboxfile"
convert_mbox_eml(mbf, "emlfile2")
maildir <- "emlfile2"
mailfiles <- dir(maildir, full.names=TRUE)
readmsg <- function(fname) {
l <- readLines(fname)
subj <- grep("Subject: ", l, value=TRUE)
subj <- gsub("Subject: ", "", subj)
date <- grep("Date: ", l, value=TRUE)
date <- gsub("Date: ", "", date)
text1 <- tail(l, 3)[1]
text2 <- tail(l, 3)[2]
return(c(subj, date, text1, text2))
}
mdf <- do.call(rbind, lapply(mailfiles, readmsg))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With