Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import and Parse .eml files

Tags:

r

text-parsing

I'm hoping someone can give me some advice on importing and parsing .eml files in r. I have a folder with around 1000 .eml files containing text which includes entries like the one below:

Return-Path: < [email protected]>

What I would like to do is import all of these files in to a data.frame or data.table in r, and parse out the email addresses in to a separate field.
I think I've seen something like this done before with text files and using grep.

Any tips would be very much appreciated.

like image 511
user3476463 Avatar asked Oct 23 '25 21:10

user3476463


1 Answers

I started with an mbox file that I downloaded from gmail. Broke it down into a bunch of individual messages in eml format. Then from each file pulled out the lines I need and assembled them into a data frame.

library(tm.plugin.mail)

mbf <- "mboxfile"
convert_mbox_eml(mbf, "emlfile2")

maildir <- "emlfile2"
mailfiles <- dir(maildir, full.names=TRUE)
readmsg <- function(fname) {
  l <- readLines(fname)
  subj <- grep("Subject: ", l, value=TRUE)
  subj <- gsub("Subject: ", "", subj)
  date <- grep("Date: ", l, value=TRUE)
  date <- gsub("Date: ", "", date)
  text1 <- tail(l, 3)[1]
  text2 <- tail(l, 3)[2]
  return(c(subj, date, text1, text2))
}

mdf <- do.call(rbind, lapply(mailfiles, readmsg)) 
like image 118
J. Win. Avatar answered Oct 25 '25 12:10

J. Win.