Wednesday, July 16, 2008

How to Parse JAMES / SMTP Email Messages in Ruby

So here is my predicament : I have a JAMES (Java Apache Mail Enterprise Server) running on my machine which I am only using as a router for sending emails to the outside world (the smtp settings needed some hostname, right? usually people put their company's mail server information in there).

Our CEO really wants the "yahoo and gmail" ability since emailing is the #1 activity on the internet. Our SME is groaning that "emails are not necessary". I loved the idea of using emails but the workload and prioritization by the SME prevented me from working on it - UNTIL he taunts me and says "Well you said you were going to work on it but its been two weeks and ..." Couple this with this fact : JAMES is totally powerful (like procmail) and I had plans for it which I wanted to realize and this man proposed the plans and God did NOT dispose of them! (thank God!)

In the previous post I showed how I created email accounts on the fly by telnet-ing to JAMES and just creating an account for every user for our website.

Now that they have an account, now what? Now they can receive emails at that account! meaning if JAMES is running on our production server called "www.letter4sure.com" (domain name) you can configure it (tackled in an upcoming post) to have a mailserver running with a domain name of @letter4sure.com! Since configuring this is already on the wiki and docs I will show how I retrieved emails from the user inbox and made it available to the user.

So here is where we stand: whenever a user "test" registers on our website we create an email account for them which you can write to for example : test@letter4sure.com is a legit account you may email to.

The first step was to locate this email's location. JAMES stores it in the following directory : INBOX_PATH = "C:/James/james-binary-2.3.1/james-2.3.1/apps/james/var/mail/inboxes"

Under the inbox is a folder with the user id : test and in that directory ( USER_TEST_INBOX_PATH = "C:/James/james-binary-2.3.1/james-2.3.1/apps/james/var/mail/inboxes/test") JAMES stores two files : one object file (email as a serialized object) and the email as plain text. Following screen shot should depict this visually:

Since I am not using Java but instead we are using ROR we will be parsing the plain text file (this was done before we implemented the Ruby Java Bridge; with the bridge we could so much more easily read emails by de serializing the object).



This is what the typical email message looks like: I have highlighted some headers :

Return-Path:
Delivered-To: test@letter4sure.com
Received: from an-out-0708.google.com ([209.85.132.241])
by letter4sure (JAMES SMTP Server 2.3.1) with SMTP ID 472
for ;
Sat, 3 May 2008 18:25:22 -0400 (EDT)
Received: by an-out-0708.google.com with SMTP id c8so393902ana.14
for ; Sat, 03 May 2008 15:25:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
bh=Uo8qOpB7rUSrIN27heBRePRNd+qtPvjVt+fKn3+oAMo=;
b=qArxrmx7iSrv4mFJRjZReLZFIS1Mv/vL3+sfzgyug0ZjCmkg3+p7zeLGIKxQy60eM7XmbisqtzGdcQcpQxnOjUvsl3ccrN7UxEDi3F+AyyzgOO4KsfqmLColHJZGoyoGafduElC2RDrIhaIQXa7b2DS1lr4VPTJpzBtB4F9cx+s=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
b=EOM9cxjCGAE1jdcdbgn1HCd3SOZ4s6evJdTEhPoSQZ6vSWtS1S2blXnjWgAGboZhublnmvVE0gI3BopyxOoMkX54rLk0GPmvZDMcRoz2DAjCtp3AGFxQHeKSnQa/P8P7pTGth7AauDhCF9VIu1mYxGDJUPF7PpRrR16ijat8cZc=
Received: by 10.100.121.12 with SMTP id t12mr6265454anc.154.1209853514008;
Sat, 03 May 2008 15:25:14 -0700 (PDT)
Received: by 10.100.250.10 with HTTP; Sat, 3 May 2008 15:25:13 -0700 (PDT)
Message-ID:
Date: Sat, 3 May 2008 18:25:13 -0400
From: "Athar Shiraz Siddiqui"
To: test@letter4sure.com
Subject: Testing your email checking capabilities
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

testing ...

Here the body of the email is in yellow and the header is in blue and the important key value pairs in the header is in red.

Following is the ruby file that basically takes the above and extracts the information out of it.
You can store the following in a ruby file like match.rb :

@content ='Return-Path:
Delivered-To: test@letter4sure.com
Received: from an-out-0708.google.com ([209.85.132.241])
by letter4sure (JAMES SMTP Server 2.3.1) with SMTP ID 472
for ;
Sat, 3 May 2008 18:25:22 -0400 (EDT)
Received: by an-out-0708.google.com with SMTP id c8so393902ana.14
for ; Sat, 03 May 2008 15:25:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
bh=Uo8qOpB7rUSrIN27heBRePRNd+qtPvjVt+fKn3+oAMo=;
b=qArxrmx7iSrv4mFJRjZReLZFIS1Mv/vL3+sfzgyug0ZjCmkg3+p7zeLGIKxQy60eM7XmbisqtzGdcQcpQxnOjUvsl3ccrN7UxEDi3F+AyyzgOO4KsfqmLColHJZGoyoGafduElC2RDrIhaIQXa7b2DS1lr4VPTJpzBtB4F9cx+s=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
b=EOM9cxjCGAE1jdcdbgn1HCd3SOZ4s6evJdTEhPoSQZ6vSWtS1S2blXnjWgAGboZhublnmvVE0gI3BopyxOoMkX54rLk0GPmvZDMcRoz2DAjCtp3AGFxQHeKSnQa/P8P7pTGth7AauDhCF9VIu1mYxGDJUPF7PpRrR16ijat8cZc=
Received: by 10.100.121.12 with SMTP id t12mr6265454anc.154.1209853514008;
Sat, 03 May 2008 15:25:14 -0700 (PDT)
Received: by 10.100.250.10 with HTTP; Sat, 3 May 2008 15:25:13 -0700 (PDT)
Message-ID:
Date: Sat, 3 May 2008 18:25:13 -0400
From: "Athar Shiraz Siddiqui"
To: test@letter4sure.com
Subject: Testing your email checking capabilities
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

testing ...
'

def extractField field
result = /^#{field}.*$/.match(@content)
output = result[0].gsub(field,"")
puts "Field: #{field} and value: #{output}"
return output
end

def extractBody
puts @content[/^$.*/m]
end
# gets the whole line
# /^From:.*$/.match(@content)

extractField "From:"
extractField "To:"
extractField "Subject:"
date = extractField "Date:"
#puts (" date : #{DateTime.parse(date)}")
extractField "Content-Type:"
extractBody
run it and you will see the fields and values extracted like so:
Yes as you can see that the biggest trouble was getting the darn reg expression statements to work. I got some help from regexbuddy and the rest was from googling.

So once the user clicks on his inbox tab all I have to do is load any text files in the inbox->parse them-> store them into the DB-> remove the text file and put it in archive (to prevent it from being loaded into the DB again)-> permit the user to see his email from the DB.

def loadFilesFromDir
@user_inbox_path = MailServer::INBOX_PATH + File::SEPARATOR+ current_user.login
@user_archive_path = MailServer::ARCHIVE_PATH + File::SEPARATOR+ current_user.login
# logger.debug ">> path : #{@user_inbox_path}"
if File.directory?(@user_inbox_path)
messages = getMessages(@user_inbox_path)
loadIndividualMessages messages
else
flash[:notice]="Your Inbox doesn't exist. Please contact Tech Support for help."
# no emails in the inbox or inbox not created
end
end

def getMessages(inbox)
messages = []
Find.find(inbox) do |f|
next unless File.file?(f)
filename = File.basename(f)
if File.fnmatch('*.FileStreamStore', filename)
FileUtils.mkdir_p(@user_archive_path) # create user archive path if it doesnt exist
movetopath = "#{@user_archive_path}"+ File::SEPARATOR+ "#{filename}"
FileUtils.mv f, movetopath
messages <<>> message before: #{message.inspect}"
@emailcontent = IO.read(message)
@msg = IncomingMsg.new
@msg.user = current_user
@msg.from = extractField "From:"
@msg.to = extractField "To:"
@msg.subject = extractField "Subject:"
@msg.arrived = DateTime.parse(extractField "Date:")
@msg.content_type= extractField "Content-Type:"
# get everything starting with the blank line :^$
# to end of file .* multiple characters
# multiline:/m
@msg.body = @emailcontent[/^$.*/m]
@msg.save!
logger.debug ">> message after save: #{@msg.inspect}"
end
end

def extractField field
line = /^#{field}.*$/.match(@emailcontent) # get entire line
value = line[0].gsub(field, "") # remove the key from the line
return value
end

The above code takes files like the .filstream file from here :
and archives / moves it to here :

It also takes the contents of the filestreamstore (email message) including headers and stores it here in the db :

That is it. You may have questions about using FileUtils or using Ruby's file reading/deleting/manipulation facilities. Feel free to request an elaboration and I will have another post to write about.

You can see the users view of the inbox here (draft):

No comments:

Total Pageviews

Popular Posts