ToddAndMargo: grep is a decent
start, but it actually is much more complicated than that.
To get this data in a format you can read/process, you have to
deal with the fact that there is no standard in SMTP for the order
of headers, and every client seems to do it differently. On top
of which, some clients might record a "Sent" header instead of a
"Date" header, and then you have to deal with control fields, etc.
Anyway, assuming Thunderbird on Linux, you can get to the MBOX
files from ~/.thunderbird/<YOUR_PROFILE_ID>
/ImapMail/<IMAPSERVERNAME_FOLDER>/<REMOTE_FOLDER>
YOUR_PROFILE_ID would be whatever you see there, it's a random
string and if you only have one profile it will end in ".default".
IMAPSERVERNAME_FOLDER would be which email account you are looking
for, and if you have multiple it might append a "-2", "-3", etc.
"imap.gmail.com" is a good example
REMOTE_FOLDER is the actual folder name of the folder you are
trying to scrape. So "INBOX", "Sent", "Spam", etc.
I wrote this simple combination of grep/awk to convert everything
into a CSV that you can import into whatever you want.
If you save this in thunderbird_to_csv.sh, you can execute it like
so (the first argument is the Thunderbird MBOX file):
$ ./thunderbird_to_csv.sh .thunderbird/<YOUR_PROFILE_ID>
/ImapMail/<IMAPSERVERNAME_FOLDER>/<REMOTE_FOLDER>
#!/bin/bash
grep -E "^((Subject|Date|Sent|From): |From - )" $1 | awk 'BEGIN {
print "From,Subject,Date"; } /^From - /{
subject=""; from=""; date="";
while(length(from) == 0 || length(date) == 0 ||
length(subject) == 0){
getline;
if(length(from) == 0 && index($0, "From: ") == 1){
from=gensub("^From: (.*)$", "\\1", $0);
}
if(length(subject) == 0 && index($0, "Subject: ")
== 1){
subject=gensub("^Subject: (.*)$", "\\1", $0);
}
if(length(date) == 0 && index($0, "Date: ") == 1){
date=gensub("^Date: (.*)$", "\\1", $0);
}
if(length(date) == 0 && index($0, "Sent: ") == 1){
date=gensub("^Sent: (.*)$", "\\1", $0);
}
}
sub("\"", "\"\"", from);
sub("\"", "\"\"", subject);
sub("\"", "\"\"", date);
from=substr(from, 0, length(from)-1);
subject=substr(subject, 0, length(subject)-1);
date=substr(date, 0, length(date)-1);
print "\"" from "\",\"" subject "\",\"" date "\"";
}'
For those curious what this does, the grep command strips
everything down to lines starting with "Subject: ", "From: ",
"Date: ", "Sent: ", and "From - ".
I can't recall if "From -" is a part of the MBOX format (I don't
remember it being there), but I think it's actually something
Thunderbird threw in there. Glad they did, as it separates each
email pretty nicely.
It then loops through every line to see if you can find these
headers, and replaces them IF AND ONLY IF THAT HEADER HASN'T BEEN
SEEN BEFORE. So for instance, if you have an email that was
originally from Bob, forwarded to you from Alice, if I kept
searching through it would say the email was from Bob and not
Alice (because Alice is who actually sent that email).
After that, it escapes the double quotes inside to be two
double-quotes, the standard for CSV files, and takes off the last
character which is an extra newline.
-Brad
On 08/24/2015 06:54 PM, ToddAndMargo wrote:
On 08/24/2015
04:29 PM, Yasha Karant wrote:
My query applies
specifically to Mozilla Thunderbird current, but could
have a more general solution.
I need to convert to a plain text file listing (that could be
imported
into a word processor, LaTeX or a GUI front end thereto, etc)
what
appears in the display of Thunderbird as the columns Subject
From and
Date for an internal activity report that I must write.
These columns
appear on the end-user GUI display and allow one to then read
specific
messages by "point and click". As I cannot find a description
of the
official Thunderbird nomenclature for the various sections of
the GUI
display, I am using the above descriptions.
I could use a screenshot application, select a rectangular
region, save
each entity as a PNG image, and then use an OCR application to
yield
plain text. I would prefer that the screenshot application
simply
recognizes the text *AS*
text, allowing me to copy and paste into a text
editor, etc., all running under X wndows. Does anyone know
of an
application that does this? A brief perusal on the web as
well as a
quick read of the information on the "default" screenshot
applications
that come with either MATE or KDE does not seem to reveal a
mechanism
for this (but rather the PNG or other image, non-text, route).
The normal mechanism I use -- highlight (select), pointing
device button
(to copy), and then point device button (paste) to capture
from say a
text HTTP file in a web browser to a word processor
application -- does
not seem to work for the above "column" portion of the
Thunderbird
display. This normal mechanism does work if I view source for
each
message, displaying the SMTP text source and headers in a box,
but is
very time consuming as the information that I need is
available in the
"columns" of the basic Thunderbird user interface without
having to view
the source.
Any assistance is appreciated.
Yasha Karant
Hi Yasha,
Something like this?
grep -i "subject\|from\|date" Inbox
-T