SCIENTIFIC-LINUX-USERS Archives

August 2015

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Brad Cable <[log in to unmask]>
Reply To:
Brad Cable <[log in to unmask]>
Date:
Tue, 25 Aug 2015 11:23:31 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (217 lines)
The MSF files aren't the ones I was using, the no extension files were, 
as you found out.

Thunderbird can't display information that it does not have locally
unless it polls constantly, though your client might be configured to do
that if it thinks it's connecting to an exchange server (the IMAP
auto-detection settings might have found that out). Take a look in your
Account Settings under "Synchronization & Storage", and make sure "Keep
messages for this account on this computer" is checked. You can also
look at the Disk space settings also on that tab to see if your client
is deleting the older email.

If it is already checked and everything is stored locally, I'm curious
about the "INBOX-75" name. That suggests that you possibly have
multiple INBOX files, is that the only one there? (ignoring the MSF)
What does the directory listing inside
"~/.thunderbird/gr6o2z18.default/ImapMail/exchange.csusb.edu" look like?

-Brad


On 08/25/2015 02:37 AM, Yasha Karant wrote:
> Brad,
>
> Let me certain that I understand you.
>
> ~/.thunderbird/gr6o2z18.default/ImapMail/exchange.csusb.edu is my path
> to a set of IMAP email entries, extension msf for most
> cat -v INBOX*.msf | less
> yields nothing like a header
> However, one of the files has a name without extension, INBOX-75
> cat -v INBOX-75 | less
> does seem to contain text that could be parsed and extracted by the
> script you kindly provided. However, this single file (-75) is not
> complete in terms of the historical epoch I need. All of the relevant
> email is kept on a remote proprietary Microsoft cloud email server
> that will respond to standard IMAP requests from standard open systems
> clients, such as Thunderbird -- one is not forced to use a proprietary
> Microsoft product as a client. It appears that the .msf files somehow
> contain IMAP instructions to retrieve earlier epoch email from the
> server. If this is the case, elucidation would be appreciated. What
> I have found is:
>
> From http://file.org/extension/msf
>
> The MSF files that are used by these email programs do not contain the
> actual contents of the email message that was sent or received using
> the email service. These files only contain an index of the messages
> and the message mail headers and summaries. The Earthlink and Mozilla
> email applications both use the .msf file extension for this purpose.
>
> When I do process a msf file with cat -v and hunt for subject via
> less, I find:
>
> (71ED1=3ae99)(2F6CC=Cloud webinar presentation (was: Re: (no subject\)\))
>
> but no actual subject field -- merely a text string that only contains
> the phrase subject as above.
>
> The MSF files that are used by these email programs do not contain the
> actual contents of the email message that was sent or received using
> the email service. These files only contain an index of the messages
> and the message mail headers and summaries. The Earthlink and Mozilla
> email applications both use the .msf file extension for this purpose.
>
> Again, all of this could be avoided were there a screen capture method
> that contained an OCR that would allow me to paste into a word
> processor, LaTeX text, etc.
>
> Yasha
>
> On 08/24/2015 07:08 PM, Brad Cable wrote:
>> ToddAndMargo: grep is a decent start, but it actually is much more
>> complicated than that.
>>
>>
>> To get this data in a format you can read/process, you have to deal
>> with the fact that there is no standard in SMTP for the order of
>> headers, and every client seems to do it differently. On top of
>> which, some clients might record a "Sent" header instead of a "Date"
>> header, and then you have to deal with control fields, etc.
>>
>> Anyway, assuming Thunderbird on Linux, you can get to the MBOX files
>> from
>> ~/.thunderbird/<YOUR_PROFILE_ID>/ImapMail/<IMAPSERVERNAME_FOLDER>/<REMOTE_FOLDER>
>>
>> YOUR_PROFILE_ID would be whatever you see there, it's a random string
>> and if you only have one profile it will end in ".default".
>> IMAPSERVERNAME_FOLDER would be which email account you are looking
>> for, and if you have multiple it might append a "-2", "-3", etc.
>> "imap.gmail.com" is a good example
>> REMOTE_FOLDER is the actual folder name of the folder you are trying
>> to scrape. So "INBOX", "Sent", "Spam", etc.
>>
>> I wrote this simple combination of grep/awk to convert everything
>> into a CSV that you can import into whatever you want.
>>
>> If you save this in thunderbird_to_csv.sh, you can execute it like so
>> (the first argument is the Thunderbird MBOX file):
>>
>> $ ./thunderbird_to_csv.sh
>> .thunderbird/<YOUR_PROFILE_ID>/ImapMail/<IMAPSERVERNAME_FOLDER>/<REMOTE_FOLDER>
>>
>>
>>
>> #!/bin/bash
>> grep -E "^((Subject|Date|Sent|From): |From - )" $1 | awk 'BEGIN {
>> print "From,Subject,Date"; } /^From - /{
>> subject=""; from=""; date="";
>> while(length(from) == 0 || length(date) == 0 || length(subject)
>> == 0){
>> getline;
>> if(length(from) == 0 && index($0, "From: ") == 1){
>> from=gensub("^From: (.*)$", "\\1", $0);
>> }
>> if(length(subject) == 0 && index($0, "Subject: ") == 1){
>> subject=gensub("^Subject: (.*)$", "\\1", $0);
>> }
>> if(length(date) == 0 && index($0, "Date: ") == 1){
>> date=gensub("^Date: (.*)$", "\\1", $0);
>> }
>> if(length(date) == 0 && index($0, "Sent: ") == 1){
>> date=gensub("^Sent: (.*)$", "\\1", $0);
>> }
>> }
>> sub("\"", "\"\"", from);
>> sub("\"", "\"\"", subject);
>> sub("\"", "\"\"", date);
>> from=substr(from, 0, length(from)-1);
>> subject=substr(subject, 0, length(subject)-1);
>> date=substr(date, 0, length(date)-1);
>> print "\"" from "\",\"" subject "\",\"" date "\"";
>> }'
>>
>>
>>
>>
>> For those curious what this does, the grep command strips everything
>> down to lines starting with "Subject: ", "From: ", "Date: ", "Sent:
>> ", and "From - ".
>>
>> I can't recall if "From -" is a part of the MBOX format (I don't
>> remember it being there), but I think it's actually something
>> Thunderbird threw in there. Glad they did, as it separates each
>> email pretty nicely.
>>
>> It then loops through every line to see if you can find these
>> headers, and replaces them IF AND ONLY IF THAT HEADER HASN'T BEEN
>> SEEN BEFORE. So for instance, if you have an email that was
>> originally from Bob, forwarded to you from Alice, if I kept searching
>> through it would say the email was from Bob and not Alice (because
>> Alice is who actually sent that email).
>>
>> After that, it escapes the double quotes inside to be two
>> double-quotes, the standard for CSV files, and takes off the last
>> character which is an extra newline.
>>
>> -Brad
>>
>>
>> On 08/24/2015 06:54 PM, ToddAndMargo wrote:
>>> On 08/24/2015 04:29 PM, Yasha Karant wrote:
>>>> My query applies specifically to Mozilla Thunderbird current, but
>>>> could
>>>> have a more general solution.
>>>>
>>>> I need to convert to a plain text file listing (that could be imported
>>>> into a word processor, LaTeX or a GUI front end thereto, etc) what
>>>> appears in the display of Thunderbird as the columns Subject From and
>>>> Date for an internal activity report that I must write. These columns
>>>> appear on the end-user GUI display and allow one to then read specific
>>>> messages by "point and click". As I cannot find a description of the
>>>> official Thunderbird nomenclature for the various sections of the GUI
>>>> display, I am using the above descriptions.
>>>>
>>>> I could use a screenshot application, select a rectangular region,
>>>> save
>>>> each entity as a PNG image, and then use an OCR application to yield
>>>> plain text. I would prefer that the screenshot application simply
>>>> recognizes the text *AS* text, allowing me to copy and paste into a
>>>> text
>>>> editor, etc., all running under X wndows. Does anyone know of an
>>>> application that does this? A brief perusal on the web as well as a
>>>> quick read of the information on the "default" screenshot applications
>>>> that come with either MATE or KDE does not seem to reveal a mechanism
>>>> for this (but rather the PNG or other image, non-text, route).
>>>>
>>>> The normal mechanism I use -- highlight (select), pointing device
>>>> button
>>>> (to copy), and then point device button (paste) to capture from say a
>>>> text HTTP file in a web browser to a word processor application --
>>>> does
>>>> not seem to work for the above "column" portion of the Thunderbird
>>>> display. This normal mechanism does work if I view source for each
>>>> message, displaying the SMTP text source and headers in a box, but is
>>>> very time consuming as the information that I need is available in the
>>>> "columns" of the basic Thunderbird user interface without having to
>>>> view
>>>> the source.
>>>>
>>>> Any assistance is appreciated.
>>>>
>>>> Yasha Karant
>>>
>>> Hi Yasha,
>>>
>>> Something like this?
>>>
>>> grep -i "subject\|from\|date" Inbox
>>>
>>> -T
>>>
>>>
>>>
>>
>>

ATOM RSS1 RSS2