LISTSERV mailing list manager LISTSERV 16.5

Help for SCIENTIFIC-LINUX-USERS Archives

   

SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives


SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

SCIENTIFIC-LINUX-USERS Home

SCIENTIFIC-LINUX-USERS Home

SCIENTIFIC-LINUX-USERS  August 2015

SCIENTIFIC-LINUX-USERS August 2015

Subject:

Re: how to capture text as text

From:

Yasha Karant <[log in to unmask]>

Reply-To:

Yasha Karant <[log in to unmask]>

Date:

Tue, 25 Aug 2015 10:30:33 -0700

Content-Type:

multipart/mixed

Parts/Attachments:

Parts/Attachments

text/plain (250 lines) , ls.out (1 lines)

Unfortunately, I do not have adequate storage capacity on my devices to
store all of my email -- I must leave it on the server.

Thus, as Thunderbird does display the information that I need on the
primary user interface, I was hoping to be able to capture that
information from that interface, not having to either manually download
or peruse the detailed files. When my email was on a local SMTP server
from which I could read "directories" thereon, this was an easy task
using a grep script. However, we are not allowed to get to the servers
-- the only interfaces allowed are a web browser to an MS Outlook URL,
actual MS Outlook or similar MS applications, or an IMAP enabled email
client. I use the latter.

I could try to use the MS Outlook URL and capture from there, but others
who try this in Linux Mozilla Firefox or Linux Google Chrome are not
able to highlight and paste from the user interface -- something appears
to have been disabling this capability that does work for most URLs as
displayed in the browser application.

Yasha

On 08/25/2015 09:23 AM, Brad Cable wrote:
> The MSF files aren't the ones I was using, the no extension files
> were, as you found out.
>
> Thunderbird can't display information that it does not have locally
> unless it polls constantly, though your client might be configured to
> do that if it thinks it's connecting to an exchange server (the IMAP
> auto-detection settings might have found that out). Take a look in
> your Account Settings under "Synchronization & Storage", and make sure
> "Keep messages for this account on this computer" is checked. You can
> also look at the Disk space settings also on that tab to see if your
> client is deleting the older email.
>
> If it is already checked and everything is stored locally, I'm curious
> about the "INBOX-75" name. That suggests that you possibly have
> multiple INBOX files, is that the only one there? (ignoring the MSF)
> What does the directory listing inside
> "~/.thunderbird/gr6o2z18.default/ImapMail/exchange.csusb.edu" look like?
>
> -Brad
>
>
> On 08/25/2015 02:37 AM, Yasha Karant wrote:
>> Brad,
>>
>> Let me certain that I understand you.
>>
>> ~/.thunderbird/gr6o2z18.default/ImapMail/exchange.csusb.edu is my
>> path to a set of IMAP email entries, extension msf for most
>> cat -v INBOX*.msf | less
>> yields nothing like a header
>> However, one of the files has a name without extension, INBOX-75
>> cat -v INBOX-75 | less
>> does seem to contain text that could be parsed and extracted by the
>> script you kindly provided. However, this single file (-75) is not
>> complete in terms of the historical epoch I need. All of the
>> relevant email is kept on a remote proprietary Microsoft cloud email
>> server that will respond to standard IMAP requests from standard open
>> systems clients, such as Thunderbird -- one is not forced to use a
>> proprietary Microsoft product as a client. It appears that the .msf
>> files somehow contain IMAP instructions to retrieve earlier epoch
>> email from the server. If this is the case, elucidation would be
>> appreciated. What I have found is:
>>
>> From http://file.org/extension/msf
>>
>> The MSF files that are used by these email programs do not contain
>> the actual contents of the email message that was sent or received
>> using the email service. These files only contain an index of the
>> messages and the message mail headers and summaries. The Earthlink
>> and Mozilla email applications both use the .msf file extension for
>> this purpose.
>>
>> When I do process a msf file with cat -v and hunt for subject via
>> less, I find:
>>
>> (71ED1=3ae99)(2F6CC=Cloud webinar presentation (was: Re: (no
>> subject\)\))
>>
>> but no actual subject field -- merely a text string that only
>> contains the phrase subject as above.
>>
>> The MSF files that are used by these email programs do not contain
>> the actual contents of the email message that was sent or received
>> using the email service. These files only contain an index of the
>> messages and the message mail headers and summaries. The Earthlink
>> and Mozilla email applications both use the .msf file extension for
>> this purpose.
>>
>> Again, all of this could be avoided were there a screen capture
>> method that contained an OCR that would allow me to paste into a word
>> processor, LaTeX text, etc.
>>
>> Yasha
>>
>> On 08/24/2015 07:08 PM, Brad Cable wrote:
>>> ToddAndMargo: grep is a decent start, but it actually is much more
>>> complicated than that.
>>>
>>>
>>> To get this data in a format you can read/process, you have to deal
>>> with the fact that there is no standard in SMTP for the order of
>>> headers, and every client seems to do it differently. On top of
>>> which, some clients might record a "Sent" header instead of a "Date"
>>> header, and then you have to deal with control fields, etc.
>>>
>>> Anyway, assuming Thunderbird on Linux, you can get to the MBOX files
>>> from
>>> ~/.thunderbird/<YOUR_PROFILE_ID>/ImapMail/<IMAPSERVERNAME_FOLDER>/<REMOTE_FOLDER>
>>>
>>> YOUR_PROFILE_ID would be whatever you see there, it's a random
>>> string and if you only have one profile it will end in ".default".
>>> IMAPSERVERNAME_FOLDER would be which email account you are looking
>>> for, and if you have multiple it might append a "-2", "-3", etc.
>>> "imap.gmail.com" is a good example
>>> REMOTE_FOLDER is the actual folder name of the folder you are trying
>>> to scrape. So "INBOX", "Sent", "Spam", etc.
>>>
>>> I wrote this simple combination of grep/awk to convert everything
>>> into a CSV that you can import into whatever you want.
>>>
>>> If you save this in thunderbird_to_csv.sh, you can execute it like
>>> so (the first argument is the Thunderbird MBOX file):
>>>
>>> $ ./thunderbird_to_csv.sh
>>> .thunderbird/<YOUR_PROFILE_ID>/ImapMail/<IMAPSERVERNAME_FOLDER>/<REMOTE_FOLDER>
>>>
>>>
>>>
>>> #!/bin/bash
>>> grep -E "^((Subject|Date|Sent|From): |From - )" $1 | awk 'BEGIN {
>>> print "From,Subject,Date"; } /^From - /{
>>> subject=""; from=""; date="";
>>> while(length(from) == 0 || length(date) == 0 || length(subject)
>>> == 0){
>>> getline;
>>> if(length(from) == 0 && index($0, "From: ") == 1){
>>> from=gensub("^From: (.*)$", "\\1", $0);
>>> }
>>> if(length(subject) == 0 && index($0, "Subject: ") == 1){
>>> subject=gensub("^Subject: (.*)$", "\\1", $0);
>>> }
>>> if(length(date) == 0 && index($0, "Date: ") == 1){
>>> date=gensub("^Date: (.*)$", "\\1", $0);
>>> }
>>> if(length(date) == 0 && index($0, "Sent: ") == 1){
>>> date=gensub("^Sent: (.*)$", "\\1", $0);
>>> }
>>> }
>>> sub("\"", "\"\"", from);
>>> sub("\"", "\"\"", subject);
>>> sub("\"", "\"\"", date);
>>> from=substr(from, 0, length(from)-1);
>>> subject=substr(subject, 0, length(subject)-1);
>>> date=substr(date, 0, length(date)-1);
>>> print "\"" from "\",\"" subject "\",\"" date "\"";
>>> }'
>>>
>>>
>>>
>>>
>>> For those curious what this does, the grep command strips everything
>>> down to lines starting with "Subject: ", "From: ", "Date: ", "Sent:
>>> ", and "From - ".
>>>
>>> I can't recall if "From -" is a part of the MBOX format (I don't
>>> remember it being there), but I think it's actually something
>>> Thunderbird threw in there. Glad they did, as it separates each
>>> email pretty nicely.
>>>
>>> It then loops through every line to see if you can find these
>>> headers, and replaces them IF AND ONLY IF THAT HEADER HASN'T BEEN
>>> SEEN BEFORE. So for instance, if you have an email that was
>>> originally from Bob, forwarded to you from Alice, if I kept
>>> searching through it would say the email was from Bob and not Alice
>>> (because Alice is who actually sent that email).
>>>
>>> After that, it escapes the double quotes inside to be two
>>> double-quotes, the standard for CSV files, and takes off the last
>>> character which is an extra newline.
>>>
>>> -Brad
>>>
>>>
>>> On 08/24/2015 06:54 PM, ToddAndMargo wrote:
>>>> On 08/24/2015 04:29 PM, Yasha Karant wrote:
>>>>> My query applies specifically to Mozilla Thunderbird current, but
>>>>> could
>>>>> have a more general solution.
>>>>>
>>>>> I need to convert to a plain text file listing (that could be
>>>>> imported
>>>>> into a word processor, LaTeX or a GUI front end thereto, etc) what
>>>>> appears in the display of Thunderbird as the columns Subject From and
>>>>> Date for an internal activity report that I must write. These columns
>>>>> appear on the end-user GUI display and allow one to then read
>>>>> specific
>>>>> messages by "point and click". As I cannot find a description of the
>>>>> official Thunderbird nomenclature for the various sections of the GUI
>>>>> display, I am using the above descriptions.
>>>>>
>>>>> I could use a screenshot application, select a rectangular region,
>>>>> save
>>>>> each entity as a PNG image, and then use an OCR application to yield
>>>>> plain text. I would prefer that the screenshot application simply
>>>>> recognizes the text *AS* text, allowing me to copy and paste into
>>>>> a text
>>>>> editor, etc., all running under X wndows. Does anyone know of an
>>>>> application that does this? A brief perusal on the web as well as a
>>>>> quick read of the information on the "default" screenshot
>>>>> applications
>>>>> that come with either MATE or KDE does not seem to reveal a mechanism
>>>>> for this (but rather the PNG or other image, non-text, route).
>>>>>
>>>>> The normal mechanism I use -- highlight (select), pointing device
>>>>> button
>>>>> (to copy), and then point device button (paste) to capture from say a
>>>>> text HTTP file in a web browser to a word processor application --
>>>>> does
>>>>> not seem to work for the above "column" portion of the Thunderbird
>>>>> display. This normal mechanism does work if I view source for each
>>>>> message, displaying the SMTP text source and headers in a box, but is
>>>>> very time consuming as the information that I need is available in
>>>>> the
>>>>> "columns" of the basic Thunderbird user interface without having
>>>>> to view
>>>>> the source.
>>>>>
>>>>> Any assistance is appreciated.
>>>>>
>>>>> Yasha Karant
>>>>
>>>> Hi Yasha,
>>>>
>>>> Something like this?
>>>>
>>>> grep -i "subject\|from\|date" Inbox
>>>>
>>>> -T
>>>>
>>>>
>>>>
>>>
>>>
>
>



advising-1.sbd
advising.msf
advising.sbd
archives1-1.sbd
archives1.msf
Archives-1.msf
archives1.sbd
Archives-1.sbd
Archives-2.sbd
Archives.msf
automatic-reply-1.sbd
automatic-reply.msf
automatic-reply.sbd
bitpipe-1.sbd
bitpipe.msf
bitpipe.sbd
cfengine.msf
chimera-ucsf
chimera-ucsf.msf
Drafts-1
Drafts-1.msf
Drafts.msf
facsen.msf
fac-sen-workstation-survey.msf
filterlog.html
folder1.msf
IEEE.msf
INBOX-10.msf
INBOX-11.msf
INBOX-12.msf
INBOX-13.msf
INBOX-14.msf
INBOX-15.msf
INBOX-16.msf
INBOX-17.msf
INBOX-18.msf
INBOX-19.msf
INBOX-1.msf
INBOX-20.msf
INBOX-21.msf
INBOX-22.msf
INBOX-23.msf
INBOX-24.msf
INBOX-25.msf
INBOX-26.msf
INBOX-27.msf
INBOX-28.msf
INBOX-29.msf
INBOX-2.msf
INBOX-30.msf
INBOX-31.msf
INBOX-32.msf
INBOX-33.msf
INBOX-34.msf
INBOX-35.msf
INBOX-36.msf
INBOX-37.msf
INBOX-38.msf
INBOX-39.msf
INBOX-3.msf
INBOX-40.msf
INBOX-41.msf
INBOX-42.msf
INBOX-43.msf
INBOX-44.msf
INBOX-45.msf
INBOX-46.msf
INBOX-47.msf
INBOX-48.msf
INBOX-49.msf
INBOX-4.msf
INBOX-50.msf
INBOX-51.msf
INBOX-52.msf
INBOX-53.msf
INBOX-54.msf
INBOX-55.msf
INBOX-56.msf
INBOX-57.msf
INBOX-58.msf
INBOX-59.msf
INBOX-5.msf
INBOX-60.msf
INBOX-61.msf
INBOX-62.msf
INBOX-63.msf
INBOX-64.msf
INBOX-65.msf
INBOX-66.msf
INBOX-67.msf
INBOX-68.msf
INBOX-69.msf
INBOX-6.msf
INBOX-70.msf
INBOX-71.msf
INBOX-72.msf
INBOX-73.msf
INBOX-74.msf
INBOX-75
INBOX-75.msf
INBOX-75.sbd
INBOX-76.msf
INBOX-7.msf
INBOX-8.msf
INBOX-9.msf
INBOX.msf
inbox-sun-1.sbd
inbox-sun.msf
inbox-sun.sbd
Junk.msf
ls.out
msgFilterRules.dat
paraview
paraview.msf
paraview.sbd
rtg.msf
sci-linux
sci-linux-1.sbd
sci-linux.msf
sci-linux.sbd
Sent-1
Sent-1.msf
Sent-1.sbd
Sent-2.sbd
Sent.msf
solpubss.msf
SUN_Migrate_Import-1.msf
SUN_Migrate_Import-1.sbd
SUN_Migrate_Import-2.sbd
SUN_Migrate_Import.msf
SUN_Migrate_Import.sbd
tech-target-list.msf
tech-target-lists
tech-target-lists.msf
Templates-2.msf
Templates.msf
Trash-1.sbd
Trash.msf
Trash.sbd
unit-caps-cns.msf
wac
wac.msf

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2024
March 2024
December 2023
September 2023
August 2023
June 2023
February 2023
January 2023
December 2022
November 2022
October 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
April 2004

ATOM RSS1 RSS2



LISTSERV.FNAL.GOV

CataList Email List Search Powered by the LISTSERV Email List Manager