LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

April 2016

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS April 2016

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	firefox misreading charset of groff-produced files
From:	Steve Talbott <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Tue, 26 Apr 2016 16:19:17 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (52 lines)

Greetings, all —



I maintain a substantial collection of web files, editing them with vim on

my local system, viewing them in a browser, then uploading the finished

version to the web.  Everything is manual, and the html encoding is simple

and non-interactive.  I use groff to generate my html output — employing a

macro package that I’ve been maintaining for many years.  (I don’t use the

groff “html” device type, but rather -Tutf8.)  File editing is with vi.



The problem: Upon switching just now from SL 6.2 to 7.2 I find firefox

misreading the line,



    <meta http‐equiv="Content‐Type" content="text/html; charset=utf‐8">



and therefore presenting a very messed-up page interpreted as “Western”

rather than “utf-8”.  Every time I reload a page I am working on (which

can be very frequently), I have to reset firefox’s understanding of the

encoding.  When I ask firefox to show the page source as initially

construed, it displays this:



   <meta http[GARBAGE]equiv="Content[GARBAGE]Type" content="text/html;

   charset=utf[GARBAGE]8">



It turns out that groff insists, against every urging, upon outputting

hyphens as unicode U+2010 hyphens rather than as U+0045 hyphen-minuses.

In researching the issue on the web, I found at least one reference to

this as a “bug”.  I do not understand why firefox cannot recognize these

pages as utf-8 encoded and accept the U+2010 hyphen for what it is.  It

would be nice to know, but that is not my main question.



I have gotten everything to work by globally changing all hyphens in groff

output to U+0045.  But the more general problem is that the current groff

wants all sorts of special character encodings (for example, for

punctuation, em-dashes, and so on) and does not just pass utf-8-encoded

text for these characters through to its output.  It begins to make a real

mess of one’s source files — and also creates difficulties in text

searches on these files.  It also complicates text-processing scripts.



My question: is there a way to work more straightforwardly with groff

while staying in the utf-8 world?  Or is groff for some reason just

migrating away from the future?



(As you have probably realized by now, I am not a sysadmin type, and am

rather out of my depth here.  Any advice will be appreciated.)



Steve

------------------------------------------------------------------------

Stephen L. Talbott                               BiologyWorthyofLife.org

Senior Researcher, The Nature Institute:             natureinstitute.org

NetFuture editor:                                          netfuture.org

Mailing address: 20 May Hill Road, Ghent NY 12075      Tel: 518-672-5049

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV