LISTSERV - SCIENTIFIC-LINUX-USERS Archives

SCIENTIFIC-LINUX-USERS Archives

April 2016

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

	LISTSERV Archives
	SCIENTIFIC-LINUX-USERS Home
	SCIENTIFIC-LINUX-USERS April 2016

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	firefox misreading charset of groff-produced files
From:	Steve Talbott <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Tue, 26 Apr 2016 16:19:17 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (52 lines)

Greetings, all —

I maintain a substantial collection of web files, editing them with vim on
my local system, viewing them in a browser, then uploading the finished
version to the web.  Everything is manual, and the html encoding is simple
and non-interactive.  I use groff to generate my html output — employing a
macro package that I’ve been maintaining for many years.  (I don’t use the
groff “html” device type, but rather -Tutf8.)  File editing is with vi.

The problem: Upon switching just now from SL 6.2 to 7.2 I find firefox
misreading the line,

    <meta http‐equiv="Content‐Type" content="text/html; charset=utf‐8">

and therefore presenting a very messed-up page interpreted as “Western”
rather than “utf-8”.  Every time I reload a page I am working on (which
can be very frequently), I have to reset firefox’s understanding of the
encoding.  When I ask firefox to show the page source as initially
construed, it displays this:

   <meta http[GARBAGE]equiv="Content[GARBAGE]Type" content="text/html;
   charset=utf[GARBAGE]8">

It turns out that groff insists, against every urging, upon outputting
hyphens as unicode U+2010 hyphens rather than as U+0045 hyphen-minuses.
In researching the issue on the web, I found at least one reference to
this as a “bug”.  I do not understand why firefox cannot recognize these
pages as utf-8 encoded and accept the U+2010 hyphen for what it is.  It
would be nice to know, but that is not my main question.

I have gotten everything to work by globally changing all hyphens in groff
output to U+0045.  But the more general problem is that the current groff
wants all sorts of special character encodings (for example, for
punctuation, em-dashes, and so on) and does not just pass utf-8-encoded
text for these characters through to its output.  It begins to make a real
mess of one’s source files — and also creates difficulties in text
searches on these files.  It also complicates text-processing scripts.

My question: is there a way to work more straightforwardly with groff
while staying in the utf-8 world?  Or is groff for some reason just
migrating away from the future?

(As you have probably realized by now, I am not a sysadmin type, and am
rather out of my depth here.  Any advice will be appreciated.)

Steve
------------------------------------------------------------------------
Stephen L. Talbott                               BiologyWorthyofLife.org
Senior Researcher, The Nature Institute:             natureinstitute.org
NetFuture editor:                                          netfuture.org
Mailing address: 20 May Hill Road, Ghent NY 12075      Tel: 518-672-5049

ATOM RSS1 RSS2

LISTSERV.FNAL.GOV