SCIENTIFIC-LINUX-USERS Archives

April 2016

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Steve Talbott <[log in to unmask]>
Reply To:
Date:
Tue, 26 Apr 2016 16:19:17 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (52 lines)
Greetings, all —

I maintain a substantial collection of web files, editing them with vim on
my local system, viewing them in a browser, then uploading the finished
version to the web. Everything is manual, and the html encoding is simple
and non-interactive. I use groff to generate my html output — employing a
macro package that I’ve been maintaining for many years. (I don’t use the
groff “html” device type, but rather -Tutf8.) File editing is with vi.

The problem: Upon switching just now from SL 6.2 to 7.2 I find firefox
misreading the line,

    <meta http‐equiv="Content‐Type" content="text/html; charset=utf‐8">

and therefore presenting a very messed-up page interpreted as “Western”
rather than “utf-8”. Every time I reload a page I am working on (which
can be very frequently), I have to reset firefox’s understanding of the
encoding. When I ask firefox to show the page source as initially
construed, it displays this:

   <meta http[GARBAGE]equiv="Content[GARBAGE]Type" content="text/html;
   charset=utf[GARBAGE]8">

It turns out that groff insists, against every urging, upon outputting
hyphens as unicode U+2010 hyphens rather than as U+0045 hyphen-minuses.
In researching the issue on the web, I found at least one reference to
this as a “bug”. I do not understand why firefox cannot recognize these
pages as utf-8 encoded and accept the U+2010 hyphen for what it is. It
would be nice to know, but that is not my main question.

I have gotten everything to work by globally changing all hyphens in groff
output to U+0045. But the more general problem is that the current groff
wants all sorts of special character encodings (for example, for
punctuation, em-dashes, and so on) and does not just pass utf-8-encoded
text for these characters through to its output. It begins to make a real
mess of one’s source files — and also creates difficulties in text
searches on these files. It also complicates text-processing scripts.

My question: is there a way to work more straightforwardly with groff
while staying in the utf-8 world? Or is groff for some reason just
migrating away from the future?

(As you have probably realized by now, I am not a sysadmin type, and am
rather out of my depth here. Any advice will be appreciated.)

Steve
------------------------------------------------------------------------
Stephen L. Talbott BiologyWorthyofLife.org
Senior Researcher, The Nature Institute: natureinstitute.org
NetFuture editor: netfuture.org
Mailing address: 20 May Hill Road, Ghent NY 12075 Tel: 518-672-5049

ATOM RSS1 RSS2