As the title says, fmtroff is another version of the old
fmt(1) that you'll find in most unix-like
systems. This version, apart from improving some features
present in other fmt versions also brings an innovation to
make it easier (and more reliable) to work with roff files
(to edit my novels I used groff, the GNU
version.)
Download (fmtroff.c)
Tested in OpenBSD and Linux. I hope you'll find it
useful.
Changelog
- Set, 1, 2025. I had missed detecting the
preceding space in the third abbreviations detection case
(is_initial()).
-
Aug 20, 2025. I changed the following syntax:
#define PERIOD 0x2e
#define QUESTION 0x3f
[...]
For:
#define PERIOD L'.'
#define QUESTION L'?'
[...]
Ingo Schwarze, member of the OpenBSD team, made me notice that a
constant defined as 0x2e is interpreted as type
int (see C11 6.4.4.1, Integer
constant). I was casting int as
wchar_t what didn't make sense. The proper
notation to define that data as character consants is L'.'
(see C11 6.4.4.4, Character constant).
-
Aug 19, 2025. I restored the change of Aug
17. It wasn't about a problem with the compilers but a
mistake of my part. I overlooked to to null-terminate the
constant arrays. By definition, if the array doesn't end in
a NULL character is not a string! Basic concepts
that our human brain overlooks and the compiler doesn't
forgive. Before:
const wchar_t END_OF_SENTENCE[] = { PERIOD, QUESTION, EXCLAM,
ELLIPSIS };
Now:
const wchar_t END_OF_SENTENCE[] = { PERIOD, QUESTION, EXCLAM,
ELLIPSIS, '\0' };
This fed garbage to functions as wcsspn() which returned
wrong values. Thanks to Otto Moerbeek from the OpenBSD team
for pointing this out to me.
- Aug 18, 2025. Unfortunately, I had to revert
the previous change. The change introduced a bug under
OpenBSD that is not reproducible under Linux. In the entry
from June 5, I mentioned that compiling in Linux resulted in different
behavior for the executable compared to doing so in
OpenBSD. Today, I installed GCC (GNU cc) on OpenBSD and
confirmed that GCC interprets the code differently than Clang
does.
- Aug 17, 2025. I moved all the abbreviations
detecting staff to a separate function to order and clean a bit the
code.
- Aug 10, 2025. Reverted change that generated a
memory error under Linux.
- Aug 8, 2025. More improvements in abbreviations
recognition.
- Aug 2, 2025. Recognize abbreviations even if
they are enclosed in quotation marks or parentheses.
- Aug 2, 2025. Minor change in abbreviation
recognition. Not only a capital letter, but a lowercase
letter preceded by a space (or a quote character) and followed by a
period is also likely an abbreviation.
- June 5, 2025. Lately, using Linux, I found two
bugs not reproducible under OpenBSD:
- In the cquote_count() function (formerly
left_quote()), not limiting the countdown loop caused a
segmentation fault when a quotation character followed by spaces was
encountered at the beginning of the line.
- I noticed that on Linux, for the wcsspn() function to
return the expected values, the arrays used as arguments have to be
declared as constants (in this case end_of_sentence[],
oquote[] and cquote[].) To achieve the
same effect on OpenBSD, they have to be declared as static (as they
were.) I guess this is due to differences between how the
compilers gcc and clang interpret the code.
- June 5, 2025. I renamed the
left_quote() function to something more appropriate:
cquote_count(). I also simplified the names of the
cl_quote[] and op_quote[] arrays, now
cquote[] and oquote[] respectively.
- Jan 28, 2025. Removed two abreviations I'd
added by mistake.
- Jan 6, 2025. In "troff" mode, in addition to
lines starting with a period or apostrophe, also ignore those starting
with a backslash.
- Oct 26, 2024. Recognize backslash
'\' as begin-of-sentence character. This allows the use roff
and LaTeX tags at the beginning of a sentence (eg \fI or
\emph{).
- Oct 1, 2024. I decided to revert the changes
made on August 10th and 22nd since they complicated the code just to
cover isolated cases.
- Set 26, 2024. Yesterday I noticed that when
running fmtroff on Chinese text many characters were
deleted, shortening the string. This was happening on Linux,
as I normally use OpenBSD (and I don't usually write in Chinese either
;-)) I hadn't noticed it. After some research, I realized
that the cause was in collapse_whitespace(), where I was
using isblank() instead of
iswblank(). I thought that in this case it was
not necessary to use the wide-character version of the function, but I
was wrong.
- Ago 22, 2024. Before saving the variable
explained in the previous item, check that the word contains only
letters to make sure that it is a person's name.
- Ago 10, 2024. Save to a variable if the present
word is capitalized to recognize if the next is an initialism.
- May 9, 2024. I simplified the code a bit.
- May 7, 2024. Added the ‘-m’ option,
when used fmtroff skips mail headers and quoted text (I
added this option just to be consistent with the original use of
fmt(1)).
- May 7, 2024. Changed the option ‘-t’
to ‘-b’ and the option ‘-l’ to ‘-o’
for compatibility with OpenBSD fmt(1).
- May 6, 2024. Now fmtroff skips
nested code (tbl(1), pic(1), eqn(1).)
Thanks to Victor <vico at tuta dot io> for pointing it
out to me.
- Nov 7, 2023 (bug). Fixed an error created with
the last changes.
- Oct 8, 2023. Now I rewrote the code again this
time using libc and wide char functions.
- Oct 1, 2023. I changed VLA arrays by pointers
in most functions.
- Sep 8, 2023. Except for ‘new line’ and ‘tab’
(\n and \f), fmtroff strips all control characters. Today,
reading an interesting thread in groff mailing list, I learned that,
with roff you can use the ‘leader’ or ‘SOH’ character in tables and
indices. So, I modified the code to, in troff mode, fmtroff
let pass that character.
- Ago 24, 2023 (bug). Today I discovered that
it's necessary to allocate memory before entering the loop that reads
the file (or input string). Without this, calling fmtroff
from a text editor (nvi or vim) on an empty file produces a
segfault.
- Jun 28, 2023. I improved a bit the "recognize
and skip some initialisms" conditional, now it ignores leading quotes
and brackets and takes also in care non ASCII uppercase letters.
- Apr 25, 2023 (bug). Years passed without using
iso-latin, I didn't realize until yesterday, while using OpenBSD
wscons, that iso-latin characters made fmtroff hang. It took
a minimal change in a conditional to fix the bug, this means that now
the utf-8 limitation is only applicable to multibyte characters.
- Apr 6, 2023 (bug). When in troff mode or when
the ‘-n’ option is not used, treat lines starting with
‘'’ (single quotes) in the same way as those starting with a
period, as the former can also be troff macro lines.
- Apr 5, 2023. Don't add extra space after a
ASCII ellipsis when it opens the line.
- Mar 28, 2023. New option ‘-l’ to
allow sentences begin with lower case (useful with man pages where to
begin a sentence with a command name is recurrent.
- Mar 25, 2023 (bug). Ignore a word when its
number of characters is grater than the established column width.