% This program is by Tomas Rokicki.  Various routines were borrowed from
% GFtoPXL by Arthur Samuel.

% Version 0.0 (development) started 20 July 1985 TGR.
% Version 1.0 finished 29 July 1985 TGR.
% Version 1.1 fixed 0 width problem 11 July 1988 TGR.
\def\versiondate{11 July 1988}

% Here is TeX material that gets inserted after \input webmac
\def\hang{\hangindent 3em\noindent\ignorespaces}
\def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces}
\font\ninerm=cmr9
\let\mc=\ninerm % medium caps for names like SAIL
\font\tenss=cmss10 % for `The METAFONTbook'
\def\PASCAL{Pascal}
\def\ph{{\mc PASCAL-H}}
\font\logo=logo10 % font used for the METAFONT logo
\def\MF{{\logo META}\-{\logo FONT}}
\def\<#1>{$\langle#1\rangle$}
\def\section{\mathhexbox278}
\let\swap=\leftrightarrow
\def\round{\mathop{\rm round}\nolimits}
\def\(#1){} % this is used to make section names sort themselves better
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>

\def\title{GFread}
\def\topofcontents{\null
  \def\titlepage{F} % include headline on the contents page
  \def\rheader{\mainfont\hfil \contentspagenumber}
  \vfill
  \centerline{\titlefont The {\ttitlefont GFread} processor}
  \vskip 15pt
  \centerline{(Version 1.1, \versiondate)}
  \vfill}
\def\botofcontents{\vfill
  \centerline{\hsize 5in\baselineskip9pt
    \vbox{\ninerm\noindent
    The preparation of this report
    was supported in part by the National Science
    Foundation under grants IST-8201926 and MCS-8300984,
    and by the System Development Foundation. `\TeX' is a
    trademark of the American Mathematical Society.}}}
\pageno=\contentspagenumber \advance\pageno by 1

@* Introduction.
This program is intended to be a model of generic-font (``\.{GF}'') file
reading software, especially for drivers that currently read pixel
(``\.{PXL}'') files.  It is intended that the relevant parts of this
program might be copied verbatim into an existing driver such that only
a modicum of glue code need be added to make the driver read \.{GF} files
rather than \.{PXL} files.  Since \.{PXL} files are being replaced by
\.{GF} files, largely due to the lack of some relevant information in the
\.{PXL} files, there is a large number of drivers which must be updated.
This code should ease the transition.

As this software is intended to be a part of production drivers, every effort
has been made to make this software both general enough for easy incorporation
into existing programs, and efficient enough for common use.  To this extent,
the individual characters are never reduced to absolute individual pixels,
but rather strings of pixels are calculated at one time.  Code is included
to skip individual character definitions (for unused characters, for instance)
and the data structures used in this program are hidden by macros, which can
be easily redefined for a particular environment.

For the lack of a better task, the program currently produces a pixel file as
its output.  It is relatively simple to modify the program to create the pixel
file entirely in memory, thereby making it appear that a pixel file was loaded
instead of a \.{GF} file.  Care must be taken with this approach, however, to
insure that the pixel width of each character is taken from the \.{GF} file,
rather than rounded from the \.{PXL} file as is typically done.  The \.{TFM}
widths must still be scaled exactly as they are from the \.{PXL} file.

The |banner| string defined here should be changed whenever \.{GFread}
gets modified.

@d banner=='This is GFread, Version 1.1' {printed when the program starts}

@ Some of the code below is intended to be used only during debugging aspects
of this program, and possibly its integration with various drivers.
Such code will not normally be compiled; it is
delimited by the codewords `$|debug|\ldots|gubed|$', with apologies
to people who wish to preserve the purity of English.   For even more
output, the codewords |eebug| and |gubee| can be used.
@^debugging@>

@d debug==@{
@d gubed==@t@>@}@/
@f debug==begin
@f gubed==end
@d eebug==@{
@d gubee==@t@>@}@/
@f eebug==begin
@f gubee==end

@ This program is written in standard \PASCAL, except where it is
necessary to use extensions; for example, one extension is to use a
default |case| as in \.{TANGLE}, \.{WEAVE}, etc.  All places where
nonstandard constructions are used should be listed in the index under
``system dependencies.''
@!@^system dependencies@>

@d othercases == others: {default for cases not listed explicitly}
@d endcases == @+end {follows the default case in an extended |case| statement}
@f othercases == else
@f endcases == end

@ The binary input comes from |gf_file|, and the output font is written
on |pxl_file|.  All text output is written on \PASCAL's standard |output|
file.  The term |print| is used instead of |write| when this program writes
on |output|, so that all such output could easily be redirected if desired.

@d print(#)==write(#)
@d print_ln(#)==write_ln(#)
@d print_nl==write_ln

@p program GFread(@!gf_file,@!pxl_file,@!output);
label @<Labels in the outer block@>@/
const @<Constants in the outer block@>@/
type @<Types in the outer block@>@/
var @<Globals in the outer block@>@/
procedure initialize; {this procedure gets things started properly}
  var i:integer; {loop index for initializations}
  begin print_ln(banner);@/
  @<Set initial values@>@/
  end;

@ If the program has to stop prematurely, it goes to the
`|final_end|'.

@d final_end=9999 {label for the end of it all}

@<Labels...@>=final_end;

@ The following parameters can be changed at compile time to extend or
reduce \.{GFread}'s capacity.  Actually, as the data structures will be
modified to fit a particular driver, and |line_length| and
|terminal_line_length| are standard \.{WEB} macros, they will probably be
eliminated altogether when integrated with a driver.

@<Constants...@>=
@!line_length=79; {bracketed lines of output will be at most this long}
@!terminal_line_length=150; {maximum number of characters input in a single
  line of input from the terminal}
@!mem_max=4000; {largest index in the main |mem| array}

@ Here are some macros for common programming idioms.

@d incr(#) == #:=#+1 {increase a variable by unity}
@d decr(#) == #:=#-1 {decrease a variable by unity}

@ If the \.{GF} file is badly malformed, the whole process must be aborted;
\.{GFread} will give up, after issuing an error message about the symptoms
that were noticed.

Such errors might be discovered inside of subroutines inside of subroutines,
so a procedure called |jump_out| has been introduced. This procedure, which
simply transfers control to the label |final_end| at the end of the program,
contains the only non-local |goto| statement in \.{GFread}.
@^system dependencies@>

@d abort(#)==begin print(' ',#); jump_out;
    end
@d bad_gf(#)==abort('Bad GF file: ',#,'!')
@.Bad GF file@>

@p procedure jump_out;
begin goto final_end;
end;

@* Generic font file format.
The most important output produced by a typical run of \MF\ is the
``generic font'' (\.{GF}) file that specifies the bit patterns of the
characters that have been drawn. The term {\sl generic\/} indicates that
this file format doesn't match the conventions of any name-brand manufacturer;
but it is easy to convert \.{GF} files to the special format required by
almost all digital phototypesetting equipment. There's a strong analogy
between the \.{DVI} files written by \TeX\ and the \.{GF} files written
by \MF; and, in fact, the file formats have a lot in common.

A \.{GF} file is a stream of 8-bit bytes that may be
regarded as a series of commands in a machine-like language. The first
byte of each command is the operation code, and this code is followed by
zero or more bytes that provide parameters to the command. The parameters
themselves may consist of several consecutive bytes; for example, the
`|boc|' (beginning of character) command has six parameters, each of
which is four bytes long. Parameters are usually regarded as nonnegative
integers; but four-byte-long parameters can be either positive or
negative, hence they range in value from $-2^{31}$ to $2^{31}-1$.
As in \.{TFM} files, numbers that occupy
more than one byte position appear in BigEndian order,
and negative numbers appear in two's complement notation.

A \.{GF} file consists of a ``preamble,'' followed by a sequence of one or
more ``characters,'' followed by a ``postamble.'' The preamble is simply a
|pre| command, with its parameters that introduce the file; this must come
first.  Each ``character'' consists of a |boc| command, followed by any
number of other commands that specify ``black'' pixels,
followed by an |eoc| command. The characters appear in the order that \MF\
generated them. If we ignore no-op commands (which are allowed between any
two commands in the file), each |eoc| command is immediately followed by a
|boc| command, or by a |post| command; in the latter case, there are no
more characters in the file, and the remaining bytes form the postamble.
Further details about the postamble will be explained later.

Some parameters in \.{GF} commands are ``pointers.'' These are four-byte
quantities that give the location number of some other byte in the file;
the first file byte is number~0, then comes number~1, and so on.

@ The \.{GF} format is intended to be both compact and easily interpreted
by a machine. Compactness is achieved by making most of the information
relative instead of absolute. When a \.{GF}-reading program reads the
commands for a character, it keeps track of two quantities: (a)~the current
column number,~|m|; and (b)~the current row number,~|n|.  These are 32-bit
signed integers, although most actual font formats produced from \.{GF}
files will need to curtail this vast range because of practical
limitations. (\MF\ output will never allow $\vert m\vert$ or $\vert
n\vert$ to get extremely large, but the \.{GF} format tries to be more
general.)

How do \.{GF}'s row and column numbers correspond to the conventions
of \TeX\ and \MF? Well, the ``reference point'' of a character, in \TeX's
view, is considered to be at the lower left corner of the pixel in row~0
and column~0. This point is the intersection of the baseline with the left
edge of the type; it corresponds to location $(0,0)$ in \MF\ programs.
Thus the pixel in \.{GF} row~0 and column~0 is \MF's unit square, comprising
the region of the plane whose coordinates both lie between 0 and~1. The
pixel in \.{GF} row~|n| and column~|m| consists of the points whose \MF\
coordinates |(x,y)| satisfy |m<=x<=m+1| and |n<=y<=n+1|.  Negative values of
|m| and~|x| correspond to columns of pixels {\sl left\/} of the reference
point; negative values of |n| and~|y| correspond to rows of pixels {\sl
below\/} the baseline.

Besides |m| and |n|, there's also a third aspect of the current
state, namely the @!|paint_switch|, which is always either \\{black} or
\\{white}. Each \\{paint} command advances |m| by a specified amount~|d|,
and blackens the intervening pixels if |paint_switch=black|; then
the |paint_switch| changes to the opposite state. \.{GF}'s commands are
designed so that |m| will never decrease within a row, and |n| will never
increase within a character; hence there is no way to whiten a pixel that
has been blackened.

@ Here is a list of all the commands that may appear in a \.{GF} file. Each
command is specified by its symbolic name (e.g., |boc|), its opcode byte
(e.g., 67), and its parameters (if any). The parameters are followed
by a bracketed number telling how many bytes they occupy; for example,
`|d[2]|' means that parameter |d| is two bytes long.

\yskip\hang|paint_0| 0. This is a \\{paint} command with |d=0|; it does
nothing but change the |paint_switch| from \\{black} to \\{white} or
vice~versa.

\yskip\hang\\{paint\_1} through \\{paint\_63} (opcodes 1 to 63).
These are \\{paint} commands with |d=1| to~63, defined as follows: If
|paint_switch=black|, blacken |d|~pixels of the current row~|n|,
in columns |m| through |m+d-1| inclusive. Then, in any case,
complement the |paint_switch| and advance |m| by~|d|.

\yskip\hang|paint1| 64 |d[1]|. This is a \\{paint} command with a specified
value of~|d|; \MF\ uses it to paint when |64<=d<256|.

\yskip\hang|@!paint2| 65 |d[2]|. Same as |paint1|, but |d|~can be as high
as~65535.

\yskip\hang|@!paint3| 66 |d[3]|. Same as |paint1|, but |d|~can be as high
as $2^{24}-1$. \MF\ never needs this command, and it is hard to imagine
anybody making practical use of it; surely a more compact encoding will be
desirable when characters can be this large. But the command is there,
anyway, just in case.

\yskip\hang|boc| 67 |c[4]| |p[4]| |min_m[4]| |max_m[4]| |min_n[4]|
|max_n[4]|. Beginning of a character:  Here |c| is the character code, and
|p| points to the previous character beginning (if any) for characters having
this code number modulo 256.  (The pointer |p| is |-1| if there was no
prior character with an equivalent code.) The values of registers |m| and |n|
defined by the instructions that follow for this character must
satisfy |min_m<=m<=max_m| and |min_n<=n<=max_n|.  (The values of |max_m| and
|min_n| need not be the tightest bounds possible.)  When a \.{GF}-reading
program sees a |boc|, it can use |min_m|, |max_m|, |min_n|, and |max_n| to
initialize the bounds of an array. Then it sets |m:=min_m|, |n:=max_n|, and
|paint_switch:=white|.

\yskip\hang|boc1| 68 |c[1]| |@!del_m[1]| |max_m[1]| |@!del_n[1]| |max_n[1]|.
Same as |boc|, but |p| is assumed to be~$-1$; also |del_m=max_m-min_m|
and |del_n=max_n-min_n| are given instead of |min_m| and |min_n|.
The one-byte parameters must be between 0 and 255, inclusive.
\ (This abbreviated |boc| saves 19~bytes per character, in common cases.)

\yskip\hang|eoc| 69. End of character: All pixels blackened so far
constitute the pattern for this character. In particular, a completely
blank character might have |eoc| immediately following |boc|.

\yskip\hang|skip0| 70. Decrease |n| by 1 and set |m:=min_m|,
|paint_switch:=white|. \ (This finishes one row and begins another,
ready to whiten the leftmost pixel in the new row.)

\yskip\hang|skip1| 71 |d[1]|. Decrease |n| by |d+1|, set |m:=min_m|, and set
|paint_switch:=white|. This is a way to produce |d| all-white rows.

\yskip\hang|@!skip2| 72 |d[2]|. Same as |skip1|, but |d| can be as large
as 65535.

\yskip\hang|@!skip3| 73 |d[3]|. Same as |skip1|, but |d| can be as large
as $2^{24}-1$. \MF\ obviously never needs this command.

\yskip\hang|new_row_0| 74. Decrease |n| by 1 and set |m:=min_m|,
|paint_switch:=black|. \ (This finishes one row and begins another,
ready to {\sl blacken\/} the leftmost pixel in the new row.)

\yskip\hang|@!new_row_1| through |@!new_row_164| (opcodes 75 to 238). Same as
|new_row_0|, but with |m:=min_m+1| through |min_m+164|, respectively.

\yskip\hang|xxx1| 239 |k[1]| |x[k]|. This command is undefined in
general; it functions as a $(k+2)$-byte |no_op| unless special \.{GF}-reading
programs are being used. \MF\ generates \\{xxx} commands when encountering
a \&{special} string; this occurs in the \.{GF} file only between
characters, after the preamble, and before the postamble. However,
\\{xxx} commands might appear anywhere in \.{GF} files generated by other
processors. It is recommended that |x| be a string having the form of a
keyword followed by possible parameters relevant to that keyword.

\yskip\hang|@!xxx2| 240 |k[2]| |x[k]|. Like |xxx1|, but |0<=k<65536|.

\yskip\hang|xxx3| 241 |k[3]| |x[k]|. Like |xxx1|, but |0<=k<@t$2^{24}$@>|.
\MF\ uses this when sending a \&{special} string whose length exceeds~255.

\yskip\hang|@!xxx4| 242 |k[4]| |x[k]|. Like |xxx1|, but |k| can be
ridiculously large; |k| mustn't be negative.

\yskip\hang|yyy| 243 |y[4]|. This command is undefined in general;
it functions as a 5-byte |no_op| unless special \.{GF}-reading programs
are being used. \MF\ puts |scaled| numbers into |yyy|'s, as a
result of \&{numspecial} commands; the intent is to provide numeric
parameters to \\{xxx} commands that immediately precede.

\yskip\hang|no_op| 244. No operation, do nothing. Any number of |no_op|'s
may occur between \.{GF} commands, but a |no_op| cannot be inserted between
a command and its parameters or between two parameters.

\yskip\hang|char_loc| 245 |c[1]| |dx[4]| |dy[4]| |w[4]| |p[4]|.
This command will appear only in the postamble, which will be explained
shortly.

\yskip\hang|@!char_loc0| 246 |c[1]| |@!dm[1]| |w[4]| |p[4]|.
Same as |char_loc|, except that |dy| is assumed to be zero, and the value
of~|dx| is taken to be |65536*dm|, where |0<=dm<256|.

\yskip\hang|pre| 247 |i[1]| |k[1]| |x[k]|.
Beginning of the preamble; this must come at the very beginning of the
file. Parameter |i| is an identifying number for \.{GF} format, currently
131. The other information is merely commentary; it is not given
special interpretation like \\{xxx} commands are. (Note that \\{xxx}
commands may immediately follow the preamble, before the first |boc|.)

\yskip\hang|post| 248. Beginning of the postamble, see below.

\yskip\hang|post_post| 249. Ending of the postamble, see below.

\yskip\noindent Commands 250--255 are undefined at the present time.

@d gf_id_byte=131 {identifies the kind of \.{GF} files described here}

@ Here are the opcodes that \.{GFread} actually refers to.

@d paint_0=0 {beginning of the \\{paint} commands}
@d paint1=64 {move right a given number of columns, then
  black${}\swap{}$white}
@d boc=67 {beginning of a character}
@d boc1=68 {abbreviated |boc|}
@d eoc=69 {end of a character}
@d skip0=70 {skip no blank rows}
@d skip1=71 {skip over blank rows}
@d new_row_0=74 {move down one row and then right}
@d max_new_row=238 {move down one row and then right}
@d no_op=247 {noop}
@d xxx1=239 {for \&{special} strings}
@d yyy=243 {for \&{numspecial} numbers}
@d nop=244 {no operation}
@d char_loc=245 {character locators in the postamble}
@d char_loc0=246 {character locators in the postamble}
@d pre=247 {preamble}
@d post=248 {postamble beginning}
@d post_post=249 {postamble ending}
@d undefined_commands==250,251,252,253,254,255

@ The last character in a \.{GF} file is followed by `|post|'; this command
introduces the postamble, which summarizes important facts that \MF\ has
accumulated. The postamble has the form
$$\vbox{\halign{\hbox{#\hfil}\cr
  |post| |p[4]| |@!ds[4]| |@!cs[4]| |@!hppp[4]| |@!vppp[4]|
   |@!min_m[4]| |@!max_m[4]| |@!min_n[4]| |@!max_n[4]|\cr
  $\langle\,$character locators$\,\rangle$\cr
  |post_post| |q[4]| |i[1]| 223's$[{\G}4]$\cr}}$$
Here |p| is a pointer to the byte following the final |eoc| in the file
(or to the byte following the preamble, if there are no characters);
it can be used to locate the beginning of \\{xxx} commands
that might have preceded the postamble. The |ds| and |cs| parameters
@^design size@> @^check sum@>
give the design size and check sum, respectively, which are exactly the
values put into the header of any \.{TFM} file that shares information with
this \.{GF} file. Parameters |hppp| and |vppp| are the ratios of
pixels per point, horizontally and vertically, expressed as |scaled| integers
(i.e., multiplied by $2^{16}$); they can be used to correlate the font
with specific device resolutions, magnifications, and ``at sizes.''  Then
come |min_m|, |max_m|, |min_n|, and |max_n|, which bound the values that
registers |m| and~|n| assume in all characters in this \.{GF} file.
(These bounds need not be the best possible; |max_m| and |min_n| may, on the
other hand, be tighter than the similar bounds in |boc| commands. For
example, some character may have |min_n=-100| in its |boc|, but it might
turn out that |n| never gets lower than |-50| in any character; then
|min_n| can have any value |<=-50|. If there are no characters in the file,
it's possible to have |min_m>max_m| and/or |min_n>max_n|.)

@ Character locators are introduced by |char_loc| commands,
which specify a character residue~|c|, character escapements (|dx,dy|),
a character width~|w|, and a pointer~|p|
to the beginning of that character. (If two or more characters have the
same code~|c| modulo 256, only the last will be indicated; the others can be
located by following backpointers. Characters whose codes differ by a
multiple of 256 are assumed to share the same font metric information,
hence the \.{TFM} file contains only residues of character codes modulo~256.
This convention is intended for oriental languages, when there are many
character shapes but few distinct widths.)
@^oriental characters@>@^Chinese characters@>@^Japanese characters@>

The character escapements (|dx,dy|) are the values of \MF's \&{chardx}
and \&{chardy} parameters; they are in units of |scaled| pixels;
i.e., |dx| is in horizontal pixel units times $2^{16}$, and |dy| is in
vertical pixel units times $2^{16}$.  This is the intended amount of
displacement after typesetting the character; for \.{DVI} files, |dy|
should be zero, but other document file formats allow nonzero vertical
escapement.

The character width~|w| duplicates the information in the \.{TFM} file; it
is $2^{24}$ times the ratio of the true width to the font's design size.

The backpointer |p| points to the character's |boc|, or to the first of
a sequence of consecutive \\{xxx} or |yyy| or |no_op| commands that
immediately precede the |boc|, if such commands exist; such ``special''
commands essentially belong to the characters, while the special commands
after the final character belong to the postamble (i.e., to the font
as a whole). This convention about |p| applies also to the backpointers
in |boc| commands, even though it wasn't explained in the description
of~|boc|. @^backpointers@>

Pointer |p| might be |-1| if the character exists in the \.{TFM} file
but not in the \.{GF} file. This unusual situation can arise in \MF\ output
if the user had |proofing<0| when the character was being shipped out,
but then made |proofing>=0| in order to get a \.{GF} file.

@ The last part of the postamble, following the |post_post| byte that
signifies the end of the character locators, contains |q|, a pointer to the
|post| command that started the postamble.  An identification byte, |i|,
comes next; this currently equals~131, as in the preamble.

The |i| byte is followed by four or more bytes that are all equal to
the decimal number 223 (i.e., @'337 in octal). \MF\ puts out four to seven of
these trailing bytes, until the total length of the file is a multiple of
four bytes, since this works out best on machines that pack four bytes per
word; but any number of 223's is allowed, as long as there are at least four
of them. In effect, 223 is a sort of signature that is added at the very end.
@^Fuchs, David Raymond@>

This curious way to finish off a \.{GF} file makes it feasible for
\.{GF}-reading programs to find the postamble first, on most computers,
even though \MF\ wants to write the postamble last. Most operating
systems permit random access to individual words or bytes of a file, so
the \.{GF} reader can start at the end and skip backwards over the 223's
until finding the identification byte. Then it can back up four bytes, read
|q|, and move to byte |q| of the file. This byte should, of course,
contain the value 248 (|post|); now the postamble can be read, so the
\.{GF} reader can discover all the information needed for individual
characters.

Unfortunately, however, standard \PASCAL\ does not include the ability to
@^system dependencies@>
access a random position in a file, or even to determine the length of a file.
Almost all systems nowadays provide the necessary capabilities, so \.{GF}
format has been designed to work most efficiently with modern operating
systems.  However, \.{GFread} reads the \.{GF} file from front to back, as it
can get the necessary information this way in just one pass, and it simplifies
the code somewhat.

@* Pixel file format.
A \.{PXL} file is an expanded raster description of a single font at a
particular resolution and contains essentially the same information as
that contained in a \.{GF} file.  \.{PXL} files are used by many existing
device-driver programs for dot matrix devices. By convention, \.{PXL} files
are for 200 pixels per inch. \.{GFread} will report the magnification
over the design point size that will occur if the \.{PXL} file is
used on a 200 pixel per inch output device, and include this information in
the name of the pixel file.  For instance, a pixel file at magstep one for
a 300 dot per inch device would be named \.{foo.1800pxl}.

All words in a \.{PXL} files are in 32-bit format, with the four lower
bits zero on 36-bit machines. The raster information is contained in a
sequence of binary words which record white pixels as zeros and black
pixels as ones.

The first word of the \.{PXL} file and the last word contain the \.{PXL~ID}
which is currently equal to 1001.

This first word is followed by a sequence of raster information words
where each line of pixels in the glyphs is represented by one or more
words of binary information. The number of words used to represent each
row of pixels for any particular glyph is fixed and it is set by the value
of |max_m-min_m+1| for that particular glyph. Each white pixel is represented
by a zero and each black pixel is represented by a one in the corresponding bit
positions (the first 32 only of each word on 36-bit machines).
 The unused bit positions
toward the end of each set of words for each row of pixels are filled with
zeros. It sould be noted that this representation is more wasteful of
space than it needs to be, but it may possibly simplify the
subsequent use of the information by a device-driver program.

The font directory follows, occupying a fixed position with respect to the
end of the file (in words 517 through 6 from this end), and assigns 4
words for each of the potential 128 different glyphs that could be
contained in this particular font in the order of their ascending ascii
values (not in the order that the glyphs appear in the raster section,
which may be entirely arbitrary). This means that the first four words are
for the ascii zero glyph.  All four words reserved for any missing glyphs
are set to zero.

The first word of each glyph's directory information contains the Pixel
Width in the left half-word (the leftmost 16 bits) and the Pixel Height in
the right half-word (the next 16 bits). These dimensions are those of the
smallest bounding-box, measured in pixels, and they have nothing
necessarily to do with the width and height figures that appear in the
\.{TFM} file.  The \.{TFM} width, measured in \.{FIXes}, where 1 \.{FIX}
is $1/(2^{20})$ times the design size, is listed in the fourth word of the
glyph's directory information.

The second word of the glyph's directory information contains the offset
of the glyph's reference point from its upper-left-hand corner of the
bounding box, measured in pixels, with the X-Offset in the left half-word
and the Y-Offset in the right half-word.  These numbers may be negative,
and two's complement representation is used.  Remember that the positive x
direction means `rightward' and positive y is `downward' on the page.

The third word of a glyph's directory information contains the number of the
word in this \.{PXL} file where the Raster Description for this particular
glyph begins, measured from the first word which is numbered zero.

As mentioned earlier, the fourth word of directory information for each
glyph contains the \.{TFM} width.

The final five words in the \.{PXL} file contain information relation to
the entire file.

  The first of these five words is a checksum which should
match the checksum contained in the \.{TFM} file that \TeX\ used in
reference to this font, although, if this checksum is zero, no validity
checking will be done.

The second of these five words is an integer that is 1000 times the
magnification factor at which this font was produced.

The third word contains the design sige of the font measured in \.{FIXes}
($2^{-20}$ unmagnified points).

The fourth word contains a pointer to the first word of the font directory.

The fifth and last word of the entire file contains a duplicate of the
\.{PXL} ID as contained in the first word of the file.

@d pxl_id=1001 {current version of \.{PXL} format}

@* Input and output for binary files.
We have seen that a \.{GF} file is a sequence of 8-bit bytes. The bytes
appear physically in what is called a `|packed file of 0..255|'
in \PASCAL\ lingo.

Packing is system dependent, and many \PASCAL\ systems fail to implement
such files in a sensible way (at least, from the viewpoint of producing
good production software).  For example, some systems treat all
byte-oriented files as text, looking for end-of-line marks and such
things. Therefore some system-dependent code is often needed to deal with
binary files, even though most of the program in this section of
\.{GFread} is written in standard \PASCAL.
@^system dependencies@>

We shall stick to simple \PASCAL\ in this program, for reasons of clarity,
even if such simplicity is sometimes unrealistic.

@<Types...@>=
@!eight_bits=0..255; {unsigned one-byte quantity}
@!byte_file=packed file of eight_bits; {files that contain binary data}

@ The program deals with two binary file variables: |gf_file| is the
input file that we are translating into \.{PXL} format, to be written
on |pxl_file|.

@<Glob...@>=
@!gf_file:byte_file; {the stuff we are \.{GFread}ing}
@!pxl_file:byte_file; {the stuff we have \.{GFread}ed}

@ To prepare the |gf_file| for input, we |reset| it.

@p procedure open_gf_file; {prepares to read packed bytes in |gf_file|}
begin reset(gf_file);
end;

@ To prepare the |pxl_file| for output, we |rewrite| it.

@p procedure open_pxl_file; {prepares to write packed bytes in |pxl_file|}
begin rewrite(pxl_file);
pxl_loc := 0 ;
end;

@ |pxl_loc| contains the number of the byte about to
be written to the |pxl_file|.

@<Glob...@>=
@!pxl_loc:integer; {where we are about to write, in |pxl_file|}

@ We shall use two simple functions to read the next byte or
bytes from |gf_file|.  We either need to get an individual byte or a
set of four bytes.
@^system dependencies@>

@p function gf_byte:integer; {returns the next byte, unsigned}
var b:eight_bits;
begin if eof(gf_file) then bad_gf('Unexpected end of file!')
else  begin read(gf_file,b); gf_byte:=b;
  end;
end;
@#
function gf_signed_quad:integer; {returns the next four bytes, signed}
var a,@!b,@!c,@!d:eight_bits;
begin read(gf_file,a); read(gf_file,b); read(gf_file,c); read(gf_file,d);
if a<128 then gf_signed_quad:=((a*256+b)*256+c)*256+d
else gf_signed_quad:=(((a-256)*256+b)*256+c)*256+d;
end;

@ Most info in the |pxl_file| comes in words, but we have to write it
as halfwords occasionally.  These routines are used to do all
pixel file output, and are hidden as described later in this report, so
they can be eliminated when incorporating this program into a driver.

@d pxl_byte(#)==begin write(pxl_file,#); incr(pxl_loc); end

@p procedure pxl_halfword(@!w:integer);
begin
if w<0 then w:=w+@"10000;
pxl_byte(w div @"100);
pxl_byte(w mod @"100);
end;
@#
procedure pxl_word(@!w:integer);
begin
if w>=0 then pxl_byte(w div @"1000000)
else begin
  w:=w+@"40000000;
  w:=w+@"40000000;
  pxl_byte((w div @"1000000) + 128);
  end;
pxl_byte((w div @"10000) mod @"100);
pxl_byte((w div @"100) mod @"100);
pxl_byte(w mod @"100);
end;

@* Data structures and their camouflage.
As this program was meant to be integrated into existing software with ease,
all major data structures are hidden behind macros.  In addition, only one
major array is used, called (surprisingly) |mem|.  This is an array of 32-bit
integers, but various parts only need be sixteen-bit integers.

First, the |mem| array usage will be described, followed by a description of
how the macros may be used to incorporate this program into drivers.

As the raster information for each character is being gleaned from the \.{GF}
file, there are five pieces of information that need to be saved for the
\.{PXL} directory.  These pieces are the bit-map height and width, the
x- and y-offset, and the raster pointer.  Since pixel files only support 128
characters, we reserve the first $128*5$ or 640 elements of the |mem| array
to hold these values, and declare the macros appropriately.
The variable |gf_ch| holds the current character number we are working with.
It will be defined when we define the |load_gf_file| procedure.

@d gf_c_width==mem[gf_ch] {where we store the width of the bit map}
@d gf_c_height==mem[gf_ch+128] {where we store the height of the bit map}
@d gf_x_offset==mem[gf_ch+256] {where we store the x-offset of the bits}
@d gf_y_offset==mem[gf_ch+384] {where we store the y-offset of the bits}
@d gf_raster==mem[gf_ch+512] {where we store the raster pointers}

@ We initialize these all to zero.  Note that here the |gf_ch| character is
not used, so this needs to be rewritten if the above macros are changed.

@<Set init...@>=
for i := 0 to 639 do mem[i] := 0 ;

@ We also need a place to store the bit-map counts of each row in each
character.  This array must have enough storage for
$$1+\sum_{\hbox{rows}} 2*nwb+1$$
where |nwb| is the number of white-to-black transitions in the row, including
one for the beginning if the left-most pixel is black.  A minimum would be
about 3000 for moderately large characters.

@d row_counts(#)==mem[#+640] {where to store the row counts}
@d end_of_row==65535 {flag to indicate the end of a row}
@d end_of_char==65534 {flag to indicate the end of a character}

@ These row counts contain the number of contiguous black and white pixels.
They always start with a white count.  Each row is terminated by the
|end_of_row| flag, and the character is terminated by an |end_of_char|
flag.  An empty row has no row counts---only an |end_of_row| flag.  No zero
counts are allowed except for the first (white) count in each row.  Once
these counts are gleaned from the \.{GF} file, the actual character is
constructed.

Unfortunately, because the |min_m| and |max_m| counts can be arbitrarly
larger than the character proper, we cannot send out any raster information
until we have scanned the entire character, in order to insure that we have
the minimum bounding box in which the character will fit.

That completes our data structures.  There is only one array.

@<Glob...@>=
@!mem : array [0..mem_max] of integer ; {our working array}

@ It should be quite obvious that the above macros make this program easy
to modify for a particular machine.  The |gf_c_width|, etc. arrays
usually correspond to real arrays in the driver itself, so a simple
macro re-definition should suffice to fill these arrays.  The |row_counts|
array should be a fairly large scratch array.  For those drivers that
actually load the raster information, that array should be fine.

For instance, if there is an array called |char_rasters|, and the next free
location is |next_raster|, you might use the following simple scheme.
First, calculate the maximum number of integers that the character could
possibly take by |(max_m-min_m+1+31) div 32 * (max_n-min_n+1)|.  Then,
|row_counts| might be defined as |char_rasters[next_raster+max_char_size+#]|.

@ The output to the pixel file of the raster information is done sixteen-bits
at a time.  (Sixteen bit chunks are used in the bit manipulation portion of
the code to eliminate sign problems.)  A simple macro takes care of this:

@d send_raster_16(#)==pxl_halfword(#)

@* Plan of attack.
Our approach to the problem of turning the \.{GF} description into a raster
description is done in two steps.  The first step sets up the |row_counts|
array as described above.  During this step, we also calculate the
tightest bounds of the array.  The second step takes these row counts and
actually creates a raster description of the character from it, using
simple bit manipulations.  (This is why an area in |char_rasters| might
need to be reserved---we certainly don't want to destroy the |row_counts|
as we are building the character!)  This way, if some strange raster format
were need for a particular device, the first code for the first step,
which interprets the \.{GF} format, would remain essentially the same,
and only the second step would need to be rewritten.

The first step---that of interpreting the \.{GF} file format, is
essentially quite simple.  There are only a limited number of commands
that can occur outside a character definition in the \.{GF} file.
Therefore, we start by defining the procedure which will actually do
an entire \.{GF} font.

@p procedure load_gf_file;
var
   @!gf_ch:integer; {what character are we looking at?}
   @!i, @!j, @!k : integer ; {general purpose indices}
   @!gf_com : integer ; {current gf command}
   @<Locals to |load_gf_file|@>
begin
   open_gf_file ;
   if gf_byte <> pre then bad_gf('First byte is not preamble');
   if gf_byte <> gf_id_byte then
        bad_gf('Identification byte is incorrect');
   i := gf_byte ;
   for j := 1 to i do k := gf_byte ;
   repeat
     gf_com := gf_byte ;
     case gf_com of
        boc, boc1 : @<Interpret character@> ;
        @<Specials and |no_op| cases@> ;
        post : ; {we will actually do the work for this one later}
     othercases bad_gf('Unexpected ',gf_com:1,' command between characters') ;
     endcases ;
   until gf_com = post ;
   @<Interpret postamble@> ;
end ;

@ We need a few easy macros to expand some case statements:

@d four_cases(#)==#,#+1,#+2,#+3
@d sixteen_cases(#)==four_cases(#),four_cases(#+4),four_cases(#+8),
         four_cases(#+12)
@d sixty_four_cases(#)==sixteen_cases(#),sixteen_cases(#+16),
         sixteen_cases(#+32),sixteen_cases(#+48)
@d one_sixty_five_cases(#)==sixty_four_cases(#),sixty_four_cases(#+64),
         sixteen_cases(#+128),sixteen_cases(#+144),four_cases(#+160),#+164

@ That is certainly simple enough!  In this programs, all special commands
and the |no_op| are ignored, so we write some code to skip over these:

@<Specials and |no_op| cases@>=
four_cases(xxx1) : begin
   i := 0 ; for j := 0 to gf_com - xxx1 do i := i * 256 + gf_byte ;
   for j := 1 to i do k := gf_byte ; end ;
yyy : k := gf_signed_quad ;
no_op :

@ Now we need the routine that handles the character commands.  Again,
only a subset of the gf commands are permissible inside character
definitions, so we only look for these.  Also, for the pixel files, we
only interpret characters less than 128.  For drivers, this code might
be modified to skip any characters that are not actually used.

@<Interpret character@>=
begin
  if gf_com = boc then begin
    gf_ch := gf_signed_quad ;
    i := gf_signed_quad ; {dispose of back pointer}
    min_m := gf_signed_quad ;
    max_m := gf_signed_quad ;
    min_n := gf_signed_quad ;
    max_n := gf_signed_quad ;
  end else begin
    gf_ch := gf_byte ;
    i := gf_byte ;
    max_m := gf_byte ;
    min_m := max_m - i ;
    i := gf_byte ;
    max_n := gf_byte ;
    min_n := max_n - i ;
  end ;
debug print_ln('Character ',gf_ch:1) ; gubed
  if gf_ch > 127 then {we skip the character}
    repeat
      gf_com := gf_byte ;
      case gf_com of
sixty_four_cases(paint_0), eoc, skip0, one_sixty_five_cases(new_row_0) : ;
@<Specials and |no_op| cases@> ;
paint1, skip1 : i := gf_byte ;
paint1+1, skip1+1 : begin i := gf_byte ; i := gf_byte ; end ;
paint1+2, skip1+2 : begin i := gf_byte ; i := gf_byte ; i := gf_byte ; end ;
othercases bad_gf('Unexpected ',gf_com:1,' while skipping character') ;
      endcases ;
    until gf_com = eoc
  else @<Convert character to raster form@> ;
end

@ We declare a few more locals:

@<Locals to |load_gf_file|@>=
@!min_m,@!max_m : integer ; {the maximum and minimum horizontal counters}
@!min_n,@!max_n : integer ; {the maximum and minimum vertical values}
@!rows : integer ; {the current row counter value}

@ Now we are at the beginning of a character that we need the raster for.
Before we get into the complexities of decoding the |paint|, |skip|, and
|new_row| commands, let's define a macro that will help us fill up the
|row_counts| array.  Note that we check that |rows| never exceeds |max_rows|;
|max_rows| should be set to the highest number the |rows| index can take,
and is dependent on how the program is integrated into a driver.  Instead of
calling |bad_gf| directly, as this macro is repeated eight times, we simply
set the |bad| flag true.

@d put_in_rows(#)==begin if rows > max_rows then bad := true else begin
row_counts(rows):=#; incr(rows); end ; end

@ Now we have the procedure that decodes the various commands and puts counts
into the |row_counts| array.  This would be a trivial procedure, except for
the |paint_0| command.  Because the |paint_0| command exists, it is possible
to have a sequence like |paint| 42, |paint_0|, |paint| 38, |paint_0|,
|paint_0|, |paint_0|, |paint| 33, |skip_0|.  This would be an entirely empty
row, but if we left the zeros in the |row_counts| array, it would be difficult
to recognize the row as empty.

This type of situation probably would never
occur in practice, but it is defined by the \.{GF} format, so we must be able
to handle it.  The extra code is really quite simple, just difficult to
understand; and it does not cut down the speed appreciably.  Our goal is
this: to collapse sequences like |paint| 42, |paint_0|, |paint| 32 to a single
count of 74, and to insure that the last count of a row is a black count rather
than a white count.  A buffer variable |extra|, and two state flags, |on| and
|state|, enable us to accomplish this.

The |on| variable is essentially the |paint_switch| described in the \.{GF}
description.  If it is true, then we are currently painting black pixels.
The |extra| variable holds a count that is about to be placed into the
|row_counts| array.  We hold it in this array until we get a |paint| command
of the opposite color that is greater than 0.  If we get a |paint_0| command,
then the |state| flag is turned on, indicating that the next count we receive
can be added to the |extra| variable as it is the same color.

@<Convert character to raster form@>=
begin
  max_rows := mem_max - 640 ;
  bad := false ;
  rows := 0 ;
  on := false ;
  extra := 0 ;
  state := true ;
  repeat
    gf_com := gf_byte ;
    case gf_com of
paint_0 : begin
  state := not state ;
  on := not on ;
end ;
sixty_four_cases(paint_0+1),paint1+1,paint1+2 : begin
  if gf_com < paint1 then i := gf_com - paint_0
  else begin
    i := 0 ; for j := 0 to gf_com - paint1 do i := i * 256 + gf_byte ;
  end ;
  if state then begin
    extra := extra + i ;
    state := false ;
  end else begin
    put_in_rows(extra) ;
    extra := i ;
  end ;
  on := not on ;
end ;
four_cases(skip0) : begin
  i := 0 ; for j := 1 to gf_com - skip0 do i := i * 256 + gf_byte ;
  if not on and ( extra > 0 ) then put_in_rows(extra) ;
  for j := 0 to i do put_in_rows(end_of_row) ;
  on := false ; extra := 0 ; state := true ;
end ;
one_sixty_five_cases(new_row_0) : begin
  if not on and ( extra > 0 ) then put_in_rows(extra) ;
  put_in_rows(end_of_row) ;
  on := true ; extra := gf_com - new_row_0 ; state := false ;
end ;
@<Specials and |no_op| cases@> ;
eoc : begin
  if bad then abort('Ran out of internal memory for row counts!') ;
  if not on and ( extra > 0 ) then put_in_rows(extra) ;
  if ( rows > 0 ) and ( row_counts(rows - 1) <> end_of_row) then
    put_in_rows(end_of_row) ;
  put_in_rows(end_of_char) ;
  @<Scan for bounding box and dump raster@> ;
end ;
othercases bad_gf('Unexpected ',gf_com:1,' character in character definition');
    endcases ;
  until gf_com = eoc ;
end

@ A few more locals used above and below:

@<Locals to |load_gf_file|@>=
@!on : boolean ; {indicates whether we are white or black}
@!state : boolean ; {a state variable---is the next count the same race as
   the one in the |extra| buffer?}
@!extra : integer ; {where we pool our counts}
@!bad : boolean ; {did we run out of space?}
@!max_rows : integer ; {the highest our |rows| counter can go}

@ Now we have the row counts in our |row_counts| array.  First, we determine
the minimum bounding box.  To find the real |max_n|, we look for the first
non-|end_of_row| value in the |row_counts|.  If it is an |end_of_char|,
the entire character is blank.  Otherwise, we first eliminate all of the blank
rows at the end of the character.  Next, for each remaining row, we check the
first white count for a new |min_m|, and the total length of the row
for a new |max_m|.  Note that we give a raster pointer for the character
whether or not is has any raster bits, because there might be a blank character
with a real \.{TFM} width.  The raster pointer might need to be changed
depending on how your driver is set up.

@<Scan for bounding box and dump raster@>=
i := 0 ; decr(rows) ; gf_raster := pxl_loc div 4 ;
while row_counts(i) = end_of_row do incr(i) ;
if row_counts(i) <> end_of_char then begin
  max_n := max_n - i ;
  while row_counts(rows - 2) = end_of_row do begin
    decr(rows) ;  row_counts(rows) := end_of_char ;
  end ;
  min_n := max_n + 1 ;
  extra := max_m - min_m + 1 ;
  max_m := 0 ;
  j := i ;
  while row_counts(j) <> end_of_char do begin
    decr(min_n) ;
    if row_counts(j) <> end_of_row then begin
      k := row_counts(j) ;
      if k < extra then extra := k ;
      incr(j) ;
      while row_counts(j) <> end_of_row do begin
        k := k + row_counts(j) ; incr(j) ;
      end ;
      if max_m < k then max_m := k ;
    end ;
    incr(j) ;
  end ;
  min_m := min_m + extra ;
  max_m := min_m + max_m - 1 ;
  gf_c_height := max_n - min_n + 1 ;
  gf_c_width := max_m - min_m + 1 ;
  gf_x_offset := - min_m ;
  gf_y_offset := max_n ;
debug
  print_ln('W ',gf_c_width:1,' H ',gf_c_height:1,' X ',gf_x_offset:1,' Y ',
    gf_y_offset:1);
gubed
  @<Dump raster@> ;
end

@ Now we can actually dump the raster representation of the character.  We
start at |i| in the |row_counts| array, which was left pointing to the first
non-blank row.  Each row has |gf_c_width| pixels.  We do the count to raster
conversion sixteen bits at a time.
This way, there is no messy business with the sign bit.  The conversion of
the counts to pixels is very straightforward, requiring care only when we
go over the bounds of a word.

@<Dump raster@>=
word_width := ( max_m - min_m + 1 + 31 ) div 32 * 2 ;
while row_counts(i) <> end_of_char do begin
  j := 0 ;
  word := 0 ;
  bit := 16 ;
  on := false ;
  count := row_counts(i) - extra ;
  while count <> end_of_row do begin
    incr(i) ;
eebug
  if on then for k := 1 to count do print('*')
  else for k := 1 to count do print(' ') ;
gubee
    while count > 0 do begin
      if count >= bit then begin
        if on then word := word + power[bit] - 1 ;
        count := count - bit ;
        send_raster_16(word) ;
        incr(j) ;
        word := 0 ; bit := 16 ;
      end else begin
        if on then word := word + power[bit] - power[bit - count] ;
        bit := bit - count ;
        count := 0 ;
      end ;
    end ;
    on := not on ; count := row_counts(i) ;
  end ;
  while j < word_width do begin
    send_raster_16(word) ;
    word := 0 ;
    incr(j) ;
  end ;
eebug print_nl ; gubee
  incr(i) ;
end

@ I suppose you noticed all of the locals still undefined?

@<Locals to |load_gf_file|@>=
@!word_width : integer ; {the width of the words to send}
@!word : integer ; {the word to send out}
@!count : integer ; {the number of bits to send}
@!bit : integer ; {the bit position to next set}
@!hppp, @!vppp : integer ; {horizontal and vertical pixels per point}
@!dx, @!dy : integer ; {escapements for the character}

@ And we still need the power array.

@<Glob...@>=
@!power : array [0..16] of integer ; {powers of two}

@ @<Set init...@>=
power[0] := 1 ;
for i := 1 to 16 do power[i] := power[i-1] * 2 ;

@ Our last remaining task is to interpret the postamble commands.  The only
things that may appear in the postamble are |post_post|, |char_loc|,
|char_loc0|, and the special commands.  We use the |row_counts| array to
store the \.{TFM} widths of the characters.  We must make sure to clear out
this array before using it.  Insure that you use the
horizontal displacement from the character locaters in your driver, rather
than the rounded |tfm_width|.  (This was not possible with the \.{PXL} files,
which is one of the reasons they are being discontinued.)

@d tfm_width==row_counts(gf_ch)

@<Interpret postamble@>=
for gf_ch := 0 to 127 do tfm_width := 0 ; {clear out tfm widths}
i := gf_signed_quad ; {skip over junk}
design_size := gf_signed_quad ;
check_sum := gf_signed_quad ;
hppp := gf_signed_quad ;
vppp := gf_signed_quad ;
if hppp <> vppp then print_ln('Odd aspect ratio!') ;
pxl_mag := round ( ( ( 5 * hppp ) * 72.27 ) / 65536.0 ) ;
debug print_ln('PXL mag = ',pxl_mag:1); gubed
i := gf_signed_quad ; i := gf_signed_quad ; {skip over junk}
i := gf_signed_quad ; i := gf_signed_quad ;
repeat
  gf_com := gf_byte ;
  case gf_com of
char_loc, char_loc0 : begin
  gf_ch := gf_byte ;
  if gf_com = char_loc then begin
    dx := gf_signed_quad ;
    dy := gf_signed_quad ;
  end else begin
    dx := gf_byte * 65536 ;
    dy := 0 ;
  end ;
  tfm_width := gf_signed_quad ;
  i := gf_signed_quad ;
end ;
@<Specials and |no_op| cases@> ;
post_post : ;
othercases bad_gf('Unexpected ',gf_com:1,' in postamble') ;
  endcases ;
until gf_com = post_post ;

@ Now we have the main program.  We do much of the pixel file output here
so it can be removed easily.  The |load_gf_file| procedure does all of the
work of the program.

@p begin
  initialize ;
  open_pxl_file ;
  pxl_word(pxl_id) ;
  load_gf_file ;
  dir_ptr := pxl_loc div 4 ;
  for gf_ch := 0 to 127 do begin
    pxl_halfword(gf_c_width) ;
    pxl_halfword(gf_c_height) ;
    pxl_halfword(gf_x_offset) ;
    pxl_halfword(gf_y_offset) ;
    pxl_word(gf_raster) ;
    pxl_word(tfm_width) ;
  end ;
  pxl_word(check_sum) ;
  pxl_word(pxl_mag) ;
  pxl_word(design_size) ;
  pxl_word(dir_ptr) ;
  pxl_word(pxl_id) ;
final_end : end .

@ A few more globals.  (Note that there is both a global |gf_ch| and a
local |gf_ch|.  This is so the same macros can be used for both.

@<Glob...@>=
@!gf_ch : integer ; {which character are we looking at?}
@!check_sum : integer ; {the checksum of the file}
@!dir_ptr : integer ; {where does the directory information start?}
@!design_size : integer ; {the design size of the font}
@!pxl_mag : integer ; {the pixel magnification, based on 200 dots per inch}

@* System-dependent changes.
This section should be replaced, if necessary, by changes to the program
that are necessary to make \.{GFread} work at a particular installation.
It is usually best to design your change file so that all changes to
previous sections preserve the section numbering; then everybody's version
will be consistent with the printed program. More extensive changes,
which introduce new sections, can be inserted here; then only the index
itself will get a new section number.
@^system dependencies@>

@* Index.
Pointers to error messages appear here together with the section numbers
where each ident\-i\-fier is used.