"Structured Data with Bit Fields"
                     "Mitch Bradley"
                        Abstract




     A structured data item  is  a  data  item  composed  of
several   subfields  of  various  types,  analogous  to  the
`struct' declaration of C or the `RECORD'  types  of  Pascal
and Modula-2.

     Structured data types are trivial to implement in Forth
if each subfield is constrained to occupy an integral number
of machine bytes.  The problem becomes more  interesting  if
subfields  may include groups of bits within a byte or word,
as in C `bit fields' or Modula-2 `BITSET's.  Such bit fields
arise frequently when dealing with hardware registers, which
often contain  groups  of  bits  which  control  independent
hardware functions.

     This paper presents a set  of  words  for  conveniently
defining  and  accessing  structured data, which may contain
bit fields.  Their utility is shown with a real  example  --
manipulation of a disk controller.

INTRODUCTION

     A number of proposals for data structuring  words  have
recently appeared in the literature [WALK][BASI][JOOS].  The
scheme that I present is simpler than  most  of  these,  and
probably  not  as  powerful.  My particular emphasis in this
paper is not on data structures  in  general,  but  on  data
structures with imbedded bit fields.

     Structures  or  records  are  usually  used  to   group
together  related  data  items.  As an example, consider the
following fragment of C  [KERN]  code.   
struct  controller-registers {
       unsigned short   data;
       unsigned short   command;
       unsigned long    dma-base;
       unsigned  short    dma-count;   };   
This structure declares that there is a structure type called 
`controller- registers'.  This structure contains four fields, 
three of them short unsigned integers (2 bytes each), and one 
of the a long unsigned integer (4 bytes)[1].  

     The structure  could  later  be  used  by  declaring  a
pointer to that structure, as in:

        struct controller-registers *ptr;


 ________________________
   [1] Those of you familiar with C will no  doubt  note
that  dma-base should probably be a (char *) instead of
an (unsigned long).  However, it is easier  to  explain
as it stands.


     To access the dma-count register, one could then use

        ptr->dma-count


     From a code generation standpoint, this is not hard  to
implement.   All  the compiler has to do is to remember what
type of object is represented by `dma-count' and its  offset
from the start of the structure.

     When Forth programmers are faced with accessing a  data
structure  like  this  one, the tendency is to use numerical
offsets into the data structure.  The offset calcualation is
usually done explicitly when the object is referenced, using
perhaps `8 +'.  Explicitly calculating offsets  is  markedly
inferior  to  having  it done automatically for you, though.
Explicit offsets buried in code make the code hard to under-
stand and especially hard to modify.

     A structure capability  that  automatically  calculates
offsets  is extremely easy to implement in Forth.  
: field ( offset size --- ofset+size )
  create over , +
  does> ( fd pfa --- fd+offset )
         @ + ; 

: struct 0 ; 

This may  be  used  as  follows:
struct ( controller-registers )
    2 field   data
    2 field   command
    4 field   dma-base
    2 field   dma-count 
constant   controller-registers-size

The word `field' just keeps a running account of the current
offset at compile time, and creates a new  word  which  when
executed  will  add  its  offset to the number on the stack.
The word `struct' simply initializes the current  offset  to
0.   After  the structure definition is finished, it is con-
venient to remember the  total  size  of  the  structure  by
defining  a  constant, using the size that is already on the
stack.

     Later the fields may be used as  follows,  assuming  we
have  a  variable  `ptr'  which  contains  the address of an
instance of this structure.

          ptr @ data @    ( fetches the contents of the data field )

100000.   ptr @ dma-base 2! ( sets the dma-base field to 100000. )


BITFIELDS

     So far this has been pretty trivial, and  hardly  worth
writing  a  paper  about.  So lets make the problem a little
more interesting.  Suppose  that  we  want  to  define  some
fields  which  do  not  occupy  an integral number of bytes.

This situation can arise when there are data items that  are
very short, say 1 bit, and you don't want to waste an entire
byte just to store 1 bit.   The  situation  also  frequently
arises  when  dealing  with  hardware registers, which often
pack several barely-related bits within  a  single  byte  or
word.   Coping with single bits isn't too bad, because it is
easy to operate on the bit with a mask and a boolean  opera-
tor,  setting or clearing the bit as desired.  The real pain
comes when there  is  something  like  a  3-bit  field  that
represents a number, and that field is somewhere in the mid-
dle of a byte.  Then the problem involves a lot  of  masking
and shifting, which is annoying to write and hard to read.

     C copes with this problem  by  providing  a  `bitfield'
definition  capability  within  structures.  Bitfields allow
the programmer to give a symbolic name to a  group  of  bits
within a word or byte, and then to access that group of bits
as though it were a small  integer,  without  having  to  do
explicit  chifting and masking.  I have attempted to provide
this capability within Forth, and  have  succeeded  to  some
extent.   Here's  how  it works, described by example.  This
segment of code comes from a driver for a Xylogics Model 450
disk controller controller board[2].  
_____________________________________
  [2] The funny byte numbering (1 first, then 0)  is  a
result  of  the fact that the controller manual numbers
the bytes in `little endian' order, with the least sig-
nificant  byte of a word at the lowest address, whereas
the code is written for a 68000 machine, which is  `big
endian',  with  the  most significant byte of a word at
the lowest address.  This problem is well-known to any-
body  who  has  ever  tried to write `portable' code to
deal with real I/O devices.  It ranks as a  major  nui-
sance.


struct (  xyiopb  )  ( Byte 1 )
  1 resbits
  1 bits  intrall ( interrupt on all iopbs (450)
  1 bits  intrerr ( interrupt on error )
  1 bits  reserve ( reserve dual port drive (450)
  1 bits  recal   ( recalibrate on seek errors (450)
  1 bits  enabext ( enable extensions (450)
  2 bits  eccmode ( ECC actions ) ( Byte 0 )
  1 bits  autoup  ( auto update of IOPB )
  1 bits  reloc   ( use relocation )
  1 bits  chain   ( command chaining )
  1 bits  ie      ( interrupt enable )
  4 bits  cmd     ( command ) ( Byte 3 )
  1 field errno   ( error number ) ( Byte 2 )
  1 bits  iserr   ( error indicator )
  2 resbits
  3 bits  ctype   ( controller type )
  1 resbits
  1 bits  complete ( completion code valid (450) ( Byte 5 )
  2 bits  drive   ( drive type )
  4 resbits
  2 bits  unit    ( unit number ) ( Byte 4 )
  1 bits  bytebus ( use byte transfers )
  4 bits  intrlv  ( interleave - 1 (450)
  3 bits  throttle ( throttle control ) 
( remaining are  not bit fields )
  1 field sector  ( 7: sector number )
  1 field head    ( 6: head number )
  2 field cylinder ( 9,8: cylinder number )
  0 field nsect   ( b,a: sector count )
  2 field status  ( low byte is status )
  2 field bufoff  ( d,c: buffer offset )
  2 field bufrel  ( f,e: buffer offset )
  1 +  ( 11: reserved byte )
  1 field bhead   ( 10: base head (450)
  2 field nxtoff  ( 13,12: next iopb offset )
  2 field eccpatt ( 15,14: ECC pattern )
  2 field eccaddr ( 17,16: ECC address ) 
constant xyiopbsize

     Take a look at `Byte 1'.  The 8 bits of this byte  con-
trol 6 different functions.  The most-significant-bit is not
used (1 resbits).  The next 5 bits control 5 separate modes,
and  the final 2 bits are encoded to select one of four pos-
sible ecc modes.  Other bytes have different  configurations
of  bits.   The  code  to control this disk controller could
obviously get quite contorted if the programmer  had  to  do
explicit  shifting  and  masking  in order to manipulate all
these little pieces of bytes.

     Here is a section  of  the  code  which  accesses  this
structure.   Note how the use of the bitfield names has made
the function of the code obvious.  

: setiopb ( buf. cnt  sec head cyl cmd -- )
  iopb xyiopbsize    0 fill
  ( ... cmd ) iopb cmd      bit!
  ( ... cyl ) iopb cylinder !
  ( ... head) iopb head     c!
  ( ... sec ) iopb sector   c!
  ( ... cnt ) iopb nsect    !
  ( ... bufhi ) 15 and   12  shift
                iopb bufrel !
  ( ... buflo ) iopb bufoff !

  drivetype  @ iopb drive    bit!
  unitno     @ iopb unit     bit!
  xythrottle @ iopb throttle bit!

  1 iopb autoup  bit!
  1 iopb reloc   bit!  ( specific to the 450 -- not  on  the 440 )
  1 iopb enabext bit!
  basehead @ iopb bhead  c!
  2 iopb eccmode bit!  ;

     The rabid Forth programmer will no doubt complain  that
this  definition  is  too long.  This is a valid point.  The
excuse is that this bit of code was translated almost verba-
tim  from a known-working C program, and I had no motivation
whatsoever to do a stylistic transformation strictly for the
sake of `purity'.  If it isn't broken, don't fix it.


IMPLEMENTATION

     The code to implement bit field defining, fetching, and
storing  follows.   It  is  deceptively  simple, but its use
represents a  major  improvement  in  the  writeability  and
understandability  of  programs  that need to deal with such
structures.  Both high-level  and  68000  assembly  language
versions  are given.  It was actually easier to write `bit!'
in assembly language than in high-level, because  the  stack
manipulations were complicated.

     This package specifically does not allow bit fields  to
cross byte boundaries.  It would not be hard to extend it to
allow bit fields to be anywhere within a word.  This partic-
ular  application  benefited from the byte-only restriction,
because the disk controller hardware had  some  funny  byte-
only  registers.   Undesireable  side  effects resulted from
using a word operation which accessed 2 of them at once.

     I suspect that allowing  bit  fields  to  cross  n-byte
boundaries,  where  n is the number of bytes in the machines
hardware data paths, would make the code a lot more  compli-
cated.


ACKNOWLEDGEMENT

     The syntax  for  defining  fields,  where  the  current
offset  is  kept on the stack at compile time, was suggested
at a FIG meeting by (I believe) Glenn Tenney.

References

[BASI] Basile, James, Implementing Datastructures  in  FORTH
Jour. FORTH Appl. Res. 1, 2, December, 1983.  pp.5-16.

[JOOS] Joosten, Rieks, and  Nieuwenhuysen,  Hans,  Vectoring
Arrays  of  Structures Jour. FORTH Appl. Res. 1, 2, December
1983. pp.35-54

[KERN] Kernighan, B., and Ritchie, D.,  The C Programming Language
 Prentice-Hall, 1978.

[WALK] Walker, Bruce W., PL/1 Data Structures  FORTH Dimensions
  5,  6,  March/April  1984,  pp.8-12.   

( Forth 83 )
( structures with bit fields )
variable bitoffset

: bitalign ( offset size -- offset' size )
( moves to next byte boundary if the bit offset is nonzero )
  bitoffset @ if
    0 bitoffset !   swap 1+ swap
  then
;

: struct 0 bitoffset ! 0 ;
: field ( offset size -- newoffset )
    create bitalign over , +
    does> ( structure-base -- structure-member-address )
          @ +
;
( reserves an n-bits field )
: resbits ( offset #bits -- newoffset )
  bitoffset @ +   8 /mod  ( offset bitoffset_mod_8 bitoffset/8 )
  swap bitoffset !  +
;  
( bits are allocated from left-right, or from msb to lsb. )
( however, they are numbered from lsb=0 to msb=7 )
( left-right generates the correct bit number )
: left-right-bit# ( #bits -- #bits bit# )
  bitoffset @ over +  8 swap -
  dup 0< abort" Bit fields can't cross byte boundaries"
;

( structure describing the layout of information in the parameter field )
( of a word created by "bits" )

struct ( bit-record )
  2 field bit-offset
  2 field bit#
  2 field #bits
constant bit-record-size

( defines an n-bit field )
: bits ( offset #bits --namex newoffset )
  create  over , left-right-bit# , dup ,  ( offset #bits )
  resbits
  does> ( struct_addr pfa -- byte_addr bit# #bits )
     >r
     r@ bit-offset @ +  ( byte_addr )
     r@ bit#       @    ( byte_addr bit# )
     r> #bits      @    ( byte_addr bit# #bits )
;

hex
create masks
  0 c, 1 c, 3 c, 7 c, 0f c, 1f c, 3f c, 7f c, ff c,
decimal

: bitmask ( n -- mask ) masks + c@ ;

: bit@ ( byte_addr bit# #bits -- n )
  rot c@           ( bit# #bits byte )
  rot negate shift ( #bits shifted-byte )
  swap bitmask and
;
: bit! ( n byte_addr bit# #bits -- )
  bitmask         ( n byte_addr bit# mask )
  rot >r          ( n bit# mask )
  rot over and    ( bit# mask masked-n )
  2 pick shift    ( bit# mask shifted-masked-n )
  -rot            ( shifted-masked-n bit# mask )
  swap shift      ( shifted-masked-n shifted-mask )
  not             ( shifted-masked-n shifted-inverted-mask )
  r@              ( shifted-masked-n shifted-inverted-mask byte_addr )
  c@ and          ( shifted-masked-n masked-old-value )
  or              ( new-value )
  r>  c!
;

\ 68000 assembly language version of bit@ and bit!

\ code bit@ ( byte_addr bit# #bits -- n )
\   word
\   sp )+ d0 move  ( d0 is #bits )
\   masks # d7 move   d7 a0 lmove  ( relocated address of mask table )
\   byte
\   0  d0  a0  di)  d0 move        ( mask to d0 )
\   word
\   sp )+ d1 move  ( d1 is bit# )
\   sp )+ d7 move   d7 a0 lmove   ( relocate 16 bit address to 32 bits )
\   long d2 clr
\   byte a0 ) d2 move
\   d1 d2 lsr
\   d0 d2 andw
\   word d2 sp -) move
\ c;

\ code bit! ( n byte_addr bit# #bits -- )
\   word
\   sp )+ d0 move  ( d0 is #bits )
\   masks # d7 move   d7 a0 lmove  ( relocated address of mask table )
\   byte
\   0  d0  a0  di)  d0 move        ( mask to d0 )
\   word
\   sp )+ d1 move  ( d1 is bit# )
\   sp )+ d7 move   d7 a0 lmove   ( relocate 16 bit address to 32 bits )
\   sp )+ d2 move  ( d2 is n )
\   d0 d2 andw
\   d1 d2 lsl ( masked, shifted, n in d2 )
\   d1 d0 lsl ( shifted mask in d0 )
\   d0 notw ( inverted mask in d0 )
\   byte d0 a0 ) andw
\        d2 a0 ) orw
\   word
\ c;