From ZaInternetHistory
swf 9,73;sj y MAIL017
RHODES UNIVERSITY COMPUTING CENTRE
----------------------------------
21 November 1989
----------------
Changes to the Fidonet Gateway
------------------------------
1. Scope
2. Changes
2.1. Points off Settler City
2.2. Conferences on Settler City
2.3. Relocate the PC
2.3.1. Logging by Ops Staff
2.3.2. Operations Schedule
2.3.3. Controlling .BAT File
2.4. String Handling
2.5. Log for Operations
2.6. Responsibilities to Fidonet
2.7. Better Logic
2.8. Possible Bug in Archiving?
2.9. Error Recovery
2.10. Testing Environment
2.11. Current Log File
2.12. Commenting
2.13. Operations Log
2.14. Backup
2.15. Manual for Ops Staff
2.16. Explanation of Disk Files
2.17. Description of Sources
2.18. Parameter for START.BAT
2.19. Recovery from Failure
2.20. Function Key for Forced Dialing
2.21. Mailspec Extensions
2.22. Algorithms
2.23. Reptition
2.24. Modem Parameters
2.25. Break-in
2.26. Changing Directories
2.27. Cleanup
2.28. Regular Backups
1 1. Scope
--------
The present Fidonet operation has been developed under all kinds
of pressures, with never enough time to stand back and review.
This review must now be done, as beyond doubt, the system is
creaking.
This document sets out to specify the changes that need to be
made to the present operation of the Fidonet Gateway. It is NOT
intended to be a criticism of how the gateway currently operates
- if it is read in such a light, then it will have been
mis-interpreted.
2. Changes
----------
2.1. Points off Settler City
----------------------------
Close all Points by 31 December 1989, except for
Pat Terry
the Rhodes PRO office
Dave Wilson testing
Tim Bouwer testing
The affected sysops are to be notified by 30 November 1989 of
this proposed action.
Operation of points carries some kind of commitment to see
that facilities are available to these Points. These facilities
appear to be quite inconsistent
There is nothing to stop these sysops from operating their own
Fidonet nodes, or from using Uninet (given the necessary
authorisation by a Uninet participant site). Indeed, the
closing notification is to state these options very clearly.
The reasons for encouraging Points no longer apply. Uninet is
accessible to most research institutions, and Fidonet is now
officially recognised by the SAPT as a common interest group.
So the present Point sysops have at least one viable
alternative.
2.2. Conferences on Settler City
--------------------------------
Close all conferences by 31 December 1989 except for
those required to operate the zonegate
those of direct interest to Pat Terry
Subscribers who will be affected are to be notified by 30
November 1989 of this proposed action.
It is not beyond reasonable expectations that an alternative
mailing gateway will come into existence. This gateway will not
be based on Fidonet. If there are users who are dependent on
the Fidonet system for their conferences, they will assert
pressure to keep Fidonet open. It will be better to stop this
right now, as it appears that these conferences interfere with
the primary operation of the gateway, viz to transfer mail to
and from Uninet.
There is nothing whatsoever preventing anyone who is
inconvenienced by this action from operating their own Fidonet
system to receive these conferences.
2.3. Relocate the PC
--------------------
Install the PC and modem in the Computer Room so that it can be
driven and monitored by the operations staff.
If the operation of the zonegate is not brought up to the point
whereby it is operated in a routine manner, then it will not be
possible for the present sysop to be absent from the Computing
Centre.
In order to bring the system up to a high level of reliability,
the operation should be made to be manual. What will be needed
includes (amongst possible omissions that are to be added to
this document):-
2.3.1. Logging by Ops Staff
---------------------------
A log to be prepared for the Ops staff. This must include
a record of operator activities
a record of incidents that are inconsistent
with reliable operation of the system
The record of system failures must have place for a report or
comment on what caused the incident, how it was cleared, what
the lost time was, when the incident occured, who reported the
incident, who cleared the incident.
2.3.2. Operations Schedule
--------------------------
A schedule of operation must be drawn up. This must show when
various events should take place. These include, for example:-
time for the sysop
national mail hour
2.3.3. Controlling .BAT File
----------------------------
Set up a function key that will (in this order)
cause the PC to exchange mail with RURES, cause
mail received from RURES to be prepared for
sending by the PC.
Set up a function key that will
cause a single dialing dialing attempt to 1:105/42,
allowing a simple break-in process (eg some sort
of stop-key, or modem powering off) should the
operator decide to abort the dialing attempt
if the connection to 1:105/42 was successful cause
any incoming Uninet mail to be delivered to RURES.
It must be possible to repeat any of these operations ad nauseum
without any adverse effect on the system. The repeat should run
quickly, avoiding duplication of runs, it must not cause
information to be lost. If errors occur, it must fail safe.
At times when the PC is not being driven by the operator, it
will be in a state to receive calls from other Fidonet PCs.
The PC will also respond to National Mail Hour, causing dialing
out to other Fidonets in accordance with standards and
guidelines acceptable to the RSA Fidonet organisation.
Critical points in the operation must be identified, so that it
is just about impossible to continue automatically after a crash
or an untoward interruption if corruption or loss of information
is about to occur.
2.4. String Handling
--------------------
Double-check the programs that have been written at Rhodes -
there seems to be a problem with string handling in CONF2NOS.C,
and this might well be more widespread.
2.5. Log for Operations
-----------------------
Design a log for the Ops Staff.
This should follow the lines of that used for the Cybers. It
should record when the gate is in production mode, when it is
down for scheduled maintenance, when it has had a problem and
the method of attending to the problem and the cause of the
problem.
It must be possible to produce daily and weekly reports on the
performance of the facility, its reliability, and the causes and
durations of stoppages. The number of times of dial-ups to the
USA were attempted, and the outcome, should be readily visible.
These reports will be produced manually by the Ops Staff, as is
currently done for the Cybers.
2.6. Responsibilities to Fidonet
--------------------------------
The primary reason why Rhodes uses Fidonet at all is for the
international gateway for Uninet email. The responsibilities to
Fidonet need to be spelled out in a Computing Centre document.
For example, what is supposed to happen about the nodefile, what
files must be transferred from the USA, and what is supposed to
happen to them when they arrive, what times the Zonegate is
supposed to accept incoming calls, etc etc.
It would be sensible to classify ideas in this regard as either
requirements or desireables.
2.7. Better Logic
-----------------
The processing of the messages seems to be illogical. There are
too many copies of files being kept and being re-processed. A
good system diagram is needed to make the flow of information
more visible, but it is clearly quite ridiculous to use the PC
as a filestore when it is a gateway. If backup copies of files
need be kept, which is very likely an absolute necessity, then
they should not be kept on the PC. Also, when a message has
been processed, it should be removed. Here, processing of a
message does not just mean getting it out of the received
archived file and into a packet, nor out of a packet into a
message area. Once all of the work is done on a message, it
must be removed from the system. Reprocessing of messages is to
be avoided.
Similarly, when an archived file has had all of its processing
done, it should be removed, as should a packet file (and any
other file). Too many files are being left on the PC.
Further, it seems that not all messages are being archived.
Those that are relevant to a conference are not being archived.
Everything that passes through the gate should be archived to
tapes on the Cyber.
There seems to be a problem with looking for an RFC 822
subheader within the Fidonet text field. Is this a new
specification that has been introduced, and if so, where is it
documented? What is the purpose of checking for the text
"Return-Path:" in the first 5000 characters of the text field?
There is no guard against pressing a wrong function key. When a
function key is pressed, before proceeding with what might be a
time-consuming operation the system should notify the operator
what is about to be done, and wait for a proceed / abort reply.
2.8. Possible Bug in Archiving?
-------------------------------
Check for a bug in the process that archives the .MSG files. If
these files do not have a 4- (or more-?) digit number, they do
not get archived.
(Refer to Pat Terry for details - he reported this problem).
2.9. Error Recovery
-------------------
The present method of error recovery leaves a lot to be desired.
It would be far better to implement a method of recovery that is
sensitive to the context in which a failure occurred, so that
recovery can be automated where possible, and prevented (and the
operator advised accordingly) when auto-recovery is not
possible.
The system as it was (17 Nov 89) was capable of starting up
quite merrily, and then failing on transfers to RURES, because
an interlock file had not been reset. Yet, there was no warning
at startup that this interlock was set.
2.10. Testing Environment
-------------------------
It must be appreciated that the Settler City PC is in full-scale
production. It is therefore most unwise to use it as a test-bed
for ideas. Changes should not be made "on the fly" except in
cases of emergency. A test-bed system must be set up on the AT,
so that, for example,
programs can be debugged
new .BAT files can be tested
operations on incoming and outgoing .MSG, .PKT
and .MOx files can be tested
the BINKLEY.EVT file can be edited
The production PC would be used to copy files to a floppy (or a
series of floppies, using the DOS BACKUP command) and then
loaded onto the AT for testing / examination or whatever.
Similarly, new processes can be thoroughly tested on the AT
before being loaded onto the production PC.
Apart from the obvious benefits that will arise from the above,
it will then be possible to strip the production PC to a minimum
as far as files are concerned. Far fewer utilities need be kept
on-line, only those required for emergency use. Many of these
emergency utilities can be kept off-line on floppy disks anyway.
2.11. Current Log File
----------------------
It should be possible to produce a snapshot of the current daily
log information at the press of a function key. This would
typically then be copied to a floppy disk for examination. The
process must not destroy the contents of the daily log.
2.12. Commenting
----------------
The small amount of comments in the .BAT, .EVT and .C files is
appalling. It is simply not possible for an intelligent person
to make any progress with these files without a great deal of
further study of the Fidonet system, or without getting hold of
an expert and annoying him considerably.
The lack of a version number in the major files (eg START.BAT,
BINKLEY.EVT) indicates a sloppy approach to computing, and
indicates a total lack of appreciation about running a
production system.
These files should have at least
a version number
a modification record
a method of obtaining ANY of the earlier versions
a ratio of 2:1 for lines of comment to lines of code
a description of how to install, compile or otherwise
put into production any changes that are made (this to
be comments in the file itself)
appropriate words of warning or other cautions that
should be known to anyone who contemplates changing
the file.
2.13. Operations Log
---------------------
A log of the activities that are performed by the operator and
the system maintainer must be kept. This should reflect the
amount of time that the system was
in normal production
stopped due to hardware failure
stopped due to program failure
stopped for hardware maintenance
stopped for program maintenance
manually initiated mail transfer attempts
any other untoward incident
The log must show the date/time of these events, the name of the
person who recorded the event, and a brief comment about the
event itself.
This log will have an associated fault reporting system, similar
to that for the equipment in the computer room. When, say, a
program failure occurs, there must be a comprehensive report
provided on the failure, describing what program failed, what
the problem was, how the situation was cleared, what was done to
fix the program.
2.14. Backup
------------
A backup system must be put into place AND TESTED to ensure that
the system can be reloaded. The DOS BACKUP and RESTORE commands
are to be used for this. The testing of the reloading must be
onto a PC or AT that is cleared of all except the DOS bootup
files.
A .BAT controlling file must be provided to cause the backup to
take place. The backing up of files over and beyond those
needed to run the system is to be avoided. The normal backup
process is to backup entire directories, and this should be the
case here. At the same time, the TREE/F (or similar) command
must be used to produce a floppy disk file for documentation.
2.15. Manual for Ops Staff
--------------------------
A write-up for the Ops Staff must be provided. This must
describe at the least how to
how to carry out normal operations
distinguish between normal and failed conditions
cause a full international email interchange to
take place
record faults
whom to notify in case of failure
2.16. Explanation of Disk Files
--------------------------------
For each file or category of file on the PC there must be a
description containing at least
Pathname Purpose Creating process Deleting process
-------- ------- ---------------- ----------------
Without this it is impossible to determine what files are
necessary and what are not.
There are also some Ramdisks set up. There must be a
description of their use, the size constraints on them, which
programs use them, and any further information of use to an
intelligent programmer who is unfamiliar with the details of how
Fidonet works.
2.17. Description of Sources
----------------------------
On the Fidonet PC, there must be a documentation file as the
first file of the sorted C:\FIDO directory describing at least
where to find the original Fidonet disks
how to get help in times of trouble
where the source of Rhodes University programs
are stored
how to store any changed source programs (NB this
includes .BAT, .EVT files)
how to use the test AT for debugging
any other useful information
2.18. Parameter for START.BAT
------------------------------
Modify START.BAT to take a parameter to allow it to be invoked
to run from a particular point, by default from the beginning.
Ensure that only sensible value can be provided for this
parameter.
2.19. Recovery from Failure
---------------------------
Modify START.BAT so that it will restart automatically from the
most sensible point after a power failure or any other
interruption.
If restart is not possible, then START.BAT must inform the
operator accordingly, and should provide as much useful
information as possible to help with the manual restart.
2.20. Function Key for Forced Dialing
-------------------------------------
A function key should be set up to force the dialing to the USA.
The method of typing Alt-M and responding with 1:105/42 is not
satisfactory, as it is easy to type the wrong characters.
Before dialing, the operator should be prompted to indicate
whether mail from RURES should be collected first. The question
should timeout after 2 minutes, and assume an affirmative reply.
After the telephone connection has been completed, the operator
should be prompted to indicate whether the incoming mail should
be processed. This question should timeout after 2 minutes, and
assume an affirmative response.
2.21. Mailspec Extensions
-------------------------
What extensions have been added or attempted over and above the
mailspec given in the MAILnnn files?
For example, it seems that a test was put into CONF2NOS to look
in the first 5000 charactes for the text "Return-path:". This
is a case-sensitive check, which violates RFC822 standards for
starters, and ignores totally the concept of a header followed
optionally by a blank line and text.
2.22. Algorithms
----------------
Sorely lacking in CONF2NOS.C is any description of the
algorithms used. This is a major oversight, and must be
corrected.
The programs should be written first and foremost for people to
read, and then have the computer code added.
2.23. Reptition
----------------
It seems pretty obvious that when function key F10 is pressed
immediately after the PC has finished processing a previous F10,
a great deal of processing takes place. This is grossly
inefficient - when a file is processed successfully, it should
not be processed again except by deliberate and intentional act.
2.24. Modem Parameters
----------------------
Provide a .BAT file to set up the modem to its correct settings.
The file should have a comprehensive description of what it is
doing. It must also describe any hardware settings required by
the modem.
Currently, if the modem gets a bad setting, the settings have to
be guessed.
2.25. Break-in
--------------
Describe how to break into a poll attempt, and how to break into
other telephone activities. (eg at worst, power off the modem).
2.26. Changing Directories
--------------------------
AVOID LIKE THE PLAGUE any changing of directories from within a
program or .BAT file. It is dangerous, and caused a series of
files to be wiped out when a directory did not exist.
When a program fails, the PC is left in an arbitrary directory,
and this is dangerous.
In cases where it is absolutely unavoidable, then devise a
foolproof check to see that the change took place.
2.27. Cleanup
-------------
When mail has passed through the PC, it should be cleaned out of
the PC. Any files required for backup should be stored in a
tape archive on the Cyber.
Do not use the PC as a filestore, nor as a backup system. It is
a gateway.
2.28. Regular Backups
---------------------
A regular backup procedure must be instituted. This must use
the DOS BACKUP process, to backup the PC files onto floppy
disks. This backup should be used to re-create the Fidonet
system, and must be tested to do this. It should not re-create
any files with messages unless these are essential to run the
system.
1 MAIL017 Ends