KA9Q on the USA Link
From ZaInternetHistory
The first leased line to the USA, between the computer room at Rhodes University and the home of Randy Bush in Portland, Oregon, was driven by Penril modems at 14.4 Kbps and had a 386 PC at each end. The software that performed the routing functions was Phil Karn's KA9Q program. This was in November 1991 (?? to be checked??).
Long response times
One of the "features" of the KA9Q routing software was that it appeared to be reluctant to drop packages when there was congestion on the circuit. Whether or not this was desirable is not discussed here, but for sure when there was congestion, the ping times between the two routers would rise to anything up to 300 seconds. This was way beyond the timeout limits of the TCP/IP protocols, and so the hosts that were connected across this link would themselves cause a timeout and a re-transmission. Given that the routers were still doing their best to transmit the original packets, the congestion simply worsened. The effect was that the routers at each end would hang, and the only way out was to phone to the USA and ask for a reset. One can understand that people like Joan Bush (Randy's charming wife) did not like to be woken up at, say, 3 am to be asked to reset the router. Besides, Randy had a number of such routers, and it was all to easy to reset the wrong one. Dave Wilson designed, and Brian Kemp (of Rhodes's Electronic Services divisions) built, a circuit that monitored the RS-232 inteface between the router and the modem, and if it went idle for too long, it would do an automatic reset on the 386 chip, and the router would then restart itself. This was installed at each end of the link, and helped a great deal.
The long response times caused a large number of problems that would not occur on a network that had a fast response time. Considerable time was spent trying to isolate these problems. One such problem that was eventually isolated, but could not be fixed permanently, was that domain storms would readily occur. One technical example of this was the case of the University of the Orange Free State, domain uovs.ac.za. This had its primary nameserver within uovs.ac.za, and secondaries at hippo.ru.ac.za, daisy.ee.und.ac.za, bertha.ee.und.ac.za and rain.psg.com. One weekend, the uovs.ac.za computer was unreachable for some reason or other. Using FTP Software's LanWatch program to monitor the ethernet traffic, we could see a DNS query arrive from the USA that had a request for resolving a uovs.ac.za hostname. The congestion on the USA circuit was such that even had the uovs.ac.za nameserver host been able to answer the query, the host that originated the request would already have sent out queries to daisy, bertha and hippo. Neither of these three hosts had information on the uovs.ac.za domain (for some reason or other at that particular time, the primary nameserver was unreachable and the DNS caches had timed out), so all three of these secondary nameserver hosts would operate in recursive mode and try to get the request resolved, which could only be done by querying each other and by querying rain. Now rain.psg.com was in the USA (in Randy's house), and rain itself could not do the resolving either so it would fire off recursive queries to the uovs.ac.za host (which was unreachable), and to daisy, bertha and hippo, which in turn would... and so on and so on. Under those conditions the link rapidly degenerated into a state of carrying no productive traffic whatsoever. Once this situation had been identified, Jacot found a way to kludge a resolving of the uovs.ac.za queries on hippo, and things got back to normal. However, it took the entire weekend to track this particular problem down, but we could not find a way to prevent this from occurring again at some point in the future.