Controlling the Net: A Case Study

This is a posting in HTML format of a paper originally presented as an invited talk at the Network Analysis User Group meeting in Washington, DC in September 1993. While the specific technology and numbers are dated, the design principles have stood the test of time.

University Networking Services, +1 612 625 8888, 130 Lind Hall, unet@unet.umn.edu

<URL:ftp://mail.unet.umn.edu/unet/wiring/net-design;type=a> <URL:gopher://mail.unet.umn.edu/0/home/ftp/unet/wiring/net-design> September 1993

Summary

The University of Minnesota's network management system is a philosophy that pervades all aspects of our data network. This paper will present the design of our data network and show how network management concerns entered into each part of the design.

What We Are Managing

We operate a typical large enterprise network. It extends to five campuses throughout the state plus another dozen smaller sites. Our "flagship" protocol is TCP/IP. We also support DECNET, AppleTalk, and Novell IPX. We could route OSI CLNS if necessary, but at this time only one site within our network has requested it, and that only for a single, specialized connection.

We support Ethernet and LocalTalk hardware interfaces. All protocols are available to those hosts with Ethernet interfaces, but only AppleTalk and TCP/IP are available to LocalTalk devices.

Our network connects over 15,000 computers over 470-odd IP network numbers (subnets). There are 203 AppleTalk zones.

We implement this user access with nearly 60 Cisco routers, over 460 twisted-pair hubs, over 100 Ethernet-to-Ethernet bridges, and over 250 Shiva FastPaths that connect LocalTalk to Ethernet networks.

Design Principles

We follow these overall design principles:

a mean time between failure (MTBF) of one year, as seen by a user
a mean time to repair (MTTR) of two hours
all solutions must scale smoothly from small to large

Bridging or Routing?

The answer to this question was never in doubt: from our experience, we had to fully route all protocols in order to survive at all. This conclusion came from the following points:

Large parts of our network used to be bridged together. We had a constant problem with broken devices "taking over" other devices (e.g., a bogus ARP response). Given the confusion of Ethernet addresses, it was fairly difficult to track down such miscreants.
Making changes in the network configuration (e.g., AppleTalk network number or zone list) was very difficult, as it was very difficult to locate the owners of all affected devices and arrange a common time to make the change.
The change problem was exacerbated by the fact that changes happened much more frequently on the larger network. In addition, what would otherwise be purely internal (to the backbone network) changes were visible to end users, again making the change problem worse.
A bridged network can "cover for" many mistakes. For example, if a user uses an invalid IP address, it may well still work. This covering encourages "time bombs," which go off when otherwise innocent changes are made to the network.
A bridged network does not (necessarily) limit the protocols that are used. As it turns out, very few protocols can scale to a network of our size. There were a number of instances where users began using an unsupportable protocol (e.g., DEC's LAT) and made a large commitment of resources to it. When they started running into problems, we were not able to help them as they were not using the protocol in accordance with vendor specifications. (No, we don't support any extensions to protocols that the vendor does not also support.) This is a no-win situation for everyone.

So, what does routing gain for us?

Ability to divide our network in many small pieces and isolate problems to within one piece.
Ability to hide our "backbone" network from user view.

The last point is the real key item.

Without it, we have people "hooking into a network," one which can't be changed or grown without affecting all existing users.

With it, we are in the business of offering packet delivery services, with the interface to those services specified by hardware/software combinations. Now, we can grow and change "our" part of the network without disturbing the existing users.

Overall Topology

We reviewed the various topologies and settled on a compound star for these reasons:

After reviewing the failure history of our (and other) network(s), we determined that the software was the most likely thing to fail. The lack of redundancy offered by the compound star put minimal demands on software, thus increasing the MTBF. For TCP/IP, we use the IGRP routing protocol between Cisco routers and no protocol other than ARP between end hosts and the routers. You can't get more simple than than...
The lack of redundancy also helps quickly pinpoint any failures in the network. This reduces the MTTR.
The simple structure of the compound star helps prevent intermittent failures. Such savings offer orders of magnitude improvements in MTBF and MTTR values.
The simple structure is also economical to implement.

The core of our star is a cluster of six Cisco AGS+ routers, each with 18 Ethernet interfaces and interconnected with an FDDI ring. These systems are located within a few feet of each other at the central telecommunications facility. We consider the FDDI ring to be more of a Cisco "bus extender" than a network with an identity of its own.

We currently have fibre run to about 70% of our buildings. Where it exists, we use it as a fibre Ethernet which runs to a central point within each building. Links shorter than the 1 000 m in the fibre Ethernet standard are implemented as repeater links (see the next section). Links longer than this distance terminate in a router (preferred) or bridge at the remote end.

Our campus is large, and not all fibre runs directly back to the telecommunications center. We have established roughly a dozen remote routers (each Cisco AGS+) which feed connections to their local area, then have one connection back to the main switching hub. These connections are all Ethernet for now, but can be upgraded as traffic warrants. We have set up the star to be as small and "bushy" as possible to minimize the number of hops that packets must make.

If you note, this design uses the Cisco routers as network switches. A fully-configured router can switch traffic at rates approaching 1 Gbps. We use networks (i.e., Ethernet), as a point-to-point communications protocol. This architecture often goes by the name "inverted backbone."

The only statistics in the core MIB of the SNMP protocol are counts of events crossing an interface. The compound star design matches the physical network to the model used for statistics. This matching makes it easy for us to gather any required statistics.

Building Networks

We have on the order of one hundred building connections, but something like 150 times that many host connections. We have thus spent a great deal of time developing a good growth plan for the local connections. The plan we use is nicknamed the "country road" plan. It is modeled after that of the road system, in which a road starts off as a dirt lane and is monitored and improved as required to handle the traffic.

The first connection in a building is made like this:

fibre           ---------  twisted pair
=============== |  hub  | xxxxxxxxxxxxxxxxxxxxx host
(< 1 000 m)     |       |
                |       |
                |       |
                ---------

fibre           ---------               ---------  twisted pair
=============== | Cisco | ------------- |  hub  | xxxxxxxxxxxxxxxxxxxxx host
(> 1 000 m)     | IGS/L |               |       |
                ---------               |       |
                                        |       |
                                        ---------

The fibre is on a dedicated Ethernet interface in the telecommunications center or remote router. The local router is only required for longer fibre runs. The router can be attached to the hub by fibre, twisted pair, or thinnet.

The second through twelfth connections in each building (we use 12 port hubs) are handled in the obvious way.

Connections 12 through 24 are handled as:

fibre           ---------  twisted pair
=============== | hub 1 | xxxxxxxxxxxxxxxxxxxxx host
(< 1 000 m)     |       | xxxxxxxxxxxxxxxxxxxxx host
                |       |       ...
              +-|       | xxxxxxxxxxxxxxxxxxxxx host
              | ---------
              |
              | ---------
              +-| hub 2 | xxxxxxxxxxxxxxxxxxxxx host
                |       | xxxxxxxxxxxxxxxxxxxxx host
                |       |       ...
                |       | xxxxxxxxxxxxxxxxxxxxx host
                ---------

In other words, the hubs are co-located where possible and connected to each other with thinnet. This is the only allowable use of thinnet in our network.

If, for distance reasons, a hub must be located remotely, we use twisted pair for the connection. If the hub's AUI port is free, we use a twisted pair transceiver off the hub's AUI port for inter-hub connections. This keeps counting as clean as possible (12 ports, 12 devices).

In many cases we are attaching to "private" networks. These can be old thick- or thin- net networks or new 10BaseT networks for which all of the devices and hubs are owned by the local department Such networks hang off one of the twisted pair interfaces, and a bridge is always inserted between the private network and the backbone.

In fact, the term "private" mainly refers to who paid for the equipment. Regardless of who owns it, we follow the same host registration (all computers attached to the Ethernet must be registered) procedures and all hubs and other network equipment are monitored. The user is also responsible for ensuring that their equipment is running the correct software releases.

We also keep an eye on the total number of hosts attached to one router interface. Our current guidelines call for a maximum of one hundred hosts per interface, with a typical network should have no more than sixty or so. If the network gets larger than this, a second building backbone is established and operated in parallel.

This limit comes not from pure performance issues, but from fault prevention and isolation concerns. We have found that if networks are kept small, they operate reliably. As they get large -- even well within nominal specifications -- they tend to fail. The causes of failure are numerous, ranging from faulty equipment to below-par installations to host software misconfigurations. However, it is far cheaper to simply keep the size small and not worry about these problems than it is to be trying to fix them all, and still wind up having to split things.

Cabling

We reuse existing pairs when possible. All pairs are tested before being okayed for use. When new wire has to be pulled, we are pulling level 5 cabling but terminated according to level 3 techniques. When level 5 termination equipment is available, we will reconsider this decision.

Most of our existing fibre is 50 or 62.5 micron multimode. For typical building distances, the data rate for such fibre tops out at about 500 Mbps. We will switch to using 8 micron single mode as soon as devices (transceivers, modems, etc.) are available: we expect this to happen soon.

Sparing

We have only a few types of supported network equipment, so stocking spares is fairly easy. We always make sure that we have one of everything in our spare stock. Some devices (e.g., FastPaths and hubs), we have more spares for because of the large number of installed units. These devices fail rarely, so our spare stock doubles as a pool of devices (all but one that is) available for quick installation.

We assume that we have to have spares to cover a complete chassis and all cards. In the case of a core hub Cisco AGS+, this ties up over $70,000 in a fully-configured spare. This sounds like (and is) a lot of money, but if someone should ever drop an A/C line into one of our routers and fry the whole thing, our users will be very happy that they can be up and running now instead of having to wait until the next business day for parts.

UPS

We have all routers and monitoring equipment on UPS power. Buildings with only bridges and hubs do not have UPS. Our reasoning is that if a building loses power, it doesn't buy us much to have the hubs up if the computers are down. However, if a router should go down, that can cause a change in the network routing tables, thus violating the principle that what happens in one building should not be able to affect other buildings.

Most of our power outages are short term (under a second to a few minutes). Thus, we use UPS units with a nominal 20 minute power supply (which is more like an hour in practice). We feel that this is a reasonable compromise between protection and expense.

Vendors

We try to have good working relationships with our networking equipment vendors. While we look for many things in a vendor, I would say that the most important is active participation on their part in the evolution of networking. There is no way that we can specify enough detail in a purchase request to ensure that a router or other device will work. Besides, if we did, no one would bother reading the document. Instead, we need to be sure that as new problems are uncovered and new needs arise, the vendor will be improving and adapting their products to meet those needs.

Future Hardware

We see adding FDDI to the range of interfaces available for our users. We consider it a stopgap, but an important one that will last for a number of years. Our current plans are to add a parallel compound star network of FDDI concentrator hubs. Local connections will be over twisted pair. We have not set an upper limit on the number of hosts on a single ring, but it will be fairly small.

We see our network moving to ATM over the next few years. The ultimate configuration will be something like:

Central ATM switch with something like 512 ports at 600 Mbps to 2.4 Gbps per port. All buildings will connect at at least 600 Mbps. (OK, so parking ramps can still get by with DS3 speed connections (:-).)
Building-level ATM switch with a range of interfaces. This will probably by a chassis-type unit. Typical interfaces will include fibre and twisted pair at the expected variety of speeds as well as analog video out for supporting our existing cable television system. Most new video will, of course, be digital.
Local ATM switching hubs. Probably in configurations something like:
- ATM out, 12 separate Ethernets in
- ATM out, 12 twisted pair ATM interfaces in
- ATM out, maybe a 6/6 split

Our ultimate goal will be to have a dedicated Ethernet available for every existing host and twisted pair ATM (running from 20-50 Mbps) for all new host installations. (At last, fast enough to download printer inits!) Finally, 155 and 600 Mbps fibre connections will be available on demand anywhere. We expect that only about one to ten percent of connections will require this higher level of throughput.

As ATM switching technology evolves, we foresee migrating away from a compound star to a mesh architecture to gain performance and reliability. But this assumes that they get the software working (:-).

The Other Half

So far, I have reviewed our hardware design. But what about the software and data management aspects of network management? The next section will describe the data that we collect. The following sections will cover what we do with that data.

Data Collection

At this time, we collect the following categories of data for each host:

IP address
DNS name
MX information
who is responsible for it
location (building and room#)
host type and operating system (very generic, e.g., Sun Unix)
monitoring status

and this information about how it is connected:

if 10BaseT, hub name and port number
wall jack location (assigned by Telecommunications)
"caller id" (assigned by Telecommunications)
"line number" (assigned by Telecommunications)

We also collect additional information for a few specific types of hosts, for example, Shiva FastPaths and Novell servers.

While we do not have complete information on all hosts (if only!), we do have enough by and large to manage the network. For almost all hosts, we either have the data or have enough information to (eventually) track down who does. I say eventually, because a University has high turnover among system administrators, who are often graduate or undergraduate students, and it can take perseverance to convince (xxxxxxxx find) someone to take responsibility for a computer.

We use this information for a number of purposes:

identify devices to monitor
long term capacity planning
gross statistics about what hosts are important to support
help in identifying the source of problems (if we're having AppleTalk problems and a DEC VAX is present, they problems may be due to someone running Pathworks without contacting us about how to configure it)
identify who to contact about planned changes in the building network
identify who is affected by network problems

At this time, we use the IP address as the unique identifier in our database. We chose this identifier for the following reasons:

it is short and guaranteed unique
we assign it (i.e., it is not dynamic)
all but a few hundred devices need it anyway, so it was easier to assign IP addresses to computers that don't need them (e.g., Apple LaserWriter printers) than invent a whole new scheme.

So far, it works fine. However, we need to do something else to accommodate dynamic address assignment, a need that is here now and will grow in the future. Right now, if a user has a problem, they can give their IP address. This is sufficient to locate them in our and Telecommunication's databases. We can then start tracking down the problem. But imagine this dialogue:

User: I'm having problems.
Help Person: What's your IP address?
User: I don't know.  It isn't working.
Help Person: What's your Ethernet address?
User: I don't know.  What's an "ethernet address?"
Help Person: Where are you?
User: I don't know.  In some building.
Help Person: AAAARRRGGGHH!!!

Uses of the Data

We use this data for these purposes:

DNS: The per-host data is kept as ASCII text in files directly usable by the name server. Only address and MX information is stored as usable records: the rest is in specially-formatted comments.

We have programs (the bulk of our programs are written in Perl) to look up and information on individual hosts. For example:

% lookup-dns -v norge.unet.umn.edu
looking in /home/named/umn/unet
norge                           A       128.101.4.16    ;isup 24 Medium Iecho -
        ;host Sun - Unix
        ;room 130 lindh
lindh   031     Lind Hall
                                MX      0 unet.unet.umn.edu.
        ;enet 08:00:20:09:80:cd g
----- unit -----
unet.umn.edu    @ns             Networking Services
@ns             -;CIS/NS;130 Lind;5 8888;unet@unet.umn.edu
----- network -----
128.101.4.0     @ns             net     lindh   NS internal network
@ns             -;CIS/NS;130 Lind;5 8888;unet@unet.umn.edu

This program extracts the DNS information and expands the maintainer and location information as it goes. It gives you just about everything that you need to contact the host's maintainer.

A program extracts all hosts that are monitored by our network monitoring program and uses this information to construct the monitoring program's configuration file. (More on this below.)
A wide range of statistics programs:
- count hosts of each type
- count hosts in each department, or on each subnet
- count open IP subnets
- count hubs, bridges, routers, etc.
- for each building, identify all subnets in each building and count the number of hosts on each of those subnets
- locate and test all networks and network equipment in a building
Once a month, we have a program "ping" all possible IP addresses on our networks. The results are cross-checked against registrations. The owners of unregistered computers are contacted...
We also have programs to check the syntax and semantics (e.g. cross-references) of the files.

Nightly Runs

A key part of our system is a series of programs that are run each night. These programs verify and expand upon the data in our files. Key steps:

We communicate with each Cisco router and fetch copies of:
AppleTalk ARP table
AppleTalk neighbor table
AppleTalk routing table
AppleTalk ZIP table
DECNET routing table
IP ARP table
IP routing table
Novell IPX neighbor table
Novell IPX routing table
running configuration
saved configuration (these had better be the same!)

We communicate with each Shiva FastPath and fetch copies of:

AppleTalk ARP table
AppleTalk routing table
AppleTalk ZIP table
message log

A program scans our file of IP network number assignments. It expands the building abbreviations and checks that the assigned status (in use, open) matches the running IP routing table. It also checks that the default route (the IP address a.b.c.254) is assigned (if in use) or not assigned (if open).

Unlike IP where we can tightly control the routing tables, AppleTalk and Novell IPX routing is more open. We therefore follow a different strategy. For AppleTalk network numbers and zone names, and IPX network numbers, we have:

A program scans our file of assignments and all routers (Cisco and Shiva FastPaths). It constructs a file of all identifiers in use anywhere.
A second program scans that file and, for each of the sources (routers, etc.) prints a list of all of the identifiers that it has incorrectly. (Ideally, these lists should be zero length.) By looking for patterns in these lists of differences, we can identify the errant device that is causing the problem.

We also have a program that scans all AppleTalk ARP tables on all routers on each physical network. It constructs, for each network, a table of all devices that use AppleTalk on that network. The table lists the devices by Ethernet addresses, and attempts to match that device with our records. This file has proven its utility many times. Sample output:

network 50162:
        00:80:19:0C:07:EE       50162.57        unknown device
        00:80:D3:A0:0A:76       50162.246       134.84.238.228 [atalk.out]
        08:00:89:A2:17:81       50162.130       134.84.183.22 [atalk.out]
        08:00:89:A2:20:41       50162.193       134.84.183.1 [atalk.out]
        AA:00:04:00:D7:DB       50162.71        134.84.27.254 [Cisco ip-arp]
008019  Dayna Communications    "Etherprint" product
0080D3  Shiva                   Appletalk-Ethernet interface
080089  Kinetics                AppleTalk-Ethernet interface
AA0004  DEC                     Local logical address for systems running DECNET

We have programs that fetch data from the Shiva FastPaths and Novell servers and cross-check that data against our configuration files. Here is a sample FastPath result:

IPAddress:      128.101.25.80
kbox.geom.umn.edu: host kbox.geom.umn.edu <=> 128.101.25.80
looking in /home/named/umn/geom
kbox                            A       128.101.25.80   ;fast 24 public -
        ;host FastPath - Native
kbox-81                         A       128.101.25.81
kbox-82                         A       128.101.25.82
kbox-83                         A       128.101.25.83
kbox-84                         A       128.101.25.84
kbox-85                         A       128.101.25.85
kbox-86                         A       128.101.25.86
kbox-87                         A       128.101.25.87
kbox-88                         A       128.101.25.88
kbox-89                         A       128.101.25.89
----- address counts -----
Assigned static 0, unassigned static 0, dynamic 9 addresses

atalk-snmp-check for router 128.101.25.80,  is
    Shiva FastPath4, K-STAR Patch 9.1p2 92/07/14
    has been up 1584939 seconds (18.34 days)
    fpPromVersion is 510
    fpBufferAvail is 224
      => FastPath4 with/kit
    fpBufferDrops is 40
    ifPhysAddress.1 is 08 00 89 a0 26 28
start   status          net config      zone config     zone
0       operational     unconfigured    unconfigured    
31947   operational     configured      configured      Geometry Center
52467   operational     garnered        garnered        Geometry Center
0       unconfigured    unconfigured    unconfigured    
--------------------------------------------------
        router is 128.101.25.254 name is fmc.gw.umn.edu / Ethernet 1
        appletalk zone Geometry Center
--------------------------------------------------
IPAddress:      128.101.25.80
Status:         running
What:           FastPath4+
Where:          5th.floor.machine.room x1300 u-office
x1300 * -       1300 South Second St (fmc)
Contact:        @levy
@levy           Stuart Levy;Geometry;? x1300;-;slevy@geom.umn.edu
Schedule:       -
LANmarkPort:    -
IPTalk:         -
LocalTalk:      31947 Geometry Center
EtherTalk:      52467 Geometry Center
NovellTalk:     -
OtherTalk:      -
RemoteTalk:     -
SerialNo:       101962
Failures:       -
Notes:          Old S/N 80086
LastChange:     phil, 13 Apr 93; new default net number

In addition, we have a number of "housekeeping" programs run. These programs manage the "old" copies of these input and output files, check syntax, and perform other tasks.

Network Monitoring

Remember this? It is, after all, the first thing that comes to mind when one uses the term "network management." Well, we do it, too. We use a locally-written program that, in essence, reads in a configuration file and sits in a loop, firing off queries to devices and recording the results. The basic queries that we use are:

Are you up? The question can be asked in these ways:
- ICMP Echo
- SNMP get sysDescr
- TCP Telnet connection
- TCP SMTP connection (checks for response)
- TCP FTP connection (checks for response)
- TCP Gopher connection (checks for response)
- TCP NNTP connection (checks for response)
- more can be added fairly easily
Get interface usage statistics. We mainly use SNMP, but a few other methods are available, too.

For each directive, you can specify its checking interval and the actions to take if it is down or changes state. The actions include running an arbitrary Unix command.

The configuration file is built from the per-host data that we maintain. For example, the entry for a twisted pair hub is:

lindh-hub-1                     A       128.101.215.249 ;hubh 24 SNMP-public !
        ;host HP Hub Native
        ;room S22 lindh
        ; lines 001-012

This expands into the following series of directives:

are you up? (ICMP Echo) at 5 minute intervals
if up, for each of the 14 interfaces, get the usage at hourly intervals

The program that does the expansion keys off the device type, so we can change the monitoring that we do on all devices in a class in one simple operation.

Aside from simply being able to handle the number of devices that it does (we currently monitor about 6,800 data points in the network), the program has these features:

The response history of each data point is tracked and the intervals are dynamically adjusted. This means that, as the network gets busy, the monitoring program automatically backs off.
Each of the directives can depend upon the results of others. For example, all directives in a building depend on the status of the main hub. If that is down, the program does not trying to monitor the rest of the building. This means that, if we start having network problems, the network is not flooded with monitoring traffic. In the case of marginal links, this is very important!

Those dependencies are calculated by a program, which takes as input a starting place in the network. The easiest way to debug this program was to print out it's idea of the network map. A little (well, lot) of polishing later, we now have a program that can automatically create and print a map of the network, with all data fetched from the configuration files or the network itself. This has proven very useful.

The operations staff needs to know what (if any) devices are down right now. So, we have a program that sorts through all the data collected by the network monitoring program and shows which ones are down at any given time. This program displays simple ASCII text. No fancy graphics, but the only people who see it aren't the sort who are impressed by gloss.

We also have programs to post-process the statistics and produce reports. We collect over 10 MBytes of raw statistics each day, and each daily report is about a megabyte. Of all this, we only actually look at parts that are important on any given day.

Future Software

So far, the people who have looked in detail at our system have all told us that we are way in advance of commercially-available packages. That said, we would like to eventually be able to switch to commercial packages, so that we can put our energy into improving the network in other ways. After all, networking became much better when we could buy Cisco routers instead of maintaining (or writing) gated on a Sun...

In particular, we would like to improve these areas:

Move core data collection to a real database, and have a nice front-end for entering and changing host data. The front-end should, of course, do extensive cross-checking...
Add a graphical display of network status. But one that we don't have to manage the layout for.
And, of course the package must interface with all of our existing programs and data.

Notes

All counts and statistics on our network were produced by our network management software. While accurate at the time produced, these numbers change over time and so all numbers were converted to approximations.

For Further Reading

Finseth, Craig (1992) "Thoughts on Network Management at the University of Minnesota". Published in ConneXions December 1992 and available from mail.unet.umn.edu in ~ftp/unet/wiring/net-management.

Midden, Marshall (1993) "Making a Network Map". Published in ConneXions June 1993.

Author

Craig A. Finseth
Networking Services
Computer and Information Services
University of Minnesota
130 Lind Hall
207 Church St SE
Minneapolis MN 55455-0134

Craig.A.Finseth-1@umn.edu
fin@unet.umn.edu

+1 612 624 3375 desk
+1 612 626 1002 fax