Switch performance measurements

Home Computing Event Builder E-mail Notes Meetings Subsystems Search

Software overview

The switch performance measurements are done using programmable NIC based on the Tigon 2 chip. The software consists of 5 main components

  1. A dedicated firmware running on the NICs
  2. A device driver running on the host PC (currently available for Linux kernels 2.2.x and 2.4.x under study for Windows NT 5.x). The Linux device driver tigon supports also remote debugging via gdb.
  3. tglib: a user space support library to talk to the driver
  4. A UDP based server program for remote controlling of the driver and hence the NIC, tgrctl, which is a demon process implementing a very simple protocol for reading and writing NIC memory and registers, starting stopping the firmware etc. It is implemented on top of tglib.
  5. pptgctrl: a client which can talk to an arbitrary number of servers, currently there is only one implementation (for python 2.x or higher), which offers a GUI based on tkinter
     

To get the system running the driver has to be installed and tgrctl must be started (this - obviously - requires root privileges). Note, that because of the IRQ control, the device driver tigon cannot coexist with a standard Ethernet driver for the same card (acenic). Then the GUI client can be started on any machine which has python installed.

Caveats:

  1. There is no security in accessing the server, this could be improved in principle by careful use of special service accounts (instead of root) and/or using secure sockets
  2. The client server has been designed for this specific purpose and it is easy to implement new functionality, when the driver's/firmware's capabilities are extended. However this is not run-time configurable, but requires a change in the server code. A more general, production system ready solution, will probably use a dedicated package such as DIM
  3. Starting the firmware without properly installing the driver or uninstalling it, without properly stopping the firmware first, may result in an oops due to spurious uncaught interrupts (to create such a situation requires however quite some ingenuity :-) 

 

This page last edited by Niko Neufeld on December 02, 2001.