By James Hanback
As you move on in your career, perhaps taking on a server administrator or network administrator role, the simple restart becomes a much less practical means of fixing problems. Unless you have munged a network device’s running configuration but not yet written it to the device, or there’s a memory leak, or there’s some other hardware issue that only a power cycle can solve, rebooting a router or a managed switch is unlikely to be a magic solution to everything. However, it is a great way to temporarily disrupt your company’s business and irritate co-workers , especially if you do it without notifying them. Gone are the days when you can rely on an operating system’s simple, dying demand that you restart, most famously illustrated by Susan Kare in old versions of Mac OS by a friendly little bomb with a lit fuse. Now you must interpret the meanings of error messages and use them as guides in your troubleshooting quests.
Sep 5 09:10:28.087: %LINK-3-UPDOWN: Interface FastEthernet0/0, changed state to up
Sep 5 09:10:29.087: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to up
The above is console output that displays two different interface messages in the standard Berkeley Software Distribution (BSD) Syslog format as it is implemented in Cisco IOS. The first message indicates that the FastEthernet 0/0 interface has transitioned to the up state. The second message indicates that the line protocol for the FastEthernet 0/0 interface has transitioned to the up state. Syslog message format on Cisco devices can be identified by the presence of a percent sign (%) followed by a facility code, a severity code, and a mnemonic code, all separated by dashes. The dash-separated codes are followed by a colon and descriptive text that explains, in somewhat plainer terms, the event that has occurred.
Each of the messages in our example begins with a timestamp that indicates the precise date and time the event occurred, at least according to the device’s system clock (and that’s a whole different story). If you’re not seeing timestamps similar to the ones above on your Cisco device, you might need to issue the service timestamps log datetime msec command in global configuration mode. If your system is not configured to display timestamps with log messages, the lines of output above would each begin with the percent sign. However, in this case, we can see that the physical interface transition occurred on Sept. 5 at 9:10 a.m. and 28 seconds. The line protocol transition occurred exactly one second later.
Following the timestamp is the percent sign and the facility code. Facility codes on Cisco devices are typically free-form keywords and are used to identify the service or software that generated the message. The first line of output above has a facility code of LINK, enabling you to surmise that the event occurred on an interface. The second line of output has a facility code of LINEPROTO, enabling you to surmise that the event occurred on the line protocol. However, Cisco’s facility codes deviate from the ones that are defined in Request for Comments (RFC) 5424 and that are found on different Unix and Linux systems.
Severity codes on Cisco devices, on the other hand, stick to the standards. Severity codes are numeric values in the range from 0 through 7 and have the following meanings:
Severity codes can therefore be used to determine the urgency of a Syslog message—the lower the severity code value, the higher the urgency. On Cisco devices you can filter console Syslog messages by severity level by issuing the logging console severity-level command, which causes only the Syslog messages with the specified severity level and lower numbers to be displayed on the console. In the link and line protocol output example, the link message has a severity code of 3, which is defined as an error. The line protocol message has a severity code of 5, which is defined as a notice. A flapping interface will always have a severity code of 3 and a flapping line protocol will always have a severity code of 5. Issuing the logging console 3 command would enable you to always see an interface flapping between the up and down state but would prevent you from seeing whether the line protocol was flapping. Naturally, the narrower your severity level filter, the fewer console Syslog messages you should see; unless, of course, the device in question exists in a state of constant emergency. You can also configure the severity level of the messages sent to a Syslog server by issuing the logging trap severity-level command in global configuration mode.
Enough with the numbers. Let’s move on to some more meaningful words. The severity code is followed by the mnemonic code, which is a simple abbreviation describing the event that occurred. In the sample output, the mnemonic code for both the link message and the line protocol message is UPDOWN, which indicates that the interface and line protocol have both transitioned to either the up state or the down state. The mnemonic code informs you about what happened but doesn’t give you the details of the event. That chore is reserved for the description that follows the mnemonic code.
The description is simply that: a phrase or sentence that more completely describes the event that occurred. In the sample output, the link message has the following description:
Interface FastEthernet0/0, changed state to up
Therefore, the description has completed the information that the mnemonic code only hinted at. You now know not only that the interface state changed, but that it transitioned to the up state. You also now know which interface made the transition: the FastEthernet 0/0 interface.
You might be thinking, “That’s great and all, but how does it help me to have to visit a dozen devices and scroll through a bunch of console Syslog messages to find out what’s wrong?” That’s where a Syslog server comes in. Thanks to the Syslog protocol, you do not necessarily need to monitor individual devices and sift through tens of thousands of Syslog messages at each console. A device that generates Syslog messages can be configured to send those messages to a central Syslog server, which is a Syslog message collector that typically listens on User Datagram Protocol (UDP) port 514. Some collectors allow you to configure the port number. On a Cisco device, you can direct Syslog messages to a Syslog server by issuing the logging host ip-address command in global configuration mode, where ip-address is the IP address of the Syslog server. Keep in mind that, if you’ve already issued the logging trap severity-level command, only Syslog messages of the specified and lower number severity levels will be sent to the server.
Syslog messages that are sent over UDP can be broken into three parts: PRI, HEADER, and MSG. The PRI part is an eight-bit numeric representation of the facility code and the severity code enclosed in angle brackets. The HEADER part is the timestamp and the name of the host that sent the packet. The MSG part contains the same percent sign—denoted text as the messages displayed on the console of a Cisco device. You can configure a Syslog server to parse the messages and store them in separate log files (or display them to the server device’s console or send them to other places) depending on source host, facility, severity, or a number of other criteria. Log files can then be viewed at your leisure or in real-time as events are added to them. Such files should be rotated out on a regular basis to ensure that they do not become too large to parse easily. It’s also not wise to allow a Syslog server to accept messages from just any old host out there somewhere, because doing so could easily fill all your storage space. You can configure a Syslog server to accept messages from only specific host addresses or ranges of host addresses.
In addition to simply collecting the Syslog messages in a central location for review, there are relational database solutions that enable you to filter the messages that are captured by Syslog servers to make them more sortable, reportable, and meaningful. For example, searching a database of Syslog messages might make it faster and easier to find events that were generated by a specific device out of the many that are logging to the same server. If you cleverly take advantage of the Syslog format combined with the benefits of a server and a relational database, you might also be able to implement an automated ticketing system that alerts the appropriate people or initiates an appropriate action depending on the device, event type, or urgency of the event.
Of course, all of the above build-out depends on the size and scale of your organization. If you’re a team of one (I know, I know; there’s no “I” in “team”) or you have exactly one device or system that uses Syslog formatted messages, your best bet might be to simply take a look at the affected device’s console or log files and see what’s scrolling there. It’ll be your phone blowing up with all those cries for help from the darkness anyway.
Or you could light a few candles, say a few incantations, and reboot.
Who knows? It might work.
Photo: Steven Snodgrass