davidrickard.net

Random stuff, randomly updated.

Archive for January, 2010

Totally faxed up

Friday, January 29th, 2010

The CallManager system I look after recently developed an odd issue. Faxes had been working quite happily for some time, but suddenly they were failing left, right and centre. Not only that, we had quite a few credit card machines running over VoIP which were failing to take payments. All ran via Cisco ATA 186 analogue gateways.

I ran various tests and found that faxes between ATAs internally were perfectly fine. As soon as they went through the gateways, they failed miserably. I ran through some of Cisco’s help guides but was drawing complete blanks. I’d ran the prserv tool to see what the ATAs were up to whilst making calls. I’d see ‘resync’ go whizzing by with a load of other seemingly random numbers. The word ‘resync’ suggested to me that the ATA was hiccuping on something, and doing something to the audio stream.

Analogue modems expect a constant stream of data. It might get fuzzy, or drop out, but it will always come along in a specific order, at a certain time. It’s predictable. Resyncing something mid-stream isn’t a good idea to a modem. In all cases when I saw the word ‘resync’ the fax would end up corrupted or dropped entirely, depending on when it happened during the call.

I did a little digging on the Cisco TAC case collection, and found what I was looking for. It was something I fiddled with some time ago.

ISDN circuits rely on a clock. Generally speaking, the clock is the telco end. We have four ISDN links on our gateways – two out to the PSTN, and two QSIG links to the old PBX. We had some odd issues with echo, and one thing we tried was forcing clock sync on the E1 controllers.

E1 and T1 controllers can exhibit something known as ‘slipped seconds’. This is basically where the clocks at both ends get slightly out of sync with eachother. In some instances it can cause echo, so we’d nailed up the QSIG links to use the clock at the legacy PBX end. However, it seems with the 2-port WICs, this causes BOTH ports to sync to that clock.

Up to this point it hadn’t been the issue. There was the odd dropped fax, but nothing overly bad. A week or so ago (the week I was off, natch), the faxes all pretty much failed simultaneously. Voice calls remained perfectly fine, which made it all the more perplexing. Luckily I found the info I needed in the TAC collection.

The issue will manifest itself as slipped seconds. On the router, I did the following:

router#sh controller e1 0/0/0
E1 0/0/0 is up.
  Applique type is Channelized E1 - balanced
  No alarms detected.
  alarm-trigger is not set
  Version info Firmware: 20060707, FPGA: 13, spm_count = 0
  Framing is CRC4, Line Code is HDB3, Clock Source is Line.
  CRC Threshold is 320. Reported from firmware  is 320.
  Data in current interval (618 seconds elapsed):
     0 Line Code Violations, 0 Path Code Violations
     181 Slip Secs, 0 Fr Loss Secs, 0 Line Err Secs, 0 Degraded Mins
     181 Errored Secs, 0 Bursty Err Secs, 0 Severely Err Secs, 0 Unavail Secs
  Data in Interval 1:
     0 Line Code Violations, 0 Path Code Violations
     262 Slip Secs, 0 Fr Loss Secs, 0 Line Err Secs, 0 Degraded Mins
     262 Errored Secs, 0 Bursty Err Secs, 0 Severely Err Secs, 0 Unavail Secs

There we see some ‘slip secs’.

The ‘fix’ is quite simple. I switched to the interface, and then issued the commands:

network-clock-participate wic 0
network-clock-select 1 E1 0/0/0

The first line selects the WIC you wish to use, then the second selects the clock source (interface) you wish to use. Once I’d done that, the faxes magically all worked again.

http://www.cisco.com/cisco/web/support/index.html – TAC case collection requires a CCO login.