Page 1 of 1

Watchdog possible problems

PostPosted: Wed Jan 15, 2003 9:11 am
by luben
Hello,

Talking about watchdog we usually believe that it's more then enough to reset the timer every call of OSSched().

My experience is that in some cases this is far away from the true - the kernel could work and the OSSched could be called frequently, but in the same time the tasks could be locked and not working!

I had a case, that is connected somehow with the last problem I revealed in SALVO - bank problem (currently solved). I was very surprised to see that the system stopped responding at all, but the kernel was still rounding, the watchdog was ON, but because the reset watchdog is called from OSSched, no reset occured.
My investigations brought me to the source of the problem - the interrupt, that is responsible for calling OSTimer() was disabled somehow.

In short - if you use time related features of SALVO (connected with OSTimer() and OS_Delay, timeouts), reseting the watchdog only in the kernel is not enough to guarantee, that the system is not locked or hanged. To guarantee that the system is really OK we need some other, more complicated approach - we should care is the OSTimer() called frequently or not (in cases when OSTImer() is used).

I found out this potential problem right now and I didn't have time to get solution. I always believed that reseting the watchdog in kernel is enough... wrong. Just try to disable interrupt that calls OSStimer and you'll see hanging the system. This could be done with one simple instruction like

code:

T0IE=0; // and the system hangs...
......
or GIE = 0; // this could happen often - the user often disables and reenables GIE


Look to the problem from different angle - one wrong instruction, one unconcious reseting a single bit .. and BANG! I mean - here is one of the weak places of SALVO .... be aware!

Well, I know that everybody tries to avoid this, but I was witness how one unpredictable change of this bit makes the system to hang, despite that the kernel is rounding.

Any suggestions how this could be avoid are welcomed.

Best regards
Luben


Re: Watchdog possible problems

PostPosted: Wed Jan 15, 2003 11:46 am
by luben
Hello,

I hesitated to redirect the topic in other forum, not like bug report. In fact it's not clear case of BUG, but it's a case, when you rely on the current SALVO functionality and you can get wrong operation, undescribed at all.

In addition OSCLRWATCHDOG is incorporated into OSSched()... but seems that it's not complited, it doesn't cover all possible problems.

Regards
Luben


Re: Watchdog possible problems

PostPosted: Thu Jan 16, 2003 9:07 am
by aek
Hi Luben.

This is a good illustration of why one can't blindly count on a watchdog timer to solve everything ... :-)

The Salvo libraries that include clearing of watchdogs are provided because they handle the watchdog issue in a reasonable way for many users.

But as you have illustrated, situations can arise where the system "stops running" and yet the watchdog timer does not trip. For advanced users, a source-code build with watchdog timer reset in an alternate location is a better approach.

------------------