Pumpkin, Inc.

Pumpkin User Forums

Bombarding case of OS_WaitXYZ

If you have a general question or comment regarding Salvo, post it here.

Bombarding case of OS_WaitXYZ

Postby aek » Tue Dec 03, 2002 1:21 am

Hi Luben.

The bombarding case is no longer possible, because all OS_WaitXyz() ** always ** context-switch to the scheduler before proceeding, even if the event has been signaled.

So, the manual is correct, but it also says that any OS_Xyz service forces a context switch unconditionally. Of course, that same task will continue after the context switch if it's the highest-priority task.

So, I suspect the problem is somewhere else. Maybe a deadlock due to a shared resource?

------------------

-------
aek
aek
 
Posts: 1888
Joined: Sat Aug 26, 2000 11:00 pm

Re: Bombarding case of OS_WaitXYZ

Postby luben » Tue Dec 03, 2002 7:22 am

Hello,

So, the problem is in the manual - look for example at page 266 for OS_WaitBinSem(). There is written clear in "description":
....wait the curent task on binary semaphore with timeout. If semaphore is 0, return to the scheduler and continue waiting. If the semaphore is 1, reset ot to 0 and continue....

Well, english is not my native language, but what I understood is - only if semaphore is not signalled, the service will contex switch.

To be more exactly, I opened the manual of old SALVO 2.0, where is bombarding case still available - the descriprion of OS_WaitBinSem is absolutely the same.

Look logically - if there is some change in the bahaviour, there should be change in the manual. It's like - "get the 10 differences between 2 pictures"... I gave up, can't find differences in the manual description of OS_WaitBinSem.

Or maybe I wrong? If you wish, ask somebody who is "outside" SALVO how he understands the text of the manual - when contex switch and when not (be sure not to force his mind to some oppinion). At least all my friends around me translated this part of the manual like - "sometimes contex switch, sometimes not...."

Best regards
Luben

luben
 
Posts: 324
Joined: Sun Nov 19, 2000 12:00 am
Location: Sofia, Bulgaria

Re: Bombarding case of OS_WaitXYZ

Postby luben » Tue Dec 03, 2002 7:27 am

Hello,

Here is a place for the SALVO monitor - if it could measure the maximum occupied time per task, I could just browse these parameters and locate which task holds too long the control.

Maybe at the end of this week I'll be ready with monitor like prototype.

Regards
Luben

luben
 
Posts: 324
Joined: Sun Nov 19, 2000 12:00 am
Location: Sofia, Bulgaria

Re: Bombarding case of OS_WaitXYZ

Postby aek » Tue Dec 03, 2002 8:27 am

Hi Luben.

Sounds like the manual wasn't changed to accurately reflect the coding changes in v3.x regarding the bombardment case. I'll look into that (i.e. your and your friends' English is entirely correct ... ).

Nonetheless, v3.x OS_Xyz API calls always context-switch unconditionally. Thereafter, the task will either wait (no event available) or continue (event was available). So I suspect the problem in your particular application is somewhere else ... let me know what the monitor shows ...

------------------

-------
aek
aek
 
Posts: 1888
Joined: Sat Aug 26, 2000 11:00 pm

Re: Bombarding case of OS_WaitXYZ

Postby luben » Tue Dec 03, 2002 12:32 pm

Hello,

I think I suffer again from sorta "bombarding case".

In my last project "smart controller for pad printing machine" I use 7 tasks and 6 events. Processor 16F877, 7.5 Kb memory used.

The system works smooth and fine, except that from time to time it hangs (reset from WDT). When I removed WDT - everything is fine and no visible problems occured.

I found in the manual that the behavior of OS_WaitXYS is described as:
... if the event is already signalled it returns the control to the task.

Imagine that you have 10 OS_WaitXYZ one after other for 10 different events and all events are already signalled. If I understand SALVO , the task will not contex switch at all. And if these operators are in loop it's possible "deadlock" or at least (like in my case) to hold the control into one task too long.

That means:
1. Or the manual is not updated with "bombarding case" info
2. Bombarding case still exists, when many OS_WaitXYZ are one after other. It's extremely dangerous case, because you can not mark the problem immediately, but after some time... could be years. And for sure all you'll say will be - noises from the line. In fact this could be named like the extremely high waves in the ocean, so called "solitons". SOmething like interference of bad events - one chain of working OS_WaitXYZ in one moment becomes "dead lock", like several waves together could make "soliton"

I know that this behaviour (sometimes contex switch, sometimes not) is made for speeding the service. I think that will be OK, if the user could alone chose the behaviour - speed or normal OS_WaitXYZ service. I don't say that all OS_WaitXYZ should first contex switch and then check events, but sometimes it's just neccesary to do so.

Tomorrow I'll locate the problem with simple adding OS_Yield before the operators. I'll keep you in touch. Oh, If was some option like OSWAITXYZ_BEHAVIOUR SPEED/SWITCH.....

Best regards
Luben

luben
 
Posts: 324
Joined: Sun Nov 19, 2000 12:00 am
Location: Sofia, Bulgaria

Re: Bombarding case of OS_WaitXYZ

Postby luben » Wed Dec 04, 2002 5:35 am

Hello,

In my case the problem was with sprintf() function with unsigned long variables and some calculations. Seems that it took too long time and if added to OSTimer()(that runs from time to time) creates WDT pronblems.

Now everything is OK.

Regards
Luben

luben
 
Posts: 324
Joined: Sun Nov 19, 2000 12:00 am
Location: Sofia, Bulgaria

Re: Bombarding case of OS_WaitXYZ

Postby luben » Wed Dec 04, 2002 8:28 am

Hello,

12-16 ms - the standard WDT period for PIC (no prescaller). I usually use watchdog without prescaller and I never had problems. I mean, the tasks should return the control within this period.. or I punish the program.

From other side calculations with unsigned long really take a lot of time, especially when using sprintf(). I tried to make easy one counter of hits to 99999999. The main problem was that the sprintf("%l8d",(long)hits) took sometimes longer then 12-16 ms. Seems that sprintf() is relly dangerouus function. BTW I suffered very much when I used extremely complicated sprintf function - the result was: no error messages in compiler stage, just some overflow messages in linker process. So I had to switch off all sprintf() one after other to get where is the problem.

If the SALVO monitor was working I could locate the problem within seconds...

I think that even if the system is working SALVO monitor could help you to locate potential, hidden problems. For example some task uses time near to the watchdog limit. When this time is added to ISR delays it could trigger the WDT reset. But this could happen from time to time, or when the temperature is changed and it changed the WDT frequency... not good, huh?

One SALVO monitor could reveal such problems immediately. And if it's offered on low price (as I told you the estimate end user price should not exceed 60-70$), nobody will hesitate to monitor the processes and to locate hidden time problems.

One of the dangerous side of using SALVO with watch dog ON is: even if the system is working smooth, it doesn't mean that WDT could not reset it in some time critical moments.

BTW and idea came in my mind - "SOftware SALVO watchdog" - you use one of the timers in uP and in OSSched you load it with some value, instead of clrwdt(). If overflow occured in this timer - that will show that the same could happen if WDT was used. If you connect LED to some pin to indicate the overflow of the "SALVO software watchdog) you can monitor in real time the appearing of bad events. Even more - you can add potenciometer and change the period of software watchdog into fly - so you can get where is exactly the dangerous zone....

Regards
Luben

luben
 
Posts: 324
Joined: Sun Nov 19, 2000 12:00 am
Location: Sofia, Bulgaria

Re: Bombarding case of OS_WaitXYZ

Postby aek » Wed Dec 04, 2002 8:48 am

Hi Luben.

Out of curiosity, what was the WDT period?

Regards,

------------------

-------
aek
aek
 
Posts: 1888
Joined: Sat Aug 26, 2000 11:00 pm

Re: Bombarding case of OS_WaitXYZ

Postby aek » Thu Dec 05, 2002 8:14 am

Hi Luben.
quote:
When this time is added to ISR delays ...
Just a quick comment -- Salvo's services that you can call from within ISRs (e.g. OSTimer(), OSSignalXyz()) are very short and fast. For example, OSTimer() is under 20 instructions. Most of the real "processing" happens in OSSched(). So it's not so much the "ISR delays" that can affect the situation you describe, it's more the actual processing time to handle internal activity (e.g. enqueueing a task when a large number of tasks are eligible) that takes time.

So, as you correctly describe, if time(OSSched()) + time(taskX() + time(ISR)) > WDT period, you will have a WDT reset. In your case, it sounds like time(taskX()) was very large because of problems in sprintf().

quote:
One of the dangerous side of using SALVO with watch dog ON is: even if the system is working smooth, it doesn't mean that WDT could not reset it in some time critical moments.
Well, that's what the WDT and testing are for ... Of course the best approach would probably be to make the WDT period less than "normal", and test that way ... then if it passes, restore the WDT period to "normal", and you should be fine. Testing Testing Testing ...

------------------

[This message has been edited by aek (edited December 05, 2002).]

-------
aek
aek
 
Posts: 1888
Joined: Sat Aug 26, 2000 11:00 pm


Return to General

Who is online

Users browsing this forum: No registered users and 3 guests

cron