Page 1 of 1

>OSSignalSem: WARNING - event 1 is not a sempahore.

PostPosted: Wed Jul 02, 2014 12:42 am
by jc_hsfl
Hi,

I'm trying to track down a strange bug that only started to appear ever since I enabled nested interrupts, and started to clear UART overruns (when detected) in my UART tasks. The symptom is that the CPU seems to reset at random times after a couple hours (typically 2-4 hours) of runtime. Most of the activities are real time command processing, as well as routine housekeeping.

I'm running this on a PIC24F, and after a restart, here's some of the RCON register bits:
EXTR:0 SWR:0 WDT:0 POR:0 BOR:0 IOPUWR:0 TRAPR:0 CM:0 SLEEP:0 IDLE:0

The bits seem to indicate that it isn't a typical CPU reset, and seems more like the program is jumping to the beginning without triggering any of the reset flags.

More recently, I've observed this message right before the program restarts:
>OSSignalSem: WARNING - event 1 is not a sempahore.

Here's a printout of OSRpt() right before the message (uptime = 13646s):
Code: Select all
Salvo 4.2.2
CtxSws: 7
Errors: 0  Warnings: 0  Timeouts: 255  Ticks: 1366207
EligQ:
DelayQ: t3,t8,t5,t11,t4,t2,t13,t1,t9,t18  Total delay: 183 ticks
task stat prio    addr   t->  e->  d-> delay
  1  wait   2  000058D0    .  e 1  t 9   14
  2  wait   4  00005A8C    .  e 2  t13   10
  3  wait   3  0000177E    .  e 3  t 8   18
  4  wait   3  00001B78    .  e 4  t 2   36
  5  dlyd   5  0000AB60         .  t11   32
  8  dlyd   5  000025E8         .  t 5   29
  9  dlyd   5  00004650         .  t18   25
 11  dlyd   5  000012B2         .  t 4   16
 13  dlyd   4  00002A00         .  t 1    0
 18  dlyd  14  0000513A         .    .    3
 20* elig  14  000085BA       n/a

evnt type t->    value
  1   Sem   .       12
  2   Sem t 2        0
  3   Sem t 3        0
  4   Sem t 4        0
  5  dstr
  6  dstr
  7  dstr
  8  dstr


When things are running normally, here's what the OSRpt() looks like (uptime = 581s):
Code: Select all
Salvo 4.2.2
CtxSws: 43
Errors: 0  Warnings: 0  Timeouts: 255  Ticks: 58273
EligQ:
DelayQ: t11,t9,t1,t13,t2,t5,t18,t3,t4,t8  Total delay: 134 ticks
task stat prio    addr   t->  e->  d-> delay
  1  wait   2  000058D0    .  e 1  t13   79
  2  wait   4  00005A8C    .  e 2  t 5    0
  3  wait   3  0000177E    .  e 3  t 4   11
  4  wait   3  00001B78    .  e 4  t 8    0
  5  dlyd   5  0000AB60         .  t18    5
  8  dlyd   5  000025F4         .    .    0
  9  dlyd   5  000044AA         .  t 1    1
 11  dlyd   5  000012B2         .  t 9    2
 13  dlyd   4  00002AC0         .  t 2    0
 18  dlyd  14  0000513A         .  t 3   36
 20* elig  14  000085BA       n/a

evnt type t->    value
  1   Sem t 1        0
  2   Sem t 2        0
  3   Sem t 3        0
  4   Sem t 4        0
  5  dstr
  6  dstr
  7  dstr
  8  dstr


Questions:
1. Does Salvo jump to the beginning of code without affecting RCON bits if it detects an internal error?

2. What might be causing Salvo to report the warning that event 1 is not a semaphore?

3. Are there any known issues with using nested interrupts with Salvo?

Re: >OSSignalSem: WARNING - event 1 is not a sempahore.

PostPosted: Wed Jul 02, 2014 2:19 am
by aek
Salvo does not support nested interrupts, in the sense that you could be inside of an API function (e.g. OSSignalSem()), that is then preempted by a nested interrupt call, which calls OSSignalSem() again. That will definitely lead to corruption of something, likely the tcbs themselves.
1. Does Salvo jump to the beginning of code without affecting RCON bits if it detects an internal error?

2. What might be causing Salvo to report the warning that event 1 is not a semaphore?

3. Are there any known issues with using nested interrupts with Salvo?

1. Salvo simply returns an error code from the given function when an error is detected. It does not explicitly affect any objects / memory / registers on-chip, except for what may be affected in the hooks (interrupt, WDT and idle).

2. A corrupted tcb, entirely likely by nested calls to API functions.

3. Not supported.

Re: >OSSignalSem: WARNING - event 1 is not a sempahore.

PostPosted: Wed Jul 02, 2014 2:31 am
by aek
I should be more precise -- nested interrupts are not supported, in the sense that Salvo's API calls are not re-entrant, therefore calling Salvo API services from nested interrupts is bound to cause problems.

If you nested interrupts in a manner where only a single level of those interrupts called a Salvo API service, then it would still be compatible with Salvo, as there would never be an instance of an API call being preempted by another.

The whole point of the Salvo interrupt hook is precisely to avoid / preclude any preemption of Salvo API services. It's a very small-footprint solutions for applications that do not have nested interrupts.

To handle nested interrupts, would result in more complex code and more RAM for reentrant Salvo API services. That goes against Salvo's design fundamentals.

(Advanced subject): Note that there is a concept of "delayed interrupt processing", whereby stuff that is triggered by interrupts is not actually processed from within the (foreground) interrupts, but rather, in background (e.g., mainline) code. This is one way to handle nested interrupts, but it's technically complex. When I write code for e.g. MPLAB-SIM, given its poor simulation of interrupts, I usually just call OSTimer() (only) from within the main() loop ... that, in effect, is a similar way to "pull functions out of interrupts".

Re: >OSSignalSem: WARNING - event 1 is not a sempahore.

PostPosted: Wed Jul 02, 2014 11:39 am
by jc_hsfl
Ok, that very much confirms my problem, and thanks for providing the expansion on making things compatible.

I would like to figure out a way to code with nested interrupts. I'm thinking about using 'delayed interrupt processing' by making the ISR's manipulate non-Salvo flags/counters, and having the main loop do the actual signaling of semaphores just after OSSched(). That should take care of almost all my ISR OS calls.

Is there any issue with calling OSTimer from a nested ISR? If I'm able to achieve the above, it would probably be the only OS call from any ISR. And does that also need to be protected against preemption (e.g. bump the priority of the interrupt hook to highest)?

Re: >OSSignalSem: WARNING - event 1 is not a sempahore.

PostPosted: Wed Jul 02, 2014 12:53 pm
by aek
Your delayed action scheme would work.

Just like other Salvo API services, you must ensure that they are not called during a critical section (the interrupt hooks handle this), and you must ensure that they do not preempt themselves (as could be the case with nested interrupts). That's all you have to look out for.

Re: >OSSignalSem: WARNING - event 1 is not a sempahore.

PostPosted: Thu Jul 03, 2014 11:16 pm
by jc_hsfl
It works! No more random reboots.

Thanks!