Release 257

11 replies [Last post]
brett
Offline
Providence, United States
Joined: 9 Jan 2010

Anybody else noticed their pachube feed frozen?  Mine had.  Investigating I found an issue in the xaplib2 filter calls which since release 253-255 had been working.  Weird.  I don't think I changed anything in 256 that would affect it.  Anyway I've fixed it.

kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010
Whooah that was fast :-)

Great stuff Brett,

    Could you double check the pacing is working ?  I am seeing about 30 messages sent in 20ms according to Viewer.

   When I tried various speeds in iServer I settled on 10/sec ie a 100ms gap between messages,  I'm sure we can go a bit faster and I'll try it out on the various devices here to see at what point it starts to lose efficiency , where devices cant respond within the gaps.

   I also have one device that wont synch now - but it is responding with xapbsc.info messages to the queries being sent so I'm guessing it's message format is slightly different or maybe its a subnet thing again and HAH can't hear it.  I'll investigate and see if anything is being passed back to xAP Flash...

[UPDATE] Ahh.. Is it possible that HAH doesn't like addresses / sub addresses that have spaces within them ?   It is changing the query target for say

source=A.B.C:Contains space

and sending

target=A.B.C: 

which is invalid and also not passing responses from these devices back to xAPFlash  (spaces are allowable in source addresses).

  K

PS All @xapautomation.org  email was interrupted and potentially undelivered for most of yesterday (Weds) - so if anyone emailed on that domain please send again.

kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010
Ahh - there's one more -

Ahh - there's one more - Within the BSC schema (only)  iServer lowercases the state= parameter value to 'off' 'on' or '?' before passing back messages to xAPFlash,  the HAH version is passing them unchanged.   This is in all BSC  .info .event and .cmd messages.  Again this was a downstream optimisation thing to cater for iServer clients lacking a string lowercase function.     I can of course alter this in xAPFlash but it's there for those other clients too.

As a neat feature request it would be great to see a list of the connected client names (or at least the # clients)  in the HAH webpage.  I was going to suggest you could report the # clients in iServers heartbeat too but you can't have bodies on heartbeats in xAP v1.2 (OK in xAPv1.3) so it would have to be a separate message .

 

  K

kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010
All quiet on the Eastern front

  This morning HAH has died.  Overnight I'm seeing quite a few xAP messages originating from Joggler clients via HAH that are truncated and flagged as errors in Viewer.  After around 5 hours of running and immediately after such a truncation HAH died . The truncations are in varying places within a message.  All heartbeats have ceased, web interface down and a Telnet session only creates a blank screen.    Ping does work.

 I dont think there's anything useful I can do is there rather than just reboot it (as I have no command line access) ?   I dont think its concatenated messages within the iServer socket but could it be caused by <xap>...<xap> messages that span a TCP packet boundary ?

Actually in examining a few messages from Jogglers relayed by HAH they are mostly missing the end of the message block - the closing } - Viewer doesn't flag these as errors which is surprising.

Also I notice another device taking a  long time to synch - I have a feeling this might be a device issue but just checking there isn't a limit on the number of dot hierarchies in either the main or sub address - or the lengths of either - this is long address

source=Idratek.Cortex.SERVER:World.Gledholt.Downstairs.LivingRoom.Sitting.LightLevel

   K

BodgeIT
Offline
London, United Kingdom
Joined: 10 Jun 2010
Not dead

My HAH died overnight on 256.  After upgrading to 257 and waiting for morning, it's not dead but very porrly.  It takes about 5-10 secs to come out of Screensaver and seems very grogy and confused. SSH still active:

Top shows:

Mem: 13340K used, 540K free, 0K shrd, 0K buff, 1416K cached
CPU:   1% usr   2% sys   0% nice  95% idle   0% io   0% irq   0% softirq
Load average: 0.07 0.12 0.08

Strangely, I'm seeing two iServer entries in list:

  166     1 root     S     6048  44%   1% /usr/bin/iServer -i br0
  136     1 root     S     1052   8%   0% /usr/bin/xap-livebox -s /dev/ttyS0 -i br0
  134     1 root     S     3532  25%   0% /usr/bin/xap-pachube -i br0
  156     1 root     S     5780  42%   0% /usr/bin/xap-googlecal -i br0
  163     1 root     S     3660  26%   0% /usr/bin/xap-plugboard -i br0
  151     1 root     S     2604  19%   0% /usr/bin/xap-currentcost -s /dev/ttyUSB1 -i br0
  177   166 root     S     6048  44%   0% /usr/bin/iServer -i br0
  143   141 root     S     5084  37%   0% /usr/bin/kloned
  146   141 root     S     5084  37%   0% /usr/bin/kloned
  141     1 root     S     5072  37%   0% /usr/bin/kloned

Hope this provides a clue?

G.

derek
Offline
Glasgow, United Kingdom
Joined: 26 Oct 2009
Ill patient

Thanks for the info.

Having a box that is not 'dead' but 'sick' is exactly what is needed to find out the cause of the illness. Not sure why there are two instances. It might be a simple enough change to check for and disallow a second instance.

It seems that this issues caused by iServer only manifests itself in environments where there are 'lots' of BSC Endpoints. I'm adding more to my test environment to see if I can replicate the issue that others are seeing.

Derek

brett
Offline
Providence, United States
Joined: 9 Jan 2010
It does provide some clues

The memory consumption has gone from 1Mb, what is should use, to 6Mb so there is a memory leak somewhere which is why over time the box dies. Thanks I'll see if I can track it down. I've made a number of changes and pushed 258. There are still things I need to work on but while I'm doing this I can get some feedback on these changes. Thanks for your patience while I sort these out. If it was easy everybody would be doing it !

kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010
Joggler waffles...

BSC TextBox endpoints are much larger than others, especially if they contain a lot of text, The endpoint reports the content in both plain text and html. If you have a few they can report back to back and likely become concatenated within the TCP stream, traversing a TCP packet boundary - they are delimited to allow iServer to preserve the message boundaries.    This is where I think the problem may be .

Initially when the BSC endpoint feature was introduced , periodic reporting within xAPFlash was pretty verbose which also won't help but should just result in needless traffic.  This was reworked but that change may not be in the current released beta.  Checks were also put in to stop any xAP messages ever exceeding 1500 bytes . This could happen if for example text boxes contained a lot of text.  Initially the html representation is discarded and replaced with 'Removed'  and then the text itself if necessary.   

Although the above exacerbates the issue it's not the cause - I'm seeing the issue in the latest build still and I don't think users typically have a lot of text within text boxes. 

xFX Viewer doesn't flag all these errors - you only see them if you inspect a few messages.  These are messages shown as originating from xAPFlash, not iServer,  although iServer is actually originating them on the clients behalf as Flash can't send UDP.

xap-header
{
v=13
hop=1
uid=FF.6996:1029
class=xAPBSC.info
source=UKUSA.xAPFlash.CS4:Button.State.Spots
}
input.state
{
state=?
level=0

This shows a truncation in the middle of the level parameter value.. No error was flagged in Viewer but a truncation within a key name or the header is flagged.

K

brett
Offline
Providence, United States
Joined: 9 Jan 2010
iServer memory leak solved

What do you mean they are delimited?  Is there some other token being injected into the stream I'm not aware of?   I found the memory leak BTW.  Its was in the protocol tokenizer which is why after leaving the system on overnight it would be dead in the morning.   The busier your network the quicker the iServer memory would leak until the unit would lock up.

I've pushed 259 for this issue which will stablize it, while I figure out why some messages get truncated.  For small messages we should be ok now.

brett
Offline
Providence, United States
Joined: 9 Jan 2010
Its just a thread

There are two instances as I start another thread to handle the BSC query when an initial connection is made.   The linux kernel reports a thread as a separate process so its looks like two are running.  Rest assured there aren't.

kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010
By delimited I meant the

By delimited I meant the <xap></xap> element tags.  Within the TCP socket stream howevever this is essentially transparent.

I am thinking about supporting STX ETX alternatively so that it would be more inkeeping with intentions within a TCP hub and also avoid any confusion that XML <xap> tags included within a xAP message could create eg in a text or displaytext field.

   K

BodgeIT
Offline
London, United Kingdom
Joined: 10 Jun 2010
Japanese...

...sounds like the easy alternative at the moment.  You guys are out there!

Hardware Info