Release 257
Anybody else noticed their pachube feed frozen? Mine had. Investigating I found an issue in the xaplib2 filter calls which since release 253-255 had been working. Weird. I don't think I changed anything in 256 that would affect it. Anyway I've fixed it.
Ahh - there's one more - Within the BSC schema (only) iServer lowercases the state= parameter value to 'off' 'on' or '?' before passing back messages to xAPFlash, the HAH version is passing them unchanged. This is in all BSC .info .event and .cmd messages. Again this was a downstream optimisation thing to cater for iServer clients lacking a string lowercase function. I can of course alter this in xAPFlash but it's there for those other clients too.
As a neat feature request it would be great to see a list of the connected client names (or at least the # clients) in the HAH webpage. I was going to suggest you could report the # clients in iServers heartbeat too but you can't have bodies on heartbeats in xAP v1.2 (OK in xAPv1.3) so it would have to be a separate message .
K
This morning HAH has died. Overnight I'm seeing quite a few xAP messages originating from Joggler clients via HAH that are truncated and flagged as errors in Viewer. After around 5 hours of running and immediately after such a truncation HAH died . The truncations are in varying places within a message. All heartbeats have ceased, web interface down and a Telnet session only creates a blank screen. Ping does work.
I dont think there's anything useful I can do is there rather than just reboot it (as I have no command line access) ? I dont think its concatenated messages within the iServer socket but could it be caused by <xap>...<xap> messages that span a TCP packet boundary ?
Actually in examining a few messages from Jogglers relayed by HAH they are mostly missing the end of the message block - the closing } - Viewer doesn't flag these as errors which is surprising.
Also I notice another device taking a long time to synch - I have a feeling this might be a device issue but just checking there isn't a limit on the number of dot hierarchies in either the main or sub address - or the lengths of either - this is long address
source=Idratek.Cortex.SERVER:World.Gledholt.Downstairs.LivingRoom.Sitting.LightLevel
K
My HAH died overnight on 256. After upgrading to 257 and waiting for morning, it's not dead but very porrly. It takes about 5-10 secs to come out of Screensaver and seems very grogy and confused. SSH still active:
Top shows:
Mem: 13340K used, 540K free, 0K shrd, 0K buff, 1416K cached
CPU: 1% usr 2% sys 0% nice 95% idle 0% io 0% irq 0% softirq
Load average: 0.07 0.12 0.08
Strangely, I'm seeing two iServer entries in list:
166 1 root S 6048 44% 1% /usr/bin/iServer -i br0
136 1 root S 1052 8% 0% /usr/bin/xap-livebox -s /dev/ttyS0 -i br0
134 1 root S 3532 25% 0% /usr/bin/xap-pachube -i br0
156 1 root S 5780 42% 0% /usr/bin/xap-googlecal -i br0
163 1 root S 3660 26% 0% /usr/bin/xap-plugboard -i br0
151 1 root S 2604 19% 0% /usr/bin/xap-currentcost -s /dev/ttyUSB1 -i br0
177 166 root S 6048 44% 0% /usr/bin/iServer -i br0
143 141 root S 5084 37% 0% /usr/bin/kloned
146 141 root S 5084 37% 0% /usr/bin/kloned
141 1 root S 5072 37% 0% /usr/bin/kloned
Hope this provides a clue?
G.
Thanks for the info.
Having a box that is not 'dead' but 'sick' is exactly what is needed to find out the cause of the illness. Not sure why there are two instances. It might be a simple enough change to check for and disallow a second instance.
It seems that this issues caused by iServer only manifests itself in environments where there are 'lots' of BSC Endpoints. I'm adding more to my test environment to see if I can replicate the issue that others are seeing.
Derek
BSC TextBox endpoints are much larger than others, especially if they contain a lot of text, The endpoint reports the content in both plain text and html. If you have a few they can report back to back and likely become concatenated within the TCP stream, traversing a TCP packet boundary - they are delimited to allow iServer to preserve the message boundaries. This is where I think the problem may be .
Initially when the BSC endpoint feature was introduced , periodic reporting within xAPFlash was pretty verbose which also won't help but should just result in needless traffic. This was reworked but that change may not be in the current released beta. Checks were also put in to stop any xAP messages ever exceeding 1500 bytes . This could happen if for example text boxes contained a lot of text. Initially the html representation is discarded and replaced with 'Removed' and then the text itself if necessary.
Although the above exacerbates the issue it's not the cause - I'm seeing the issue in the latest build still and I don't think users typically have a lot of text within text boxes.
xFX Viewer doesn't flag all these errors - you only see them if you inspect a few messages. These are messages shown as originating from xAPFlash, not iServer, although iServer is actually originating them on the clients behalf as Flash can't send UDP.
xap-header
{
v=13
hop=1
uid=FF.6996:1029
class=xAPBSC.info
source=UKUSA.xAPFlash.CS4:Button.State.Spots
}
input.state
{
state=?
level=0
This shows a truncation in the middle of the level parameter value.. No error was flagged in Viewer but a truncation within a key name or the header is flagged.
K
By delimited I meant the <xap></xap> element tags. Within the TCP socket stream howevever this is essentially transparent.
I am thinking about supporting STX ETX alternatively so that it would be more inkeeping with intentions within a TCP hub and also avoid any confusion that XML <xap> tags included within a xAP message could create eg in a text or displaytext field.
K
...sounds like the easy alternative at the moment. You guys are out there!
Great stuff Brett,
Could you double check the pacing is working ? I am seeing about 30 messages sent in 20ms according to Viewer.
When I tried various speeds in iServer I settled on 10/sec ie a 100ms gap between messages, I'm sure we can go a bit faster and I'll try it out on the various devices here to see at what point it starts to lose efficiency , where devices cant respond within the gaps.
I also have one device that wont synch now - but it is responding with xapbsc.info messages to the queries being sent so I'm guessing it's message format is slightly different or maybe its a subnet thing again and HAH can't hear it. I'll investigate and see if anything is being passed back to xAP Flash...
[UPDATE] Ahh.. Is it possible that HAH doesn't like addresses / sub addresses that have spaces within them ? It is changing the query target for say
source=A.B.C:Contains space
and sending
target=A.B.C:
which is invalid and also not passing responses from these devices back to xAPFlash (spaces are allowable in source addresses).
K
PS All @xapautomation.org email was interrupted and potentially undelivered for most of yesterday (Weds) - so if anyone emailed on that domain please send again.