I/O shows 'could not connect' after 254 update

7 replies [Last post]
kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010

I just updated to 254 and now all my I/O shows 'could not connect'. 

RF, relay, input, temp, and LCD.  (LCD does display IP)

I've tried rebooting several times and got it back very briefly once for a few secs but then it went again... is this just a coincident hardware fail with the update or could it be a firmware issue ?

  K

derek
Offline
Glasgow, United Kingdom
Joined: 26 Oct 2009
Death of a process

254 is good on my HAH. If the LCD is showing the IP address, it's unlikely that it's a hardware failure.

You will have a lot more xAP traffic on your LAN than either Brett or I. I'm thinking that perhaps something on your network is exposing a bug in the new xaplib2 and causing the HAH process that drives the interface to the UI to panic and die.

After you see the '?'s on the UI, can you telnet into the HAH, issue a 'ps' command and post the results here?

Then, we'll be able to narrow this one down.

Derek.

kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010
Arthur Miller says...

livebox login: root
Password:
# ps
  PID USER       VSZ STAT COMMAND
    1 root      2260 S    init
    2 root         0 SW   [keventd]
    3 root         0 SWN  [ksoftirqd_CPU0]
    4 root         0 SW   [kswapd]
    5 root         0 SW   [bdflush]
    6 root         0 SW   [kupdated]
    7 root         0 SW   [mtdblockd]
    8 root         0 SW   [khubd]
   29 root         0 SWN  [jffs2_gcd_mtd2]
   95 root      2252 S    udhcpc -T 10 -i br0
  110 root      1668 S    dropbear -p 22
  118 root      2244 S    telnetd -p 23
  128 root      1248 S    pure-ftpd (SERVER)
  135 root      2244 S    inetd
  141 root      1020 S    /usr/bin/xap-hub -i br0
  152 root      4956 S    /usr/bin/kloned
  153 root      4968 S    /usr/bin/kloned
  156 root      4992 S    /usr/bin/kloned
  181 root      2272 S    -ash
  182 root      2248 R    ps
#

 

   K

derek
Offline
Glasgow, United Kingdom
Joined: 26 Oct 2009
Yup ... it's gone

OK. So, the xap-livebox process is indeed dead.

Next thing is to re-start it, in the foreground, in debug mode.

From a telnet session use ...

/usr/bin/xap-livebox -d 7 -s /dev/ttyS0 -i br0

Hopefully, this might give some info re why the process is having a panic.

kevin
Offline
Huddersfield, United Kingdom
Joined: 17 May 2010
xAP goes the distance.. and more

It's a long incoming xAP message - the sender ensures that the packet is within a UDP packet size of 1500 bytes and if larger spreads the groups reported across multiple xAP packets but I think it's proving too long for the new library.

[dbg][rx.c:23:readXapData] Rx xAP packet
xap-header
{
v=13
hop=1
uid=FF.6E17:0000
class=lighting.info
source=UKUSA.GHgateway.C-Bus
}
Status.GroupState
{
Group00=Off
Group02=On
Group03=Off
Group04=Err
Group05=Off

<snip>

Group50=Off
Group51=Off
Group52=Off
Group53=Off
Group54=Er
Segmentation fault
#

 

The mi4 xAP News application and xAP TV applications bring it down as well..

derek
Offline
Glasgow, United Kingdom
Joined: 26 Oct 2009
Killer xAP message on the wire

Yes. The thing to do would be to drop Brett an email with the exact message that is giving the problem.

Then, this issue should be reproducable and a fix could be rolled into xaplib2.

It's good to have somebody with a variety of xAP enabled kit helping out on the testing.

Derek.

brett
Offline
Providence, United States
Joined: 9 Jan 2010
Large packets

If you could capture the entire xAP packet and post it I'll use this as a tester to find out why its breaking.  Failing that I'll just make a large packet up and try it out.

brett
Offline
Providence, United States
Joined: 9 Jan 2010
A couple of bugs

The segmentation fault was due to me having an > instead of >= in my parse code when detecting when I can't store any more key/value pairs.  Having said that I only stored 50 which wouldn't have been enough for your message anyway so it wuold have silently overwritten you data which would have been harder to figure out.  So the SEGV was a lucky break in the end.

I've increased the number of key=value pairs to 150.  When I run out a storage a message will be logged so at least if this does happen you can find out why.

Pushed 255.

This bug only affects those with LARGE xAP messages which is why it went unnoticed during my testing.

Brett

Hardware Info