Pachube Feed Freezing

33 replies [Last post]
Pentala
Offline
Joined: 22 Oct 2011

I'm using the 302 firmware and seem to be having problems with the Pachube uploads freezing. Checking the device this evening it froze earlier today.

Looking at the /var/log/xap-pachube.log -

 

# cat xap-pachube.log
Pachube Connector for xAP v12
Copyright (C) DBzoo 2009-2010
[err][pachulib.c:137:connect_server] errno 145 (Connection timed out)
[err][pachulib.c:137:connect_server] Err!! connect
[err][pachulib.c:137:connect_server] errno 145 (Connection timed out)
[err][pachulib.c:137:connect_server] Err!! connect
[err][pachulib.c:137:connect_server] errno 145 (Connection timed out)
[err][pachulib.c:137:connect_server] Err!! connect
[err][pachulib.c:185:read_data] errno 131 (Connection reset by peer)
[err][pachulib.c:1#

 

# ps
  PID USER       VSZ STAT COMMAND
    1 root      1796 S    init
    2 root         0 SW   [keventd]
    3 root         0 RWN  [ksoftirqd_CPU0]
    4 root         0 SW   [kswapd]
    5 root         0 SW   [bdflush]
    6 root         0 SW   [kupdated]
    7 root         0 SW   [mtdblockd]
    8 root         0 SW   [khubd]
   34 root         0 SWN  [jffs2_gcd_mtd2]
  111 root      1668 S    dropbear -p 22
  133 root      1028 S    /usr/bin/xap-hub -i br0
  137 root      1068 S    /usr/bin/xap-livebox -s /dev/ttyS0 -i br0
  144 root      1044 S    /usr/bin/xap-serial -i br0
  146 root      4072 S    /usr/bin/kloned
  148 root      2616 S    /usr/bin/xap-currentcost -s /dev/ttyUSB0 -i br0
  152 root      4168 S    lua /etc_ro_fs/plugboard/plugboard.lua
  179 root      1080 S    /usr/bin/xap-pachube -i br0
  187 root      1740 S    dropbear -p 22
  188 root      1812 S    -ash
  193 root      1784 R    ps
#

I have other devices uploading data to Pachube and these continue to upload when Livebox has failed.

 

Has anyone else seen this problem?

Andy.

 

allanayr
Offline
Ayr, United Kingdom
Joined: 25 Sep 2011
Pachube freeze

My Pachube feed (36589) froze this morning at about 11 am. I noticed at about 11:45 and rebooted the box. It's been fine ever since. I checked that xap-pachube was still running and everything seemed OK. So it may just be coincidence. I'm still running 301 at the moment.

 

/var/log/xap-pachube.log is clean by the way.

brett
Offline
Providence, United States
Joined: 9 Jan 2010
Those errors

Those errors are just telling you there was a connectivity problem at some point - this would result in a frozen feed for that period.  However they are benign and the process will recovery automatically without having to restart the process.

If you do experience an issue make sure you check the logs BEFORE you reboot as any error message will be lost on reboot.

Brett

kevin9
Offline
Lincolnshire, United Kingdom
Joined: 24 May 2010
Pachube Server issues

They had some problems abourt 12 hours ago with their server and it came back up an hour later. Apparantlky problems  with hosting provider

So that would have been down between 11 and 12 on the 13th jan ab back up between 12 and 1 ish

 

kevint

Pentala
Offline
Joined: 22 Oct 2011
xap-pachube recovery

Brett - the problem seems to be that the process doesn't recover. The xap-pachube log posted above was from the box at around 20:00 last night, around 9-10 hours after the apparent Pachube outage. Also, I have other systems uploading data to Pachube (from the same LAN) that continued to sucessfully log data during the 'outage'?

The only way I could get the process to recover was with a reboot...

Andy

brett
Offline
Providence, United States
Joined: 9 Jan 2010
I experienced a pachube feed freeze too

I've created issue 35 to track the pachube freezing problem - This happened to me too.

UPDATE: I found the problem it was a bug in a 3rd party library I was using.  Grrrr

Pachube doesn't go down too often so we should be ok until I get the next release out.  If it does you'll need to reboot to fix this as the issue is a File Descriptor leak which will affects every process on the HAH.  Yeah its a really ugly bug I inherited in this library, see what happens when you don't write it yourself!

Brett

Pentala
Offline
Joined: 22 Oct 2011
Thanks

Thanks Brett - please let me know if you need any further diagnostic/debug/config details.

Thanks again for all yours and Dereks work with this project.

Andy.

EJ-Ambient
Offline
Ringwood, United Kingdom
Joined: 5 Aug 2011
HAH feed to Pachube frozen

Hi Brett

I'm on 302.8 (- thanks for the ability to post to datastream '0'......)

Last night I had a brief service interruption - unfotunately I didn't check till this morning and found:

1) Non HAH post to Pachube all OK

2) HAH post to Pachube frozen (I have two HAH - both FROZEN)

I've tried a hard re-boot on both HAH - no fix

xFx viewer shows xAP activity with heartbeat on Pachube and CurrentCost

I can see CurrentCost values being posted by both HAH on Joggler-xAP Flash GUI screen

I can't log onto either HAH with WinSCP, and PuTTY takes 10 times longer than normal to respond, but eventually does get to the login screen.

I know I've re-booted but I get the following from CAT

# cat xap-pachube.log
cat: can't open 'xap-pachube.log': No such file or directory

I'm going to take down the whole LAN and bring each element on-line to see if I can sort it...

Any ideas?

EJ

OK - nothing improved until I physically disconnected the ADSL Router - powered up and everything fell back into place.....wierd....EJ

brett
Offline
Providence, United States
Joined: 9 Jan 2010
Losing internet

If your internet connection suddenly drops what happens is that DNS resolution fails, and now it takes a long time for a timeout, so when you attempt to SSH/SCP into you live box various networking stacks have a bit of an issue, and these delays become VERY noticable.  As you have experienced.   If you rebooted your HAH and then it locked up that would be the NTP DAEMON trying to reach out to setup the TIME before bringing up the SSHD daemon, again this takes about 5-10min to time out if no internet it found.  So it appears your HAH is dead, its not.   You should have noticed on the LCD the message 'NTP sync'.    I added this so people would know why its freezing at this point.  Perhaps you didn't look ?!

I suspect what might have happened is that your ADSL connection did a DHCP renew and perhaps got a new IP, which screwed everything over, as the connection that where establish internally now all get dropped.  AND OR your Modem just fell over and DIED and need resetting (the likely scenario) as a DHCP RENEW should not have locked up the HAH like I'm mentioned above.

When you reset your modem the HAH would now resolve everything and boot up fine.

Brett

EJ-Ambient
Offline
Ringwood, United Kingdom
Joined: 5 Aug 2011
Recovered

Hi Brett

thanks for that......I didn't know that a message would be shown on the LCD - must read ALL the wiki....anyway as I mentioned I re-booted the ADSL and everything came back up as normal!

 

I've reccently downloaded the BETA with the feed to datastream zero element added - which topic should I log into to post updates, etc......as i'm not able to get a feed to datastream zero from a HAHHub which has been instanced....

cold weather is abating - now a balmy 4.3C outside (9:00pm)

regards....EJ

BoxingOrange
Offline
United Kingdom
Joined: 11 Jun 2010
HAH Network Issues

It is VERY important that if you run more than 1 HAH on a network you MUST make sure they all have different MAC addresses.  Remember the HAH comes with 2 network ports, they default to 00:07:3A:11:22:00 and 00:07:3A:11:22:01, and can be found on the Management tab of the web GUI.  I would suggest you add 1 to the last part of the MAC for additional HAH's, ie 00:07:3A:11:22:02 andd 00:07:3A:11:22:03 for a second HAH.  

Duplicate MAC addresses will also cause problem for your internet router or any other network device you might have.  You may even find your router losing it's internet connection when you have duplicate MAC addressses.

 

Karl

hodder_fisher
Offline
United Kingdom
Joined: 1 Apr 2012
Freeze problem

I am also experiencing a similiar freeze problem which I have tracked down to it occuring when my router goes down (froze today at 15:50 router uptime aligns give or take a few minutes).

From issue 35 noted in an earlier post by Brett it was a known issue but has now been fixed (I am running version 306),  details from xap-pachube.log are;

 

 

# cat  xap-pachube.log

Pachube Connector for xAP v12

Copyright (C) DBzoo 2009-2010

[err][pachulib.c:101:resolve_host] errno 22 (Invalid argument)

 

[err][pachulib.c:101:resolve_host] Err!! gethostbyname

[err][pachulib.c:101:resolve_host] errno 22 (Invalid argument)

 

 

Resets ok when I reboot the HAH.

 

Any ideas ?

 

Dave

brett
Offline
Providence, United States
Joined: 9 Jan 2010
I have no idea's this time as

I have no idea's this time as I puzzled over this for a while the last time before I spotted a file descriptor leak.  Now it gets even harder...

jetjackson
Offline
Winchester, United Kingdom
Joined: 18 May 2012
:-(

I recently purchased 2 boxes with 306 on them. I've also noticed the Pachube upload freezing.  So much so, I've stopped using them to upload 'important' data for PV generation (and have gone back to the dreaded CCBridge). I still have it running from a clamp and another envi just to keep pushing data through it so see if it keeps breaking. When I say freezing, its still sending data, just the same value over and over again.

Can anyone give me any pointers or leads to follow to try to understand what is going wrong? I had a quick look at the xAP traffic with the free ipad app. I can see values moving around and changing, problem is its hard to follow as you have to catch it when it sticks which will be either when I'm at work or asleep! Are there any logs / debug stuff on the box to record if its having a problem sending data to COSM?

g7pkf
Offline
United Kingdom
Joined: 11 Jan 2011
could it be

that your whole house hysterisis is set a bit high? therefore the same value will be uploaded until a significant (over setting) change is detected?

hodder_fisher
Offline
United Kingdom
Joined: 1 Apr 2012
Temp workaround but not a fix...

I still suffer freezing of the feed, I can normally associate this with my network connection going down.

It can also freeze the feed at a value but is worth checking the hystorisis setting to ensure this is also not the issue.

I work around the issue by scheduling a reboot of HAH every 12 hours, which is ok for now (see Pvoutput link below) as all I am using pachube for is my house consumption, my Pv is collected still by using a scrip on my PC to connect via bluetooth to the invertor every hour through the day.  I would like to switch from the PC but results with using a CC clamp are less acurate so have decided to stick with the current setup for now, although I am looking into using the CC optismart sensor for a friends Pv installation.

 

http://www.pvoutput.org/intraday.jsp?id=6631&sid=5287

brett
Offline
Providence, United States
Joined: 9 Jan 2010
I have heard reports of the

I have heard reports of the feed not recovering past a network outage, unfortunately it's going to be a couple of months before I get a chance to look at this code and test for this as I'm 2 weeks from relocating countries, having said that with a stable network connection I've never experiences an outage.  Maybe its time to change providers :)

Perhaps somebody else can take a swing at this issue and see if they can find a bug.

Brett

jetjackson
Offline
Winchester, United Kingdom
Joined: 18 May 2012
A couple of questions:- How

A couple of questions:

- How do I schedule a reboot? 

- do I need to do something to turn on logging? The files in my logs directory are empty?

andy
Offline
United Kingdom
Joined: 17 Mar 2012
Brett will probably answer

Brett will probably answer this before me, but you could either:

call reboot via crontab, or write a script for PBv2 to call reboot, or listen for a BSC message via PBv2 and reboot, lots possibilities. Another thought - look at web admin page, perhaps you could even post/get to the reboot button if you want to do it from wget on another system, although not checked this and perhaps brett does a referrer check. I'm not near to the hah source now so can't check myself.

As for logging, think unless you look at Bretts new beta's you may have to start processes manually to enable enough logging. Guess this is where a usb stick comes in handy...

--andy

hodder_fisher
Offline
United Kingdom
Joined: 1 Apr 2012
Reboot script

Try this link for the reboot script, all the details are on the plugboard (PB) Wiki to explain how to get the Applets working.

https://docs.google.com/leaf?id=0BwzJbOYgkNcVZGU3NDZhZjEtZWI4Mi00MjkxLTh...

Nont worry its easy (I managed it !)

 

Dave

hodder_fisher
Offline
United Kingdom
Joined: 1 Apr 2012
Reboot script

Try this link for the reboot script, all the details are on the plugboard (PB) Wiki to explain how to get the Applets working.

https://docs.google.com/leaf?id=0BwzJbOYgkNcVZGU3NDZhZjEtZWI4Mi00MjkxLTh...

Nont worry its easy (I managed it !)

 

Dave

andy_godber
Offline
Joined: 13 Sep 2011
Pachube Cosm Feeds still freezing?

Brett - did you manage to get this resolved? Im on 306 and believe its still happening.

Every day or so, my feeds stop; Im not sure its directly related to router/network issues though, as I cant do an exact correlation to my router going offline. As everyone says though, the feed process doesnt recover when the network is back. A simple click of the restart button on the Pachube page starts it working.

In the meantime, other coders, I wonder is there a better way than doing a full reboot? Maybe a check if the service has stopped, or just restarting it every 12 hours or so?

Thoughts?

Edit #1

I noticed that, presumably because Id restart the Pachube service, there were seveal instances of it running (as shown by PS) - I would have thought they should have disappeared, and could be hanging?

Ive rebooted and will obtain screenshots as they become relevant.

garrydwilms
Offline
United Kingdom
Joined: 31 Mar 2011
Ok for me

I have a really flaky Internet connection and since Bretts alterations my Pachube recovers fine. In fact I've had no Pachube failures for as long as I can remember. So it certainly isn't a universal issue. 

As for rebooting, you could just schedule the cron scheduler to issue a reboot command as often as you like. 

More intelligently you could also write a script that monitors Pachube heartbeats and reboot if they stop. If you need helP wih this I could knock one up for you. 

In fact have you checked the heartbeats do actually stop when the posting stops? Just wondering as you say you are seeing mulitple instances. 

Garry. 

andy_godber
Offline
Joined: 13 Sep 2011
Heartbeats
I'll keep an eye out over the next 24 hours or so - usually stops overnight (UK time), so will report back later.
andy_godber
Offline
Joined: 13 Sep 2011
More info

 

Ok, stopped again this morning. This is the pachube log

# cat xap-pachube.log

Pachube Connector for xAP v12
Copyright (C) DBzoo 2009-2010

[err][pachulib.c:101:resolve_host] errno 22 (Invalid argument)
[err][pachulib.c:101:resolve_host] Err!! gethostbyname
[err][pachulib.c:101:resolve_host] errno 22 (Invalid argument)
[err][pachulib.c:101:resolve_host] Err!! gethostbyname

etc

I cant see that my network has dropped out, but I wonder if the hostname resolution is failing (as above) ?

The xap Pachube hearbeat is still running.

Prior to restarting, there is only one copy of

145 root      1080 S    /usr/bin/xap-pachube -i br0

after, restarting, there are two:

145 root      1080 S    /usr/bin/xap-pachube -i br0

204 root      1080 S    /usr/bin/xap-pachube -i br0

 

So Im guessing the first one is hanging, although thats challengeable because the heartbeat is still runnig. Also, XFX Viewer now shows both heartbeats.

Can someone try a restart on their (working) version, and see if it creates another instance (as shown in PS) without closing the first?

 

 

 

brett
Offline
Providence, United States
Joined: 9 Jan 2010
You said you are using

You said you are using firmware 302 - do you mean 306 as this is the latest.

Post release 302 there where some pachube buffer overrun issues that where resolved which could cause it to misbehave if you have many data feeds you are updating.  Thus why I ask if you are really on 302.   You want to be on 306.

Brett

andy_godber
Offline
Joined: 13 Sep 2011
yes 306
yes, i mentioned i was on 306 - the OP was on 302 some months ago
EJ-Ambient
Offline
Ringwood, United Kingdom
Joined: 5 Aug 2011
More Pachube freezes

Hi Brett

Just to stick my 2 cents worth in - my main home HUB (not instanced) freezes whenever my (Sky) internet hiccups, however the remote HUB (instanced) recovers!!!!

The main HUB has many CC and weatherstation items being posted via the HUB page, plus I'm posting a dozen more to ID's on a couple of Feeds via Plugboard...... the remote HUB only has three items being posted via the HUB Pachube page.

BUT - its not consistent.... I just had a 20 second loss of sevice and neither HUB recovered gracefully.....had to re-start Pachube on both (waited 15mins to make sure it wasn't auto recovering)!!!

I don't always get anything in the Pachube log, but occasionally there are err 22 and err 146 in the log.

It's become more noticeable recently as Sky have a very flakey service in my area - which they blame on the 'unseasonable wet weather' !!!!!!

Anyway - over the past 14 days I've had 26 freezes, 18 reboots and 8 restart Pachube's.  The Pachube heartbeat doesn't always cease so I can't use that as a trigger..... can anyone point me to a Twitter applet?

hope you can get to the bottom of this......cheers for all your good work......EJ

andy_godber
Offline
Joined: 13 Sep 2011
EJ- SLow?
EJ - does your Pachube tab in HAH seem slow when you add or delete entries?
EJ-Ambient
Offline
Ringwood, United Kingdom
Joined: 5 Aug 2011
Slow Pachube Tab

Exceedingly - to the extent that I don't dare touch the tab (add, amend or delete) anymore!  I lost several ID's, Feeds, etc whenever I tried to change anything so I resorted to my Plugboard applets to  post additional ID's to Pachube.... Brett wants/was going to rehash the ini system at one stage, so I will wait for that....EJ

andy_godber
Offline
Joined: 13 Sep 2011
Likewise
Ive just gone into the .ini file, and its a mess - seems every time you add/delete/change, its not very good at tidying up.So Ive cleared it all out and just going to re-enter everything again. Im not sure it'll cure the random freezing, but might speed up the tab.
brett
Offline
Providence, United States
Joined: 9 Jan 2010
Sometimes the web interface

Sometimes the web interface makes a bit of a dogs breakfast of the .ini file.   Once I get settled (one more house move to go) I'll be able to spend some time investigating these problems.  The pachube one is curious.

The only thing that comes to mind is that if you have a really prolonged outage the amount of log that is produced can fill up the /var/log area and break things due to out of "disk".  As this is a ram disk a reboot clears this and then we are good.

Brett

EJ-Ambient
Offline
Ringwood, United Kingdom
Joined: 5 Aug 2011
Date/time refresh !

So it happened again - while I was watching the xfx viewer and the HubGUI......

1) GoogleCal went dead.... (orange box in the view list)

2) Pachube went dead .... ( ... ditto ... )

3) COSM/Pachube froze on the last transmitted numbers....

4) Nothing in any log file!!!!

So I looked at the HUB - all appeared to be functional.  The PC on the same LAN was able to access the internet so I decided to try a Pachube refresh from the Hub GUI..... the Pachube icon in xfx viewer showed OK and it listed the heartbest stop and start so I left it for a while.... the heartbeats were not every 60 - they came about 90, then 75, then 80 then 70...all over the place.... and the COSM site was not updating!!!

This time I rebooted from the GUI..... everything looked OK (all heartbeats OK), but the COSM site was still not updating..... then I glanced at the Hub time stamp.... Jan 01 1970..... the Hub had rebooted but hadn't got a good np time synch.... rebooted again... bad time/heartbeat.... and again.... bad time/heartbeat.....left for 30 mins then rebooted..... bad time/heartbeat.... left an hour then rebooted....all OK, time in synch - everything heartbeating OK and the COSM site was being updated...!!!!!

I'm now constantly looking at my COSM twitter feed to see if the ID has frozen, but as it can take upto 30 mins for COSM to respond to a freeze, it's not exactly rapid response....

Hope this helps....

brett
Offline
Providence, United States
Joined: 9 Jan 2010
That is certainly

That is certainly interesting.

On a reboot what can happen is that if the network is down, or at least DNS resolution can't occur, then the script (/usr/share/udhcpc/default.script) that executes when a dhcp IP request is received may timeout too leaving your system with the wrong date/time.

So its possible that your router is up and handing out DHCP IP addresses however the WAN (internet) is still down which causes the ntpclient to hang before timing out.   At this point the system will come up, BUT pachube will continue to fail as the internet is still down (ie gethostname will throw that error that you see in the logs) as the DNS resolution is breaking still.

What it does not explain is why none of this recovers - certainly anything that needs SSL (ie SSH & HTTPS) will not work until the time is synced up, that is xap-googlecal and xap-twitter will both be hosed.

I'm not sure of the behaviour of the ntpclient once it times out I don't belive it will try again which would require a further reboot (as you mentioned).

The irregular heartbearts are unusual but I'm not sure what that is indicative of - perhaps some sort of network load?

You can try using a different NTP host for syncing the time if the uk.pool.ntp.org is playing up (find that very unusual). Perhaps your service provider as a closer source has one: ntp.sky.co.uk or ntp1.isp.sky.com   - you might want to manually check those work before changing your box to use them.   They resolve for me but I could not sync to them - but then again I'm not on a sky network and perhaps they lock them down.

# ntpdate -s -h <NTPSOURCE>

Brett

Hardware Info