Error messages in Pachube log
Hi all
I'm having a bit of a hard time nailing a lua/scripting/pachube error.....
I have several scripts running that receive inputs from an EnviR, do some maths on the values then post those values to Pachube on several different feeds.....my problem is that one (or more) of the scripts occasionally causes Pachube to error out - and its a fatal error as it takes a manual reboot from me to get the Pachube module to re-start sending data, even though it still has a heart beat in xFx viewer!
I've tried going back to basics and just having one script working, then adding in another and another till it fails - but it fails randomly......
The xap-pachube error log shows:-
[err][pachulib.c:196:read_data] errno 22 (Invalid argument)
[err][pachulib.c:196:read_data] Err!! desconexion en recv
multiple times.....
Is there somewhere I can get xap-pachube error log to show more detail - like which script caused the posting error?
ALSO - as an asside, is there a beep() command in lua I can use to alert me as to when an error or a specific event has occured?
hope you can help....
EJ
I found some buffer corruption, overflows and an invalid flag in the pachube library. This is what you get for using somebody elses code. :( Try upgrading to the beta and let me know if this resolves your problems. Those errors should disappear now.
I also improved the logging.
# xap-pachube -i br0 -d 6
Should let you see what is going on.
Brett
Those errors are to be expected if there are issues on the server side. It just the client "xap-pachube" complaining that there are webserver issues posting and retrieving data. What is happening now thou is that its recovering as it should. Lets monitor how this goes over a longer period if it all looks well I'll push up a new build with this single fix in it as everybody will want/need this one.
Brett
I see the problem its a buffer overrun issue as I suspected. However the code I added in the beta should have prevent it from crashing, I'll see why it didn't too.
The buffer is only 1500 bytes so this has to be a candidate for concern and the crash.
[inf][pachulib.c:154:send_data] send 1505 bytes
The library I'm using can't handle pushing a large number of endpoint up in a single transaction which is how I'm using it. I'll have to rework some bits so that a dynamic array is created to handle larger buffer sizes needed when multiple readings are all being pushed up together. I could just increase the 1500 bytes to a larger number but that just delays the crash to some later point. I'll fix this properly.
Thx that was a great help the extra debug I added proved its worth.
Brett
Update to 305.2 I'm pretty sure I've nailed it this time given the information from the logs you provided me.
Brett
endpoints could have caused this crash?
Once you log over a 1500 bytes worth of Pachube ID's it overflows an internal buffer. The more you log the more likely you are to hit this boundary. EJ is feeding lots of stuff. Also don't forget the labels and units types also take up more space in this precious buffer until BANG it overflows.
I used the word endpoint perhaps I should have used pachube datastream. The more datastreams you have in a feed the more likely you are to hit this arbitary buffer limitation.
Each Feed is fed up to pachube as a group of datastreams and the XML representing their new data values all had to fit inside this tiny buffer, it was a bad design decision bythe author of this library that i've now corrected.
Brett
EJ there was nothing you could have done to fix this bug, bar spliting your datastream out into individual feeds to workaround the issue. It not something you should have been working arround anyway as its MY bug. I think you should find its solid now. We'll leave it run for a few more days.
Brett
Regarding the accumulating errors when the ISP connection drops. Yes this would build the error logs, I realized this before I checked in 306 which has these downgraded from error to warning so they don't log. This solves the problem and as the system recoverys from them they aren't really errors.
The logs are not in persistent storage so if you need to examine what went wrong do so before a reboot.
I could copy them into /etc on a reboot, or rotate them, but things should not go wrong such that you need them !
The error above means that you sent something to pachube and the code is now waiting to get a response from the pachube webserver. The response never happens so an error is reported, which is reasonable. It should however not crash after doing this. Are you sure at this point you HAH still has internet connectivity to pachube? You might want to verify that one you get a "lock up". Something as simple as this.
# wget http://www.pachube.com/
I should upgrade the error messages from spanish to english too I guess. I borrowed this library.
If you are still seeing a heartbeat then the process is still running. If however its no longer sending/receiving data then that is more a connectivity issue to the pachube webserver. Is your ISP doing something nasty?
Brett