Yun - on long REST API delays and diagnosing related LuCI problems

I have already posted some of this in an old thread here, but @jessemonroy650 quickly educated this newbie :) that it is actually preferred to start a new thread.

Like many of you, I was initially excited when I unpacked that shiny new Arduino Yun and my project worked, but later disappointed with the performance of Yun's REST API interface. The project itself is not really important; it's just the usual IoT: control something (in my case a camera, two servos, motion sensor, and relay) remotely over WiFi through a web page. I press a web page button, or type a URL into browser's address bar, or even use curl on the Yun itsef - it doesn't matter which - and the Yun is dog ssllooooowwwww... Between this trigger and the expected effect (e.g, camera moving, or lights turning on) Yun was taking its sweet time - up to 30 seconds! For all practical purposes, my project was useless…

First, I eliminated all obvious reasons (network? check, router? check, interference? none, etc.) then I embarked on an intense Google search, of course, finding that many had similar problems. Among others, the link mentioned above, which contains lots of good advice. I tried this and that. In short, nothing worked. Frustrating…

Yesterday, I set other projects aside and firmly decide to find what's going on with my Yun, no matter what. Where to start? Well, I SSH into Yun and start monitoring the CPU usage:

top -d1

OK, I see uhttpd, the HTTP daemon, at 0% CPU (good), mjpg_streamer at 0% (good), "top -d1" at 2% (expected), "python -u bridge.py" at 8-10% (a bit high, but perhaps not unusual?), and "/usr/bin/lua /www/cgi-bin/luci" competing with bridge.py for the top spot: once appearing at ~30%, then disappearing, and appearing again, every two seconds. Moreover, its PID is different every time. Hmmm… Is this supposed to be like this? Unfortunately, "I ain't know nuthin'" about web server programming, I'm just a total newbie. :(

Now, from my Internet surfs I already know that LuCI is a web user interface that generates web pages delivered to the user as well as handles REST requests and talks to Arduino's Bridge, written in an exotic programming language called "Lua". Hence, the strange jumping process is definitely spawned by uhttpd and deserves some special attention.

Next, I open my project's web page and press one button (light switch, I think). As expected, the Yun server moves with the "speed of government" ;) but while I wait I closely observe top's output: a second "/usr/bin/lua /www/cgi-bin/luci" comes up at 34%, in addition to the first jumpy one, and stays at the very top for a (long) while, with the same PID. Gotcha! Either this is the culprit, or it waits for something else down the line. So I check this "/www/cgi-bin/luci" thing - it's a script:

#!/usr/bin/lua
require "luci.cacheloader"
require "luci.sgi.cgi"
luci.dispatcher.indexcache = "/tmp/luci-indexcache"
luci.sgi.cgi.run()

It uses "/tmp/luci-indexcache", so I "cat" it - it's a binary file, unfortunately. But while displaying the contents of /tmp directory I notice something else - a sub-directory named "luci-sessions". It contains hundreds of randomly named files; each of them stores a session cookies/tokens/whatever. Hundreds? I think about it for a while. Well, my Yun has been up for a while, after all. So I delete all of them (it's /tmp, right?) and press the project's button once again…

Bingo! The lights turn on almost instantly! Well, there is about 1 second delay, but it's infinitely better than before! Wow. That's it, then. LuCI must be busy scanning all the session files, perhaps searching for the right one? (It would be a poor design, if you ask me.)

My joy is short lived, though. I check luci-sessions again and notice new files being generated fast. What the ? After 5 minutes Yun's server responds noticeably slower, after 30 minutes it's back to its old ways… What is generating these files? Well, it's the jumpy "/usr/bin/lua /www/cgi-bin/luci" process, naturally, but what is spawning it? Well, it's uhttpd, of course, but what causes it? How to diagnose it?

In my Internet searches on LuCI I found this excellent web page with tutorials. The 6th one is about debugging Lua - definitely well worth checking out. Following its instructions, I temporarily modify the "/www/cgi-bin/luci" script to this:

#!/usr/bin/lua
require "nixio"

dbg = io.open("/tmp/luci.req", "w")

for k, v in pairs(nixio.getenv()) do
    dbg:write(string.format("export %s=%q\n", k, v))
end

dbg:write("/www/cgi-bin/luci\n")
dbg:close()

require "luci.cacheloader"
require "luci.sgi.cgi"
require "luci.sauth"
luci.dispatcher.indexcache = "/tmp/luci-indexcache"
require "luci.debug".trap_memtrace("l", "/tmp/trace")   -- (L, not one)
luci.sgi.cgi.run()

and repeat the button press. The above generates two debug files: /tmp/trace with per line code trace (but /usr/lib/lua/luci/ codes turn out to be compiled semi-binaries, so trace is not very useful) and /tmp/luci.req - the latter is a jackpot, since it contains the copy of the last GET request. I immediately recognize it, but how could that be? It comes from my Mac, but I am not doing anything! I close my project's web page, check all other tabs in my Safari browser, and make sure no Terminal windows are open… The luci-sessions are still being generated. >:(

In the end, a very careful analysis of Activity Monitor reveals a zombie Safari process, running without GUI, sending REST requests to my Arduino every 2 seconds. :( And since I've been taking an on-line electronics course lately and did not want to lose continuity, I was keeping my Mac either on or in sleep mode for weeks… Mystery explained.

If you survived this long story, up to here, thank you. If you are a newbie, like me, I hope you at least learned how to diagnose LuCI hiccups and will find source(s) of your Yun problems, whatever they might be.

aromring: I have already posted some of this in an old thread here, but @jessemonroy650 quickly educated this newbie :) that it is actually preferred to start a new thread.

::::SNIP::::

In the end, a very careful analysis of Activity Monitor reveals a zombie Safari process, running without GUI, sending REST requests to my Arduino every 2 seconds. :( And since I've been taking an on-line electronics course lately and did not want to lose continuity, I was keeping my Mac either on or in sleep mode for weeks… Mystery explained.

If you survived this long story, up to here, thank you. If you are a newbie, like me, I hope you at least learned how to diagnose LuCI hiccups and will find source(s) of your Yun problems, whatever they might be.

@aromring, that was excellent detective work. I am very happy to see persons like yourself going to such extremes. FWIW: we know about the Safari browser keeping the sockets open, and doing "shadow reads", we have even see it do "pre-fetches".

The fact that you figured this out on your own amazes me. I'm impressed. :)

Jesse

Great perseverance, and excellent work! Thank you for sharing the results of your efforts, I'm sure it will help someone.

Let me (once again) display my ignorance and ask if this problem is specific to using LUA on the Yun ?

...R

Robin2:
Let me (once again) display my ignorance and ask if this problem is specific to using LUA on the Yun ?

I’d say the problem is specific to LuCI, not Lua. The LuCI code is not very good at cleaning its own messes. Here, look at what I’ve found in the source code of sauth.lua from the 0.12 release: there is a function reap() for removing expired sessions. It calls read(id) for each session, which in turn figures out which session is expired in the following line:

if sess.atime and sess.atime + sessiontime < luci.sys.uptime() then
        kill(id)

In other words, if the starting session time is more than “sessiontime” seconds (I believe) into the past, then delete it. The “sessiontime” is a configurable parameter in /etc/config/luci (if I remember correctly - I am away from my Yun) equal to 3600 seconds, by default. Thus, OpenWRT assumes that a web session to a router running LuCI occurs not much more frequently than once per hour. Pretty reasonable, since you don’t login too often into your router to perform administrative tasks.

Now, this stands at cross purposes with Arduino Yun. Say, you have running JavaScript that constantly polls your Yun through REST API for sensor data every 2 seconds - the number of session files accumulates fast and pretty soon clogs the web traffic to the point where response delay is longer, much longer than 2 seconds… See the picture?

Yes, you could change “sessiontime” down to 1 second, for example, but then you’ll be kicked out frequently when you do log into your Yun’s admin interface.

I am thinking about submitting a bug report to LuCI team, but I am afraid Arduino Yun is just too niche to get developers’ attention…

@aromring, Thanks very much for your detailed explanation.

I will stick with using a Python based server such as Bottle or CherryPy.

...R

@Robin2, What you write is interesting (and I have seen mentions of this before). To be clear, you took your Yun, uninstalled the uHttpd/LuCI combo and replaced it with a Python based server - correct? And does it solve all the performance issues? Is there a published tutorial on how to do such a switch?

Sorry for my ignorance this time, the whole world of web programming is new to me.

In the meantime, I dedicated my lunch break to finding where the REST bottleneck is in LuCI. I located it in the dispatcher.lua code, lines 375-384:

375                      if not sess then
376                         local token = luci.sys.uniqueid(16)
377                         sauth.reap()
378                         sauth.write(sid, {
379                             user=user,
380                             token=token,
381                             secret=luci.sys.uniqueid(16)
382                         })
383                         ctx.urltoken.stok = token
384                     end
385

The likely culprit here is line 377 - the sauth.reap() function located in sauth.lua :

116 --- Remove all expired session data files
117 function reap()
118     if sane() then
119         local id
120         for id in nixio.fs.dir(sessionpath) do
121             if _checkid(id) then
122                 -- reading the session will kill it if it is expired
123                 read(id)
124             end
125         end
126     end
127 end

As I suspected, it loops over all files in /tmp/luci-sessions with every REST call... Not nice.

So far, I've come up with a quick and dirty workaround to likely enhance the performance of YunServer:

1) First, locate dispatcher in /usr/lib/lua/luci and make a backup copy (e.g., dispatcher.bkp) 2) Download the dispatcher.lua source code in its place 3) Comment out line 377 4) Have another, external process (cron?) periodically clean /tmp/luci-sessions

Haven't tried it, yet. I will welcome any and all better ideas.

aromring: To be clear, you took your Yun, uninstalled the uHttpd/LuCI combo and replaced it with a Python based server - correct?

No. Nothing so sophisticated.

I just ignored uhttpd and used the Python server for my app.

I am a bit of a Philistine. I first wrote a web application using Ruby and Rails then gradually migrated to Ruby and Sinatra followed by Python and Bottle. I don't really know how to use uhtttpd or apache properly. I try to find solutions that keep my life simple.

...R

Wow, I was pulling my hair out until I found this thread. Thank you for your detailed write up! It really did help me. I discovered after running top that my luci service was running high, so I did a google search for “Yun luci high cpu” and bingo, found this thread.

I’m going to dig a little deeper and see if I can’t find a log file with IP addresses of sessions. I’d like to know which of my devices may be running phantom sessions.