[Straw] Fixing async IO reliability
- From: Tuukka Hastrup <Tuukka Hastrup iki fi>
- To: straw-list gnome org
- Subject: [Straw] Fixing async IO reliability
- Date: Fri, 24 Nov 2006 23:34:27 +0200 (EET)
Hello all,
I'm a long-time happy user of Straw and started looking at the code
recently in the context of fixing a release-critical bug for Debian:
#397469: straw: Does not work with python-adns installed
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=397469
I'm a big fan of asynchronous IO in favor of threaded concurrency, but in
Straw it seems to be the cause of complexity and lack of reliability in
the networking as seen in the code and bug reports. Perhaps the async code
can be fixed for good, perhaps the feeds should be updated in separate
worker threads. I have described a couple of issues below, all of which
would disappear if threads were used instead. Consider this an offer to
work on these issues, whether threading is to be used or not.
1. Loading feeds takes a very long time or times out
This is the problem fixed in the bug report above. Async code should only
sleep in the GTK main loop. Instead, when the ADNS library is installed
for async DNS lookups, Straw all the time sleeps 0.1 seconds in
URLFetch.py:167:lookup_manager.poll(timeout)
LookupManager.py:164:self.queryengine.run(timeout)
ADNS.py:46:self._s.completed(timeout)
This means asyncore.poll is hardly ever called in the following lines of
URLFetch.py, which causes feed loading to stall. Changing the timeout
parameter given to ADNS to 0 fixes this issue.
The if statement around asyncore.poll and the non-zero timeout distract
fair scheduling and don't help much, so they should be removed as well.
2. Limited download speed
NetworkConstants.py currently defines
POLL_INTERVAL = 500
This means we read from the buffer at most 2 times per second giving a
maximum download speed of 10 kilobytes per second per connection here. Not
only do large feeds load slowly, but it's also holding resources of the
remote server.
The problem is alleviated by changing POLL_INTERVAL to something like 10
which gives a maximum download speed of 100 kilobytes per second - still
far from the over 700 kilobytes per second that wget gives here.
A fix would mean integrating the GTK, asyncore and ADNS main loops into
one, for example if asyncore and ADNS connections could be added to GTK
watches. I have been using PyGTK and Twisted Python together with good
results, perhaps an option would be to switch from asyncore to twisted -
although this wouldn't fix the rest of the issues.
3. Feed URIs with IP addresses don't work, IPv6 and /etc/hosts don't work
ADNS is meant for DNS lookups in server software, not for name resolving
in desktop applications. A user can expect IP addresses, IPv6, /etc/hosts
etc. work as in every other app, but they can't work unless we use the
system resolvers instead of ADNS. As a fix, IP addresses can be
special-cased, ADNS may get IPv6 support, and the rest could be listed as
a known "feature".
On the other hand, using threads we can call getaddrinfo in the Python
libraries, which corresponds to the respective POSIX function and uses the
system resolver.
4. Feed parsing hangs everything
This isn't quite an IO problem, but it would get fixed with threads.
Another way would be to patch feedparser.py to use the incremental
interfaces of xml.sax and sgmllib and feed the content in reasonably
sized pieces.
I suppose changing from asyncore and ADNS to threading would require small
changes all over the code. However, the functionality is mainly in
LookupManager.py, URLFetch.py, and SummaryParser.py. I'd hope the
threading model wouldn't get too complex if a worker thread was spawned
for each feed update. The thread could independently perform hostname
lookup, content downloading and feed parsing before in the main
thread inserting the results into the user interface.
Would you think these ideas are of any use in the further development of
Straw?
Regards,
Tuukka Hastrup
--
-- Trying to catch me? Just follow up my Electric Fingerprints
-- To help you: Tuukka Hastrup iki fi
http://www.iki.fi/Tuukka.Hastrup/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]