>> My typical worker thread completes its task in under 50-250 ms.
>
>If the tasks complete in that time, why do you have to fork a thread at all ?
>Perhaps sometimes they take much longer and you can't predict in advance when
>that will happen ?
[me:]
> > If the tasks complete in that time, why do you have to fork a thread at
> > all ? Perhaps sometimes they take much longer and you can't predict in
[quoted text clipped - 6 lines]
> requests are served as fast as possible, and requests are
> asynchronous.
That makes sense -- assuming that the DB can service, say, 10 tasks
simultaneously faster than the same 10 sequentially.
> > Anyway, as others have said, the time and resources consumed per thread
> > startup/death are very system-dependent. Personally, if I wanted such
[quoted text clipped - 7 lines]
> that to garbage collector, which in turn burdens processor), but
> you're mostly using less memory.
Agreed, but I suspect that you are thinking of a thread pool where threads hang
around for long periods if unused. I'd use a timer to retire threads that had
been idle for <some small period of time>. At a first, completely arbitrary,
number to plug into the design, I would probably choose 5 seconds.
> But does it take more (memory and/or processor) to have a pool of 50
> threads, or does it take more (processor) to fork threads and gc them?
How many DB queries are typically in-flight at any one time, and how much does
that figure vary with time ? You don't have to make your pool big enough that
any request can be satisfied from the pool, since there is nothing to stop you
creating a new thread /if/ there isn't one already available. Given the figure
of 50 that you mention, and given the short time taken for each request, it
seems that you anticipate large numbers of simultaneous DB queries, and that
they must be fired off rather often. That being the case, an even shorter
time-out might make the pool tune itself more effectively.
> But I'm just guessing - and I would have already benchmarked that if
> it was simple. :)
Agreed ;-)
What I would do is write a pool implementation (including the timeout), and add
instrumentation to it to monitor how many idle threads existed at any one time.
Then run it under a production load[*] and see directly what the costs were.
([*] Assuming that's possible -- and if you can't at least simulate a rough
equivalent to a production load then you're reduced to guessing anyway...)
-- chris
Eric Sosman - 20 Apr 2006 15:09 GMT
Chris Uppal wrote On 04/20/06 08:24,:
> Agreed, but I suspect that you are thinking of a thread pool where threads hang
> around for long periods if unused. I'd use a timer to retire threads that had
> been idle for <some small period of time>. At a first, completely arbitrary,
> number to plug into the design, I would probably choose 5 seconds.
If a Thread is sitting idle, it doesn't have much of an
effect on the operation of the program. Yes, it'll tie up
some memory -- but reclaiming the memory is only useful if
you have reason to believe some other Thread will need it.
Since the first Thread is sitting idle, chances are that the
overall load is fairly light at the moment and the memory
demand is probably also at a low ebb ...
So I'd vote for letting the idle Thread just sit there:
you needed it once, which suggests you might need it again
when the load ramps back up, so why not have it handy and
ready to roll?

Signature
Eric.Sosman@sun.com
Chris Uppal - 21 Apr 2006 10:32 GMT
[me:]
> > Agreed, but I suspect that you are thinking of a thread pool where
> > threads hang around for long periods if unused. I'd use a timer to
[quoted text clipped - 6 lines]
> when the load ramps back up, so why not have it handy and
> ready to roll?
I suspect that we are both approaching this from the same direction: "hack
something simple together and see if it works; only get complicated if the
simple version is inadequate", but that we differ in our ideas of what is
"simple" when applied to a thread pool.
You position appears to be something like: don't worry about memory (or address
space) consumption unless there's a known problem. Mine is more like, just
make the damn thing kill themselves, and you won't even need to /think/ about
whether there's a resource problem.
I'd rather write code than think ;-)
(BTW, on a more practical note, 5 seconds is an enormously long period by
computer standards, so if the thread has been idle that long, I'd consider that
it was /not/ likely to be needed again -- not on a timescale comparable with
the thread-creation overhead anyway.)
-- chris
Domagoj Klepac - 24 Apr 2006 14:41 GMT
>> In my concrete example, a worker task consists of several checks and
>> modifications which involve several database lookups. The database
[quoted text clipped - 5 lines]
>That makes sense -- assuming that the DB can service, say, 10 tasks
>simultaneously faster than the same 10 sequentially.
Yes, but I'm not worried about the DB. There is a number of ways to
solve DB bottlenecks, including clustering, but that doesn't have much
to do with my application.
>> But I'm just guessing - and I would have already benchmarked that if
>> it was simple. :)
[quoted text clipped - 4 lines]
>instrumentation to it to monitor how many idle threads existed at any one time.
>Then run it under a production load[*] and see directly what the costs were.
Yep, it seems that's what I'll have to do.
Domchi