Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / November 2006

Tip: Looking for answers? Try searching our database.

concurrency, threads and objects

Thread view: 
Tom Forsmo - 15 Nov 2006 11:49 GMT
Hi

I have recently done some thread programming in java, my previous
experience is from posix threads in C. There is one thing that puzzles
me about thread programming in java.

In C there are no function instances (as in objects or similar things),
only function invocations. When programming threads in C there is one
function and several threads with its separate function invocations.
In java you can either create an object and have a number of threads
execute its run() method or you can create one object per thread.

What puzzles me is that in a way both ways seems slightly wrong.

Creating a number of objects with a thread for each is sort of like
creating many separate programs/processes, it seems like waste of
objects to start with. Why create as many objects as you would create
threads?

Creating only one object and creating a number of threads for that
objects run method, also seems wrong, for two reasons. 1) the object
could then not have any state unless it was to be shared and 2) when
reading the name of the thread all threads are named the same.

I understand that it would be normal to create other objects and execute
their methods run(), which would then effectively create a
function/method invocation and it all aligns well with my previous
perception. But the startup part of it seems a bit strange to me.

Anyone care to help me push my perception into alignment again.

tom
Chris Uppal - 15 Nov 2006 13:09 GMT
> Creating only one object and creating a number of threads for that
> objects run method, also seems wrong, for two reasons. 1) the object
> could then not have any state unless it was to be shared and [...]

That is correct, but it may be the semantics that you want -- if several
threads have to share data for instance.  (It's not the typical case, though.)

> [...] 2) when
> reading the name of the thread all threads are named the same.

The name comes from the instance of Thread, and can be set independently of the
Runnable object that each thread executes.

> Creating a number of objects with a thread for each is sort of like
> creating many separate programs/processes, it seems like waste of
> objects to start with. Why create as many objects as you would create
> threads?

I don't know Posix threads, but I assume there is some way that you can ask the
system about a thread once created (is it still running, what groups does it
belong to, and so on).  So there is a /something/ there that you can talk
about.  In Java a "something" is always represented by an object, so in Java
there is an object (instance of Thread) which stands for each thread.

As an example, a thread couldn't have a name unless there was an object to hold
the name.

   -- chris
Tom Forsmo - 15 Nov 2006 15:05 GMT
>> Creating only one object and creating a number of threads for that
>> objects run method, also seems wrong, for two reasons. 1) the object
>> could then not have any state unless it was to be shared and [...]
>
> That is correct, but it may be the semantics that you want -- if several
> threads have to share data for instance.  (It's not the typical case, though.)

I didn't quite think of it like that...

>> [...] 2) when
>> reading the name of the thread all threads are named the same.
>
> The name comes from the instance of Thread, and can be set independently of the
> Runnable object that each thread executes.

I tried this but it did not work properly. What I did was this:

            thr = new Thread[opt.getThreads()];

            for(int i=0; i<opt.getThreads(); i++) {
                thr[i] = new Thread(this);
        thr[i].setName("thread num: " + i);
                thr[i].start();
            }

But I found out, just now, that if I use setName() inside run(), then it
works ok, why is that? the object is created outside run so changing its
state there should be ok, unless the object is reinitialised when run()
starts to execute.

>> Creating a number of objects with a thread for each is sort of like
>> creating many separate programs/processes, it seems like waste of
[quoted text clipped - 9 lines]
> As an example, a thread couldn't have a name unless there was an object to hold
> the name.

Sorry, what I meant was my defined objects, not the Thread objects. E.g.
I can define an class ClassA which implements Runnable. If I use 100
threads, I would create 100 ClassA objects, which means I would have 100
ClassA objects and 100 Thread objects.

            thr = new Thread[opt.getThreads()];

            for(int i=0; i<opt.getThreads(); i++) {
        Runnable r = new ClassA();
                thr[i] = new Thread(r);
                thr[i].start();
            }

Of course the Thread object is here composed of the ClassA object, so it
would be the same as if you extended a Thread class, but that is my
point. It seems like a waste of resources to have those 100 ClassA
objects lying around for no reason. A thread safe program requires
re-entrancy, which it in this example solves, not by having a re-entrant
object, but rather by just creating completely new objects avoiding the
entire issue... Sort of like starting 100 separate programs/processes.
This is where my perception clashes with java threads.

tom
Thomas Fritsch - 15 Nov 2006 15:48 GMT
[...]
> I can define an class ClassA which implements Runnable. If I use 100
> threads, I would create 100 ClassA objects, which means I would have 100
[quoted text clipped - 12 lines]
> point. It seems like a waste of resources to have those 100 ClassA
> objects lying around for no reason.
So, why not use the same ClassA object for all your 100 threads?
    Runnable r = new ClassA();
    for(int i=0; i<opt.getThreads(); i++) {
        thr[i] = new Thread(r);
        thr[i].start();
    }
But of course this approach depends on how careful your ClassA object
handles concurrent calls to its run() method.
> A thread safe program requires
> re-entrancy, which it in this example solves, not by having a re-entrant
> object, but rather by just creating completely new objects avoiding the
> entire issue... Sort of like starting 100 separate programs/processes.
Not quite: a thread is actually much cheaper than a process.
> This is where my perception clashes with java threads.
Well, having 100 objects may not be such a big thing as you suspect. A
Java Thread object is actually nothing more than a native OS thread and
a few more bytes (for its member variables).

Signature

Thomas

Robert Klemme - 15 Nov 2006 16:04 GMT
> Sorry, what I meant was my defined objects, not the Thread objects. E.g.
> I can define an class ClassA which implements Runnable. If I use 100
[quoted text clipped - 13 lines]
> point. It seems like a waste of resources to have those 100 ClassA
> objects lying around for no reason.

As said earlier, the overhead of a single object is not much.  And the
advantage of using a class that implements Runnable is that you get more
flexibility.  You can, for example, push those instances into a queue
and have a fixed number of threads processing them one by one.  (I think
Doug calls it "lightweight processing framework".)

> A thread safe program requires
> re-entrancy, which it in this example solves, not by having a re-entrant
> object, but rather by just creating completely new objects avoiding the
> entire issue... Sort of like starting 100 separate programs/processes.
> This is where my perception clashes with java threads.

The concept is called "thread confinement" (-> Doug Lea's book).
Basically this means that you avoid congestion by having separate sets
of data which in turn removes the necessity of synchronization.

Whether you apply that or not depends on the circumstances.  Also of
course this can be mixed with other approaches so you get very precise
control over which data is shared and which not.

Regards

    robert
Tom Forsmo - 15 Nov 2006 18:01 GMT
> The concept is called "thread confinement" (-> Doug Lea's book).

Where in Dougs book is that described? I could not find it.
In any case the more general computer science term is re-entrant, as
part of the subject of thread safe code, see

http://en.wikipedia.org/wiki/Thread-safe

tom
Robert Klemme - 15 Nov 2006 21:39 GMT
>> The concept is called "thread confinement" (-> Doug Lea's book).
>
> Where in Dougs book is that described? I could not find it.

?

http://www.awprofessional.com/bookstore/product.asp?isbn=0201310090&rl=1#info2

click on Table of Contents, Chapter 2

> In any case the more general computer science term is re-entrant, as
> part of the subject of thread safe code, see

Reentrancy (?) is something completely different from thread
confinement.  You can use the latter to achieve the former.  Reentrancy
is a property of a piece of code while thread confinement is a technique
which can be used to achieve the other - and other ends.

Regards

    robert
Tom Forsmo - 16 Nov 2006 00:21 GMT
>>> The concept is called "thread confinement" (-> Doug Lea's book).
>>
[quoted text clipped - 5 lines]
>
> click on Table of Contents, Chapter 2

That explains it, I have edition 1 of the book, you are referring to
edition 2. A question then, do you know if 2nd Ed much different that
1st Ed?

I see there is a difference in the TOC, but to me it seems like a
reorganisation of the book with possibly better titles or something. The
book came out 2 years after the first, so it seemed to me that there
really could not be much difference. Basically since the theory and
practice of concurrent programming is quite old, so much would probably
not have changed in 2 years.

>> In any case the more general computer science term is re-entrant, as
>> part of the subject of thread safe code, see
>
> Reentrancy (?) is something completely different from thread
> confinement.

Fair enough, but I was talking about re-entrancy. I am not saying thread
confinement could not be used, thought.

Btw could you give a bit more detailed description about thread
confinement, I could not find anything when googling, only references to
Dougs book.

tom
Robert Klemme - 16 Nov 2006 09:55 GMT
> That explains it, I have edition 1 of the book, you are referring to
> edition 2. A question then, do you know if 2nd Ed much different that
> 1st Ed?

I can't seem to find my book (time to search colleagues desks) and also
I do not know 1st edition. :-)

> I see there is a difference in the TOC, but to me it seems like a
> reorganisation of the book with possibly better titles or something.

I believe the foreword mentioned something like reorganization as one of
the major differences.

> The
> book came out 2 years after the first, so it seemed to me that there
> really could not be much difference. Basically since the theory and
> practice of concurrent programming is quite old, so much would probably
> not have changed in 2 years.

Still a book can be improved. :-)

>>> In any case the more general computer science term is re-entrant, as
>>> part of the subject of thread safe code, see
[quoted text clipped - 7 lines]
> confinement, I could not find anything when googling, only references to
> Dougs book.

I believe I stated that earlier: basically you restrict access to data
to a thread and thus avoid synchronization issues.  Can be achieved in
different ways (local variables etc.).

Kind regards

    robert
Chris Uppal - 15 Nov 2006 16:28 GMT
> > > [...] 2) when
> > > reading the name of the thread all threads are named the same.
[quoted text clipped - 16 lines]
> state there should be ok, unless the object is reinitialised when run()
> starts to execute.

I can't think of any reason why setName() wouldn't work.  I admit I haven't
tested it, but I can't see anything odd in the source.  I suspect that it's an
artefact of whatever you are using to "see" the Thread's names.

But why use setName() at all ?  It seems easier just to pass the correct names
to the Threads' constructors in the first place.

> If I use 100
> threads, I would create 100 ClassA objects, which means I would have 100
> ClassA objects and 100 Thread objects.

So what ?

;-)

Think of it like this.  If those 100 objects which implement Runnable are
genuinely unnecessary, then they must be effectively stateless in that none of
the processing in any thread depends on the state of its Runnable object -- in
which case there is no reason not to use the same Runnable for every Thread.
But, going further, if they are /actually/ stateless, or nearly so, then they
are so cheap that they cost much less (in space and time) than the Thread
itself, and will almost certainly cost less even than the thread's /name/, so
there is no reason to go to the (cognitive) effort of reusing the same object.
Just create 100 of 'em -- you can afford it.

OTOH, if the Runnables' states /do/ affect the subsequent execution, then you
obviously can't get away without having separate objects...

BTW, the most common case is that each thread /is/ parameterised in some way
(which Socket to read from, which array to process, which Snoggle to
delaminate(), ....) and the Runnable objects are the natural place to put that
information.

   -- chris
Tom Forsmo - 15 Nov 2006 18:36 GMT
> I can't think of any reason why setName() wouldn't work.  I admit I haven't
> tested it, but I can't see anything odd in the source.  I suspect that it's an
> artefact of whatever you are using to "see" the Thread's names.

I use println() just before start(), and then another println() inside
run() to print the progress of each thread. When I print before start()
it prints out the name I set, but when I print inside run() the name is
reset to the original name. Its almost as if there is an object
reinitialisation when run starts.

> But why use setName() at all ?  It seems easier just to pass the correct names
> to the Threads' constructors in the first place.

I haven't tried that, it might work.

>> If I use 100
>> threads, I would create 100 ClassA objects, which means I would have 100
>> ClassA objects and 100 Thread objects.
>
> So what ?

I don't believe in code bloat and I see it as unnecessary runtime
resource consumption. I don't subscribe to the idea that you should not
worry about resources (cpu, memory etc.), because its so cheap. The
reason is simple, bloated code runs slower and is more difficult to
maintain. Think of a program that takes up 300 MB of memory and compare
it to a program that only requires say, 150MB. The smaller program
requires less bus bandwidth between the cpu, memory and disk and less
processing cycles (barring algorithm efficiency).

I do see, though, that there are solutions where doing having one object
per thread is beneficial. But it sort of leaves a bad taste in my mouth...

> Think of it like this.  If those 100 objects which implement Runnable are
> genuinely unnecessary, then they must be effectively stateless in that none of
[quoted text clipped - 13 lines]
> delaminate(), ....) and the Runnable objects are the natural place to put that
> information.

I have come to the same conclusions as well. But as I said I think it
leaves a bad taste... But then again, I might just be a bit picky.

thanks for all your feedback.

tom
A. Bolmarcich - 15 Nov 2006 22:03 GMT
>> I can't think of any reason why setName() wouldn't work.  I admit I haven't
>> tested it, but I can't see anything odd in the source.  I suspect that it's an
[quoted text clipped - 5 lines]
> reset to the original name. Its almost as if there is an object
> reinitialisation when run starts.

Be careful to invoke getName() on the same object on which setName()
was invoked.  Within run() you likely want to use an expression like

 Thread.currentThread().getName()

Instead of posting partial code, please post a small complete program that
demonstrates the problem that others can compile and run.

>> But why use setName() at all ?  It seems easier just to pass the correct names
>> to the Threads' constructors in the first place.
[quoted text clipped - 4 lines]
>>> threads, I would create 100 ClassA objects, which means I would have 100
>>> ClassA objects and 100 Thread objects.
[snip]

Not in the example code that posted.  It had a loop whose body contained

 thr[i] = new Thread(this);

That would create 100 Thread objects.  The statement

 thr[i].start();

later in the loop would invoke the run() method of the (single) object
referred by the keyword "this" 100 times.
Tom Forsmo - 16 Nov 2006 00:44 GMT
> Be careful to invoke getName() on the same object on which setName()
> was invoked.  Within run() you likely want to use an expression like
>
>   Thread.currentThread().getName()

that seemed to work, but I don't understand quite why.

> Instead of posting partial code, please post a small complete program that
> demonstrates the problem that others can compile and run.

(Please don't post comments in a thread where they don't belong, it
makes it difficult to understand what message and part of it you are
commenting.)

I was posting only the relevant parts of the code, all the other code
has nothing to do with the problem.

> Not in the example code that posted.  It had a loop whose body contained
>
[quoted text clipped - 6 lines]
> later in the loop would invoke the run() method of the (single) object
> referred by the keyword "this" 100 times.

Yes, but it starts the 100 threads (i.e. the thread objects), which all
executes within the same object, namely this.

In any case, it was the wrong code, it was supposed to be

            thr = new Thread[opt.getThreads()];

            for(int i=0; i<opt.getThreads(); i++) {
                System.out.println("Starting Thread " + i);
                thr[i] = new Worker();
                thr[i].start();
            }

this creates 100 worker objects and 100 threads objects

tom
Chris Smith - 16 Nov 2006 02:59 GMT
> >> If I use 100
> >> threads, I would create 100 ClassA objects, which means I would have 100
[quoted text clipped - 10 lines]
> requires less bus bandwidth between the cpu, memory and disk and less
> processing cycles (barring algorithm efficiency).

You seem to see things in black and white.  The world doesn't work that
way.  Practically everything is an object in Java.  Objects are cheap.  
The entire runtime system, memory management, etc. is designed that way,
and people have put lots of effort into making it so.  Anything else you
do that tries to minimize creating objects is likely to not be a
noticable improvement, and often hurts the performance of your code.

On the other hand, creating 100 threads is certainly not cheap, and
almost certainly harmful if you care about performance in this
application... unless it will be running on some kind of supercomputer
that has at least 50 processors or so.  Sometimes creating 100 threads
can make your development life easier by helping you separate various
tasks in your application design; but if that cost is okay with you, you
are certainly misplacing your priorities when you worry about creating
that extra 100 objects.  This isn't about whether you should be happy
with a sub-optimal program.  It's about whether you should worry about
polishing the deck when the Titanic is sinking.

Signature

Chris Smith

Tom Forsmo - 16 Nov 2006 10:14 GMT
>> I don't believe in code bloat and I see it as unnecessary runtime
>> resource consumption. I don't subscribe to the idea that you should not
[quoted text clipped - 7 lines]
> You seem to see things in black and white.  The world doesn't work that
> way.  

:)

> Practically everything is an object in Java.  Objects are cheap.  
> The entire runtime system, memory management, etc. is designed that way,
> and people have put lots of effort into making it so.  Anything else you
> do that tries to minimize creating objects is likely to not be a
> noticable improvement, and often hurts the performance of your code.

That only applies if you don't have experience in thinking about
avoiding code bloat and its problems. I have numerous times created
applications that require a fraction of memory or cpu power compared to
a someone's idea that you should not worry about it. In some cases I
have also created working and stable solutions when others have not
managed to get  one off the ground, because of code bloat.

> Sometimes creating 100 threads
> can make your development life easier by helping you separate various
> tasks in your application design; but if that cost is okay with you,

Yes, for example in high performance server design, where the server
should be able to handle between thousand and ten thousand transactions
per second.

> you
> are certainly misplacing your priorities when you worry about creating
> that extra 100 objects.  This isn't about whether you should be happy
> with a sub-optimal program.  It's about whether you should worry about
> polishing the deck when the Titanic is sinking.

Its funny how many people hide behind that statement, it clearly shows
they really do not know what they are talking about in that respect. I
have experience in thinking about the problem, so I don't use much
"cognitive effort" to avoid it. A person not used to thinking about it
would spend much time worrying about it, which incidentally seems to be
the majority of java developers i have talked to. The mutual feeling
among them seems to be that the JVM will take care of it all for you, so
don't worry your pretty little head about it...  I am not saying there
is nothing to what you are saying, of course there is, but its not as
black and white as you are saying it is.

tom
Chris Smith - 16 Nov 2006 15:11 GMT
> That only applies if you don't have experience in thinking about
> avoiding code bloat and its problems. I have numerous times created
> applications that require a fraction of memory or cpu power compared to
> a someone's idea that you should not worry about it. In some cases I
> have also created working and stable solutions when others have not
> managed to get  one off the ground, because of code bloat.

I don't believe you.  That is, I believe you've created better software
than other people; I don't believe that an improvement of that scale
came from decisions along the lines of trying to avoid creating one
object per thread.

> > Sometimes creating 100 threads
> > can make your development life easier by helping you separate various
[quoted text clipped - 3 lines]
> should be able to handle between thousand and ten thousand transactions
> per second.

Are there anything close to 50 CPUs in the box?  If not, then 100
threads is still very likely to be killing your performance to the point
that it's way past time to worry about 100 objects.  Half my point is
that you are overestimating the performance impact of creating
objects... but the other half is that you are underestimating the
performance impact of creating threads.  If you need that kind of
performance, you should be doing thread pooling and asynchronous I/O in
concert with state machines to reduce the number of threads to less than
about twice the number of CPUs.

> Its funny how many people hide behind that statement, it clearly shows
> they really do not know what they are talking about in that respect.

"That" statement?  You mean the one that says that threads are likely
killing your performance, so stop worrying about 1K of memory
allocations?  Most people probably don't hear that a lot.  If you're
hearing that a lot from other developers, then perhaps it's time to
think about whether threads are killing your performance.

Yes, I realize you'd like to come away from this conversation feeling
superior because you can avoid "code bloat" while all us lowly Java
programmers can't.  Feel free to do so, of that's what your ego needs.  
Otherwise, you might want to fix your thread problem.

Signature

Chris Smith

Tom Forsmo - 16 Nov 2006 17:44 GMT
> I don't believe you.  That is, I believe you've created better software
> than other people; I don't believe that an improvement of that scale
> came from decisions along the lines of trying to avoid creating one
> object per thread.

No, not at that small level, but applying the same principle to all code
and especially to data structures that hold big amounts of data.

>>> Sometimes creating 100 threads
>>> can make your development life easier by helping you separate various
[quoted text clipped - 6 lines]
> threads is still very likely to be killing your performance to the point
> that it's way past time to worry about 100 objects.  

Are you pulling my leg? What system are you running on?

The code I did which prompted me to ask the original question, ran a
thousand server threads and five hundred client threads where the client
issued ten thousand requests per thread. In total five million requests,
which finished in around 45 minutes. This is on a Intel Core Duo
processor running Linux with 1.5 GB of RAM:

Linux duplo 2.6.17.8tf2 #10 SMP PREEMPT Wed Aug 30 22:35:48 CEST 2006
i686 Genuine Intel(R) CPU           T2300  @ 1.66GHz unknown GNU/Linux

Threads are very cheap in linux 2.6, when they changed the kernel thread
model, they did a test where they created one hundred thousand threads.
With the old model that took about 15 minutes with the new model it took
2 seconds ref: http://kerneltrap.org/node/422
As far as I understand it. On Windows processes are expensive while
threads are cheap. On linux processes are cheap and threads are
extremely cheap.

Back to the business at hand. The server and client is communicating
with UDP (so that's a bit cheaper and its a simple request/reply
operation, which talks to a DB (oracle cluster, so the DB does not cause
any problems). In addition the code is completely self made, no app
servers or anything like that, which of course would eat up a lot of the
cpu power and memory.

> "That" statement?  You mean the one that says that threads are likely
> killing your performance, so stop worrying about 1K of memory
> allocations?  

No, I mean the statement: "stop worrying about memory and processing
power, we can just buy some more...".

> Most people probably don't hear that a lot.  If you're
> hearing that a lot from other developers, then perhaps it's time to
> think about whether threads are killing your performance.

Its almost exclusively coming from java developers, but also from
developers of other languages, although not as much. I think its lazy
programming. I don't mean to be rude and condescending towards java or
java developers, I like java as well. I just think there are some ideas
that the programming and java community should open their eyes to. I
have been working in a C project the last couple of years and that's
where I learned to appreciate that sentiment.

tom
Chris Smith - 17 Nov 2006 03:14 GMT
> The code I did which prompted me to ask the original question, ran a
> thousand server threads and five hundred client threads where the client
> issued ten thousand requests per thread. In total five million requests,
> which finished in around 45 minutes.

Good.  Then you have no reason to care about the memory required for one
small object per thread.

Signature

Chris Smith

Tom Forsmo - 17 Nov 2006 09:40 GMT
>> The code I did which prompted me to ask the original question, ran a
>> thousand server threads and five hundred client threads where the client
[quoted text clipped - 3 lines]
> Good.  Then you have no reason to care about the memory required for one
> small object per thread.

hats exactly the general sentiment I am opposing in my argument...

In any case, I asked in the previous post what system you where running
on, since you say that running 100 threads would kill the performance,
do you mind telling me? If there is that much difference in performance
between systems its good to be aware of that.

tom
Tom Forsmo - 17 Nov 2006 09:56 GMT
>> The code I did which prompted me to ask the original question, ran a
>> thousand server threads and five hundred client threads where the client
[quoted text clipped - 3 lines]
> Good.  Then you have no reason to care about the memory required for one
> small object per thread.

Not the point. This discussion is not about performance as a consequence
of memory consumption, those are two separate issues in this thread.

In any case, I asked in the previous post what system you where running
on, since you say that running 100 threads would kill the performance,
do you mind telling us?

If there is that much difference in performance between systems I think
that's a valuable discussion to have in this group.

tom
Chris Smith - 17 Nov 2006 17:12 GMT
> Not the point. This discussion is not about performance as a consequence
> of memory consumption, those are two separate issues in this thread.

Of course it's the point.  The point was that you were concerned about
the memory overhead of creating one unnecessary object per thread.  I
was pointing out that it's not a sensible thing to be concerned about.

> In any case, I asked in the previous post what system you where running
> on, since you say that running 100 threads would kill the performance,
> do you mind telling us?

The machine I was speaking of was the hypothetical system in which
creating 100 objects has a discernable performance impact.  I am not
aware of any such machine in common use; but apparently you are
convinced that you are using one.  My Commodore 64 certainly qualifies;
but I haven't yet figured out how to get it to do multithreading.

Robert mentioned a start time of 5 ms per thread.  You responded that
your Linux server creates a thread on 0.02 ms.  By contrast, an object
allocation for Integer on my system (including some amortized time for
garbage collection, though probably too little since the object graph in
test code is inevitably simpler than in production code) takes about
0.000015 ms.  That's not really accounting for the real performance
impact of the threads, though, which is paid.  Unless you have a
hideously bad architecture, you won't spend a lot of time creating
threads.  Since you've now decided on 100000 threads instead of 100, the
real cost for these threads will be paid:

1. During scheduling.
2. In cache misses and TLB flushes due to context switching

Memory-wise, 100000 unnecessary objects requires about 1MB of memory;
not trivial in absolute terms, certainly.  But each thread will require
at a minimum one machine page (typically 4K) of stack space, plus extra
data structures in the kernel for tracking.  That's about half a
gigabyte of memory, and that's extremely conservative.

Result: it makes no sense to worry about one unnecessary object per
thread.

Signature

Chris Smith

Robert Klemme - 17 Nov 2006 10:16 GMT
> As far as I understand it. On Windows processes are expensive while
> threads are cheap. On linux processes are cheap and threads are
> extremely cheap.

Yep, threads on modern systems are very cheap.  I once cooked up a small
program (attached) to collect thread stats.  On my 3GHz P4D with Win XP
Pro x64 it yields

max t11 - start time in thread: 140
avg t11 - start time in thread: 4.14126
max t2  - creation time       : 78
avg t2  - creation time       : 0.03084
max t3  - start time in main  : 78
avg t3  - start time in main  : 0.1558

max t11 - start time in thread: 204
avg t11 - start time in thread: 5.35656
max t2  - creation time       : 16
avg t2  - creation time       : 0.0282
max t3  - start time in main  : 16
avg t3  - start time in main  : 0.15657

5ms as average starting time for a thread isn't really much.

> No, I mean the statement: "stop worrying about memory and processing
> power, we can just buy some more...".
[quoted text clipped - 6 lines]
> developers of other languages, although not as much. I think its lazy
> programming.

I do not think so - rather it is consciously trying to find a good OO
design.  OOA/D/P are quite different from procedural.  While I do agree
that thought has to be given to issues of memory consumption and CPU
usage during design of performance critical applications, overdoing it
is certainly doing more harm than good.  Considering the overhead of one
object created per thread to be too much will definitively harm the
design of the application.  And this is even more so true in Java where
the overhead of object creation on modern VM's is negligible.
Performance might be one goal but there are tons of other goals.  If you
have the ultra performant application that nobody can maintain then
you're getting nowhere.

> I don't mean to be rude and condescending towards java or
> java developers, I like java as well. I just think there are some ideas
> that the programming and java community should open their eyes to. I
> have been working in a C project the last couple of years and that's
> where I learned to appreciate that sentiment.

I would be very carefully carrying over knowledge from a C environment
to a Java or other OO environment.  While there are similarities and
general principles one must be aware of the platform and adjust to its
specifics.

Regards

    robert
Tom Forsmo - 17 Nov 2006 11:37 GMT
> 5ms as average starting time for a thread isn't really much.

That means windows can only create 400 threads in 2 seconds, compared to
linux 2.6 which creates 100,000 threads in 2 seconds. hats a big
difference. That makes me understand why people in this thread talks
about the performance hit of having large number of threads.

We are, though, comparing c thread calls to java thread calls, even
though java threads are native threads on both windows and linux in java 5.0
Additionally these numbers say nothing about execution efficiency of
threads in windows compared to linux.

I will have a look at you program and run it on my computer, in both
windows and linux, since there would be no no hardware difference. I
never thought it might be that big a difference between linux and
windows, actually I am not sure this difference can be correct. I know
Ingo Molnar of the linux kernel team is really good when it comes this
stuff, but microsoft can not be doing that bad here, we will see.

To test execution efficiency I will create a small test app which I will
run on both systems as well, just to get that angle. I will post my results.

> While I do agree
> that thought has to be given to issues of memory consumption and CPU
[quoted text clipped - 3 lines]
> design of the application.  And this is even more so true in Java where
> the overhead of object creation on modern VM's is negligible.

I agree, it was an instinctive reaction that prompted me to start this
thread and I decided I wanted to know the answer. I like to know the
cost and consequence of doing things on one way compared to another, for
future reference.

> I would be very carefully carrying over knowledge from a C environment
> to a Java or other OO environment.  While there are similarities and
> general principles one must be aware of the platform and adjust to its
> specifics.

I agree, but that does not preclude the chance that there might
something that can be learned from other platforms.

tom
Tom Forsmo - 27 Nov 2006 01:27 GMT
>> As far as I understand it. On Windows processes are expensive while
>> threads are cheap. On linux processes are cheap and threads are
>> extremely cheap.
>
> Yep, threads on modern systems are very cheap.  I once cooked up a small
> program (attached) to collect thread stats.  

I ran you program on my machine in both windows and linux and discovered
some interesting results:

The machine is a dual boot Thinkpad T60 with intel dual core. no special
systems/kernel optimisations has been performed on either systems.

linux: vanilla linux 2.6.17.8 kernel release running on Mandriva 2006
windows: factory installed windows xp with SP 2 (version 2002)

tf - linux:

max t11 - start time in thread: 55
avg t11 - start time in thread: 0.16632
max t2  - creation time       : 42
avg t2  - creation time       : 0.02114
max t3  - start time in main  : 42
avg t3  - start time in main  : 0.09306

max t11 - start time in thread: 65
avg t11 - start time in thread: 0.15874
max t2  - creation time       : 15
avg t2  - creation time       : 0.01887
max t3  - start time in main  : 15
avg t3  - start time in main  : 0.09395

tf - windows:

max t11 - start time in thread: 78
avg t11 - start time in thread: 0.66997
max t2  - creation time       : 78
avg t2  - creation time       : 0.14944
max t3  - start time in main  : 63
avg t3  - start time in main  : 0.27753

max t11 - start time in thread: 47
avg t11 - start time in thread: 0.73756
max t2  - creation time       : 47
avg t2  - creation time       : 0.14407
max t3  - start time in main  : 47
avg t3  - start time in main  : 0.29903

Conclusion: linux is faster.

I also tested a thread efficiency program I made, its a udp server and
client.

server: -t 1000          (number of threads: 1000)
client: -t 500 -r 10000  (number of threads 500,
                          number of requests per thread: 10000)

tf - linux

Average creation time for client object: 0.00462ms
Time executing threads: 183114ms (183.114s)
Average creation time for client object: 0.00426ms
Time executing threads: 182486ms (182.486s)

tf - windows

Average creation time for client object: 0.00359ms
Time executing threads: 535891ms (535.891s)
Average creation time for client object: 0.00296ms
Time executing threads: 536219ms (536.219s)

conclusion: windows is faster at creating client objects by a little
bit, but linux is 3 times faster at executing the actual operations.

I did another test with this code also:

in the server there is a sleep() call to simulate db access, I
experimented a bit with what values it could hold and how it would
affect the total performance. I found out that the performance
increasement is proportional to the sleep time decreasement, and that
all values down to 1ms (since it is the lowest value for the call I
made) affected performance. But for windows the story was completely
different, why that is I dont know. In windows any values below
100-110ms was rounded up to approx 100ms. So I could not get any
performance increase with values below 100ms. Also there was a strange
spike at the 1ms and 2ms tests (it might have something to do with
kernel context switching thresholds)

Here are the measurements:

tf - windows:

1ms:
E:\threads_perf>java -cp . tf.StatelessUdpClient -t 500 -r 10000
Time executing threads: 453875ms (453.875s)
Time executing threads: 483859ms (483.859s)

2ms:
Time executing threads: 656609ms (656.609s)
Time executing threads: 684734ms (684.734s)

4ms:
Time executing threads: 572547ms (572.547s)
Time executing threads: 604500ms (587.5s)

30ms:
Time executing threads: 578796ms (578.796s)
Time executing threads: 555860ms (555.86s)

100ms:
Time executing threads: 571079ms (571.079s)
Time executing threads: 593531ms (593.531s)

120ms:
Time executing threads: 632657ms (632.657s)
Time executing threads: 639125ms (639.125s)

150ms:
Time executing threads: 773750ms (773.75s)
Time executing threads: 771406ms (771.406s)

200ms:
Time executing threads: 1019234ms (1019.234s)
Time executing threads: 1021328ms (1021.328s)

500ms:
Time executing threads: 2543078ms (2543.078s)
Time executing threads: 2544656ms (2544.656s)

The code is attached.

tom
Robert Klemme - 27 Nov 2006 09:31 GMT
>>> As far as I understand it. On Windows processes are expensive while
>>> threads are cheap. On linux processes are cheap and threads are
[quoted text clipped - 8 lines]
> The machine is a dual boot Thinkpad T60 with intel dual core. no special
> systems/kernel optimisations has been performed on either systems.

Does it also have a dual display and a dual keyboard?  :-) SCNR

> linux: vanilla linux 2.6.17.8 kernel release running on Mandriva 2006
> windows: factory installed windows xp with SP 2 (version 2002)

> Conclusion: linux is faster.
>
[quoted text clipped - 21 lines]
> conclusion: windows is faster at creating client objects by a little
> bit, but linux is 3 times faster at executing the actual operations.

Interesting findings!  Thanks for sharing these!

Kind regards

    robert
bugbear - 15 Nov 2006 13:14 GMT
> Hi
>
[quoted text clipped - 9 lines]
>
> What puzzles me is that in a way both ways seems slightly wrong.

Objects to the rescue!

Data that is per thread should be in the object associated
with the thread.

Shared data should be another object(s), held in a
field of the per-thread objects.

  BugBear
Robert Klemme - 15 Nov 2006 13:18 GMT
> Hi
>
[quoted text clipped - 9 lines]
>
> What puzzles me is that in a way both ways seems slightly wrong.

Both are valid approaches although the one instance per thread seems
more common.

> Creating a number of objects with a thread for each is sort of like
> creating many separate programs/processes, it seems like waste of
> objects to start with. Why create as many objects as you would create
> threads?

In order to not let the threads interfere with each other.  If you work
with Runnable then this is the typical setup, i.e. you create one
Runnable instance per thread.  That's not really a waste of objects
since it's just this one instance.  An object is nothing "heavy" at
least not by default.  Creating an instance without any state is very
cheap on modern JVM's.  You can easily create tons of objects per second.

> Creating only one object and creating a number of threads for that
> objects run method, also seems wrong, for two reasons. 1) the object
> could then not have any state unless it was to be shared

Exactly.  And you must synchronize access to that state.  But if the
Runnable just implements some kind of function (i.e. has no state of its
own) it is perfectly ok to execute it from multiple threads.

> and 2) when
> reading the name of the thread all threads are named the same.

This is wrong.  The name is read from the Thread instance not from the
Runnable the thread executes (unless of course you make the name you are
referring to a member of that Runnable).

> I understand that it would be normal to create other objects and execute
> their methods run(), which would then effectively create a
> function/method invocation and it all aligns well with my previous
> perception. But the startup part of it seems a bit strange to me.

You do not actually create a function but an object.  That object can
have methods (and typically has).  You can invoke methods on an object
from multiple threads - in some cases it works, in others it does not.
That completely depends on the class implementation: if there is no
state or if access to state is properly synchronized then it is likely
to work from multiple threads - if there is state and accesses are not
properly synchronized all bets are off.

> Anyone care to help me push my perception into alignment again.

It seems to me that your problem might more lie in the area of object
oriented thinking.  This can be difficult when coming from a procedural
background.  The same happened to me when I embraced OO.  It can take
some time to get used to it.  However, there are plenty resources out
there that introduce OO.

For reading up on Java threads I can very much recommend Doug Lea's book
and website:

http://www.awprofessional.com/bookstore/product.asp?isbn=0201310090&rl=1
http://g.oswego.edu/dl/

Kind regards

    robert


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.