Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / September 2006

Tip: Looking for answers? Try searching our database.

Millions of Threads ?

Thread view: 
frankgerlach@gmail.com - 26 Aug 2006 00:38 GMT
Hello folks,
I am thinking about a telecom application, which would potentially
handle millions of mobile
phones (J2ME) as clients. Of course, I need a server (J2SE), too.
The "easy" implementation uses TCP connections for the client/server
communication. Problem is that there are only 65000 sockets per IP
address of the server. I think I could solve that by configuring
multiple IP addresses per network card.
Still, two problems remain: Memory used by each TCP connection and by
the enormous number of threads (each client would have a server thread
for the  "easy" implementation)
Because of all those issues I am considering the use of datagram
sockets and state machines  (one per client)  instead of one thread per
client. On the other hand, what is the difference between a state
machine called "Thread" and a "hand-crafted" state machine ? Both
consume memory, and maybe I could configure the JVM to allocate very
little memory per Thread.....
What do you think ? What is typically used in large telecoms
applications ?
Arne Vajhøj - 26 Aug 2006 00:44 GMT
> I am thinking about a telecom application, which would potentially
> handle millions of mobile
[quoted text clipped - 14 lines]
> What do you think ? What is typically used in large telecoms
> applications ?

I would say something like:

<1000 client : one thread per client
1000-10000 client : nio
>10000 clients : udp

Usually threads are mapped to OS threads and have a significantly
overhead.

With a million clients I think you should also worry about
horizontal scalability !!

Arne
frankgerlach@gmail.com - 26 Aug 2006 00:56 GMT
> With a million clients I think you should also worry about
> horizontal scalability !!
Well, a quick calculation suggests that a single server (4CPU or so)
could handle a million clients: Say each client sends 10 messages of
1000 bytes per day (The application in question is not very
data-intensive). That's 1000000*10*1000==10Gbyte. An internet
connection with 10Mbyte/sec (100Mbit Ethernet) could easily handle
this: 10000000*86400=864Gbyte. (Assuming there are no significant
traffic spikes). If I used Gbit Ethernet I could scale even higher; and
then I drop in 4 to 10 Gbit Ethernet cards...

> Arne
frankgerlach@gmail.com - 26 Aug 2006 01:00 GMT
Whatabout "green" threads ? Could they scale to millions per machine
(with less than 10 CPU cores and less than 32 GByte main memory)?
Arne Vajhøj - 26 Aug 2006 01:28 GMT
> Whatabout "green" threads ? Could they scale to millions per machine
> (with less than 10 CPU cores and less than 32 GByte main memory)?

Green thread can only use 1 core and will likely be
even slower.

Arne
Arne Vajhøj - 26 Aug 2006 01:32 GMT
> Well, a quick calculation suggests that a single server (4CPU or so)
> could handle a million clients: Say each client sends 10 messages of
[quoted text clipped - 4 lines]
> traffic spikes). If I used Gbit Ethernet I could scale even higher; and
> then I drop in 4 to 10 Gbit Ethernet cards...

If we assume that you spend 20 milliseconds processing
a request then a roundtrip over all connections
if they all need to process with 4 CPU's will take 83 minutes.

I hope it is not an interactive application.

:-)

Arne
Patricia Shanahan - 26 Aug 2006 02:51 GMT
>> Well, a quick calculation suggests that a single server (4CPU or so)
>> could handle a million clients: Say each client sends 10 messages of
[quoted text clipped - 14 lines]
>
> Arne

If each client sends 10 messages a day, there is a relatively low
probability that they all need service simultaneously. On the other
hand, it would be equally unreasonable to assume the traffic is
uniformly distributed.

I would model message arrivals with a Poisson distribution, but using
the rate associated with the busiest time of day, not the 24 hour
average. The problem is going to be working out where all the
bottlenecks are, and estimating the queue delays.

A thread has significant memory overhead, and appears on operating
system queues. How much do you really need to remember about each client
between messages? If it is small, you are better off keeping a client
state object for each client. When a message comes in, allocate a thread
and select the appropriate client state object.

Patricia
Brandon McCombs - 27 Aug 2006 01:01 GMT
>> With a million clients I think you should also worry about
>> horizontal scalability !!
[quoted text clipped - 7 lines]
> then I drop in 4 to 10 Gbit Ethernet cards...
>> Arne

I hope he has the money for a 10Mbyte/sec UPLINK. Sure, it seems small
and it is for an intranet link but that is an expensive link when you
have to put it on the Internet *and* for it to be your uplink speed, not
your downlink.
joseph.workman@gmail.com - 26 Aug 2006 01:44 GMT
What type of telecom application are you going to be writing ?
>From what it sounds like a messaging program of some kind.

Jj
David N. Welton - 26 Aug 2006 08:09 GMT
> What do you think ? What is typically used in large telecoms
> applications ?

You might have a look at Erlang (erlang.org), which was designed to deal
with exactly that sort of application.

Signature

David N. Welton
- http://www.dedasys.com/davidw/

Linux, Open Source Consulting
- http://www.dedasys.com/

Gordon Beaton - 26 Aug 2006 08:49 GMT
> What is typically used in large telecoms applications ?

Among other things, this:
http://www.erlang.org/

In particular, see section 1 of the FAQ:
http://www.erlang.org/faq/t1.html

/gordon

Signature

[ don't email me support questions or followups ]
g o r d o n  +  n e w s  @  b a l d e r 1 3 . s e

Chris Uppal - 26 Aug 2006 10:40 GMT
> I am thinking about a telecom application, which would potentially
> handle millions of mobile
[quoted text clipped - 3 lines]
> address of the server. I think I could solve that by configuring
> multiple IP addresses per network card.

Assuming < 10e6 customers, and average of 10 messages per day, and a target
turn-around time of 0.5 seconds, that suggests that you would have an average
of around 600 interactions "in-flight" at any one instant.  Obviously you would
have to model/estimate how much the peak load would exceed that.  That suggests
that running out of port numbers is not going to be one of your problems.

> Still, two problems remain: Memory used by each TCP connection and by
> the enormous number of threads (each client would have a server thread
> for the  "easy" implementation)

On most OSs each thread consumes a lot of resource -- the stack space for
instance.  It also puts load on the scheduler.  Don't use that many threads
unless either your JVM is specifically designed to handle that many threads as
lightweight entities (Sun's are not), or your OS is specifically designed to
provide ultra light-weight threads (I believe that there is a Linux threads
implementation which claims to have this property -- I have no experience of it
at all myself).

Otherwise, that many concurrent requests puts you squarely into NIO territory.
Then scale up your CPUs and/or load-balanced server boxes to make the target
turn-around time achievable.  Same goes for the database (if any).

Notice that, on the above assumptions, you will be serving a request at
intervals of (on average) about 1 millisecond.  That is substantially less than
the typical disk-seek time.  If each request causes one disk read, and if there
is not enough temporal correlation between requests (which there might be),
then each request will also cause one physical disk seek -- so you need
sufficient spindles to allow that many seeks to be issued without blocking each
other.

   -- chris
Thomas Hawtin - 26 Aug 2006 14:17 GMT
> I am thinking about a telecom application, which would potentially
> handle millions of mobile
[quoted text clipped - 3 lines]
> address of the server. I think I could solve that by configuring
> multiple IP addresses per network card.

That my be an OS problem, but it shouldn't be a problem for TCP or UDP.
Each connection is identified by four number: client IP address, client
port, server IP address and server port. Even with the last two
constant, you should get at least 32768 connection per client. Should be
enough.

Having said that, if all the connections go through the same opaque
proxy (or you try to a 1:1 mapping to back-ed app server or database),
then you could cluster onto a single client IP address. I really don't
know how mobile gateways operate.

> Still, two problems remain: Memory used by each TCP connection and by
> the enormous number of threads (each client would have a server thread
> for the  "easy" implementation)

If you want a million simultaneous connections, then ouch.

It's an area that has moved on a great deal over the last few years. So
a lot of what you read will be out of date.

The killer for threads is the amount of virtual address space used.
Therefore stick to 64-bit operating systems (shouldn't be a problem
these days). To handle large numbers of threads, OS have moved to
scalable algorithms. On Linux use a 2.6 rather than 2.4 kernel. I would
check what Solaris 10 x64 can do, but my Ultra 20 is on the blink.

I guess large numbers of TCP connections will consume a lot of memory.
Perhaps turning the window size will help (a large window helps
throughput with long latencies).

> Because of all those issues I am considering the use of datagram
> sockets and state machines  (one per client)  instead of one thread per
> client. On the other hand, what is the difference between a state
> machine called "Thread" and a "hand-crafted" state machine ? Both
> consume memory, and maybe I could configure the JVM to allocate very
> little memory per Thread.....

UDP does give you advantages in terms of not having to hold connections
and ability to work around TCP latency issues. Using UDP you will have
to reinvent a lot of TCP. Perhaps TCP and NIO would be better.

Probably you are best off using someone else's infrastructure. Here's
where I don't know much about what is available. IIRC, Grizzly and Ember
are two abstractions of NIO. Sun's Project Darkstar is a server that
handles problems such as fail-over. If you've got millions of customers,
you are probably going to want to make it more reliable than can
reasonably be achieved with one machine.

Tom Hawtin
Signature

Unemployed English Java programmer
http://jroller.com/page/tackline/

Robert Klemme - 28 Aug 2006 14:38 GMT
> Hello folks,
> I am thinking about a telecom application, which would potentially
[quoted text clipped - 15 lines]
> What do you think ? What is typically used in large telecoms
> applications ?

Another option to deal with this is LWP (light weight processing
framework).  You can find more info on this in Doug Lea's book [1].  The
basic pattern is that you have a thread pool and send requests down a
queue from which the first free thread picks the task processes it and
either sends results directly or through another queue where it is read
by one of another set of threads that transfers data back.  You save the
overhead of thread creation and deletion at the cost of a minimal more
complex architecture.

Of course the basic precondition is that the system at hand has enough
resources (main mem, CPU cycles, IO bandwidth) to actually deal with the
load.

HTH

Kind regards

    robert

[1] "Concurrent Programming in Java"
http://www.awprofessional.com/bookstore/product.asp?isbn=0201310090&rl=1
[2] Doug Lea's home page http://g.oswego.edu/
Dale King - 01 Sep 2006 04:23 GMT
> Hello folks,
> I am thinking about a telecom application, which would potentially
[quoted text clipped - 4 lines]
> address of the server. I think I could solve that by configuring
> multiple IP addresses per network card.

I think others have addressed the thread issue, I'll tackle your
misconception about IP. You need to understand the difference between
"sockets" which are a software concept and "ports" which are part of the
IP protocol. There is a limit of 65,536 "ports" per IP address. There is
really no limit other than memory, etc. on the number of "sockets". You
do not use a port per connected user. You have a socket connection with
each user, but they would all probably be communicating to a single IP port.

It is no different than a web server. There can be millions of users of
the web server and they are all connected to port 80 on the server. The
server only has to use one port.

You may have gotten confused by Sun itself! For many years, the Java
tutorial contained a totally incorrect and misleading explanation of how
TCP sockets work. It talked about the server switching from the well
known port (e.g. 80 for a web server) to another socket bound to a
different port once the connection was accepted. This is false. Once the
connection is made you move from the server socket that is listening to
new connections to another socket dedicated to that user, but it is
still bound to the same IP port on the server.

Here is a link to their old bogus tutorial page:

<http://web.archive.org/web/20000301031305/http://java.sun.com/docs/books/tutoria
l/networking/sockets/definition.html
>

You can contrast this with the corrected version (which appears to have
only been corrected this year despite years of protests):

<http://java.sun.com/docs/books/tutorial/networking/sockets/definition.html>

Signature

 Dale King



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.