Hi,
I was wondering about the right "safest" usage of the DataInput/
OutputStream functions write/readUTF.
If I communicate via Sockets and the Server sends a message in form of
a string, how can a make sure that at the time of reading the message,
I read the whole message and not only a stub of it.
So if The message is "Santa Claus has a present for you.", how can I
make sure, that as a Client I read the whole message and not only
"Santa Claus has a present"
The server writes the message like:
DataOutputStream os;
String message = "Santa Claus has a present for you"
os.writeUTF(message);
os.flush();
The client reads the message like:
DataInputStream is;
String message = is.readUTF();
Is it all the time correct, or should I use a method to make 100% sure
that I have read the whole message?
(A method could be:
Server side: send first the byte length of the message as a "short"
then the message itself;
Client side: read first the length with the method "readShort" and
then to read exactly as many bytes, as the message length is...)
Does somebody know the official solution for the problem?
Best, korcs
Gordon Beaton - 27 Nov 2007 16:31 GMT
> Is it all the time correct, or should I use a method to make 100% sure
> that I have read the whole message?
[quoted text clipped - 8 lines]
>
> Does somebody know the official solution for the problem?
There are two common solutions, and some variations on those themes.
You already described one, the other is to delimit each message with a
special character (or sequence) that cannot occur within the message
unless escaped. For text, a newline might be a suitable candidate.
/gordon
--
Matt Humphrey - 27 Nov 2007 19:55 GMT
>> Is it all the time correct, or should I use a method to make 100% sure
>> that I have read the whole message?
[quoted text clipped - 13 lines]
> special character (or sequence) that cannot occur within the message
> unless escaped. For text, a newline might be a suitable candidate.
I'm curious because the OP is using writeUTF / readUTF which I have not
used. The Javadocs say that the encoding includes a 2-byte length field and
readUTF says that it will read that many bytes or throw EOF exception if it
encounteres EOF. This suggests that it will block until it can read fully
and that it won't read additional bytes. I would think that read/write UTF
would properly delimit and reassemble bytes into the original string without
needing an extra length field, markers or so forth. Is that so?
Matt Humphrey http://www.iviz.com/
Gordon Beaton - 27 Nov 2007 21:01 GMT
> I'm curious because the OP is using writeUTF / readUTF which I have not
> used.
Me neither...
> The Javadocs say that the encoding includes a 2-byte length field
> and readUTF says that it will read that many bytes or throw EOF
[quoted text clipped - 3 lines]
> reassemble bytes into the original string without needing an extra
> length field, markers or so forth. Is that so?
Hmm, could be.
/gordon
--
Andreas Leitgeb - 27 Nov 2007 17:04 GMT
> I was wondering about the right "safest" usage of the DataInput/
> OutputStream functions write/readUTF.
>
> If I communicate via Sockets and the Server sends a message in form of
> a string, how can a make sure that at the time of reading the message,
> I read the whole message and not only a stub of it.
One way to make it dance:
send a bytecount in advance
When the reader will measure
the number of bytes for pleasure
it shall never fall short,
unless on some network-abort.
Damn, I should have read the whole
posting, since obviously you knew that all...
> Server side: send first the byte length of the message as a "short"
> then the message itself;
I wouldn't send it as a "short",
an "int" might prevent inadvertent abort,
if the message was long and the short wrapped around,
you might cause surprise on reader's ground.
> Does somebody know the official solution for the problem?
you can just as well append a \n(ewline),
and have the reader read up to it, fine!
Joshua Cranmer - 27 Nov 2007 22:09 GMT
> If I communicate via Sockets and the Server sends a message in form of
> a string, how can a make sure that at the time of reading the message,
> I read the whole message and not only a stub of it.
Well, everything is going to be modulo network considerations, but Java
pieces everything together for you in the end through sockets.
> Is it all the time correct, or should I use a method to make 100% sure
> that I have read the whole message?
[quoted text clipped - 8 lines]
>
> Does somebody know the official solution for the problem?
From the Javadocs for DataOutputStream's writeUTF:
First, two bytes are written to the output stream as if by the
writeShort method giving the number of bytes to follow. This value is
the number of bytes actually written out, not the length of the string.
[ ... ]
Java already does the message length processing for you (see the
corresponding documentation in DataInput's readUTF if you don't believe me).

Signature
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
Roedy Green - 29 Nov 2007 01:46 GMT
On Tue, 27 Nov 2007 08:09:42 -0800 (PST), korcs
<konrad.lindner@gmx.net> wrote, quoted or indirectly quoted someone
who said :
>String message = is.readUTF();
I suggest you look at the source code for readUTF in src.zip I would
be very surprised if it did not block until it had all the characters
promised in the lead 2-byte count field.
Note that creates a rather severe 10,922 limit on the length of the
field.
See http://65.110.21.43/jgloss/utf.html#WRITEUTF
for details.

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Esmond Pitt - 29 Nov 2007 05:58 GMT
> Note that creates a rather severe 10,922 limit on the length of the
> field.
> See http://65.110.21.43/jgloss/utf.html#WRITEUTF
... which is not correct. The length word isn't specified as 'signed' in
the Javadoc, you seem to have just made that up. It is unsigned. It is
read with readUnsignedShort() in DataInputStream, and the Javadoc
clearly specifies a maximum length of 65,535 bytes.
Taking the 3-byte encoding into account, that makes 65535 / 3 = 21845
characters. But 3-byte encoding only applies to characters above the
0x07FF codepoint; characters from 0x007F to 0x07FF are encoded as 2
bytes, as are nulls, and the rest below 0x007F as 1 byte.
So a 64k-1 string composed from
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" encodes
to 65537 bytes including the length word.
So the maximum is 65535. Depending on the actual characters being
encoded it may be less, but the minimum 'less' is 65535/3.
And yes it blocks until it has read everything it is looking for or
encountered an exception, including EOFException. It does this with
DataInputStream.readFully(), as you would expect.