Java Forum / General / September 2005
Java sockets and readLine
kahiga - 17 Sep 2005 00:13 GMT I have a concept java question on how java is able to return from the blocking readLine() after reading in a line ending in \n (*nix) or \r\n (Win) or \r (Mac); Specifically for an inputstream coming from the network (socket). When you create a socket connection to another Computer running UNIX, MAC or Windows how does java know what the line separator character is.
The basic analysis I can think would be that once you call readLine() which is a blocking IO process, the jvm simply keeps reading in incoming characters from the network until it detects a \r\n (for windows) and then returns with the string. Now if the computer is a Mac with uses \r for EOL, how does java know to stop waiting for the \n and just return the string, or am I misunderstanding the process?
Any Ideas are welcomed.
Mike Schilling - 17 Sep 2005 01:10 GMT >I have a concept java question on how java is able to return from the > blocking readLine() after reading in a line ending in \n (*nix) or \r\n [quoted text clipped - 12 lines] > > Any Ideas are welcomed. Any socket-to-socket communications protocol has to be defined at the byte level. HTTP header lines, for instance, are defined always to end with CRLF regardless of which platform is being used. You're right that doing writeLine()s on a Mac expecting to be able to read the result with readLine()s on Windows won't work. Don't do that.
kahiga - 17 Sep 2005 04:32 GMT I see you're point about using a predefined protocol communication but I was thinking at a simpler level e.g. a simple java server with a telnet client whose only purpose (the server) is to echo any line it receives from the client. Obviously this is possible and it shouldn't matter what platform the client is running on.
>You're right that doing writeLine()s on a Mac expecting to be able to read >the result with readLine()s on Windows won't work. Why? Seems like it should, otherwise wouldn't it break the WORA principle if the code has to be platform specific?
I think I figured out what java is doing with readLine().
Case 1 - Line ends with a \n only. Keep reading characters from network and when you see a \n, return the buffered characters as a string.
Case2 - Line ends with a \r only. Keep reading characters from network and when you see a \r, return the buffered characters as a string.
Case3 - Line ends with a \r\n. Keep reading characters from network and when you see a \r, return the buffered characters as a string. If the next character read from the network in \n, discard it.
Thomas Hawtin - 17 Sep 2005 08:47 GMT > I see you're point about using a predefined protocol communication but > I was thinking at a simpler level e.g. a simple java server with a > telnet client whose only purpose (the server) is to echo any line it > receives from the client. Obviously this is possible and it shouldn't > matter what platform the client is running on. Telnet is a predefined protocol. It defines a Network Virtual Terminal (NVT). Unless the BINARY option is negotiated, the default end of line is CR LF. Other protocols use a similar convention.
http://www.ietf.org/rfc/rfc0854.txt
TCP itself really does just give you octet streams.
Tom Hawtin
 Signature Unemployed English Java programmer http://jroller.com/page/tackline/
kahiga - 17 Sep 2005 10:55 GMT > Telnet is a predefined protocol. It defines a Network Virtual Terminal > (NVT). Unless the BINARY option is negotiated, the default end of line > is CR LF. Other protocols use a similar convention. > > http://www.ietf.org/rfc/rfc0854.txt True indeed (Wasn't aware of the rfc). I also found this article that gave a more user friendly description of the "Telnet EOL convention": http://www.freesoft.org/CIE/RFC/1123/31.htm.
I guess my example of a client was flawed, but what I was trying to specify was a client using a non-predefined protocol. Maybe a better example would be a custom java client that only sends lines of text to the server and the server locally echo's each line of text while using readLine() to read the text from the client.
I created a sample java client and sent 3 lines ending in different EOL's: "Hello world\r" "Hello world\n" "Hello world\r\n" And the server was able to read all these lines correctly using readLine().
Steve Horsley - 17 Sep 2005 13:08 GMT > I see you're point about using a predefined protocol communication but > I was thinking at a simpler level e.g. a simple java server with a > telnet client whose only purpose (the server) is to echo any line it > receives from the client. Obviously this is possible and it shouldn't > matter what platform the client is running on. The telnet RFC specifically says that Carriage Return '\r' MUST be followed by either NewLine '\n' or Null 0x00, depending on whether a line feed action is required in addition to the carriage return. A CR-NULL implies that the current line will be overwritten by the following line (or overtyped if printing).
In addition, telnet can carry escape sequences that do things like turn echo on/off and query the terminal type. So your eco server will probably work in the sense that people would see what they typed, but would not be a "proper" telnet implementation.
>> You're right that doing writeLine()s on a Mac expecting to be able to read >> the result with readLine()s on Windows won't work. [quoted text clipped - 15 lines] > buffered characters as a string. If the next character read from the > network in \n, discard it. I think you are right - this describes readline(). Case 3 is really case 2 in disguise. All you need is a rule that says to drop a '\n' if it immediately follows a '\r'.
Steve
Roedy Green - 17 Sep 2005 22:12 GMT >The telnet RFC specifically says that Carriage Return '\r' MUST >be followed by either NewLine '\n' or Null 0x00, depending on >whether a line feed action is required in addition to the >carriage return. A CR-NULL implies that the current line will be >overwritten by the following line (or overtyped if printing). That shows you how old the protocol must be. The null gives additional time for the mechanical tty head to return to the left hand side of the page.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Mike Schilling - 19 Sep 2005 21:04 GMT >I see you're point about using a predefined protocol communication but > I was thinking at a simpler level e.g. a simple java server with a [quoted text clipped - 6 lines] > Why? Seems like it should, otherwise wouldn't it break the WORA > principle if the code has to be platform specific? I think it's been explained why it won't. The WORA principle is an ideal, not an absolute. Java creates an abstraction layer, and so long as you can stay within that layer, WORA works reasonably well. Reading bytes from a socket lives outside that layer, just as reading raw bytes from the a disk would.
Roedy Green - 17 Sep 2005 02:35 GMT >When you create a socket connection to another >Computer running UNIX, MAC or Windows how does java know what the line >separator character is. readlin seems to be smart and works no matter what the lines separator is . You can experiment with a file with different line separator and it reads them all fine.
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Pete Barrett - 17 Sep 2005 08:44 GMT >I have a concept java question on how java is able to return from the >blocking readLine() after reading in a line ending in \n (*nix) or \r\n [quoted text clipped - 12 lines] > >Any Ideas are welcomed. The documentation for BufferedReader says:
"Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed."
That seems fairly clear. Since the input is buffered, it can afford to look ahead to the next character if it reads a carriage return.
Pete Barrett
kahiga - 17 Sep 2005 11:04 GMT > The documentation for BufferedReader says: > [quoted text clipped - 4 lines] > That seems fairly clear. Since the input is buffered, it can afford to > look ahead to the next character if it reads a carriage return. This might be fine for files and for the local cases it may also use the info from the <line.separator> to determine the EOL format. However, in the case of the network, the server cannot "look ahead" to see the next character. It has to wait for the client to send it. My original case was this; if the client so far has sent, for example, "Hello world\r" and the server is blocking on the readLine() method. How does it know to return the current string and not keep waiting to receive the next "\n".
Chris Uppal - 17 Sep 2005 12:53 GMT > if the client so far has sent, for example, > "Hello world\r" and the server is blocking on the readLine() method. > How does it know to return the current string and not keep waiting to > receive the next "\n". It doesn't. That's why protocols (including ones you create yourself) should specify exactly what gets written on the wire, and why platform-specific shortcuts like println() should not be used in their implementation.
-- chris
Steve Horsley - 17 Sep 2005 13:09 GMT > This might be fine for files and for the local cases it may also use > the info from the <line.separator> to determine the EOL format. [quoted text clipped - 4 lines] > How does it know to return the current string and not keep waiting to > receive the next "\n". It can afford to return as soon as it sees the '\r'. It just has to make a note that next time it is called, if the first character out is a '\n' then this should be dropped.
Steve
Raymond DeCampo - 17 Sep 2005 23:23 GMT >> This might be fine for files and for the local cases it may also use >> the info from the <line.separator> to determine the EOL format. [quoted text clipped - 8 lines] > a note that next time it is called, if the first character out is a '\n' > then this should be dropped. Exactly. There is not actually a need to "look ahead" at all.
Ray
 Signature XML is the programmer's duct tape.
Scott Ellsworth - 19 Sep 2005 20:54 GMT > > This might be fine for files and for the local cases it may also use > > the info from the <line.separator> to determine the EOL format. [quoted text clipped - 8 lines] > to make a note that next time it is called, if the first > character out is a '\n' then this should be dropped. This is certainly one way it _could_ be implemented, but earlier versions of Java were not implemented this way. It was quite common to see server software written that did a readLine() on a socket that failed when run on a Mac, but that worked great on Windows.
Scott
 Signature Scott Ellsworth scott@alodar.nospam.com Java and database consulting for the life sciences
Mike Schilling - 22 Sep 2005 19:32 GMT >> > This might be fine for files and for the local cases it may also use >> > the info from the <line.separator> to determine the EOL format. [quoted text clipped - 13 lines] > see server software written that did a readLine() on a socket that > failed when run on a Mac, but that worked great on Windows. I'm sorry to hear it was common to see server software written that did a readLine().
One of the drawbacks of Java is that it provides a surface simplicity that can disguise complex issues. It can fool people into thinking that building a multi-threaded server is as simple:as scattering some 'synchronized's around, or that persistence can be addressed merely by declaring that some classes implement Serializeable.
Pete Barrett - 18 Sep 2005 14:02 GMT >This might be fine for files and for the local cases it may also use >the info from the <line.separator> to determine the EOL format. [quoted text clipped - 4 lines] >How does it know to return the current string and not keep waiting to >receive the next "\n". I don't think there's anything in the documentation to say that readLine MUST return as soons as the \r character is received? It *could* wait until it can be sure, either because the next character has been actually been received or the socket has closed, whether there's a \n to follow the \r. But that would be an implementation detail, and others have suggested a better way of dealing with it.
As far as I can see, a worse problem arises in BufferedReaderS if the buffer is full and doesn't contain either \r or \n - what on earth does readLine do then? I don't see anything in the documentation to define what it does. If it doesn't expand the buffer, it can only return the contents of the buffer as a String, which would hardly be right.
Pete Barrett
Raymond DeCampo - 18 Sep 2005 14:17 GMT >>This might be fine for files and for the local cases it may also use >>the info from the <line.separator> to determine the EOL format. [quoted text clipped - 18 lines] > return the contents of the buffer as a String, which would hardly be > right. There's no need to expand the buffer, as in the buffer holding characters yet to be read. BufferedReader can simply treat itself as a client; once the readLine() method reads a character from the buffer that space is available to receive characters from the underlying stream. This means that there is a second buffer, in the form of a StringBuffer or StringBuilder, which is local to readLine() and is creating the String to be returned.
Note: This is all speculation, I haven't looked at the implementation of readLine().
Ray
 Signature XML is the programmer's duct tape.
Roedy Green - 18 Sep 2005 18:26 GMT >Note: This is all speculation, I haven't looked at the implementation of >readLine(). here is the main method in BufferedReader.readLine . It does not return until it has hit EOL.
/** * Read a line of text. A line is considered to be terminated by any one * of a line feed ('\n'), a carriage return ('\r'), or a carriage return * followed immediately by a linefeed. * * @param ignoreLF If true, the next '\n' will be skipped * * @return A String containing the contents of the line, not including * any line-termination characters, or null if the end of the * stream has been reached * * @see java.io.LineNumberReader#readLine() * * @exception IOException If an I/O error occurs */ String readLine(boolean ignoreLF) throws IOException { StringBuffer s = null; int startChar; boolean omitLF = ignoreLF || skipLF;
synchronized (lock) { ensureOpen();
bufferLoop: for (;;) {
if (nextChar >= nChars) fill(); if (nextChar >= nChars) { /* EOF */ if (s != null && s.length() > 0) return s.toString(); else return null; } boolean eol = false; char c = 0; int i;
/* Skip a leftover '\n', if necessary */ if (omitLF && (cb[nextChar] == '\n')) nextChar++; skipLF = false; omitLF = false;
charLoop: for (i = nextChar; i < nChars; i++) { c = cb[i]; if ((c == '\n') || (c == '\r')) { eol = true; break charLoop; } }
startChar = nextChar; nextChar = i;
if (eol) { String str; if (s == null) { str = new String(cb, startChar, i - startChar); } else { s.append(cb, startChar, i - startChar); str = s.toString(); } nextChar++; if (c == '\r') { skipLF = true; } return str; } if (s == null) s = new StringBuffer(defaultExpectedLineLength); s.append(cb, startChar, i - startChar); } } }
/** * Fill the input buffer, taking the mark into account if it is valid. */ private void fill() throws IOException { int dst; if (markedChar <= UNMARKED) { /* No mark */ dst = 0; } else { /* Marked */ int delta = nextChar - markedChar; if (delta >= readAheadLimit) { /* Gone past read-ahead limit: Invalidate mark */ markedChar = INVALIDATED; readAheadLimit = 0; dst = 0; } else { if (readAheadLimit <= cb.length) { /* Shuffle in the current buffer */ System.arraycopy(cb, markedChar, cb, 0, delta); markedChar = 0; dst = delta; } else { /* Reallocate buffer to accommodate read-ahead limit */ char ncb[] = new char[readAheadLimit]; System.arraycopy(cb, markedChar, ncb, 0, delta); cb = ncb; markedChar = 0; dst = delta; } nextChar = nChars = delta; } }
int n; do { n = in.read(cb, dst, cb.length - dst); } while (n == 0); if (n > 0) { nChars = dst + n; nextChar = dst; } }
 Signature Canadian Mind Products, Roedy Green. http://mindprod.com Again taking new Java programming contracts.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|