I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
Java (4th edition):
"Write a program that reads a Java source-code file (you provide the
file name on the command line) and displays all the comments."
This is at the end of a section about regular expressions. We have
just learnt how to use appendReplacement().
I am having great difficulty dealing with comment-markers (// or /*)
inside string literals, comments that contain quotation marks, and
backslashes before quotation marks. For example:
System.out.println("// This is not a comment.");
System.out.println("Nor is any \" // of this.");
// All of this line is a comment, including "this".
I found a proposed answer at http://greggordon.org/java/tij4/solutions.htm.
But it is completely useless. It does not even cope with comments
using /* and */ that go over two lines.
Does anyone have a solution to this exercise? or any hints?
(I would also like solutions to the following two exercises: write a
program that reads a Java source-code file and displays all the string
literals; and write a program that examines Java source code and
produces all the class names used in a particular program.)
Joshua Cranmer - 23 Feb 2008 17:57 GMT
> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
> Java (4th edition):
[quoted text clipped - 4 lines]
>
> Does anyone have a solution to this exercise? or any hints?
The way I would do it would be to create a Reader on the file, read each
character and perform simple lexical analysis there, like so:
boolean inEOLComment = false, inCComment = false, inString = false;
for each character in stream:
if inEOLComment:
print character
if character is newline, inEOLComment = false
else if inCComment:
if character is * and next is /, inCComment = false
else print character
else if inString:
if character is \, skip next character
else if character is ", inString = false
else if character is /:
if next character is /, inEOLComment = true
else if next character is *, inCComment = true
else if character is ", inString = true
else, do nothing
(writing the actual Java code is left as an exercise to the reader)
> (I would also like solutions to the following two exercises: write a
> program that reads a Java source-code file and displays all the string
> literals; and write a program that examines Java source code and
> produces all the class names used in a particular program.)
The first should be a trivial modification of the previous code, and the
latter requires some more complex modification. Look up lexical analysis
and parsing for more details.
Note: The code I provided does not provide for preprocessing of Unicode
escapes. If you need to handle that, the easiest way would be to wrap an
input stream.

Signature
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
Lew - 23 Feb 2008 18:08 GMT
anon36@yahoo.com wrote:
>> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
>> Java (4th edition):
[quoted text clipped - 4 lines]
>>
>> Does anyone have a solution to this exercise? or any hints?
> The way I would do it would be to create a Reader on the file, read each
> character and perform simple lexical analysis there, like so:
[quoted text clipped - 17 lines]
>
> (writing the actual Java code is left as an exercise to the reader)
Another approach is to borrow from the LEX / YACC approach, and have a "lexer"
extract tokens from the input, along with a token-type enum identifying it as
"String literal", "identifier/keyword", "punctuation", etc. The output of the
lexer becomes the input to the parser, which examines each token and its
identifier, and operates a state machine with, say, states of IN_COMMENT and
NOT_IN_COMMENT.
You run the parser in a loop, with the interpretation of each token depending
on the current state. So, for example, if you hit IN_SINGLE_LINE_COMMENT
state, you ignore each token up until the LINE_END token. If you reach
IN_MULTI_LINE_COMMENT state, you ignore each token until you reach the
END_COMMENT token ("*/"). Strings inside the comment do not trigger a false
END_COMMENT because you've already lexed such strings into tokens. The parser
will not see an string-embedded "*/", it'll only see a STRING_LITERAL token.

Signature
Lew
Patrick May - 23 Feb 2008 18:00 GMT
> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking
> In Java (4th edition):
[quoted text clipped - 6 lines]
> inside string literals, comments that contain quotation marks, and
> backslashes before quotation marks.
One option is to use a state machine that knows when the current
file position is within a comment or string and when the next
character has been escaped with a backslash.
There are many web pages and Usenet posts on this technique.
Regards,
Patrick
------------------------------------------------------------------------
S P Engineering, Inc. | Large scale, mission-critical, distributed OO
| systems design and implementation.
pjm@spe.com | (C++, Java, Common Lisp, Jini, middleware, SOA)
Stanimir Stamenkov - 23 Feb 2008 22:23 GMT
Sat, 23 Feb 2008 08:50:14 -0800 (PST), /anon36@yahoo.com/:
> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
> Java (4th edition):
[quoted text clipped - 3 lines]
> just learnt how to use appendReplacement().
> [...]
You may take a look at the java.io.StreamTokenizer [1] class which
will break the source text into tokens (comments, string literals,
etc.) properly. Then you could build your parsing logic on top of it.
[1] http://java.sun.com/j2se/1.5.0/docs/api/java/io/StreamTokenizer.html

Signature
Stanimir
Stanimir Stamenkov - 23 Feb 2008 23:12 GMT
Sun, 24 Feb 2008 00:23:25 +0200, /Stanimir Stamenkov/:
> Sat, 23 Feb 2008 08:50:14 -0800 (PST), /anon36@yahoo.com/:
>
[quoted text clipped - 11 lines]
>
> [1] http://java.sun.com/j2se/1.5.0/docs/api/java/io/StreamTokenizer.html
Just tried the StreamTokenizer won't help you with the comments as
it just discards them when it recognizes them.

Signature
Stanimir
Roedy Green - 24 Feb 2008 00:22 GMT
>"Write a program that reads a Java source-code file (you provide the
>file name on the command line) and displays all the comments."
>This is at the end of a section about regular expressions. We have
>just learnt how to use appendReplacement().
see http://mindprod.com/jgloss/finitestate.html
http://mindprod.com/jgloss/parser.html
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
In the Middle of the Pack - 27 Feb 2008 01:22 GMT
> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
> Java (4th edition):
[quoted text clipped - 21 lines]
> literals; and write a program that examines Java source code and
> produces all the class names used in a particular program.)
We can't give you an outright solution since it is likely to be a homework
problem. But, we can give hints.
You might have to assume the Java in the source file is syntactically
correct. If you don't get a clean compile, you could conceivably have a
source code file in which you (a human) can't tell where a string literal
ends and a comment begins. But, I digress ...
It might help to have read the file character-by-character. You could have
some support methods, such as "readString" that does nothing but read
characters until it reaches the end of a string. You could have one or
more other methods that do nothing but read and print characters until the
end of a comment is reached. Maybe "readCommentA", "readCommentB" to
account for different types of comments.
anon36@yahoo.com - 01 Mar 2008 17:15 GMT
Thank you all. So it seems that the simplest way is just to parse it
one character at a time. I did it like Joshua suggested, and it works
fine.
For what it is worth, I have a feeling that Eckel didn't intend this.
For one thing, the preceding text in the book is about replace
operations and appendReplacement(StringBuffer sbuf, String
replacement). I am fairly sure the exercises are intended to use
appendReplacement.
What's more, like the last person said, I think you have to assume the
Java source file is syntactically correct, otherwise it might be
actually impossible to do. I think the author probably intended
readers to assume that there are no //s inside quotation marks, no
quotation marks inside comments, etc. ... which would make it all a
lot easier. But he didn't say that.
Thanks anyway!