Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / First Aid / March 2008

Tip: Looking for answers? Try searching our database.

Write a program that reads a Java source-code file and displays all     the comments.

Thread view: 
anon36@yahoo.com - 23 Feb 2008 16:50 GMT
I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
Java (4th edition):
"Write a program that reads a Java source-code file (you provide the
file name on the command line) and displays all the comments."
This is at the end of a section about regular expressions. We have
just learnt how to use appendReplacement().

I am having great difficulty dealing with comment-markers (// or /*)
inside string literals, comments that contain quotation marks, and
backslashes before quotation marks. For example:
System.out.println("// This is not a comment.");
System.out.println("Nor is any \" // of this.");
// All of this line is a comment, including "this".

I found a proposed answer at http://greggordon.org/java/tij4/solutions.htm.
But it is completely useless. It does not even cope with comments
using /* and */ that go over two lines.

Does anyone have a solution to this exercise? or any hints?

(I would also like solutions to the following two exercises: write a
program that reads a Java source-code file and displays all the string
literals; and write a program that examines Java source code and
produces all the class names used in a particular program.)
Joshua Cranmer - 23 Feb 2008 17:57 GMT
> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
> Java (4th edition):
[quoted text clipped - 4 lines]
>
> Does anyone have a solution to this exercise? or any hints?

The way I would do it would be to create a Reader on the file, read each
character and perform simple lexical analysis there, like so:

boolean inEOLComment = false, inCComment = false, inString = false;
for each character in stream:
  if inEOLComment:
     print character
     if character is newline, inEOLComment = false
  else if inCComment:
     if character is * and next is /, inCComment = false
     else print character
  else if inString:
     if character is \, skip next character
     else if character is ", inString = false
  else if character is /:
     if next character is /, inEOLComment = true
     else if next character is *, inCComment = true
  else if character is ", inString = true
  else, do nothing

(writing the actual Java code is left as an exercise to the reader)

> (I would also like solutions to the following two exercises: write a
> program that reads a Java source-code file and displays all the string
> literals; and write a program that examines Java source code and
> produces all the class names used in a particular program.)

The first should be a trivial modification of the previous code, and the
latter requires some more complex modification. Look up lexical analysis
and parsing for more details.

Note: The code I provided does not provide for preprocessing of Unicode
escapes. If you need to handle that, the easiest way would be to wrap an
input stream.

Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Lew - 23 Feb 2008 18:08 GMT
anon36@yahoo.com wrote:
>> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
>> Java (4th edition):
[quoted text clipped - 4 lines]
>>
>> Does anyone have a solution to this exercise? or any hints?

> The way I would do it would be to create a Reader on the file, read each
> character and perform simple lexical analysis there, like so:
[quoted text clipped - 17 lines]
>
> (writing the actual Java code is left as an exercise to the reader)

Another approach is to borrow from the LEX / YACC approach, and have a "lexer"
extract tokens from the input, along with a token-type enum identifying it as
"String literal", "identifier/keyword", "punctuation", etc.  The output of the
lexer becomes the input to the parser, which examines each token and its
identifier, and operates a state machine with, say, states of IN_COMMENT and
NOT_IN_COMMENT.

You run the parser in a loop, with the interpretation of each token depending
on the current state.  So, for example, if you hit IN_SINGLE_LINE_COMMENT
state, you ignore each token up until the LINE_END token.  If you reach
IN_MULTI_LINE_COMMENT state, you ignore each token until you reach the
END_COMMENT token ("*/").  Strings inside the comment do not trigger a false
END_COMMENT because you've already lexed such strings into tokens.  The parser
will not see an string-embedded "*/", it'll only see a STRING_LITERAL token.

Signature

Lew

Patrick May - 23 Feb 2008 18:00 GMT
> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking
> In Java (4th edition):
[quoted text clipped - 6 lines]
> inside string literals, comments that contain quotation marks, and
> backslashes before quotation marks.

    One option is to use a state machine that knows when the current
file position is within a comment or string and when the next
character has been escaped with a backslash.

    There are many web pages and Usenet posts on this technique.

Regards,

Patrick

------------------------------------------------------------------------
S P Engineering, Inc.  | Large scale, mission-critical, distributed OO
                      | systems design and implementation.
         pjm@spe.com  | (C++, Java, Common Lisp, Jini, middleware, SOA)
Stanimir Stamenkov - 23 Feb 2008 22:23 GMT
Sat, 23 Feb 2008 08:50:14 -0800 (PST), /anon36@yahoo.com/:

> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
> Java (4th edition):
[quoted text clipped - 3 lines]
> just learnt how to use appendReplacement().
> [...]

You may take a look at the java.io.StreamTokenizer [1] class which
will break the source text into tokens (comments, string literals,
etc.) properly.  Then you could build your parsing logic on top of it.

[1] http://java.sun.com/j2se/1.5.0/docs/api/java/io/StreamTokenizer.html

Signature

Stanimir

Stanimir Stamenkov - 23 Feb 2008 23:12 GMT
Sun, 24 Feb 2008 00:23:25 +0200, /Stanimir Stamenkov/:
> Sat, 23 Feb 2008 08:50:14 -0800 (PST), /anon36@yahoo.com/:
>
[quoted text clipped - 11 lines]
>
> [1] http://java.sun.com/j2se/1.5.0/docs/api/java/io/StreamTokenizer.html

Just tried the StreamTokenizer won't help you with the comments as
it just discards them when it recognizes them.

Signature

Stanimir

Roedy Green - 24 Feb 2008 00:22 GMT
>"Write a program that reads a Java source-code file (you provide the
>file name on the command line) and displays all the comments."
>This is at the end of a section about regular expressions. We have
>just learnt how to use appendReplacement().

see http://mindprod.com/jgloss/finitestate.html
http://mindprod.com/jgloss/parser.html
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
In the Middle of the Pack - 27 Feb 2008 01:22 GMT
> I am trying to do exercise 17 on page 546 of Bruce Eckel's Thinking In
> Java (4th edition):
[quoted text clipped - 21 lines]
> literals; and write a program that examines Java source code and
> produces all the class names used in a particular program.)

We can't give you an outright solution since it is likely to be a homework
problem.  But, we can give hints.

You might have to assume the Java in the source file is syntactically
correct. If you don't get a clean compile, you could conceivably have a
source code file in which you (a human) can't tell where a string literal
ends and a comment begins. But, I digress ...

It might help to have read the file character-by-character.  You could have
some support methods, such as "readString" that does nothing but read
characters until it reaches the end of a string.  You could have one or
more other methods that do nothing but read and print characters until the
end of a comment is reached.  Maybe "readCommentA", "readCommentB" to
account for different types of comments.
anon36@yahoo.com - 01 Mar 2008 17:15 GMT
Thank you all. So it seems that the simplest way is just to parse it
one character at a time. I did it like Joshua suggested, and it works
fine.

For what it is worth, I have a feeling that Eckel didn't intend this.

For one thing, the preceding text in the book is about replace
operations and appendReplacement(StringBuffer sbuf, String
replacement). I am fairly sure the exercises are intended to use
appendReplacement.

What's more, like the last person said, I think you have to assume the
Java source file is syntactically correct, otherwise it might be
actually impossible to do. I think the author probably intended
readers to assume that there are no //s inside quotation marks, no
quotation marks inside comments, etc. ... which would make it all a
lot easier. But he didn't say that.

Thanks anyway!


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.