Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / August 2007

Tip: Looking for answers? Try searching our database.

Read in & count characters from a text file

Thread view: 
Jay Cee - 04 Aug 2007 22:09 GMT
Hi All,
Relatively new to java (ex VB) and could do with some help.
I need to read a text file character by character (can do),
and count each character as it appears, i.e
"A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
the results.

I have a few issues which I cannot seem to solve easily,
1/
I thought it would be a good idea to save the characters in a hashmap in
name-value pairs as they are read , map.put(tempStr,"1" )
I found I had to convert the character to a string before it would save to
the map.Ideally I would like to save as a character.

2/
Before adding each character to the map
   check first if it already exists
   and if found increment the value portion of the name value pair
   else
   if not found insert into map with value of 1.

My problems seems to be I cannot "check the map" if the character exists and
if it does exist how do I get at the value to increment it.

Here is what I have so far,

import java.io.*;
import java.util.*;
class TextTest
{
public static Map map = new HashMap();
private static TreeMap treeMap;
   public static void main(String[] args) throws IOException
   {

      FileInputStream in = new FileInputStream("textfile.txt");
      int ch;
      int total = 0;
      int count = 1;

      while ((ch = in.read()) != -1)
     {
          total ++;
          String tempStr = (Integer.toString(ch)); //Only way to save the
"char" in the map was to convert it to a string.
          System.out.print((char)ch );

          if (map.containsKey(tempStr))
          {
              map.put(tempStr,"value" );             //How can i extract
the value,increment it and save back to the map
          }
          else
         {
              map.put(tempStr, "value");             //I need to save the
integer 1 here in the value part of the map
         }
   }
   treeMap = new TreeMap(map);                //sort the map
  System.out.println("Total =" + total);
  System.out.print(treeMap);
  }
}
Stefan Ram - 04 Aug 2007 22:19 GMT
>My problems seems to be I cannot "check the map" if the
>character exists and if it does exist how do I get at the value
>to increment it.

 You might use something like

class NumericMapUtils
{ public static <D> void addTo /* autovivificate the value to 0 */
 ( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
 { map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

 and a sorted map

java.util.TreeMap<java.lang.Character,java.lang.Integer> map;

 then add each text like

NumericMapUtils.addTo<java.lang.String>( map, 'a', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'c', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'n', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'x', 1 );

 (I have not tested this.)

 Then iterate: »for( final java.lang.Character key: map.keySet() )«

 For files of arbitrary size, use java.math.BigInteger instead
 of java.lang.Integer.
Stefan Ram - 04 Aug 2007 22:22 GMT
Supersedes: <autovivificate-20070804231921@ram.dialup.fu-berlin.de>

>My problems seems to be I cannot "check the map" if the
>character exists and if it does exist how do I get at the value
>to increment it.

 You might use something like

class NumericMapUtils
{ public static <D> void addTo /* autovivificate the value to 0 */
 ( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
 { map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

 and a sorted map

java.util.TreeMap<java.lang.Character,java.lang.Integer> map;

 then add each text like

NumericMapUtils.addTo<java.lang.Character>( map, 'a', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'c', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'n', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'x', 1 );

 (I have not tested this. Possibly, the "<java.lang.Character>"
 type argument can be omitted.)

 Then iterate: »for( final java.lang.Character key: map.keySet() )«

 For files of arbitrary size, use java.math.BigInteger instead
 of java.lang.Integer.

Supersedes: <autovivificate-20070804231921@ram.dialup.fu-berlin.de>
Eric Sosman - 04 Aug 2007 22:38 GMT
> Hi All,
> Relatively new to java (ex VB) and could do with some help.
[quoted text clipped - 9 lines]
> I found I had to convert the character to a string before it would save to
> the map.Ideally I would like to save as a character.

    Maps (all Collections, in fact) deal only with objects,
so you cannot store primitive values like char in them.  But
you can use a Character object, which expresses your intent
more directly than a String does.

    Similarly, the mapped values must also be objects.  I
think an Integer would be a better choice than a String; if
you expect counts greater than two billion use a Long.

> 2/
> Before adding each character to the map
[quoted text clipped - 5 lines]
> My problems seems to be I cannot "check the map" if the character exists and
> if it does exist how do I get at the value to increment it.

    The map has a containsKey() method that tells you whether
there is or isn't an entry for a key you're interested in.

    If you're using an Integer (or Long) as the counter, you
can't just increment it: like String, an Integer cannot be
changed once it's created.  Instead, you need to retrieve the
existing Integer from the map and replace it with a larger one.

    ... and since you need to retrieve the Integer anyhow, the
containsKey() method doesn't seem worth while: Just ask the map
for the Integer corresponding to such-and-such a Character.  If
there is one, replace it.  If there's not, you'll get a null
back from the map and this can be your signal to start a new
counter at unity:

    Character key = Character.valueOf( (char)ch );
    Integer val = (Integer)map.get(key);
    if (val == null)
       val = Integer.valueOf(1);
    else
       val = Integer.valueOf(val.intValue() + 1);
    map.put(key, val);

    Another approach would be to invent your own Counter class
that looks a lot like an Integer but is mutable: it has methods
like set() or increment() that change its value.  Then the code
might look like

    Character key = Character.valueOf( (char)ch );
    Counter cnt = (Counter)map.get(key);
    if (cnt == null)
       map.put(key, new Counter());  // initial value zero
    cnt.increment();

> Here is what I have so far,
>
[quoted text clipped - 8 lines]
>
>        FileInputStream in = new FileInputStream("textfile.txt");

    A word of warning: This is legal, but may not be what you
intend.  InputStreams are for files made of bytes; Readers are
for files made of characters.  If an InputStream encounters a
character that has been encoded in several bytes, it will deliver
those bytes to you individually.  If a Reader encounters such a
thing, it will decode the multi-byte sequence and deliver you
the single corresponding character.

    By the way, this sort of code is fine if your objective is
to learn about Maps and the like.  But if your goal is really
to count char values (or byte values), an array of 65536 (or
256) ints or longs will be easier:

    counts[ch]++;

Signature

Eric Sosman
esosman@ieee-dot-org.invalid

Patricia Shanahan - 04 Aug 2007 22:41 GMT
> Hi All,
> Relatively new to java (ex VB) and could do with some help.
[quoted text clipped - 9 lines]
> I found I had to convert the character to a string before it would save to
> the map.Ideally I would like to save as a character.
...

Although it can certainly be done with a map, I might not use one for
this. There are only 65,536 possible values for a Java char, so why not
an array?

char[] counts = new char[Character.MAX_VALUE+1];
...
counts[ch]++;
...

Patricia
Jay Cee - 04 Aug 2007 23:13 GMT
Hi Patricia
Yours seems the simplest way to go forward with this but do I have to
iterate through the array each time I read in a character ? This will
probably be ok for this instance but if I wanted to do a character count on
a large document(a book?) surely this would be slower than a hashmap. Is
there an array that can hold     [char,integer] , I will have to do some
more research.

Eric thank you for the explanation and the "A word of warning". I have been
getting this issue of more than 1 char read in and I was wondering why , I
wonder no longer :-)

Stefan thank you for the swift reply , I will have to do some reading on the
NumericMapUtils and autovivificate which is not a word I have come across in
my life until today!!

Jay

>> Hi All,
>> Relatively new to java (ex VB) and could do with some help.
[quoted text clipped - 21 lines]
>
> Patricia
Patricia Shanahan - 04 Aug 2007 23:22 GMT
> Hi Patricia
> Yours seems the simplest way to go forward with this but do I have to
[quoted text clipped - 3 lines]
> there an array that can hold     [char,integer] , I will have to do some
> more research.

Sorry, I made a mistake making it a char[], which confuses matters.

You want an array type that is big enough for each element to hold the
maximum number of instances of any one character you expect to see in
the input. Since you are using an int for the total, int must be good
enough:

int[] counts = new int[Character.MAX_VALUE+1];

Each character has its very own entry. For example, decimal 65
corresponds to 'A', so if you see an 'A' in the input, counts[65] would
increment by one. Use element 65 for 'A' regardless of what has happened
before.

The only time you need to iterate through the array is at the end, to
report the non-zero counts.

for(int i = 0; i<counts.length; i++){
  if(counts[i] > 0){
    char ch = (char)i;
    System.out.println("character "+ch+" count "+counts[i]);
  }
}

Patricia
Roedy Green - 05 Aug 2007 04:51 GMT
>Yours seems the simplest way to go forward with this but do I have to
>iterate through the array each time I read in a character ?
You index.  Most people don't know you can index by chars
e.g.  int x =  count[ 'A' ]; is legit java.  The char gets promoted to
the corresponding Unicode int.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

cyprian - 22 Aug 2007 11:45 GMT
On Aug 4, 11:51 pm, Roedy Green <see_webs...@mindprod.com.invalid>
wrote:
> >Yours seems the simplest way to go forward with this but do I have to
> >iterate through the array each time I read in a character ?
[quoted text clipped - 5 lines]
> Roedy Green Canadian Mind Products
> The Java Glossaryhttp://mindprod.com

to do a character count on a text file, try reading it in through a
stream, buffer the stream and do read() on the buffered stream. It
just returns the number of characters read, unicode, code point
insensitive.then try doing your map thing on it. I was counting some
words myself recently. http://genericjava.blogspot.com/2007/08/can-i-count-ways-let-me.htm,
on the other hand you  could do readLine() on the buffered stream and
insert the result into a string buffer and play with the string buffer
directly. Try doing a regexp construct if possible. Use the string
buffer as framework for mapping characters to your map and counting
them char by char and making the count the value for each character
key.
Roedy Green - 23 Aug 2007 01:25 GMT
>to do a character count on a text file,

And if all you want is a count of chars in the file, use file.length.
It will give you the byte count without reading even a single byte,
which is the same as the char count for most files, and quite a
reasonable measure of "bigness".
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 05 Aug 2007 04:48 GMT
On Sat, 4 Aug 2007 22:09:10 +0100, "Jay Cee"
<itsjayceecee@hotmail.com> wrote, quoted or indirectly quoted someone
who said :

>I thought it would be a good idea to save the characters in a hashmap in
>name-value pairs as they are read , map.put(tempStr,"1" )

You would use HashMap<String,Integer>  You have to keep creating new
Integer objects, one bigger.  It is rather clumsy and slow, though
probably quite adequate to the task.

Chances are your file contains some limited set of chars, likely only
chars 0..255.  So instead you could use a  int[256] to store the
counts.  You index by character.  You simply use the ++ operator.  It
is quite a bit simpler. In the worst case you need an array [65535] if
you have no control over the chars.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.