Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / April 2007

Tip: Looking for answers? Try searching our database.

Improving performance of code

Thread view: 
ruds - 07 Apr 2007 04:46 GMT
Hi,
I'm reading a file and doing some operations on it..It is a huge file
going in GB's.....
The code is working correctly but is very slow....How do i optimise
it...
My code snipnet is:
class Risk
{
  public void compare(String infile) throws IOException
    {
        cnt=0;
        for(i=0;i<qid.size();i++)
        {
            no=0;
            fr=new FileReader(infile);
            br=new BufferedReader(fr);
            while((str=br.readLine())!=null)
            {
                no++;
                if((str.startsWith("$"))||(str.startsWith("-CONT-")))
                    continue;
                else
                {
                    s2=str.substring(0,10);
                    if(s2.equals(qid.elementAt(i)))
                    {
                        cnt++;
                        start=no;
                        end=no+29;
                        quadarray(infile,start,end);
                    }

                    if((cnt==sc) && (i<qid.size()))
                    {
                        System.out.println("qid="+qid.elementAt(i));
                        cnt=0;
                        writesubcase1();
                    }
                }
            }
            fr.close();

        }

                              for(i=0;i<tid.size();i++)
        {
            no=0;
            fr=new FileReader(infile);
            br=new BufferedReader(fr);
            while((str=br.readLine())!=null)
            {
                no++;
                if((str.startsWith("$"))||(str.startsWith("-CONT-")))
                    continue;
                else
                {
                    s2=str.substring(0,10);
                    if(s2.equals(tid.elementAt(i)))
                    {
                        cnt++;
                        start=no;
                        end=no+29;
                        triaarray(infile,start,end);
                    }
                    if((cnt==sc) && (i<tid.size()))
                    {
                        System.out.println("tid="+tid.elementAt(i));
                        cnt=0;
                        writesubcase2();
                    }
                }
            }
            fr.close();
        }
    }

    public void quadarray(String ifile,int start,int end) throws
IOException
    {
        try
        {
            fr1=new FileReader(ifile);
            br1=new BufferedReader(fr1);
            line=0;
            k=0;
            x=0;
            while((str1=br1.readLine())!=null)
            {
                line++;
                if((line>=start) && (line<end))
                {
                    if(j==0)
                        quad[j][k]=str1;
                    if((k==3)    ||(k==17)||(k==20))
                    {
                        val1=Double.parseDouble(str1.substring(18,36));
                        if(val1>qmax[x])
                        {
                            qmax[x]=val1;
                            x++;
                        }
                    }
                    if((k==5)    ||(k==8)||(k==22)||(k==25))
                    {
                        val2=Double.parseDouble(str1.substring(54,72));
                        if(val2>qmax[x])
                        {
                            qmax[x]=val2;
                            x++;
                        }
                    }
                    if((k==11)||(k==14)||(k==28))
                    {
                        val3=Double.parseDouble(str1.substring(36,54));
                        if(val3>qmax[x])
                        {
                            qmax[x]=val3;
                            x++;
                        }
                    }
                    k++;
                }
            }
        }
        catch (Exception e)
        {        }
    }

    public void writesubcase1() throws IOException
    {
        x=0;
        try
        {
            fw=new FileWriter("Result.txt",true);
            for(y=0;y<30;y++)
            {
                if((y==0)||(y==1)||(y==2)||(y==4)||(y==6)||(y==7)||(y==9)||
(y==10)||(y==12)||(y==13)||(y==15) || (y==16)||(y==18)||(y==19)||
                    (y==21)||(y==23)||(y==24) || (y==26)||(y==27))
                {
                    fw.write(quad[0][y]+"\n");
                    continue;
                }
                else
                {
                    if((y==3)||(y==17)||(y==20))
                    {
                        s=quad[0][y];
                        fw.write(s.substring(0,28)+qmax[x]+s.substring(37)+"\n");
                        x++;
                        continue;
                    }
                    if((y==5)||(y==8)||(y==22)||(y==25))
                    {
                        s=quad[0][y];
                        fw.write(s.substring(0,64)+qmax[x]+"\n");
                        x++;
                        continue;
                    }
                    if((y==11)||(y==14))
                    {
                        s=quad[0][y];
                        fw.write(s.substring(0,46)+qmax[x]+s.substring(55)+"\n");
                        x++;
                        continue;
                    }
                    if(y==28)
                    {
                        s=quad[0][y];
                        fw.write(s.substring(0,46)+qmax[x]+"\n");
                        x++;
                        break;
                    }
                }
            }
            fw.close();
        }
        catch(Exception e)
        {}
    }

    public void triaarray(String ifile,int start,int end) throws
IOException
    {
        try
        {
            fr1=new FileReader(ifile);
            br1=new BufferedReader(fr1);
            line=0;
            while((str1=br1.readLine())!=null)
            {
                line++;
                if((line>=start) && (line<end))
                {
                    if(j==0)
                        tria[j][k]=str1;
                    if(k==2)
                    {
                        val1=Double.parseDouble(str1.substring(37,54));
                        if(val1>tmax[0])
                            tmax[0]=val1;
                    }
                    if(k==5)
                    {
                        val2=Double.parseDouble(str1.substring(19,36));
                        if(val2>tmax[1])
                            tmax[1]=val2;
                    }
                    k++;
                }
            }
        }
        catch(Exception e)
        {}
    }

    public void writesubcase2()
    {
        try
        {
            fw=new FileWriter("Result.txt",true);
            for(y=0;y<7;y++)
            {
                if((y==0)||(y==1)||(y==3)||(y==4))
                {
                    fw.write(tria[0][y]+"\n");
                    continue;
                }
                if(y==2)
                {
                    s=tria[0][y];
                    fw.write(s.substring(0,47)+tmax[0]+s.substring(55)+"\n");
                    continue;
                }
                if(y==5)
                {
                    s=tria[0][y];
                    fw.write(s.substring(0,29)+tmax[1]+"\n");
                    break;
                }
            }
            fw.close();
        }
        catch(Exception e)
        {}
    }

    public static void main(String args[])
    {
     Risk r=new Risk();
     ipfile=args[0];

        try
        {
           r.compare(ipfile);
        }
        catch (Exception e)
        {        }
    }
}

The code takes a lot of time in functions Quadarray and Triaaray.
As u can see the de is very simple in these functions but still it
takes lot of time...

How do i improve it??
Esmond Pitt - 07 Apr 2007 06:03 GMT
> How do i improve it??

1. I don't see any need to read the files twice. Read them once each,
and look for both subcases on each line. This will double your speed. If
the output comes out in the wrong order, sort it later. BTW you should
be closing 'br' not 'fr' in this loop.

2. The loops on 'y' in the writesubcaseN() and xxxarray() methods seem
pretty pointless, as you do different things depending on the value of
'y'. Unroll these loops. You could use a lookup table to give you the
various offsets you need, and just loop over the lookup table. Or else
use a switch statement instead of all the tests on 'y'.

3. The triarray() and quadarray() methods probably spend most of their
time catching up to where you already are in the file. Do you really
need to do this?
ruds - 07 Apr 2007 06:36 GMT
On Apr 7, 10:03 am, Esmond Pitt <esmond.p...@nospam.bigpond.com>
wrote:
> > How do i improve it??
>
[quoted text clipped - 12 lines]
> time catching up to where you already are in the file. Do you really
> need to do this?

For the 1 & 2 sugestion points i did get those..but for the 3 point I
dont have any other way out..atleast from my point of view
If u can suggest me smthing better than this ur welcome...
I'm a newbie at handling files...
Thanx a lot.
Mike Schilling - 07 Apr 2007 08:29 GMT
> How do i improve it??

Indent it and comment it, for a start.  In its current state, it's
unreadable.
Chris Uppal - 07 Apr 2007 13:49 GMT
> > How do i improve it??
>
> Indent it and comment it, for a start.  In its current state, it's
> unreadable.

The apparent lack of indentation is a bug in the newsreader you (and I) are
using, not a deficiency in the posted source.

   -- chris
Lew - 07 Apr 2007 15:45 GMT
Mike Schilling wrote:
>> Indent it and comment it, for a start.  In its current state, it's
>> unreadable.

> The apparent lack of indentation is a bug in the newsreader you (and I) are
> using, not a deficiency in the posted source.

I'm using Thunderbird.  I see the original post's indentation, and that it was
done with the TAB character.

No doubt the space character would not have caused such difficulties.  Even
though I can see the indentation, the TAB character makes it so wide as to
damage readability.

So either way, OP, using TABs to indent Usenets posts is a Bad Thing.

Signature

Lew

Patricia Shanahan - 07 Apr 2007 16:08 GMT
> Mike Schilling wrote:
>>> Indent it and comment it, for a start.  In its current state, it's
[quoted text clipped - 12 lines]
>
> So either way, OP, using TABs to indent Usenets posts is a Bad Thing.

I am not that worried about the indentation, because if I get serious
about looking at posted program I copy it into Eclipse and click
Source-Format.

I do think the first step in a performance campaign should be making
sure the code is properly commented, as well as having meaningful
identifiers, no arbitrary, unexplained constants etc. The big
improvements usually depend on understanding the code, so that data
structures and algorithms can be changed.

Patricia
Lars Enderin - 07 Apr 2007 17:45 GMT
Chris Uppal skrev:

>>> How do i improve it??
>> Indent it and comment it, for a start.  In its current state, it's
>> unreadable.
>
> The apparent lack of indentation is a bug in the newsreader you (and I) are
> using, not a deficiency in the posted source.

Thunderbird shows all of the tabs, which should have been replaced by
two or maybe three spaces each. It certainly was indented.
Mike  Schilling - 07 Apr 2007 18:49 GMT
>> > How do i improve it??
>>
[quoted text clipped - 4 lines]
> are
> using, not a deficiency in the posted source.

So it is.  (Though, oddly, Right-Click->Properties->Details->Message Source
displays it correctly, and that can be cut-and-pasted into an editor..)  So
as far as that goes, <Emily Litella>Never mind.</Emily Litella>

Still, even indented, it would take considerable brainpower to determine
what the code is trying to do, let alone how to make it do the same thing
faster.  Comments would make it far more likely that I'd make the effort.
Chris Uppal - 07 Apr 2007 13:50 GMT
> I'm reading a file and doing some operations on it..It is a huge file
> going in GB's.....
> The code is working correctly but is very slow....How do i optimise
> it...

I found your code difficult to follow, you could improve it by using case
statements instead of lots of if-s, by returning from functions as soon as you
know the there is nothing else to do (rather than having the "real" code buried
inside several nested if-s), and above all (as Mike has already mentioned) by
commenting it properly.

So, it's quite possible that I've misread or misunderstood what the code is
doing, but if I /haven't/ got it wrong, then I'm puzzled by what quadarray() is
doing (and the other similar methods).  I /looks/ as if it loops over the
entire (huge) input file, keeping count of which line it's looking at (in
variable 'k' -- /not/ a good name, unless there's something special in the
domain which makes 'k' self-explanatory), and only doing anything with certain
numbered lines, 20, 14, 28, and so on.  But if that's true, then it doesn't do
anything at all with lines > 28, so there is no point in looping over the
remaining lines in the input file.

If I'm wrong about that (i.e. if you do have to read data from every, or nearly
every, line of the big files), and if Daniel's suggestion about reducing the
number of passes isn't suitable, then I don't think there's very much you can
do to speed it up.  If I /had/ to maximise the speed of something like this,
then I'd first try to work out what was the fastest I could possibly scan data
from the files, by writing a small test program which read in all the data as
/binary/ (so there are no conversion costs), and which didn't do anything with
the data.  That would give me a baseline so I could tell whether there was any
reasonable speedup available even in theory (there might not be).  If that did
turn out to be significantly, /and usefully/, faster than my current code, then
I'd consider (i.e do a few experiments with), doing most of the processing as
binary.  It seems to me that you don't use most of the data on most lines, so
if you can scan the data as binary, and only incur the expense of converting
the data you actually need into text, then you might be able to save some time.
But there again, it might make almost no difference.  Only measurement will
tell you (or an analytic, numeric, understanding of the performance could do
tell you too, but that would require data that I don't have here, and I suspect
you don't have either).

BTW, this sounds like one of the examples where profiling is unlikely to be
very helpful (like many examples of using profiling, in my experience).
Profiling is an excellent tool if you have an unexpected hot-spot in your code
which you don't realise is there -- it will point out your error with
devastating clarity.  But that situation's not too likely to happen to
competent programmers[*].  The other case where profiling is useful is where
you have a reasonable idea of how long things /should/ take, and you can use
profiling to attach actual numbers to your mental model of the performance.

Oh, another thing that's often worth a try (if you are on Windows or some other
OS which supports transparent compression in the filesystem), is to tell the OS
to compress the data.  If your program is primarily IO bound, rather than CPU
bound (which sounds likely in your case -- and it's easy for you to check),
then compressing the data will reduce the amount of data which has to be read
off-disk, albeit at the expense of more processing, which can sometimes be a
useful saving.

   -- chris

[*] but it never hurts to check, even so -- if you have time...
Greg R. Broderick - 07 Apr 2007 17:24 GMT
"ruds" <rudranee@gmail.com> wrote in news:1175917594.394914.13880
@d57g2000hsg.googlegroups.com:

> How do i improve it??

1.  USE MEANINGFUL VARIABLE NAMES (i.e. more that just a single letter)!

2.  Pay attention to horizontal white space -- makes code a LOT easier to
read if there are spaces.  Use:

if ((str.startsWith("$")) || (str.startsWith("-CONT-")))

or

if ((str.startsWith("$")) ||
   (str.startsWith("-CONT-")))

instead of

if((str.startsWith("$"))||(str.startsWith("-CONT-")))

3.  Declare ALL of your variables before you use them.  In quadarray() it
appears to me that the variables "j", "quad", "str1", "val1", "qmax", "val2"
are used without having been previously declared.

Just a few suggestions that will prevent your name being cursed by those who
come after you and maintain your code.

Cheers!

Signature

---------------------------------------------------------------------
Greg R. Broderick            gregb+usenet200612@blackholio.dyndns.org

A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------

squirrel - 07 Apr 2007 19:19 GMT
> Hi,
> I'm reading a file and doing some operations on it..It is a huge file
[quoted text clipped - 263 lines]
>
> How do i improve it??

I have a idea to improve it. Maybe possible.
By using NIO, a large file will be splited serval parts as
MappedByteBuffer instaces. And then using multiple threads to parse
each MappedByteBuffer instaces. There will be N+1 threads to work on
parsing file. The performance will be higher.
ruds - 08 Apr 2007 06:12 GMT
> > How do i improve it??
>
[quoted text clipped - 3 lines]
> each MappedByteBuffer instaces. There will be N+1 threads to work on
> parsing file. The performance will be higher.

Do i have to crwate new threads for this or they will be created by
JVM??
Sorry if it is a stupid question but i hve'nt done Multithreading yet..
squirrel - 08 Apr 2007 19:05 GMT
> > > How do i improve it??
>
[quoted text clipped - 7 lines]
> JVM??
> Sorry if it is a stupid question but i hve'nt done Multithreading yet..

My idea is the following:
1.Using one thread, named main thread, creates FileChannel and splits
the file's content into n instances of MappedByteBuffer;
2.The main thread starts n threads, each thread is responsible to
parse one MappedByteBuffer, and the main thread can be hold on to wait
the result of each thread and collect them. Oh, Observer may be the
best choice for this case.
BTW, we should consider the case of one sentence will be splitted into
two MappedByteBuffer.

The idea is not just one idea, not be implemented by me. I wish it
would be feasible.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.