I have a crawler program, it fetches html on the internet and a
parser will parse the page.
Because of slow networks, I use many robots to crawl the pages and
one parser.
The program is multi-threaded.
There is a pagestore object used as the bridge between robot and
parser, it's has a List structure,
the parser will remove the item it has parsed.
But as the program runs, its memory keep on inscreasing. I don't know
why. What I see in the taskmgr shows that most memory is in the
virtual memory, and the actual memory occupation is not much.
So what could possibly be the reasons?
Joe
Sabine Dinis Blochberger - 13 May 2008 09:38 GMT
> I have a crawler program, it fetches html on the internet and a
> parser will parse the page.
[quoted text clipped - 12 lines]
>
> Joe
Sounds to me like you keep writing to the buffer, without a limit of how
many entries it can contain, and the parser is not keeping up as fast.
Although if you don't run out of memory, or it gets really slow, it's
not really an issue.
You can set a limit on your buffer and have the crawlers wait for space
when it's full. I suppose your parser waits for entries when it's empty.
Also to make sure youactually remove an entry, you need to give up all
references to it, so the garbage collector cleans it up.

Signature
Sabine Dinis Blochberger
Op3racional
www.op3racional.eu
Lew - 13 May 2008 14:15 GMT
joehust@gmail.com wrote:
>> I have a crawler program, it fetches html on the internet and a
>> parser will parse the page.
[quoted text clipped - 10 lines]
>>
>> So what could possibly be the reasons?
First, regarding that "its memory keep[s] on increasing", how are you
measuring this?
It is normal for Java programs' memory usage to increase up to a point. Up to
what point does the memory increase? Does the program ever throw an
OutOfMemoryError?
How much memory are you allowing the program to take? It is normal for Java
programs to appear to the operating system to have all their permitted
allocation. It is normal for Java programs' heap to be near the -Xmx value
sometimes.
It is a common and recommended Java idiom to create gobs of very short-lived
objects.

Signature
Lew
joehust@gmail.com - 16 May 2008 18:44 GMT
> joeh...@gmail.com wrote:
> >> I have a crawler program, it fetches html on the internet and a
[quoted text clipped - 29 lines]
> --
> Lew
The program does throw an OutOfMemoryError unless I give it a higher
memory. And it will always eat
them up after some time. My program doesn't keep many things in
memory, it seems to me that it will not cost many memories.
Thank you for all you guys' suggestions, I am busy working on some
other problems, and I will try your methods when I get some time.
Andrea Francia - 13 May 2008 13:44 GMT
> But as the program runs, its memory keep on inscreasing. I don't know
> why.
A memory profiler could help you.
If you use Netbeans there are one integrated in the IDE.
Do you .close() all connections?

Signature
Andrea Francia
http://www.andreafrancia.it/
Roedy Green - 13 May 2008 20:40 GMT
On Mon, 12 May 2008 22:16:33 -0700 (PDT), "joehust@gmail.com"
<joehust@gmail.com> wrote, quoted or indirectly quoted someone who
said :
> But as the program runs, its memory keep on inscreasing. I don't know
>why
see http://mindprod.com/jgloss/packratting.html

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
krumzv@googlemail.com - 15 May 2008 08:13 GMT
Hi,
> But as the program runs, its memory keep on inscreasing. I don't know
> why.
the task manager isn't good to look at the java heap. It gives you
only some info about the memory reserved by the java process. Try
looking at the GC output to see what happends with the java heap. The
easiest option is just
-verbose:gc
add it to your start parameters and look at the console (or wherever
it is redirected).
If you need to analyze a memory leak, make a heap dump and analyze it.
Look at http://www.eclipse.org/mat/ for an good open source tool.
greets,
krum
Lew - 15 May 2008 13:07 GMT
> the task manager isn't good to look at the java heap. It gives you
We do not know if the OP is using the task manager. They have not answered
the question:
>> First, regarding that "its memory keep[s] on increasing", how are you measuring this?
> only some info about the memory reserved by the java process. Try
> looking at the GC output to see what happends with the java heap. The
[quoted text clipped - 5 lines]
> If you need to analyze a memory leak, make a heap dump and analyze it.
> Look at http://www.eclipse.org/mat/ for an good open source tool.
We do not know if the OP might have done some of this. We doubt it.

Signature
Lew
joehust@gmail.com - 16 May 2008 18:35 GMT
On May 15, 3:13 pm, kru...@googlemail.com wrote:
> Hi,
>
[quoted text clipped - 14 lines]
> greets,
> krum
Thanks for your suggestions