Java Forum / General / February 2007
Which scope for variables being used in a loop?
bugnthecode - 03 Feb 2007 02:42 GMT Hi everyone,
I've been working on an application for work and I'm using a library provided by someone else. I'm now running into a problem where I'm running out of memory under certain circumstances and have begun tracing the problem. Anyway, this made me think about the way I've coded some of my for loops. Take for instance the following snippet:
public void printCustomers(List<Customer> customers) { String customerName; String customerPhone; String customerLocation; for(Customer customer : customers) { customerName = customer.getName(); customerPhone = customer.getPhone(); customerLocation = customer.getLocation(); System.out.println(customerName+customerPhone+customerLocation); } }
Now my original thinking was that I would declare the strings outside of the loop so that I'm not re-creating a reference each time through the loop; I would just keep re-assigning it. Then I started to think that maybe this wasn't doing as I expected, and the realized that those 3 strings stay in scope until the end of the method meaning the locations would be unavailable to the gc until the method was finished processing. In the above snippet that really doesn't matter, but what if I had much more code below the loop.
So my question is which way would be better on memory or performance. Which is better coding style if performance and memory usage are negligible either way?
Thanks in advance for your help. Will
Manish Pandit - 03 Feb 2007 03:00 GMT > Hi everyone, > [quoted text clipped - 32 lines] > Thanks in advance for your help. > Will In this case, the variables are allocated on the stack, as they are "local". I do not think this could lead to out of memory (unless the collection is gigantic). Personally I never like the idea of declaring variables within a loop, and have not seen a lot of instances where it is done.
-cheers, Manish
Karl Uppiano - 03 Feb 2007 04:37 GMT [snip]
> In this case, the variables are allocated on the stack, as they are > "local". I do not think this could lead to out of memory (unless the > collection is gigantic). Personally I never like the idea of declaring > variables within a loop, and have not seen a lot of instances where it > is done. I think it happens a lot, for example if someone calls another method from within the loop.
> -cheers, > Manish Lew - 04 Feb 2007 00:43 GMT "Manish Pandit" wrote ...
>> In this case, the variables are allocated on the stack, as they are >> "local". I do not think this could lead to out of memory (unless the >> collection is gigantic). Personally I never like the idea of declaring >> variables within a loop, and have not seen a lot of instances where it >> is done. There are plenty of instances where variables are declared inside a loop, and there are good, solid engineering reasons to do so.
Declaring the variable inside a loop, or any other block, limits its scope to that block. If it is not needed outside the block, then its scope matches its use.
The variable remains in the JVM until the end of the stack frame, even after it goes out of scope; if it isn't nulled then it won't be gced until the method ends. It is still inaccessible to code outside its block.
Limiting variable scope is a good principle of defensive programming. If a variable doesn't linger after its use, nor is declared until needed, it has less chance to make mischief. (Joshua Bloch touches on this in /Effective Java/.)
Use of the for ( T thing : things ) idiom is an example of scope limitation. The variable "thing" is only in scope for the loop.
- Lew
Esmond Pitt - 04 Feb 2007 23:08 GMT > In this case, the variables are allocated on the stack, as they are > "local". In this case the *references* are allocated on the stack. The objects to which they refer are allocated on the heap as always.
> I do not think this could lead to out of memory (unless the > collection is gigantic). The effect as written is that the *last* 3 objects allocated can't be GC'd until the scope they are declared in exist. All the objects previously allocated inside the loop can be GC'd as soon as the next set of assignments takes place. So unless one or more of the 3 customer objects is gigantic this is probably not a memory hog.
> Personally I never like the idea of declaring > variables within a loop, and have not seen a lot of instances where it > is done. I'm the opposite: I favour declaring all variables in the innermost scope possible, and I haven't seen a lot of instances where it *isn't* done.
> -cheers, > Manish Stefan Ram - 04 Feb 2007 23:41 GMT >>In this case, the variables are allocated on the stack, as >>they are "local". >In this case the *references* are allocated on the stack. References are values. »Reference« is just short for »reference value« of which the JLS says »(...) reference values (...) are pointers« (JLS3, 4.3.1). They can not be allocated, because they are not storage locations.
Variables are storage locations (JLS3, 4.12), so they might be allocated.
Reference variables are those entities, which then contain reference values.
NT - 08 Feb 2007 10:12 GMT Escape Analysis in JSE6 might rule it out, i.e. allocate them on stack, for small objects.
> In this case the *references* are allocated on the stack. The objects to > which they refer are allocated on the heap as always. -Nirav Thaker http://niravthaker.blogspot.com
Chris Uppal - 08 Feb 2007 13:37 GMT > Escape Analysis in JSE6 might rule it out, i.e. allocate them on > stack, for small objects. Do you know if that feature is actually part of JDK 1.6.0 ? I remember hearing that it was mooted for inclusion in this cycle, but haven't seen anything to say whether it actually made it in to the final cut. (It's something I've been looking forward to playing with)
-- chris
Esmond Pitt - 09 Feb 2007 00:07 GMT > Do you know if that feature is actually part of JDK 1.6.0 ? Apparently so. See 'Escape Analysis in Mustang' on http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html
Chris Uppal - 09 Feb 2007 17:25 GMT [me:]
> > Do you know if that feature is actually part of JDK 1.6.0 ? > > Apparently so. See 'Escape Analysis in Mustang' on > http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html Thanks for the link (not a bad article, if somewhat overstated IMO), but unfortunately it's dated 2005-09-27, so it can only be talking about what was then /hoped/ would make 1.6 final. I'm looking for some sort of corroboration that it /did/ make 1.6 final (or, alternatively, that it didn't).
-- chris
NT - 12 Feb 2007 19:07 GMT On Feb 9, 10:25 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE- THIS.org> wrote:
> [me:] > [quoted text clipped - 4 lines] > > -- chris You may want to try these switches with mustang XX: +PrintEscapeAnalysis, -XX:+DoEscapeAnalysis, -XX:+PrintOptoAssembly.
Nirav Thaker http://niravthaker.blogspot.com
Chris Uppal - 13 Feb 2007 16:23 GMT [me:]
> > Thanks for the link (not a bad article, if somewhat overstated IMO), but > > unfortunately it's dated 2005-09-27, so it can only be talking about [quoted text clipped - 6 lines] > You may want to try these switches with mustang XX: > +PrintEscapeAnalysis, -XX:+DoEscapeAnalysis, -XX:+PrintOptoAssembly. Great! Thank you very much.
(BTW, only the -XX:+DoEscapeAnalysis is enabled in production builds -- at least the server JVM rejects the other options when I try them. Not a big deal...)
The first benchmark I was hoping it would have some significant impact on is from John Harrop's comparison of various languages at: http://www.ffconsultancy.com/free/ray_tracer/languages.html in which I believe Java suffers because the Java expression of the ray-tracing algorithms creates huge numbers of transient vector objects. Sadly, it seems that -XX:+DoEscapeAnalysis and -XX:-DoEscapeAnalysis have no measurable effect on the execution speed of that code. Oh well...
Now I shall have to try to invent some benchmark where it /does/ make a difference...
-- chris
robert maas, see http://tinyurl.com/uh3t - 09 Feb 2007 20:14 GMT > From: Esmond Pitt <esmond.p...@nospam.bigpond.com> > The effect as written is that the *last* 3 objects allocated > can't be GC'd until the scope they are declared in exist. Yes, I agree. In a cases where one of those last 3 objects might be really big, assuming the programmer can anticipate that as a likelihood, it would seem prudent to explictly null out the relevant local pointers just before reaching the bottom of the inside of the loop. That wouldn't cause any practical difference for any but the last 3, because each group of 3 pointers (except the last group) immediately gets overwritten by the next group of 3 when the loop repeats, but it'd protect the last three from being held too long. I'm assuming what somebody else said, that in fact the stack frame is created upon entry to the method, to hold the longest chain of local variables in the whole block, so even though the variables declared in the inner loop-block go out of scope they are still unavailable to be garbage collected until the whole method returns.
And I agree that the variables should be declared within the inner loop-block, rather than out in the main body of the method, for several reasons (defense against accidently re-using a value later in the same method, and allowing local allocations in one loop to overlay local allocations in another loop. Hence you can't null-out the three pointers just once after the loop exits, because the variables are out of scope already, still held by the GC yet untouchable by the programmer! (Hey, maybe loops should allow a "finally" clause which performs actions after the very last time through the loop but before local variables go out of scope?? Nah, too ugly. How about a "finally" or "volatile" clause on the local declaration itself, such that the pointer-value of the variable is forcibly nulled-out at the moment just before the variable goes out of scope the last time through the loop? Nah, impractical. How about a "volatile" declaration for the entire *loop* (not the block inside the loop), such that all stack values allocated by any block within that loop are nulled out upon exit from the loop? I think that would be logically consistent, clean, and doable.)
> I favour declaring all variables in the innermost scope possible, I agree completely.
But note such a programming style precludes this loop-search pattern: for (int ix=0; ix<=max && obj.elementAt(ix)!=target; ix++) {} if (ix>max) then return(<didn'tfindvalue>); else return(<foundvalue>); because the post-loop test can't work because ix is already out of scope.
But I always preferred this style anyway: for (int ix=0; ; ix++) { if (ix>max) return(<didn'tfindvalue>); if (obj.elementAt(ix)==target) return(<foundvalue(ix)>); }
And who would really prefer this half-of-each pattern? for (int ix=0; ix<=max; ix++) { if (obj.elementAt(ix)==target) return(<foundvalue(ix)>); } return(<didn'tfindvalue>); What I don't like about that is that one return is after the loop, whereby maintainers might not notice the *other* return from inside the loop.
And I never liked this pattern either: bool success=false; int savedix; for (int ix=0; ix<=max; ix++) { if (obj.elementAt(ix)==target) { success=true; savedix=ix; break; } } if success then return(<foundvalue(savedix)>) else return(didn'tfindvalue>);
Although I did find myself forced to invent that last horrible pattern very long ago in another language that is overly (IMO) restrictive about where you can do what. No, it wasn't pascal.
zhengxianfu@gmail.com - 03 Feb 2007 13:48 GMT >I'm now running into a problem where I'm running out of memory under certain circumstances and have >begun tracing the problem. Did you know under what circumstances ,are u sure the List-typed customer will big enough to run out of your memory ,if so how about the place which create this customer list.Maybe somewhere else.
By the way ,if the you can add/override toString() method to class Customer ,that can avoid re-creation and assign value to local string object.
Chris Uppal - 04 Feb 2007 15:12 GMT > Now my original thinking was that I would declare the strings outside > of the loop so that I'm not re-creating a reference each time through [quoted text clipped - 8 lines] > Which is better coding style if performance and memory usage are > negligible either way? I don't think that there's any doubt that the best coding style is to declare variables in the most restricted scope possible. Not only does it make the code more readable, but it reduces the chances of errors by mistakenly re-using a value which had been set in an earlier passage of code where it was used for something else. A counter-point to that view is that if your methods are so long that it makes any difference where you declare variables, then your methods are too long anyway. There's some truth in that, but it is difficult to write sensible Java code where methods /are/ properly short (say 2 or 3 lines, not counting the brackets).
From an efficiency POV, it makes no difference at all. None. Zero. Java bytecode has no concept of a local variable (or stack slot, actually) at a narrower scope than a method. So the bytecode generated for:
public void printCustomers(List<Customer> customers) { String customerName; String customerPhone; String customerLocation; for(Customer customer : customers) { customerName = customer.getName(); customerPhone = customer.getPhone(); customerLocation = customer.getLocation(); System.out.println(customerName+customerPhone+customerLocation); }
and:
public void printCustomers2(List<Customer> customers) { for(Customer customer : customers) { String customerName = customer.getName(); String customerPhone = customer.getPhone(); String customerLocation = customer.getLocation(); System.out.println(customerName+customerPhone+customerLocation); }
are essentially identical (as you can verify with javap, if you feel so inclined). I tried that with the javac from jdk 1.6.0, and the only difference (for some reason) was that it assigned the variables to stack slots in a different order.
Note that javac and/or the JIT is at liberty to generate code to null-out stack slots once they are no longer live, but in fact (as far as I can tell) the current implementations do not do so (there's to be some #ifdef-ed-out code in the 1.6 JVM source, which /seems/ as if it might be an experimental implementation of that idea, but...). However, since there's no difference in the bytecode, the JIT will, or won't, do that no matter which way you phrase your code.
-- chris
Patricia Shanahan - 04 Feb 2007 18:23 GMT >> Now my original thinking was that I would declare the strings outside >> of the loop so that I'm not re-creating a reference each time through [quoted text clipped - 18 lines] > to write sensible Java code where methods /are/ properly short (say 2 or 3 > lines, not counting the brackets). I find the most restricted scope rule helpful for keeping down method length.
It tends to lead to largely self-contained blocks, with all working variables that belong to the block declared in it, and no unnecessary sharing of variables between blocks. Those are the easiest blocks to convert to separate methods during a refactoring pass.
Patricia
firstsql@ix.netcom.com - 06 Feb 2007 06:50 GMT On Feb 4, 7:12 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE- THIS.org> wrote:
> I don't think that there's any doubt that the best coding style is to declare variables in the most restricted scope possible. Not only does it make the > code more readable, but it reduces the chances of errors by mistakenly re-using [quoted text clipped - 8 lines] > bytecode has no concept of a local variable (or stack slot, actually) at a > narrower scope than a method. So the bytecode generated for: Actually, there is some difference. Sun's javac will reuse slots between parallel blocks within a method. So, there can be a coupla efficiency advantages (albeit minor) to most restricted scope:
1) The stack frame for the method will be smaller, perhaps important for recursive uses, and
2) Reference slots for earlier blocks may be overwritten, allowing early garbage collection of the referenced objects.
-- Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com) ============================================================== * The Ultimate DBMS is here! * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
Chris Uppal - 06 Feb 2007 16:51 GMT [me:]
> > From an efficiency POV, it makes no difference at all. None. Zero. > > Java [quoted text clipped - 4 lines] > Actually, there is some difference. Sun's javac will reuse slots > between parallel blocks within a method. So [...] That's true. I should have mentioned it myself. Thanks.
-- chris
dagarwal82@gmail.com - 07 Feb 2007 06:27 GMT How about declaring "String customerName; String customerPhone; String customerLocation;" outside the loop with String Buffer. I mean :-
StringBuffer customerName StringBuffer customerPhone StringBuffer customerLocation
now even if you have 1tera billion records in the list , there will be a single instance of these variables.
Lew - 07 Feb 2007 13:59 GMT > How about declaring "String customerName; String customerPhone; > String customerLocation;" outside the loop with String Buffer. I [quoted text clipped - 6 lines] > now even if you have 1tera billion records in the list , there will be > a single instance of these variables. Actually, there won't. Variables don't have instances.
There is an engineering principle related to re-use of instances. Compare
public Foo do( Foo foo ) { Foo fooToo = foo; fooToo.setProperty( getAValue() ); return fooToo; }
to
public Foo do( Foo foo ) { Foo fooToo = foo.clone(); fooToo.setProperty( getAValue() ); return fooToo; }
The latter produces two instances of Foo, the former uses only the one. The number of variables is the same.
There are reasons to declare variables outside a loop, either because you need a wider scope or syntactic reasons.
parallel loop control variables for ( int i=0, Iterator iter = coll.iterator(); iter.hasNext(); ++i ) not allowed - declare one before the loop.
- Lew
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|