Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / September 2007

Tip: Looking for answers? Try searching our database.

Sandboxing a computation?

Thread view: 
Russell Wallace - 16 Sep 2007 18:01 GMT
Is it possible to reliably sandbox a computation in Java?

To clarify, I'm not talking about the "in a web browser" meaning of
"sandbox"; I'm looking at doing a variant of evolutionary computation,
where subprograms are generated at random and tested against a fitness
function (the version I have in mind generates programs somewhat
nonrandomly, but it makes no difference in this context); the
subprograms in question would be in a scripting language run on an
interpreter written in Java, and don't have the ability to perform
I/O, execute arbitrary Java byte code or call arbitrary library
classes, so I'm only concerned about resource exhaustion. There are
three specific issues I'm concerned about:

1. A subprogram could go into an infinite loop or otherwise spend too
much CPU time.

Solution: use a separate thread, have a monitor thread kill the
subprogram after an X second timeout.

2. Infinite or excessively deep recursion could result in stack overflow.

I think that throws a reliably catchable exception, right?

3. Out of memory.

This is the one I'm worried about: it throws an exception, but suppose
there's other stuff going on in the main program, say in a user
interface thread, that will also need memory. Couldn't there be a
condition where something else fails in the time between exhausting
memory and catch/cleanup? In other words, you can't reliably deal with
an out of memory exception in the same process, right?

If so, is there a reliable way to kick off a second JVM instance to
run a sandboxed subprogram? ("Reliable" includes portable, e.g. can't
use native fork() because it has to run on Windows as well as Unix.)
Or is there another way to run a subprogram in a separate memory pool?

(It would be nice if there was also a way to know how much memory a
subprogram had used even in the event of successful completion, or
better yet, monitor how much it was using at any given time. I'm
guessing this is impossible because other things might also use
memory, and there's no way of knowing when garbage collection has
occurred, is this correct?)

Thanks,

Signature

"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.

Joshua Cranmer - 16 Sep 2007 18:35 GMT
> 2. Infinite or excessively deep recursion could result in stack overflow.
>
> I think that throws a reliably catchable exception, right?

StackOverflowError.

> 3. Out of memory.
>
[quoted text clipped - 4 lines]
> memory and catch/cleanup? In other words, you can't reliably deal with
> an out of memory exception in the same process, right?

An OutOfMemoryError is guaranteed to be thrown only after the GC has
collected as much as it possibly can. The problem is that it gives
little information as to what caused it.

It might be possible -- I have not tested this, so I can't be sure -- to
set the default exception handler on your threads to kick out as much
stuff as possible and then trying to restore the position. Trying to
restore sanity would be difficult because an OutOfMemoryError handler
shouldn't allocate resources to try and save the position.

> If so, is there a reliable way to kick off a second JVM instance to
> run a sandboxed subprogram? ("Reliable" includes portable, e.g. can't
> use native fork() because it has to run on Windows as well as Unix.)
> Or is there another way to run a subprogram in a separate memory pool?

Runtime.exec("java <className>") is Java's version of fork.

> (It would be nice if there was also a way to know how much memory a
> subprogram had used even in the event of successful completion, or
> better yet, monitor how much it was using at any given time. I'm
> guessing this is impossible because other things might also use
> memory, and there's no way of knowing when garbage collection has
> occurred, is this correct?)

I can't say, really. Maybe javax.management and java.lang.management.

P.S. Actually, take a hard look at the Management extension. It might be
able to handle running out of memory situations without needing to spawn
subprocesses.

Signature

Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

Stefan Ram - 16 Sep 2007 18:48 GMT
>Runtime.exec("java <className>") is Java's version of fork.

 See also:

http://download.java.net/jdk7/docs/api/java/lang/ProcessBuilder.html
Zig - 16 Sep 2007 20:18 GMT
> Is it possible to reliably sandbox a computation in Java?
>
[quoted text clipped - 14 lines]
> Solution: use a separate thread, have a monitor thread kill the
> subprogram after an X second timeout.

Take a look at java.lang.management.ThreadMXBean. This should at least  
give you an idea if the thread has really been that busy, or if system was  
busy with other things. Though it won't necessarily tell if you if the  
victim thread did something like call Thread.join on a child thread...

> 2. Infinite or excessively deep recursion could result in stack overflow.
>
> I think that throws a reliably catchable exception, right?

If you use a seperate thread in the same process, it should throw a  
StackOverflowError

> 3. Out of memory.
>
[quoted text clipped - 4 lines]
> memory and catch/cleanup? In other words, you can't reliably deal with
> an out of memory exception in the same process, right?

Quite right. Generally, if your sandbox thread attempts to allocate  
something big, it will usually end up throwing the error. OTOH, if the  
sandbox thread allocates lots of small objects, the odds of another thread  
running into the problem are much higher. In Java 1.4, any thread could  
potentially throw the OutOfMemoryError. Unfortunately, Java 1.5-1.6 seem  
to have more JNI methods that won't throw errors, but instead fail the  
entire VM if the OutOfMemory occurs while they are being invoked.

OutOfMemory is limitedly recoverable. Chances are if the error was thrown  
in some data model that you haven't explicitly tested for the error, then  
the data model is in a inconsistant state and shouldn't be reused.

> If so, is there a reliable way to kick off a second JVM instance to
> run a sandboxed subprogram? ("Reliable" includes portable, e.g. can't
> use native fork() because it has to run on Windows as well as Unix.)
> Or is there another way to run a subprogram in a separate memory pool?

As another poster mentioned, ProcessBuilder will give you the ability to  
fire off another process. System.getProperty("java.home") will tell you  
where the JRE is installed, so you can get to the java.exe launcher.

IIRC, it is also significantly easier to start a new java process with a  
sandbox security model, than it is to setup security contexts on specific  
threads inside the same VM. But, sharing information between processes is  
more complicated than sharing objects between threads, though RMI can help  
with that.

As an aside, it would not be uncommon to have a light-weight GUI process,  
which can then kick off a "kernel" process to try spawning threads (thus  
not forcing a new process to get spawned every time you want to try a new  
approach). And if the kernel crashes, you can always have the GUI kick off  
a new kernel where the last one left off.

> (It would be nice if there was also a way to know how much memory a
> subprogram had used even in the event of successful completion, or
> better yet, monitor how much it was using at any given time. I'm
> guessing this is impossible because other things might also use
> memory, and there's no way of knowing when garbage collection has
> occurred, is this correct?)

the java.lang.management classes should expose methods to get at all the  
information you like, though a number of the methods are not guaranteed to  
be implemented on all platforms. IIRC, they can be used to get information  
inside the active VM, or a spawned VM.

HTH,

-Zig
Russell Wallace - 17 Sep 2007 20:40 GMT
> As an aside, it would not be uncommon to have a light-weight GUI
> process,  which can then kick off a "kernel" process to try spawning
> threads (thus  not forcing a new process to get spawned every time you
> want to try a new  approach). And if the kernel crashes, you can always
> have the GUI kick off  a new kernel where the last one left off.

Yeah, looking at the various replies so far, that seems to be the way to
go. One kernel process, taking requests from the GUI process and giving
back the result, whenever it hits out of memory it says hasta la vista
and the GUI process kicks off another kernel. Maybe one kernel per CPU
core, so the other cores keep running uninterrupted while one's being
"rebooted"?

Starting a new Java process shouldn't require any disk access, right?
(Provided the code is still in cache, of course.)

Will these extra processes show up as extra icons on the Windows task
bar? If the answer is yes by default, is there a way to suppress that?

Signature

"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.

Zig - 18 Sep 2007 20:52 GMT
> Yeah, looking at the various replies so far, that seems to be the way to  
> go. One kernel process, taking requests from the GUI process and giving  
> back the result, whenever it hits out of memory it says hasta la vista  
> and the GUI process kicks off another kernel. Maybe one kernel per CPU  
> core, so the other cores keep running uninterrupted while one's being  
> "rebooted"?

I would assume that failing a VM process should be considered a non-normal  
circumstance, so I would suspect the performance penalty of restarting a  
process would be acceptable.

As for multiple processes, that might work well if your calculations are  
mostly CPU bound. If you are expecting memory bound computations though,  
it would probably be better to stick to one large process with a big heap.

> Starting a new Java process shouldn't require any disk access, right?  
> (Provided the code is still in cache, of course.)

IIRC, there is no code sharing between processes. Each process needs to  
load up new copies of the java.lang / java.util classes, etc. Hotspot VMs  
these days are pretty smart, but I think starting a new process is going  
to go back to the exe / dll / rt.jar files. Though, the OS will probably  
have those files cached, so the disk access time should be minimal.

> Will these extra processes show up as extra icons on the Windows task  
> bar? If the answer is yes by default, is there a way to suppress that?

IIRC, using javaw.exe should accomplish that. Though, some non-Sun VM  
vendors used to give the no-console launcher a different name, so you it  
wouldn't hurt to be prepared to just use java.exe in the event javaw.exe  
doesn't exist.

HTH,

-Zig
Ben Phillips - 16 Sep 2007 23:40 GMT
> Is it possible to reliably sandbox a computation in Java?

And now, for a completely different suggestion.

Don't have the interpreted stuff more or less directly use the Java
stack and heap.

Instead your interpreter can explicitly manage a stack (using java.util
classes and iteration) and monitor heap allocation caused by the
interpreted code, and halt the interpreted code and throw a suitable
exception if a limit is hit in either case. It can also monitor the
number of instruction cycles of interpreted code that have elapsed and
enforce a limit there, too.

No mucking about with StackOverflowError, OutOfMemoryError, or Java
threads and timers this way.
Daniel Pitts - 17 Sep 2007 05:19 GMT
> > Is it possible to reliably sandbox a computation in Java?
>
[quoted text clipped - 12 lines]
> No mucking about with StackOverflowError, OutOfMemoryError, or Java
> threads and timers this way.

I agree, if you have access to the interpreter, adding built-in
support for limiting resources seems to be a better approach.  You
might even want to include these metrics in your fitness function.
Eg., if it has more recursion and more memory, but the output is the
same, then it is less fit.

As for the infinite loop, you might be able to detect that in another
manor. If you can take a state snapshot at certain intervals, you can
check each state against all previous state snapshots.  If the state
is completely unchanged, then it is in an infinite loop.   This may
not be feasible depending on the size of your state, but you might be
able to get away with a maximum cycle size.

Good luck,
Daniel.
Russell Wallace - 17 Sep 2007 20:42 GMT
> I agree, if you have access to the interpreter, adding built-in
> support for limiting resources seems to be a better approach.  You
> might even want to include these metrics in your fitness function.
> Eg., if it has more recursion and more memory, but the output is the
> same, then it is less fit.

*nods* I've considered that. The problem is going that route, I'd have
to reimplement everything from hash tables to garbage collection from
the ground up; no particular element of that is terribly difficult,
true, but this whole thing is a one-man research project, so if I can
cut down on the workload by reusing stuff Java provides, I'd like to do
that.

Signature

"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.

Daniel Dyer - 17 Sep 2007 21:54 GMT
>> I agree, if you have access to the interpreter, adding built-in
>> support for limiting resources seems to be a better approach.  You
[quoted text clipped - 8 lines]
> cut down on the workload by reusing stuff Java provides, I'd like to do  
> that.

The book I have on Genetic Programming (Genetic Programming - An  
Introduction, by Banzhaf et al.) advocates pretty much the same approach  
suggested by Daniel and others in this thread, i.e. keep track of  
resources/iterations in the interpreter and terminate the evolved program  
if it exceeds some pre-defined limit.  The fitness function can be  
constructed to favour more efficient solutions as well (this is referred  
to as "parsimony pressure" in the literature).

Are you able to give us any details on the kind of programs that you are  
attempting to evolve?  There are a lot of variations of the basic Genetic  
Programming idea.  It may be possible to frame the problem in such a way  
as to avoid some of the trickier issues.  Perhaps the best place to ask  
for advice on these kinds of things is the group comp.ai.genetic, where  
people knowledgeable in this area tend to lurk.

Dan.

Signature

Daniel Dyer
http//www.uncommons.org

Russell Wallace - 17 Sep 2007 23:38 GMT
> The book I have on Genetic Programming (Genetic Programming - An  
> Introduction, by Banzhaf et al.) advocates pretty much the same
[quoted text clipped - 3 lines]
> be  constructed to favour more efficient solutions as well (this is
> referred  to as "parsimony pressure" in the literature).

Yes, that's certainly a good approach if you can do it; the question is
how to do it when complex data structures are involved. Iterations is
fine, the problem is how to keep track of memory used. Consider adding
an element to a hash table, for example - just how do you tell how much
memory that used? Unless I'm misunderstanding something, just comparing
free memory before and after won't give the right answer because other
threads could also be allocating memory, and garbage collection might or
might not have occurred in the meantime?

Signature

"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.

Patricia Shanahan - 18 Sep 2007 05:07 GMT
>> The book I have on Genetic Programming (Genetic Programming - An  
>> Introduction, by Banzhaf et al.) advocates pretty much the same
[quoted text clipped - 12 lines]
> threads could also be allocating memory, and garbage collection might or
> might not have occurred in the meantime?

Do you really need to control exact memory usage? The cost of a
java.util map is roughly linear in the number of entries, so ration the
number of entries any simulated program can have at any one time.
Similar rules apply to other structures.

Patricia
Russell Wallace - 18 Sep 2007 05:31 GMT
> Do you really need to control exact memory usage? The cost of a
> java.util map is roughly linear in the number of entries, so ration the
> number of entries any simulated program can have at any one time.
> Similar rules apply to other structures.

Yeah, maybe that's a good approach to take, would bound memory usage to
within a small constant factor - assuming the set of objects the
subprogram is responsible for can be calculated. Maybe I could do a
GC-style sweep from its stack (which is an explicit object, not the JVM
stack) and then charge the subprogram for anything it refers to,
directly or indirectly.

Signature

"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.

Ben Phillips - 21 Sep 2007 12:37 GMT
> As for the infinite loop, you might be able to detect that in another
> manor. If you can take a state snapshot at certain intervals, you can
> check each state against all previous state snapshots.  If the state
> is completely unchanged, then it is in an infinite loop.   This may
> not be feasible depending on the size of your state, but you might be
> able to get away with a maximum cycle size.

You can detect cycles of any size without a max-cycle-size-proportional
increase in memory usage or CPU usage. Run two copies side by side, one
faster than the other, and compare their states periodically. If they
have identical states there's a loop. E.g. do two instructions for one,
one for the other, compare. Two for the one, one for the other, compare.
Etc. A cycle of length n will be detected n instructions after it's
entered, so short cycles will be detected very quickly, but long ones
will still be detected eventually. Cost is to double the memory usage
and a bit more than double the CPU usage, but it gets you arbitrarily
large cycles detected.
bugbear - 17 Sep 2007 09:43 GMT
> Is it possible to reliably sandbox a computation in Java?

> 3. Out of memory.
>
[quoted text clipped - 4 lines]
> memory and catch/cleanup? In other words, you can't reliably deal with
> an out of memory exception in the same process, right?

Since the code is "yours", you could do ALL your memory allocation
via a Class or instance that counts, and set your own limit on memory use
(a lower limit than the physical limit), then throw your own exception.

  BugBear
Roedy Green - 17 Sep 2007 19:36 GMT
On Mon, 17 Sep 2007 09:43:11 +0100, bugbear
<bugbear@trim_papermule.co.uk_trim> wrote, quoted or indirectly quoted
someone who said :

>Since the code is "yours", you could do ALL your memory allocation
>via a Class or instance that counts, and set your own limit on memory use
>(a lower limit than the physical limit), then throw your own exception.

If you have recursive code, add a depth counter and abort if it goes
too deep.  

If you don't use Applets, you can have two JVMS running. The second
simply probes the first periodically with "are you alive and well"
requests on a TCP/IP port. If the main JVM is too insane to respond,
the second shuts it down.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 18 Sep 2007 03:12 GMT
On Sun, 16 Sep 2007 18:01:30 +0100, Russell Wallace
<russell.no.spam@gmail.com> wrote, quoted or indirectly quoted someone
who said :

>Is it possible to reliably sandbox a computation in Java?

You can do manual allocation.  Basically it creates a pool of empty
objects and handles them out when you call create.  The client must
explicitly free the objects.  They are put back in the pool and
cleared.  If you run out you create more.  If you have too many
empties, you free them to GC.  This would work if you had only a few
types of object to allocate.  You then can blow up if this pool gets
too large.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Russell Wallace - 18 Sep 2007 05:51 GMT
> You can do manual allocation.  Basically it creates a pool of empty
> objects and handles them out when you call create.  The client must
> explicitly free the objects.

The downside is, that way I lose GC which was one of the big reasons for
using Java in the first place.

Signature

"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.

Daniel Pitts - 18 Sep 2007 05:54 GMT
> > You can do manual allocation.  Basically it creates a pool of empty
> > objects and handles them out when you call create.  The client must
[quoted text clipped - 6 lines]
> "Always look on the bright side of life."
> To reply by email, replace no.spam with my last name.

You can use a WeakHashMap. The JVM still handles GC, and you get to
keep track of object counts.
Russell Wallace - 18 Sep 2007 05:57 GMT
> You can use a WeakHashMap. The JVM still handles GC, and you get to
> keep track of object counts.

Good point, maybe that's the way to go. Let's see. For each subprogram,
a weak set of what it's allocated so far, plus a count of (estimated)
total words of memory. Each operation that allocates memory, adds to
both. When the count hits the allowed max, iterate through the set and
tot up how much is actually still live. Weak set only lets go of things
when GC runs, right? So before the iteration, would have to explicitly
call GC. Is that guaranteed to work? Or does weak set only undertake to
let go "sometime", rather than "no later than next GC call"?

Signature

"Always look on the bright side of life."
To reply by email, replace no.spam with my last name.

Zig - 18 Sep 2007 21:09 GMT
>> You can use a WeakHashMap. The JVM still handles GC, and you get to
>> keep track of object counts.
[quoted text clipped - 7 lines]
> call GC. Is that guaranteed to work? Or does weak set only undertake to  
> let go "sometime", rather than "no later than next GC call"?

I think the layman's version of weak is: definately cleared before the  
system get's to a point where it must throw an OutOfMemoryError, and tends  
to get cleared at intermediate points. So doing an explicit GC raises the  
probability of a object being dropped, but I don't believe guarantees it.

FYI, using java.lang.ref.ReferenceQueue is probably a little more suited  
to this kind of strategy than a WeakHashMap ... just create a reference to  
the object when you alloc it, and poll the reference queue to see what's  
no longer strongly reachable.

Of course, if you know *all* the allocation points in your code, you could  
just add a guard before each alloc:

private static long MIN_RAM_OVERHEAD=2*1024*1024;
public static void ramGuard() {
    Runtime rt=Runtime.getRuntime();
    if (rt.freeMemory() > MIN_RAM_OVERHEAD)
        return;
    rt.gc();
    if (rt.freeMemory() < MIN_RAM_OVERHEAD)
        throw new OutOfMemoryError();
}

Just beware of lurking allocs, such as

String longstring1, longstring2;
String longer=longstring1+longstring2;

or method calls that may or may not make allocations internally (ala  
ArrayList.add)

HTH,

-Zig


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.