Java Forum / General / April 2007
Problem applying generics to my code. Is there a better solution?
Lucas - 05 Apr 2007 00:09 GMT All,
I am working on a statistical inference library and have run into a limiting case of using generics for type safety. I hope someone is able to suggest a more elegant solution to my problem that I currently have.
I have a small class of static methods for computing closed-form posterior distributions. The skeleton of the classes look like this
// This class defines a distribution over a type T. Usually Integer or Double. public interface Distribution<T> { ... }
// Here are some probability distributions public class NormalDistribution implements Distribution<Double> { ... } public class GammaDistribution implements Distribution<Double> { ... } public class PoissonDistribution implements Distribution<Integer> { ... }
// and here are a bunch of methods to compute closed form solutions to the posteriors public class Posteriors { public static NormalDistribution normalNormalPosterior( List<Double> data, NormalDistribution likelihood, NormalDistribution prior ) { ... } public static GammaDistribution poissonGammaPosterior( List<Integer> data, PoissonDistribution likelihood, GammaDistribution prior ) { ... } }
Now, I would like to create a generic dispatch method that I can pass any data, likelihood and prior into and it will dispatch to the appropriate method if it exists. The prototype looks like this
public static <T, P> Distribution<P> posterior( List<T> data, Distribution<T> likelihood, Distribution<P> prior)
So, for type safety, I want to require that the data is compatible with the likelihood distribution and that the posterior distribution has the same domain as the prior -- the definition of a conjugate prior. The problem comes in the implementation. Currently, my method looks like this
public static <T, P> Distribution<P> posterior( List<T> data, Distribution<T> likelihood, Distribution<P> prior) { if ( likelihood instanceof NormalDistribution && prior instanceof NormalDistribution ) return (Distribution<P>) normalNormalPosterior((List<Double>) data, (NormalDistribution) likelihood, (NormalDistribution) prior );
if ( likelihood instanceof PoissonDistribution && prior instanceof GammaDistribution ) return (Distribution<P>) normalNormalPosterior((List<Integer>) data, (PoissonDistribution) likelihood, (GammaDistribution) prior );
throw new UnsupportedConjugatePairException(); }
This works and, logically, I don't think I can run into any strange run-time exceptions since, for example
1. "likelihood instanceof NormalDistribution" implies that List<T> must be a List<Double> 2. "prior instanceof NormalDistribution" ensures that the posterior will match the prior
The real question: Is there a way to write my posterior() method without all the ugly casting and compiler warnings?
Daniel Pitts - 05 Apr 2007 01:25 GMT > All, > [quoted text clipped - 69 lines] > The real question: Is there a way to write my posterior() method > without all the ugly casting and compiler warnings? How about this:
public static <T, P> Distribution<P> posterior(List<Double> data, NormalDistribution<T> likelihood, NormalDistribution<P> prior) { return normalNormalPosterior(data, likelihood, prior );
} public static <T, P> Distribution<P> posterior(List<Integer> data, PoissonDistribution<T> likelihood, GammaDistribution<P> prior) { return normalNormalPosterior(data, likelihood, prior); }
Why have one method that does two things? You might even want to make PoissonDistribution implement a method posterior which takes a List<Integer> data, GammaDistribution<P> prior), and likewise on the NormalDistribution.
Lucas - 05 Apr 2007 01:56 GMT > How about this: > [quoted text clipped - 13 lines] > > Why have one method that does two things? Oops. That was a typo on my part. That code path should be
public static <T, P> Distribution<P> posterior(List<Integer> data, PoissonDistribution<T> likelihood, GammaDistribution<P> prior) { return poissonGammaPosterior(data, likelihood, prior); }
> You might even want to make PoissonDistribution implement a method > posterior which takes a List<Integer> data, GammaDistribution<P> > prior), and likewise on the NormalDistribution. My motivation for the original choice was that, as a user of the package, I want to manipulate generic Distribution<T> objects and let the library decide if there is an efficient way of giving a posterior. There are always numerical methods to fall back on. Also, there can be multiple conjugate priors for the same likelihood.
Here's an example of how I want to interface to these methods
import static my.package.Conjugate.*;
public static void foo() { Distribution<Double> distn1 = new NormalDistribution( mu, sigma ); Distribution<Double> distn2 = new NormalDistribution( 0, tau );
Distribution<Double> post = posterior( distn1, distn2 ); }
So, the underlying problem I'm trying to solve is to dispatch based on the most specific type of the object. Per your suggestion, I could get around this by adding a posterior() method that would be implemented along the lines of
public class NormalDistribution implements Distribution<Double> { public Distribution<Double> posterior( List<Double> data, Distribution<Double> prior ) { if ( prior instanceof NormalDistibution ) return normalNormalPosterior( data, this, prior ); if ( prior instanceof GammaDistribution ) return normalGammaPosterior( data, this, prior ); } }
or should I just create separate methods for each type of supported prior distribution and force the caller to call the posterior() method with an concrete type and not the generic Distribution<> type?
I'm going to ramble on a bit here and point out that the library have a ConditionalDistribution class which would probably be the most appropriate vehicle for your suggestion since a NormalDistribution class itself does not have any unbound parameters. To replicate the last example
public class NormalDistributionWithUnknownMean extends NormalDistribution implements ConditionalDistribution<Double, Double> { public NormalDistribution conjugate( List<Double> data, NormalDistribution prior ) { return normalNormalPosterior( data, this, prior ); } }
public class NormalDistributionWithUnknownVariance extends NormalDistribution implements ConditionalDistribution<Double, Double> { public NormalDistribution conjugate( List<Double> data, GammaDistribution prior ) { return normalGammaPosterior( data, this, prior ); } }
Thank you for the discussion.
Daniel Pitts - 05 Apr 2007 06:15 GMT > > How about this: > > [quoted text clipped - 60 lines] > } > } Anytime that you find yourself using a chain of "instanceof" checks on classes which you control, its time to look into using polymorphism instead... You're particular problem may be best solved through a Visitor pattern. Here's an example, I'm removing the generics, as they're not necessary for my point.
abstract class Distribution { Distribution posterior(List list, Distribution prior) { prior.visit(this); } protected Distribution posteriorList list, Normal prior) { throw new IllegalArgumentException("Not valid"); } protected Distribution posterior(List list, Normal prior) { throw new IllegalArgumentException("Not valid"); } public abstract Distribution visit(List list, Distribution d); } class Normal { protected Distribution posterior(List list, Normal prior) { return normalNormaPosterior(list, this, prior); } protected Distribution posterior(List list, Gamma prior) { return normalGammaPosterior(list, this, prior); } public Distribution visit(List list, Distribution d) { return d.posterior(list, this); } }
class Gamma { protected Distribution posterior(List list, Normal prior) { return gammaNormaPosterior(list, this, prior); } protected Distribution posterior(List list, Gamma prior) { return gammaGammaPosterior(list, this, prior); } public Distribution visit(List list, Distribution d) { return d.posterior(list, this); } }
> or should I just create separate methods for each type of supported > prior distribution and force the caller to call the posterior() method > with an concrete type and not the generic Distribution<> type? That might not be a bad idea, but a "visitor pattern" might be more appropriate... Alternatively, you might look for algorithms that are distribution type agnostic. I would probably go for the visitor approach, google it :-)
> I'm going to ramble on a bit here and point out that the library have > a ConditionalDistribution class which would probably be the most [quoted text clipped - 23 lines] > > Thank you for the discussion. I actually don't know much about the problem domain your solving. I'm guessing statistics? (The one math I haven't studied much) :-)
Anyway, I hope I've been able to help.
Lucas - 05 Apr 2007 08:04 GMT > Anytime that you find yourself using a chain of "instanceof" checks on > classes which you control, its time to look into using polymorphism > instead... You're particular problem may be best solved through a > Visitor pattern. Yet another reason for me to get my own copy of the GoF.
> Here's an example, I'm removing the generics, as they're not necessary > for my point. I'll comment on your suggestion below, but I wanted to re-emphasize the point that I really would like to use generics for the their type- safety in this case. Recall the function prototype I mentioned in the OP
public static <T, P> Distribution<P> conjugate( List<T> data, Distribution<T> likelihood, Distribution<P> prior )
Fro, the perspective of enforcing mathematical correctness, it is important to constrain the return type to be the same as the prior parameters and force the data to be compatible with the likelihood. Now, on to your (slightly edited) suggestion....
> abstract class Distribution { > Distribution posterior(List list, Distribution prior) { [quoted text clipped - 7 lines] > } > public abstract Distribution visit(List list, Distribution d);} Ok, so we set up a base class (abstract in this example) where we consider every possible distribution as a potential prior. It will be up to the implementing classes to override the conjugate cases supported by their particular form.
> class Normal { > protected Distribution posterior(List list, Normal prior) { [quoted text clipped - 7 lines] > } > } Now, when we invoke .posterior(data, prior) for a distribution x, it will visit the prior, passing itself along as a parameters. Since we invoke .visit() on the prior object, it will call the correct method, i.e. normal or gamma or whatever. Now, the prior calls back to the likelihood object, passing itself. This will give the type specificity to actually call one of the polymorphic posterior() methods. Very nice!! This is a double-dispatch pattern, correct?
> I actually don't know much about the problem domain your solving. I'm > guessing statistics? (The one math I haven't studied much) :-) Correct. The particular bit of statistics I'm implementing here is described at
http://en.wikipedia.org/wiki/Conjugate_prior
> Anyway, I hope I've been able to help. Very much! I'll give your suggestion a whirl and see how it plays out.
Many thanks.
Chris Uppal - 05 Apr 2007 09:50 GMT > Anytime that you find yourself using a chain of "instanceof" checks on > classes which you control, its time to look into using polymorphism > instead... You're particular problem may be best solved through a > Visitor pattern. Small point, but "Visitor pattern" is an application of the broadly useful technique of double-dispatch (which can be called a pattern too, if you like, though it isn't often seen that way). In this case, I think, your suggestion would be better seen as a direct application of double-dispatch, rather than an application of Visitor -- and a better name could be found for the visit() method (exposing what it /does/ rather than what feature of "Visitor" it reflects).
A suggestion for the OP. Design and implement this stuff without using generics at all, then go back and put in the necessary generic declarations. Double dispatch -- or whatever -- is part of the /semantics/ of your system, whereas generics are just a way of telling the typechecker what rules correct applications of those semantics will follow. So unless generics make the design/code easier to articulate in the first place (which I doubt in this particular case), trying to keep the typechecker happy at the same time as trying to think through the actual object interactions will be counter-productive.
-- chris
Daniel Pitts - 05 Apr 2007 22:55 GMT On Apr 5, 1:50 am, "Chris Uppal" <chris.up...@metagnostic.REMOVE- THIS.org> wrote:
> > Anytime that you find yourself using a chain of "instanceof" checks on > > classes which you control, its time to look into using polymorphism [quoted text clipped - 20 lines] > > -- chris Thanks Chris, I hadn't heard of the double dispatch pattern before, and it is what I've demonstrated.
On Apr 5, 12:04 am, "Lucas" <lscha...@gmail.com> wrote:
> public static <T, P> Distribution<P> conjugate( List<T> data, > Distribution<T> likelihood, Distribution<P> prior ) > > Fro, the perspective of enforcing mathematical correctness, it is > important to constrain the return type to be the same as the prior > parameters and force the data to be compatible with the likelihood. If I'm understanding you correctly
public static <T, P extends Distribution<?>> P conjugate(List<T> data, Distribution<T> likelihood, P prior);
Now, conjugate sounds like a verb. Are you conjugating a likelihood or a prior? Does it make sense to make the conjugate method into the likelihood or prior object instead?
Which one of these three makes more sense? "Likelihood, conjugate using data and prior": likelihood.conjugate(data, prior) or "Prior, conjugate using data and likelihood": prior.conjugate(data, likelihood) or (where Data isn't just a List<T>) "Data, conjugate using likelihood and prior): data.conjugate(likelihood, prior)?
If you find that the last one makes the most sense, you might decide to "wrap" a List<T> into a Data<T> object which implements the method conjugate. You might want to wrap the List<T> into a Data class anyway, depending on other uses.
Hope this helps even more :-) Daniel
Lucas - 06 Apr 2007 01:47 GMT > If I'm understanding you correctly > [quoted text clipped - 14 lines] > "Data, conjugate using likelihood and prior): > data.conjugate(likelihood, prior)? In this context "conjugate" is a property of the likelihood and the prior distribution. It doesn't make a lot of sense to treat it as a verb.
"A prior is CONJUGATE TO a likelihood iff the posterior distribution has the same form as the prior"
A more natural purpose of a "conjugate" method would be a conjugateTo() method to test for the property, e.g.
if ( prior.conjugateTo( likelihood )) // Use fast, close-form solution return prior.posterior( likelihood, data ) else // Use slow, numerical simulation return new Posterior( likelihood, data, prior );
Lucas - 06 Apr 2007 02:49 GMT Daniel,
Thank to you suggestion, I have a functional prototype that I've included in this post. One question: In order for each class to dispatch properly, *every* class has to implement the visit() method. If there are subclasses that inherit from a parent class, the generic posterior() class in the abstract Distribution class is invoked which leads to an infinite loop.
It's just syntactic sugar, but is there a way to get around this problem? The only solution I could come up with is that the abstract class would have to implement a posterior() method for every class that extends Distribution, which is not feasible since new Distribution sub-classes will be created.
Anyway, thanks for the insight. Couldn't have gotten to this point without eveyone's help.
P.S> The semantics of the Distribution.posterior() method is that "this" is the prior distribution and we are attempting to combine it with a likelihood. This has a nice side effect that every posterior() method in a class returns a Distribution<T> object rather than the other way around of treating "this" as the likelihood and have to take and return a Distribution<?> object.
/** * Experiment with a conjugate class hierarchy */
import java.util.*;
public class Conjugacy { static abstract class Distribution<T> { public <S> Distribution<T> posterior( List<S> data, Distribution<S> likelihood ) { return likelihood.dispatch( data, this ); }
public Distribution<T> posterior( List<Double> data, NormalUnknownMean likelihood ) { System.out.println( "FAIL" ); return null; }
public Distribution<T> posterior( List<Double> data, NormalUnknownVariance likelihood ) { System.out.println( "FAIL" ); return null; }
public Distribution<T> posterior( List<Integer> data, PoissonUnknownRate likelihood ) { System.out.println( "FAIL" ); return null; }
public abstract <S> Distribution<S> dispatch( List<T> data, Distribution<S> prior ); }
/** * Create a normal, gamma and poisson distributions */ static class Normal extends Distribution<Double> { public Distribution<Double> posterior( List<Double> data, NormalUnknownMean likelihood ) { System.out.println( "mu ~ Normal ~ Normal-Normal" ); return new Normal(); }
public <S> Distribution<S> dispatch( List<Double> data, Distribution<S> prior ) { System.out.println( "FAIL" ); return null; } }
static class NormalUnknownMean extends Normal { public <S> Distribution<S> dispatch( List<Double> data, Distribution<S> prior ) { return prior.posterior( data, this ); } }
static class NormalUnknownVariance extends Normal { public <S> Distribution<S> dispatch( List<Double> data, Distribution<S> prior ) { return prior.posterior( data, this ); } }
// Gamma distribution static class Gamma extends Distribution<Double> { public Distribution<Double> posterior( List<Double> data, NormalUnknownVariance likelihood ) { System.out.println( "sigma ~ Gamma ~ Normal-Gamma" ); return new Gamma(); }
public Distribution<Double> posterior( List<Integer> data, PoissonUnknownRate likelihood ) { System.out.println( "lambda ~ Gamma ~ Poisson-Gamma" ); return new Gamma(); }
public <S> Distribution<S> dispatch( List<Double> data, Distribution<S> prior ) { System.out.println( "FAIL" ); return null; } }
// Poisson distribution static class Poisson extends Distribution<Integer> { public <S> Distribution<S> dispatch( List<Integer> data, Distribution<S> prior ) { System.out.println( "FAIL" ); return null; } }
static class PoissonUnknownRate extends Poisson { public <S> Distribution<S> dispatch( List<Integer> data, Distribution<S> prior ) { return prior.posterior( data, this ); } }
public static void main( String[] args ) { Distribution<Double> normal = new Normal(); Distribution<Double> gamma = new Gamma(); Distribution<Integer> poisson = new Poisson(); Distribution<Double> normalMu = new NormalUnknownMean(); Distribution<Double> normalSig = new NormalUnknownVariance(); Distribution<Integer> poissonRate = new PoissonUnknownRate();
List<Double> data1 = new ArrayList<Double>(); List<Integer> data2 = new ArrayList<Integer>();
System.out.println( "Using a Normal as a prior distribution" ); normal.posterior( data1, normal ); // Fail normal.posterior( data1, gamma ); // Fail normal.posterior( data2, poisson ); // Fail normal.posterior( data1, normalMu ); // Pass normal.posterior( data1, normalSig ); // Fail normal.posterior( data2, poissonRate ); // Fail
System.out.println(); System.out.println( "Using a Gamma as a prior distribution" ); gamma.posterior( data1, normal ); // Fail gamma.posterior( data1, gamma ); // Fail gamma.posterior( data2, poisson ); // Fail gamma.posterior( data1, normalMu ); // Fail gamma.posterior( data1, normalSig ); // Pass gamma.posterior( data2, poissonRate ); // Pass } }
Daniel Pitts - 06 Apr 2007 16:50 GMT > Daniel, > [quoted text clipped - 13 lines] > Anyway, thanks for the insight. Couldn't have gotten to this point > without eveyone's help. I think the solution to this, or at least a way to avoid bugs, is to call the generic method by a different name altogether, rather than use the same name. That way, if someone doesn't implement something correctly, they'll get a compiler error instead of a run-time bug.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|