Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking solutions #3

Open
matthieu-vergne opened this issue Jul 20, 2020 · 16 comments
Open

Rethinking solutions #3

matthieu-vergne opened this issue Jul 20, 2020 · 16 comments

Comments

@matthieu-vergne
Copy link
Contributor

matthieu-vergne commented Jul 20, 2020

Since we are on the "rethinking everything" side, may I suggest to push for a proper generic and simple solution interface?

This is what I started to do in my generalization branch for jMetal 5:
https://github.com/jMetal/jMetal/pull/406/commits

Observations

Here are the facts:

  1. Variables are already a List, so we don't need to add get and set methods to interact with each variable, since the List already provides them.
  2. Objectives are stuck with arrays, which are poorly featured (especially in terms of behavior customization), so we should make them proper List too (which I did here, in a way which does not break the legacy code).
  3. Any additional value, like bounds and other things, are required only by some solutions, so they should not be part of the interface to not bother those which don't need it (and don't take more space they need). I tend to think that constraints are the same: some have constraints, others not, so don't impose constraints on all of them.
  4. The attributes map is the best way to make dynamic extensions of the solution: just design a "property" or "metadata" object which represents an additional property of the solution (constraint, bound, etc.). If this property is specific to the solution, it can use the attributes map to store and retrieve it. If it is a constant, it just returns it (if all your solutions have the same constraints, don't make thousands of them store the same value). If an algorithm requires specific properties, it just ask for them (Function<S, T>). If it needs to write to it, it can require it too (BiConsumer<S, T>). I made something a bit more integrated here.

Suggestion

So basically, without loosing any functionality, the solution interface can be reduced to that:

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  List<Double> objectives() ;
  Map<Object, Object> attributes();
}

And here are the removed methods and their new forms:

Legacy New
solution.getVariable(index) solution.variables().get(index)
solution.setVariable(index, value) solution.variables().set(index, value)
solution.getNumberOfVariables() solution.variables().size()
solution.getObjective(index) solution.objectives().get(index)
solution.setObjective(index, value) solution.objectives().set(index, value)
solution.getNumberOfObjectives() solution.objectives().size()
solution.getAttribute(id) solution.attributes().get(id)
solution.setAttribute(id, value) solution.attributes().put(id, value)
solution.hasAttribute(id) solution.attributes().containsKey(id)
solution.getConstraints() constraintsProperty.read(solution)
solution.getConstraint(index) constraintsProperty.read(solution).get(index)
solution.setConstraint(index, value) constraintsProperty.read(solution).set(index, value)
solution.getNumberOfConstraints() constraintsProperty.read(solution).size()

And like constraints, if you need to add boundaries, just create the dedicated property rather than an interface extension which requires its own implementations:

Legacy New
solution.getLowerBound(index) lowerBounds.read(solution).get(index)
solution.setLowerBound(index, value) lowerBounds.read(solution).set(index, value)
solution.getUpperBound(index) upperBounds.read(solution).get(index)
solution.setUpperBound(index, value) upperBounds.read(solution).set(index, value)

In the case of constraints and boundaries, they all apply to specific variables, so you have to deal with a list of values the same size as the variables list. Which means that for all these properties, you need a single class:

public class VariablesSizedProperty<T> {
  void write(Solution<?> s, List<T> values) {
    // do some size checks here, as much as variables
    s.attributes().put(this, values);
  }
  List<T> read(Solution<?> s) {
    return (List<T>) s.attributes().get(this);
  }
}

Actually, since we only use read tog et the list and then read or write it, we can actually replace the write by an automatic generation. Or even combine the two strategies to provide more flexibility.

What about copy?

If you paid attention, you may have seen that I also removed the copy method from the interface.
So aren't we loosing something here?
Actually, it becomes useless, and here is why.

We have no more extensions of the interface, since all extensions are dealt with properties.
Which means that all the algorithms consume and produce Solution instances.
Not even a S extends Solution, but directly Solution itself.
No need to support extensions at all.

Since all implementations are fine, then the algorithm can just pick the implementation it needs.
Just optimize the choice based on the algorithm itself.
All you need then is for this implementation to have a factory method to copy:

static <T> Solution<T> createCopy(Solution<T> original) {
  // Create the instance based on the original
  // Whatever the original class was
}

If you prefer to keep the possibility to customize the copying process, then rather than having the algorithm decide about it, make it an argument of the algorithm:

UnaryOperator<Solution<T>> copier = // copier given in argument
// ...
Solution<T> copy = copier.apply(solution);

If the user wants to use a specific factory method, he just have to give it as a lambda:

UnaryOperator<Solution<T>> copier = AwesomeSolution::createCopy;
algorithm = new MyAlgorithm(..., copier, ...);
@matthieu-vergne
Copy link
Contributor Author

As a side note, all of that is what I planned to do iteratively, with successive deprecations. It would have been long, but progressive. So we can actually do it that way in jMetal 6 but more quickly, removing directly what we don't need anymore instead of deprecating.

@ajnebro
Copy link
Contributor

ajnebro commented Jul 21, 2020

I agree that the current Solution interface has become a bit cumbersome and it contains many redundant methods. However, I was thinking something like this:

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  double[] objectives() ;
  double[] constraints() ;
  Map<Object, Object> attributes();
} 

Why use arrays?. I commented you that some components in jMetal can be generalized in the sense of not having to depend on jMetal classes. For example, a method that is intensively used in many algorithms is the dominanceTest(), which now is:

  private int dominanceTest(S solution1, S solution2) {
    int bestIsOne = 0;
    int bestIsTwo = 0;
    int result;
    for (int i = 0; i < solution1.getNumberOfObjectives(); i++) {
      double value1 = solution1.getObjective(i);
      double value2 = solution2.getObjective(i);
      if (value1 != value2) {
        if (value1 < value2) {
          bestIsOne = 1;
        }
        if (value2 < value1) {
          bestIsTwo = 1;
        }
      }
    }
    if (bestIsOne > bestIsTwo) {
      result = -1;
    } else if (bestIsTwo > bestIsOne) {
      result = 1;
    } else {
      result = 0;
    }
    return result;
  }
} 

However, it can be implemented in this way:

  public static int dominanceTest(double[] vector1, double[] vector2) {
    int bestIsOne = 0;
    int bestIsTwo = 0;
    int result;
    for (int i = 0; i < vector1.length; i++) {
      double value1 = vector1[i];
      double value2 = vector2[i];
      if (value1 != value2) {
        if (value1 < value2) {
          bestIsOne = 1;
        }
        if (value2 < value1) {
          bestIsTwo = 1;
        }
      }
    }
    result = Integer.compare(bestIsTwo, bestIsOne);
    return result;
  }

Another example are the quality indicators. Instead of having an evaluate() method like this:

  @Override public Double evaluate(List<S> solutionList) {...}

it can be replaced by:

  @Override public double compute(double[][] front) {...}

@ajnebro
Copy link
Contributor

ajnebro commented Jul 21, 2020

I also want to get rid off the jmetal.util.front package, as a front is basically a double[][] matrix. So, a method like this:

  @Test
  public void shouldExecuteReturnTheCorrectValueCaseA() {
    int numberOfPoints = 3 ;
    int numberOfDimensions = 2 ;
    Front frontApproximation = new ArrayFront(numberOfPoints, numberOfDimensions);
    Front referenceFront = new ArrayFront(numberOfPoints, numberOfDimensions);

    Point point1 = new ArrayPoint(numberOfDimensions) ;
    point1.setValue(0, 1.5);
    point1.setValue(1, 4.0);
    Point point2 = new ArrayPoint(numberOfDimensions) ;
    point2.setValue(0, 2.0);
    point2.setValue(1, 3.0);
    Point point3 = new ArrayPoint(numberOfDimensions) ;
    point3.setValue(0, 3.0);
    point3.setValue(1, 2.0);

    frontApproximation.setPoint(0, point1);
    frontApproximation.setPoint(1, point2);
    frontApproximation.setPoint(2, point3);

    Point point4 = new ArrayPoint(numberOfDimensions) ;
    point4.setValue(0, 1.0);
    point4.setValue(1, 3.0);
    Point point5 = new ArrayPoint(numberOfDimensions) ;
    point5.setValue(0, 1.5);
    point5.setValue(1, 2.0);
    Point point6 = new ArrayPoint(numberOfDimensions) ;
    point6.setValue(0, 2.0);
    point6.setValue(1, 1.5);

    referenceFront.setPoint(0, point4);
    referenceFront.setPoint(1, point5);
    referenceFront.setPoint(2, point6);

    QualityIndicator<List<PointSolution>, Double> epsilon =
        new Epsilon<PointSolution>(referenceFront) ;

    List<PointSolution> front = FrontUtils.convertFrontToSolutionList(frontApproximation) ;
    assertEquals(1.0, epsilon.evaluate(front), EPSILON);
  }

becomes:

  @Test
  public void shouldComputeReturnTheCorrectValueCaseA() {
    double[][] front = {{1.5, 4.0}, {2.0, 3.0}, {3.0, 2.0}};
    double[][] referenceFront = {{1.0, 3.0}, {1.5, 2.0}, {2.0, 1.5}};

    Assertions.assertEquals(1.0, new Epsilon(referenceFront).compute(front), EPSILON);
  }

@ajnebro
Copy link
Contributor

ajnebro commented Jul 21, 2020

And like constraints, if you need to add boundaries, just create the dedicated property rather than an interface extension which requires its own implementations:

If you paid attention, you may have seen that I also removed the copy method from the interface.

I have to think about these ideas.

@ajnebro
Copy link
Contributor

ajnebro commented Jul 21, 2020

Thinking in the Solution interface, I agree that

public interface Solution<T> extends Serializable {
  List<T> variables() ;
  List<Double> objectives() ;
  List<Double> constraints() ;
  Map<Object, Object> attributes();
} 

is a more consistent definition than using arrays, but I'm concerned about making continuous conversions from lists to arrays. In a run of a metaheuristic, that conversion may need to be applied millions of times.

@matthieu-vergne
Copy link
Contributor Author

Generalizing operators to not depend on solutions is a good thing. But that is only about consuming the data directly instead of requesting them from the solution. It does not imply anything on the type of data itself.

For instance, passing from dominanceTest(solution1, solution2) to dominanceTest(vector1, vector2) appears to me to be a nice move. But these vectors can still be lists. There is only 4 lines which change:

public static int dominanceTest(List<Double> vector1, List<Double> vector2) { // here
  int bestIsOne = 0;
  int bestIsTwo = 0;
  int result;
  for (int i = 0; i < vector1.size(); i++) { // here
    double value1 = vector1.get(i); // here
    double value2 = vector2.get(i); // here
    if (value1 != value2) {
      if (value1 < value2) {
        bestIsOne = 1;
      }
      if (value2 < value1) {
        bestIsTwo = 1;
      }
    }
  }
  result = Integer.compare(bestIsTwo, bestIsOne);
  return result;
}

What matters here is that lists are abstractions of arrays. By using arrays, anyone who want to use something else (e.g. LinkedList) must do the conversion. If they want to use arrays, they can just use the ArrayList implementation. If you use arrays, you give no choice at all, either you have arrays everywhere, or there is some conversions going ahead.

Lists do not forbid you to use arrays, it only means that you should use ArrayList if you want to use arrays. Not the primitive arrays.

Now, if you are really concerned about arrays vs. lists, there is a good intermediary, which are iterators. Just use iterators internaly ang offer two methods which can support both types:

public static int dominanceTest(List<Double> vector1, List<Double> vector2) {
  Iterator<Double> iterator1 = vector1.iterator();
  Iterator<Double> iterator2 = vector2.iterator();
  return dominanceTest(iterator1, iterator2);
}

public static int dominanceTest(double[] vector1, double[] vector2) {
  Iterator<Double> iterator1 = Arrays.stream(vector1).iterator();
  Iterator<Double> iterator2 = Arrays.stream(vector2).iterator();
  return dominanceTest(iterator1, iterator2);
}

private static int dominanceTest(Iterator<Double> iterator1, Iterator<Double> iterator2) {
  int bestIsOne = 0;
  int bestIsTwo = 0;
  int result;
  for (int i = 0; iterator1.hasNext(); i++) {
    double value1 = iterator1.next();
    double value2 = iterator2.next();
    if (value1 != value2) {
      if (value1 < value2) {
        bestIsOne = 1;
      }
      if (value2 < value1) {
        bestIsTwo = 1;
      }
    }
  }
  result = Integer.compare(bestIsTwo, bestIsOne);
  return result;
}

Et voilà !
No particular conversion, since the iterator of the list just browse the list iteratively, and the stream over the array does the same.
Then you can further optimize your algorithm to stop when you reach the end or both variables are at one.
Meanwhile, having a int at 0 or 1 means having a boolean.

So, as a general policy, I would recommend to push for abstractions like List instead of using primitives like arrays to increase flexibility.
But surely any implementation for one can be replaced by the other with this kind of trick.
Here an iterator is OK because we just do one pass.
In other cases it may need a different trick.
But what you can do with one, you can do with the other.

@ajnebro
Copy link
Contributor

ajnebro commented Jul 22, 2020

After thinking about having this jMetal6 project, the situation is that now there are three jMetal projects:

The fact is many users still use jMetal 4, and I know that people finding jMetal for the first time frequently get that version. Having three different jMetal projects doesn't seem a good idea.

So, my idea is to reorganize the contents of the jMetal 6 repository to have all the contents of the 5.10 release, and move all the stuff related to the jMetal 6 in a sub-project called jmetal-experimental, which now contains the former jmetal-auto and the classes I have developed around a component-based template for evolutionary algorithms. This way, I avoid having jmetal5version packages in the other sub-projects. When I finish this task, I plan to remove this project.

There is the matter of improving the project. I like the idea of rethinking everything and studying the adoption of alternatives as the Solution interface proposed in this issue. However, adopting such an interface would require refactoring the whole project, which is something I would like to avoid. Instead, I would prefer to start a new project from scratch, with a name different to "jMetalsomething", where we could try any ideas without any constraint; in fact, I created a repository two years ago with that idea in mind: https://github.com/jMetal/moore.

@matthieu-vergne
Copy link
Contributor Author

matthieu-vergne commented Jul 22, 2020

We can do it iteratively, through incremental deprecations. This is what I started in my generalization branch. This is the best way I know for keeping everything and going iteratively without breaking anything, but it requires some discipline. Basically, you deprecate everything you plan to remove in the next major version, and you actually remove them only when you change the version. For each of your release, you anyway have a tag, so if someone needs a fix related to a specific version, just create a branch on that tag add a commit to fix it. This way, you don't impact other stuff.

Going with release branches is the way to go if you want to go iteratively. If you use maven sub-modules, you clearly cut the two, and with modularization you must pay attention not to have the same packages, otherwise you are going through trouble.

@matthieu-vergne
Copy link
Contributor Author

matthieu-vergne commented Jul 22, 2020

You can use the submodule to try stuff, and then integrate just some pieces in the main submodule once you like it, refactoring just what you need to integrate it.

@ajnebro
Copy link
Contributor

ajnebro commented Jul 22, 2020

I agree, but introducing changes such as your proposed Solution interface is too disruptive (it affects problems, operators, algorithms) and we should have to refactor everything. I have had to cope with this kind of global refactoring many times, and I'm not willing to do it again.

From my point of view, I would prefer to start another project from scratch, where we could apply creative ideas, taking advantage of the full features of Java 11, without affecting the current project. Of course, we can dispose of all the stuff we already have in jMetal, so we could reuse everything we need.

@matthieu-vergne
Copy link
Contributor Author

matthieu-vergne commented Jul 22, 2020

Check out the interface I use in the generalization branch. Nothing disruptive here:
jMetal/jMetal@1a81cb0

All I do is to add the methods to use abstractions, which build on the methods which don't. This way, nothing is broken, and we can progressively start to use them. With that, I add a generic implementation:
jMetal/jMetal@deca799

It overrides these methods to focus first on objects rather than arrays. But both are still here, which is what makes it easy to iteratively transform the rest.

Similarly, I didn't erased the DoubleSolution interface, which provides the double boundaries. But I created specific
objects which reuse them if we do use a DoubleSolution:
jMetal/jMetal@ed79bba
jMetal/jMetal@1215981

This allows to start using these abstractions to not depend on the specificities of the DoubleSolution interface, while all the other algorithms remain how they are. This allows to progressively refactor. And once they are not used anymore, just get rid of them.

Now, feel free to create a submodule to start from scratch. But keep in mind that I can help in integrating stuff in the main submodule, thus keeping the new submodule an experimental scope. Because if you don't want to take care of the refactoring tasks, I am willing to. Since it is the kind of task I am used to and take some pride in doing. I do a lot of refactoring in my personal projects and in my job too. So just ask for it when you would like something integrated, and I can come up with suggestions. And if you don't like it, just continue with your experimental submodule.

@matthieu-vergne
Copy link
Contributor Author

matthieu-vergne commented Jul 22, 2020

If I had to do it iteratively, here is the plan I would follow:

Generalyze from the many XxxSolution extensions to the unique Solution<Xxx>

The first step is to simplify the types hierarchy by going back to the generic interface Solution.Thus we need to We can rely on its attribute to store additional data:

  • Create an interface like this one (although I would probably rename it Property or alike) to read/write additional values from/in solutions : jMetal/jMetal@ed79bba
  • Provide an implementation for each additional property brought by extensions of Solution, like I did here for DoubleSolution: jMetal/jMetal@1215981
  • Progressively refactor all the components to use these objects instead of the specific methods of the interfaces, using these implementations by default. As discussed earlier, I will favor factory methods over constructors.
  • After having no specific dependency anymore, replace all the XxxSolution by the more generic Solution<Xxx>

Allow the use of custom solution generators

At this point, we don't need the extensions anymore, but we are still generating solutions from the solutions themselves (copy) and from problems (createSolution), so they extensions are still in use. We must first allow their replacement:

  • Generalize the no-arg instantiators (random, problem) by Supplier objects, with the current components (random, problem) used as default. Again, I will favor factory methods over constructors.
  • Generalize the copies, which take another solution in argument, by UnaryOperator<Solution<T>>, maybe making our on interface for readability, like SolutionCopier<T>.copy(solution). We can use Solution.copy() as default again.

Use of a generic Solution implementation

Now that all components can be extensively customized on their solution instanciations, let's exploit it:

  • Create a proper generic implementation of Solution, like here but still with the primitive arrays, abstraction is for later: jMetal/jMetal@deca799
  • For each component having a factory method using legacy instantiators as defaults (problem/copy), deprecate the method and add one with this implementation as default instead (replace problem/copy with relevant solution factory methods).
  • Replace all the calls to the deprecated method by the new one.
  • Deprecate all the Solution extensions and their implementations. Since they are only used in deprecated factory methods & constructors, no additional migration work should be necessary here.
  • Deprecate Solution.copy(), since they are replaced by solution factory methods.

Replace primitive arrays by abstractions

At this point, we primarily rely on the generic Solution implementation but it still uses arrays, so let's change that:

  • Extends the Solution interface to provide abstractions, like here: jMetal/jMetal@1a81cb0
  • Create a new generic implementation of Solution optimized for abstractions, like here: jMetal/jMetal@deca799
  • For each component using the "primitive" version as default, refactor the component to use the new methods instead and replace the default by the "abstracted" version. At this point, those using the deprecated constructors which still rely on the old Solution extensions will have an additional conversion, although it should remain small if we smartly use streams/iterators where relevant.
  • Deprecate the "primitive" implementation once no component uses it anymore.
  • Deprecate the "primitive" methods of Solution, since they have been replaced by the abstracted ones.

Simplify Solution uses

Since get/set specific variables can be done through the already available list, no need for the additional getter and setter. Same thing for the others. So:

  • Refactor the variables
    • Add Solution.variables() to return the list of variables
    • Replace all the calls to Solution.getVariable() by variables().get()
    • Replace all the calls to Solution.setVariable() by variables().set()
    • Replace all the calls to Solution.getNumberOfVariables() by variables().size()
    • Deprecate all the legacy variable methods
  • Refactor the objectives
    • Add Solution.objectives() to return the list of objectives
    • Replace all the calls to Solution.getObjective() by objectives().get()
    • Replace all the calls to Solution.setObjective() by objectives().set()
    • Replace all the calls to Solution.getNumberOfObjectives() by objectives().size()
    • Deprecate all the legacy objective methods
  • Refactor the constraints
    • Add Solution.constraints() to return the list of constraints
    • Replace all the calls to Solution.getConstraint() by constraints().get()
    • Replace all the calls to Solution.setConstraint() by constraints().set()
    • Replace all the calls to Solution.getNumberOfConstraints() by constraints().size()
    • Deprecate all the legacy constraints methods
  • Refactor the attributes
    • Add Solution.attributes() to return the map of attributes
    • Replace all the calls to Solution.getAttribute() by attributes().get()
    • Replace all the calls to Solution.setAttribute() by attributes().set()
    • Replace all the calls to Solution.hasAttribute() by attributes().containsKey()
    • Deprecate all the legacy attributes methods

At this point, those using the deprecated constructors which still rely on the old Solution extensions will be enforced to use the abstraction, so having a systematic additional conversion for objectives and constraints. However, this is the last run before the new major version, so it is really time to migrate now from this deprecated stuff.

New major version

If not done yet, it would be wise to plan a new major version here. By releasing the current state as the last minor version for the current major, people would just have to update to it to get all the deprecation warnings and go through them for proper migration (each deprecated element is documented with the @deprecated javadoc tag to tell how to migrate).

By removing all the deprecated stuff and releasing as the next major version, those having migrated can just replace by the new major version with no problem.

@matthieu-vergne
Copy link
Contributor Author

matthieu-vergne commented Jul 22, 2020

Steps like creating a class are atomic, so they are done in one shot. Steps like replacing methods just go progressively, searching for current uses with Eclipse and replacing them one after another. They can be done all at once in a single commit, or just changing few of them at a time. This is what allows a progressive refactoring. And since we deprecate without removing anything, we never break existing code. Only the major version breaks, but only what has been already deprecated and provided in a previous version for users to migrate safely.

@matthieu-vergne
Copy link
Contributor Author

matthieu-vergne commented Jul 22, 2020

Worth noting, after the major version is released, all the deprecated stuff is gone. Especially the annoying sets of constructors plugged to Solution extensions. So it opens the possibility to generalize all the relevant components to deal with more types (for now, we have only replaced extensions by their generic counter part, but never generalized further).

@ajnebro
Copy link
Contributor

ajnebro commented Jul 23, 2020

I have just updated https://github.com/jMetal/jMetal with the contents of the now defunct jMetal 6 project.

As I suppose that we cannot copy or move this issue to the other project, we could open a new one there with the contents of this post: #3 (comment).

Then, I will delete this project.

@matthieu-vergne
Copy link
Contributor Author

jMetal/jMetal#411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants