Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct Projection onto Unit Simplex #27

Closed
aleixalcacer opened this issue Jun 7, 2024 · 5 comments
Closed

Direct Projection onto Unit Simplex #27

aleixalcacer opened this issue Jun 7, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@aleixalcacer
Copy link
Owner

Implement a direct projection onto unit simplex insted of an alternating projection.

See https://math.stackexchange.com/questions/2005154/how-to-project-onto-the-unit-simplex-as-intersection-of-two-sets-optimizing-a-c

@wangzcl
Copy link
Collaborator

wangzcl commented Jul 22, 2024

I have finished projection (both direct one and the L1-Normalization in Morup and Hansen) and PGDs for NumPy in wangzcl/archetypes@7b3d07d. I will explain my design a little bit later.

@wangzcl
Copy link
Collaborator

wangzcl commented Jul 23, 2024

_projection.py:
I wrote an abstract base class Projector and two derived subclasses, UnitSimplexProjector (for the naive algorithm 1 mentioned in Condat, 2015) and L1NormalizeProjector (first non-negative, then normalize, mentioned in Morup and Hansen). The Projector.project(x) method projects (or maps) each row of NumPy matrix x onto the unit simplex. To initialize the projector, shape of the array to project has to be provided. Call the class method Projector.from_array(x) to init the instance.

The reason for doing so is, projection requires additional memories for all kinds of intermediate arrays. If memories allocated every time we do the projection, the performance would be intolerable. Projector allocates memories for all (or most of) temporal memories used in the projection in the first place when an object is initialized. This takes advantage of the fact that in the optimization loops of one AA instance, we are dealing with arrays of fixed shapes.

The current unit simplex projection algorithm may not be the fastest on, but it is easy to implement and vectorize. However, there is still some unavoidable memory allocation in UnitSimplexProjector, which is unavoidable unless 1. applying a Python loop on ndarray; 2. rewrite in Cython or 3. switch to more complicated methods.

@wangzcl
Copy link
Collaborator

wangzcl commented Jul 23, 2024

_optimize.py

  • PGD
    • PGD_AA
      • PGD_Optimize_A
      • PGD_Optimize_B

The same memory allocation issue may also be in PGD processes. To improve the performance and reduce repetition in the code, I created an abstract base class PGD that defines generic behavior of PGD optimizers. To define a PGD optimizer for a specific problem, one just need to inherit PGD, defining the objective function and gradient manually (and perhaps allocate memory for intermediate arrays). I use an abstract PGD_AA class to describe some common features used in optimizing A and B in archetypal analysis, and two subclasses of this class, PGD_Optimize_A and PGD_Optimize_B to perform the PGD steps. I also defined two class decorators, @line_search to add line search in the PGD steps, and @l1_norm_proj to modify the form of gradient to fit the L1NormalizeProjector. This architecture takes advantage of Python type system's features, is easy to extend, avoids most of the repetition and runtime memory allocation.

@wangzcl
Copy link
Collaborator

wangzcl commented Jul 24, 2024

The new projection and PGD has not finished yet. I haven't done any tests for the new utilities, but if you think this possible improvement works for you, I can go on finishing, testing and benchmarking all those.

@wangzcl
Copy link
Collaborator

wangzcl commented Jan 6, 2025

closed with #43 ~ #48

@wangzcl wangzcl closed this as completed Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants