Skip to content

Latest commit

 

History

History
408 lines (349 loc) · 35 KB

syllabus.md

File metadata and controls

408 lines (349 loc) · 35 KB

Syllabus for APPM 5630 Advanced Convex Optimization

Spring 2023, Instructor: Stephen Becker

Since there is no established optimization curriculum at CU Boulder, we will attempt to cover a lot of material in just one semester. We can divide topics into four broad categories:

  • Analysis: Basic theory (convex analysis, optimality, duality)
  • Methods: Standard methods, both classical and modern, and standard classifications. The idea is to give students a toolkit of standard approaches to solve actual problems.
  • Convergence: Convergence theory (ellipsoid, Nesterov, etc.). This involves some very beautiful mathematics (and is also useful!)
  • Applications: Often in signal processing and machine learning

The course investigates landmark convex optimization algorithms and their complexity results. We aim to cover both key theory, such as polynomial time algorithms for linear programming and the optimality of Nesterov’s method, while also surveying current practical state-of-the-art methods.

Official course description

Investigates landmark convex optimization algorithms and their complexity results. Studies both theory while also surveying current practical state-of-the-art methods. Topics may include Fenchel-Rockafellar duality, KKT conditions, proximal methods, and Nesterov acceleration.

Related courses at CU

This course is similar to the CS department's CSCI 5254 but goes a bit farther.

Here's a list of optimization classes taught at CU

Principal Topics

convex sets, convex analysis, Fenchel-Legendre conjugates, Fenchel-Rockafellar and Lagrangian duality, KKT and Slater’s conditions, conic optimization, (stochastic) gradient descent, calculating gradients, proximal methods, constrained optimization

Learning Goals/Outcomes

Students will learn the mathematics behind convex analysis, and why convexity is useful in optimization problems. Students will understand the setup and optimality of optimization problems, and how math can be used to provide explicit certificates of sub-optimality as well as implicit certificates via worst-case convergence rates. Students will know how to program core methods, applied to a few examples, and learn to construct programs in a modular, debuggable fashion. Students will synthesis their knowledge by producing a final project demonstrating application of optimization to real-world problems

Detailed list/schedule of Lectures/Topics

Mon Wed Fri
Week 1: Jan 11-15 No class No class 1
Week 2: Jan 18-22 No class 2 3
Week 3: Jan 25-29 4* 5 6 (HW 1-2 due)
Week 4: Feb 1-5 7** 8 9
Week 5: Feb 8-12 10 11 12 (HW 3-4 due)
Week 6: Feb 15-19 13 No class 14
Week 7: Feb 22-26 15 16 17 (HW 5-6 due)
Week 8: Mar 1-5 18 19 20
Week 9: Mar 8-12 21 22 23
Week 10: Mar 15-19 24 25 26 (HW 7-8 due)
Week 11: Mar 22-26 "spring pause" 27 28*** 29
Week 12: Mar 29-Apr 2 30 31 32 (HW 9-10 due)
Week 13: Apr 5-9 33 34 35
Week 14: Apr 12-16 36 37 38 (no HW, work on projects)
Week 15: Apr 19-23 39 40 41
Week 16: Apr 26-30 42 43 No class

Legend: * means last day to add a class (without instructor approval), ** is last day to drop a class (without a W), *** is last day to withdraw from a class (student receives a W)

HW 1-2 is due early, but I don't want to push the due-date back farther because I want students to get feedback on a homework before they have to decide about withdrawing from the class (Feb 1, lecture #7).

Spring 2021

Here's what we did in Spring 2021.

  1. Fri, Jan 15

    • Watch "Introduction to Optimization Problems" (36 min); notes.
    • No class on Mon, Jan 18 (MLK Jr day)
  2. Wed, Jan 20

  3. Fri, Jan 22

  4. Mon, Jan 25

  5. Wed, Jan 27

  6. Fri, Jan 29

  7. Mon, Feb 1

  8. Wed, Feb 3

  9. Fri, Feb 5

  10. Mon, Feb 8

  11. Wed, Feb 10

  12. Fri, Feb 12

  13. Mon, Feb 15

    • We covered ch 4.1 and part of 4.2 in Boyd and Vandenberghe
    • I mentioned "variational inequalities" ("VI") when we talked about Euler's inequalities
    • Wed, Feb 17: No class! Enjoy the day off
  14. Fri, Feb 19

  15. Mon, Feb 22

    • More on SDPs, LMIs
  16. Wed, Feb 24

    • Section 5.1, Lagrange dual function (and some of 5.2, Lagrange dual problem)
  17. Fri, Feb 26

  18. Mon, March 1

  19. Wed, March 3

  20. Fri, march 5

  21. Mon, March 8

    • Finish up KKT equations
      • in the same notes, discuss when you can and cannot drop constraints that were not active
    • Example of using KKT equations to solve a problem in "closed form" (e.g., reduce it to solving a linear system)
    • Activity on using KKT equations (and quiz on Lagrangian and KKT equations)
  22. Wed, March 10 (in person/hybrid)

    • Sensitivity/perturbation analysis, ch 5.6 in Boyd and Vandenberghe
    • 5 min summary of ch 5.9 "generalized inequalties"
  23. Fri, March 12 (in person/hybrid)

    • We're done with Boyd and Vandenberghe mostly
    • Starting algorithms, beginning with proximal gradient descent
  24. Mon, March 15

    • no school (snow day)
  25. Wed, March 17

    • Convergence analysis of gradient descent (non-strongly convex case)
    • Intro to worst-case first-order method result, and Nesterov's accelerated gradient descent (and proximal version, aka FISTA)
  26. Fri, March 19

    • Convergence analysis of gradient descent (strongly convex case, using the PL inequality)
    • Discussion of error metrics and termination criteria
    • Discussion of convergence rates (sub-linear, linear, quadratic)
  27. Mon, March 22

  28. Wed, March 24

  29. Fri, March 26

  30. Mon, Mar 29

  31. Wed, Mar 31

    • trust-regions
    • non-linear least-squares (Gauss-Newton, Levenberg-Marquardt)
  32. Fri, Apr 2

    • constrained optimization overview
    • penalty methods (quadratic and exact)
  33. Mon, Apr 5

    • Augmented Lagrangian; Sequential Quadratic Programming
  34. Wed, Apr 7

    • Interlude: finding gradients, DFO and analytic methods
  35. Fri, Apr 9

  36. Mon, Apr 12

  37. Wed, Apr 14

    • Finish adjoint state method
    • Start analyzing Newton's method via self-concordancy, cf. Boyd & Vandenberghe textbook
  38. Fri, Apr 16

  39. Mon, Apr 19

    • Alternating Minimization and Coordinate Descent
    • ADMM
  40. Wed, Apr 21

  41. Fri, Apr 23

    • Linear programs: simplex method (Nocedal and Wright), and complexity (Nemirovski)
    • didn't get to Integer Linear Programs (R. Vanderbei), so will try to post notes on that
  42. Mon, Apr 26 (projects due, project presentations)

  43. Wed, Apr 28 (project presentations, last day of class)

  44. Fri, Apr 30 (extra presentations; attend 2 of 3 lectures this week)

Fall 2018

Here's what we did in Fall 2018. Spring 2021 will be somewhat similar.

  1. Intro
  2. Intro
  3. Convex Sets [BV04 ch 2]
  4. cvx, cvxpy tutorial
  5. Convex sets, PSD cone, separating hyperplanes [BV04 ch2]
  6. Convex functions [BV04 ch 3]
  7. Convex functions, subgradients, commuting addition and subgradients [BC11]
  8. subgradients, more on convex fcn not from [BV04], strong/strict convexity; see handout on Lipschitz bounds
  9. Convex functions, sections 3.1--3.3 in [BV04] (Fenchel-Legendre conjugate functions)
  10. 3.3 in [BV04], convex relaxation f^** [BC11], Fenchel-Young
  11. convex relaxation f^** [BC11], [Luc10]; existence/uniqueness of minimizers [BC11 ch 11]; proximity operator [BC11, CP11]
  12. prox, Moreau envelope [CW05]
  13. prox, Moreau envelope [CW05]; Convex Optimization [BV04 ch 4]; gradient descent
  14. convergence of gradient descent [Van16]; fire drill!
  15. worst-case counter-example for 1st order methods [Nes04]
  16. Convex Optimization [BV04 ch 4.2]
  17. Convex Optimization, LP/QP/SOCP/SDP [BV04 ch 4.2-3, 4.6]
  18. SDP/LMI [BV04 ch 4.6], Duality and Lagrangians [BV04 ch 5.1]
  19. LP duality [Van08], sensible rules for remembering duals (SOB) [Ben95], Slater
  20. Slater and strong duality [BV04 ch5.2], geometric interpretation [BV04 ch5.3]
  21. Game theory [BV04 ch 5.2.5; 5.4.3], saddle point interpretation [BV04 ch 5.4]; some Fenchel-Rockafellar duality
  22. Fenchel-Rockafellar duality [BC11, chs15,18-19]
  23. Optimality and KKT conditions [BV04 ch5.5], perturbation and sensitivity [BV04 ch5.6]
  24. perturbation and sensitivity [BV04 ch5.6], generalized inequalities (eg. SDP) [BV04 ch5.9]
  25. Interlude on proximal [CW05][Van16] and accelerated (Nesterov) gradient methods [Nes04][Van16][BT10]; convergence rates
  26. Fast/fancy algorithms for unconstrained optimization [NW05]: Nesterov, Newton, linear/non-linear conjugate gradient
  27. Fast/fancy algorithms for unconstrained optimization [NW05]: quasi-Newton and L-BFGS
  28. Fast/fancy algorithms for unconstrained optimization [NW05]: inexact/matrix-free Newon (Newton-CG), non-linear least-squares and Gauss-Newton, Levenberg-Marquardt, active-set methods
  29. Constrained optimization [NW05]: penalty methods, augmented Lagrangian
  30. derivatives; some thms from [BC11], [Ber99]
  31. derivatives, automatic differentiation [NW05]
  32. adjoint-state method
  33. Newton's method, self-concordant [BV04 ch 9]
  34. Newton's method, self-concordant [BV04 ch 9]
  35. Newton for equality constraints [BV04 ch 10], interior-point methods [BV04 ch 11]
  36. interior-point methods [BV04 ch 11]
  37. First order methods: proximal pt, subgradient descent, conditional gradient aka Frank-Wolfe
  38. First order methods: ADMM [Boyd10], Douglas-Rachford [BC17], [CP11]
  39. Primal-dual method [Con13][CBS14][CCPV14][ChPo11]
  40. Simplex method [NW05]
  41. Complexity of LP and Simplex [Nem16], smoothed analysis, integer linear programming [Van08]; LP in fixed dimension [DMW97]
  42. STUDENT PRESENTATIONS
  43. STUDENT PRESENTATIONS
  44. STUDENT PRESENTATIONS

References (books)

  • [BV04] S. Boyd and L. Vandenberghe, “Convex Optimization” (Cambridge U. Press, 2004). Free electronic version
  • [NW05] J. Nocedal and S. Wright, “Numerical Optimization” (Springer, 2005). We have free electronic access at SpringerLink
  • [Nem16] A. Nemirovski, Introduction to Linear Optimization, lecture notes
  • [Van08] R. Vanderbei, “Linear Programming.” (2008) Free download at Vanderbei's website and also via SpringerLink.
  • [BC11] H. Bauschke and P. Combettes, “Convex Analysis and Monotone Operator Theory in Hilbert Spaces”, 1st ed, Springer 2011, available electronically via SpringerLink
  • [BC17] H. Bauschke and P. Combettes, “Convex Analysis and Monotone Operator Theory in Hilbert Spaces”, 2nd ed, Springer 2017, available electronically via SpringerLink
  • [Ber99] D. Bertsekas, "Nonlinear Programming", 2nd ed, Athena Scientific, 1999.
  • [Ber16] D. Bertsekas, "Nonlinear Programming", 3rd ed, Athena Scientific, 2016.
  • [Nes04] Y. Nesterov, "Introductory Lectures on Convex Optimization" (2004); Springer
    • A new version is out, "Lectures On Convex Optimization" Nesterov 2018, but it has rearranged the chapters a bit, so you need to adjust the numbering quite a bit

References (papers)

Resources

We will not follow a single textbook. The following resources (mostly available online) will be the primary references:

  • [BV2004] S. Boyd and L. Vandenberghe, “Convex Optimization” (Cambridge U. Press, 2004). Free electronic version This is a standard text that gives some convex analysis background and focuses on interior-point methods and engineering applications. The books is very useful, and I recommend you buy a copy. We will follow the book closely at the beginning of the semester. The appendices A and C are also quite useful if you do not have an applied math background.
  • [V_course] L. Vandenberghe’s course notes for ee236b follow the book, while his course notes for ee236c contain more details on proofs and advanced topics.
  • [NW05] J. Nocedal and S. Wright, “Numerical Optimization” (Springer, 2005). We have free electronic access at CU via SpringerLink. This is the go-to reference for implementing a standard method.
  • [Beck17] Amir Beck, “First-Order Methods in Optimization” (SIAM, 2017). A more advanced text than Beck’s 2014 text; free on campus at SIAM eBooks
  • [BC17] H. Bauschke and P. Combettes, “Convex Analysis and Monotone Operator Theory in Hilbert Spaces”, 2nd ed, Springer 2017, available electronically via SpringerLink. A nice collection of results, best as a reference and not as a step-by-step textbook. Sometimes we refer to the 2011 1st edition.

Some supplemental resources:

And a fun miscellaneous reference:

Programming

Homeworks will involve by mathematical analysis and programming.

Students are expected to already know how to program. We encourage using Python and Matlab; Julia is another good choice though we will not be using it explicitly. For homework assignments, usually the deliverable is the outcome of some code, so therefore the student may choose any reasonable programming language. However, we will be doing demonstrations in Python and/or Matlab (and the instructor is best at debugging Python and Matlab). Most of our demonstrations will be using github in conjunction with python via colab.

For Matlab, we'll use CVX (though yalmip is also good); for Python, we'll use cvxpy; for Julia, try Convex.jl. These are all good prototyping packages.

Matlab and Python mainly rely on small third-party packages. Julia has more "official"-like packages. Most of them use the JuMP framework, including big packages like JuliaSmoothOptimizers / SolverTools.jl and JuliaNLSolvers / Optim.jl. One of the nice things about Julia is that it has very good automatic differentiation support (see a discussion of Julia AD packages). Other nice things about Julia: very active and friendly community that are eager to help if you post in the forums; code executes very rapidly (like C and Fortran); high-level programming like Python and Matlab; lots of packages, and increasing quickly (way behind Python overall, but catching up in terms of specialized numerics packages); easy to call Python libraries from within Julia.

We recommend use of an IDE or something like JupyterLab.

We do not recommend using C, C++ or Fortran for programming in this class; these languages take a lot longer to prototype in, and the performance can be fast but only if you do it right (and usually requires calling the right kind of libraries, like NAG or GLPK or using suites like PETSc, Trilinos, Eigen, etc.)

For automatic differentiation in Matlab, try ADiMat; in Python, try JAX

More details on topics

We don't present the lectures in the following order (lectures intermingle topics more, so that students can get started on interesting homeworks right away), but thematically, the following is what we roughly cover:

Section 1: Background, duality, motivation

  1. Intro
  2. Convex Sets (ch 2 in [BV2004])
  3. Convex Functions (ch 3 in [BV2004])
  4. Convex Analysis, subdifferentials ([BauschkeCombettes])
  5. Convex Optim. Problems (ch 4 in [BV2004])
  6. Duality, optimality (ch 5 in [BV2004])
  7. Some applications, ERM and machine learning, sparse recovery, image denoising, low-rank matrix recovery
  8. CVX, CVXPY, convex.jl tutorials

Section 2: Conic programs & relaxations

  1. Linear programming
    • standard LP theory, duality ([WrightIPM])
    • simplex method, complexity, smoothed analysis ([Vanderbei_LP], [Nemirovski_LP])
    • IPM ([WrightIPM], [BV2004])
    • ellipsoid method (TBD, [Nemirovski_LP])
  2. Integer Linear programming ([Vanderbei_LP])
  3. SDP ([BV2004], Pataki?)
  4. QP, convex and non-convex (S-procedure, relaxations) ([Nemirovski_LP],[BN_Modern], [BV2004])
  5. MAXCUT relaxation ([Nemirovski_LP],[BN_Modern])
  6. Polynomial optimization, SOS ([Nemirovski_LP],[BN_Modern])

Section 3: Algorithms

  1. Gradient Descent ([V_course])
  2. Accelerated and projected/proximal gradient descent, optimality ([V_course])
  3. Classical methods ([NW05], Bertsekas, [Nesterov2004])
    • Subgradient descent
    • Newtons method, self-concordancy
    • Non-linear conjugate gradient d) quasi-Newton
    • Levenberg-Marquardt, Gauss-Newton f) Augmented Lagrangian
    • Sequential Quadratic Programming (SQP)
    • Step-size (Barzilai-Borwein), line search
  4. Recent methods
    • Douglas-Rachford and ADMM
    • Primal-dual methods (e.g., Condat)
    • Conditional Gradient and Frank-Wolfe d) (Randomized) Coordinate Descent
    • Stochastic gradient descent
    • State-of-the-art variants (SDCA, SAGA, etc.)

Possible additional topics

  1. Benders decomposition
  2. Geometric Programming, linear-fractional
  3. Online Convex Optimization
  4. Mirror Descent
  5. Bregman distances
  6. Robust optimization
  7. Randomized Kaczmarz method; Algebraic Reconstruction Technique (ART) for tomography
  8. POCS (projection onto convex sets), best approximation, Dykstra’s algorithm