Syllabus for APPM 5630 Advanced Convex Optimization

Spring 2023, Instructor: Stephen Becker

Since there is no established optimization curriculum at CU Boulder, we will attempt to cover a lot of material in just one semester. We can divide topics into four broad categories:

Analysis: Basic theory (convex analysis, optimality, duality)
Methods: Standard methods, both classical and modern, and standard classifications. The idea is to give students a toolkit of standard approaches to solve actual problems.
Convergence: Convergence theory (ellipsoid, Nesterov, etc.). This involves some very beautiful mathematics (and is also useful!)
Applications: Often in signal processing and machine learning

The course investigates landmark convex optimization algorithms and their complexity results. We aim to cover both key theory, such as polynomial time algorithms for linear programming and the optimality of Nesterov’s method, while also surveying current practical state-of-the-art methods.

Official course description

Investigates landmark convex optimization algorithms and their complexity results. Studies both theory while also surveying current practical state-of-the-art methods. Topics may include Fenchel-Rockafellar duality, KKT conditions, proximal methods, and Nesterov acceleration.

Related courses at CU

This course is similar to the CS department's CSCI 5254 but goes a bit farther.

Here's a list of optimization classes taught at CU

Principal Topics

convex sets, convex analysis, Fenchel-Legendre conjugates, Fenchel-Rockafellar and Lagrangian duality, KKT and Slater’s conditions, conic optimization, (stochastic) gradient descent, calculating gradients, proximal methods, constrained optimization

Learning Goals/Outcomes

Students will learn the mathematics behind convex analysis, and why convexity is useful in optimization problems. Students will understand the setup and optimality of optimization problems, and how math can be used to provide explicit certificates of sub-optimality as well as implicit certificates via worst-case convergence rates. Students will know how to program core methods, applied to a few examples, and learn to construct programs in a modular, debuggable fashion. Students will synthesis their knowledge by producing a final project demonstrating application of optimization to real-world problems

Detailed list/schedule of Lectures/Topics

	Mon	Wed	Fri
Week 1: Jan 11-15	No class	No class	1
Week 2: Jan 18-22	No class	2	3
Week 3: Jan 25-29	4*	5	6 (HW 1-2 due)
Week 4: Feb 1-5	7**	8	9
Week 5: Feb 8-12	10	11	12 (HW 3-4 due)
Week 6: Feb 15-19	13	No class	14
Week 7: Feb 22-26	15	16	17 (HW 5-6 due)
Week 8: Mar 1-5	18	19	20
Week 9: Mar 8-12	21	22	23
Week 10: Mar 15-19	24	25	26 (HW 7-8 due)
Week 11: Mar 22-26 "spring pause"	27	28***	29
Week 12: Mar 29-Apr 2	30	31	32 (HW 9-10 due)
Week 13: Apr 5-9	33	34	35
Week 14: Apr 12-16	36	37	38 (no HW, work on projects)
Week 15: Apr 19-23	39	40	41
Week 16: Apr 26-30	42	43	No class

Legend: * means last day to add a class (without instructor approval), ** is last day to drop a class (without a W), *** is last day to withdraw from a class (student receives a W)

HW 1-2 is due early, but I don't want to push the due-date back farther because I want students to get feedback on a homework before they have to decide about withdrawing from the class (Feb 1, lecture #7).

Spring 2021

Here's what we did in Spring 2021.

Fri, Jan 15
- Watch "Introduction to Optimization Problems" (36 min); notes.
- No class on Mon, Jan 18 (MLK Jr day)
Wed, Jan 20
- 01_TypesOfMinimizers_IntroConvexity.pdf
Fri, Jan 22
- Finish notes from last class
- New notes on convex sets (part 1)
Mon, Jan 25
- Optional lecture on "appendix 01" material, corresponding notes
- Main lecture (flipped class style), corresponding notes, 03_ConvexSets_part2
- In-class cvx/cvxpy demo
Wed, Jan 27
- Class flipped again, finish up demo or do homework questions
- Main lecture
  - Separating hyperplanes (25 min, notes: 04_SeparatingHyperplanes_Farkas.pdf) and Farkas lemma
  - Convex functions; notes: 05_ConvexFunctions_part1.pdf
Fri, Jan 29
- Notes (05_ConvexFunctions_part2.pdf)
Mon, Feb 1
- Finish Notes (05_ConvexFunctions_part2.pdf), start Notes 05_ConvexFunctions_part3_LipschitzGradient.pdf (which as associated Handouts/StrongConvexityLipshitz.pdf
Wed, Feb 3
- Finish 05_ConvexFunctions_part3_LipschitzGradient.pdf
- Cover 05_ConvexFunctions_part4_examples.pdf
Fri, Feb 5
- Cover 05_ConvexFunctions_part5_preservingConvexity.pdf
- Cover most of 06_ConjugateFunctions.pdf
Mon, Feb 8
- Finish conjugate functions (postponed for later)
- Start Intro to gradient descent: 07_GradientDescent_intro.pdf "interlude" since this week's HW discusses "first-order methods"
Wed, Feb 10
- Finish conjugate functions: 06_ConjugateFunctions.pdf
Fri, Feb 12
- Cover existence and uniqueness of minimizers: 08_ExistenceUniquenessMinimizers.pdf
- Start proximity operators: 09_ProximityOperators.pdf. The "Supplementary material: Moreau Envelope" and later content is optional.
Mon, Feb 15
- We covered ch 4.1 and part of 4.2 in Boyd and Vandenberghe
- I mentioned "variational inequalities" ("VI") when we talked about Euler's inequalities
  - for a short intro to VI and related "complementarity problems", see the note supplement_VariationalInequalities_and_LCP
- Wed, Feb 17: No class! Enjoy the day off
Fri, Feb 19
- ch 4.2 in Boyd and Vandenberghe, then ch 4.3, 4.4, 4.6 (as time permits)
- 11_FirstViewLagrangeMultipliers
- start 12_ConicOptimizationProblems
- If you want to connect the Lagrange multipliers we saw in lecture with what you may have seen in your calculus course, see my 1 page note: supplement_LagrangeMultipliersIn2D
Mon, Feb 22
- More on SDPs, LMIs
Wed, Feb 24
- Section 5.1, Lagrange dual function (and some of 5.2, Lagrange dual problem)
Fri, Feb 26
- Section 5.2, more duality: handwritten notes: 15_MoreDuality.pdf
  - see also supplement_Slater_PrimalNotAchieved
Mon, March 1
- Section 5.4 (Saddle Point interpretation) & Problems with shared Lagrangians
  - notes: 16_SaddlePtsSharedLagrangians.pdf
- Section 5.2.5 (Game theory interpretation)
  - notes: 17_GameTheoryConnections.pdf
Wed, March 3
- Finish game theory interpretation
- Fenchel-Rockafellar Duality: 18_FenchelRockafellarDuality.pdf
Fri, march 5
- Finish Fenchel-Rockafellar Duality
- KKT equations and complementary slackness: 19_KKT_and_complementarySlackness.pdf
Mon, March 8
- Finish up KKT equations
  - in the same notes, discuss when you can and cannot drop constraints that were not active
- Example of using KKT equations to solve a problem in "closed form" (e.g., reduce it to solving a linear system)
- Activity on using KKT equations (and quiz on Lagrangian and KKT equations)
Wed, March 10 (in person/hybrid)
- Sensitivity/perturbation analysis, ch 5.6 in Boyd and Vandenberghe
- 5 min summary of ch 5.9 "generalized inequalties"
Fri, March 12 (in person/hybrid)
- We're done with Boyd and Vandenberghe mostly
- Starting algorithms, beginning with proximal gradient descent
Mon, March 15
- no school (snow day)
Wed, March 17
- Convergence analysis of gradient descent (non-strongly convex case)
- Intro to worst-case first-order method result, and Nesterov's accelerated gradient descent (and proximal version, aka FISTA)
Fri, March 19
- Convergence analysis of gradient descent (strongly convex case, using the PL inequality)
- Discussion of error metrics and termination criteria
  - See the handout SubOptimalityBounds.pdf on github
- Discussion of convergence rates (sub-linear, linear, quadratic)
Mon, March 22
- See supplementary notes (handout) on unified analysis of subgradient/gradient descent and on analysis of Nesterov acceleration
  - Finish up convergence rates
  - convergence rate demo in python
- listen_to_Handel demo in python for HW 9/10
- Start conjugate gradient and nonlinear conjugate gradient
  - dense vs sparse demo
Wed, March 24
- (Linear) conjugate gradient method: 25_ConjugateGradientMethod.pdf
Fri, March 26
- finish up CG (discuss non-linear CG); see same notes as Wednesday
- start quasi-Newton methods: 26_QuasiNewtonMethods.pdf
Mon, Mar 29
- finish up quasi-Newton methods, and discuss L-BFGS
- discuss inexact Newton
- optional supplement on convergence of gradient descent
Wed, Mar 31
- trust-regions
- non-linear least-squares (Gauss-Newton, Levenberg-Marquardt)
Fri, Apr 2
- constrained optimization overview
- penalty methods (quadratic and exact)
Mon, Apr 5
- Augmented Lagrangian; Sequential Quadratic Programming
Wed, Apr 7
- Interlude: finding gradients, DFO and analytic methods
Fri, Apr 9
- Interlude: automatic differentiation (notes: 29_AutomaticDifferentiation.pdf)
- Jupyter notebook demo
Mon, Apr 12
- More calculus rules (for min/max/integral functions, see GradientsParameterizedFunctions: 30_GradientsParameterizedFunctions.pdf )
- Start adjoint state method
Wed, Apr 14
- Finish adjoint state method
- Start analyzing Newton's method via self-concordancy, cf. Boyd & Vandenberghe textbook
Fri, Apr 16
- Newton's method via self-concordancy
- Newton's method with (linear) equality constraints
- Interior-Point methods (very brief; see Boyd and Vandenberghe for more)
- corresponding handwritten notes for today: 31_NewtonAndIPM.pdf
Mon, Apr 19
- Alternating Minimization and Coordinate Descent
- ADMM
Wed, Apr 21
- ADMM, Douglas-Rachford
- Primal-Dual methods
  - Further reading on these things:
    - Mark Schmidt's notes on big-n problems (2012)
    - Dvurechenskya et al. First-Order Methods for Convex Optimization (2021)
    - Amir Beck's 2017 book
    - Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics, Volkan Cevher, Stephen Becker, Mark Schmidt, IEEE Signal Processing Magazine, vol. 31, no. 5, 2014
    - Marc Teboulle's A simplified view of first order methods for optimization (2018)
    - Combettes, Condat et al. A forward-backward view of some primal-dual optimization methods in image recovery (2014)
- my handwritten notes: 32_ADMM_DRS_PrimalDual.pdf
Fri, Apr 23
- Linear programs: simplex method (Nocedal and Wright), and complexity (Nemirovski)
- didn't get to Integer Linear Programs (R. Vanderbei), so will try to post notes on that
  - Update: here are the supplementary notes on ILP
Mon, Apr 26 (projects due, project presentations)
Wed, Apr 28 (project presentations, last day of class)
Fri, Apr 30 (extra presentations; attend 2 of 3 lectures this week)

Fall 2018

Here's what we did in Fall 2018. Spring 2021 will be somewhat similar.

Intro
Intro
Convex Sets [BV04 ch 2]
cvx, cvxpy tutorial
Convex sets, PSD cone, separating hyperplanes [BV04 ch2]
Convex functions [BV04 ch 3]
Convex functions, subgradients, commuting addition and subgradients [BC11]
subgradients, more on convex fcn not from [BV04], strong/strict convexity; see handout on Lipschitz bounds
Convex functions, sections 3.1--3.3 in [BV04] (Fenchel-Legendre conjugate functions)
3.3 in [BV04], convex relaxation f^** [BC11], Fenchel-Young
convex relaxation f^** [BC11], [Luc10]; existence/uniqueness of minimizers [BC11 ch 11]; proximity operator [BC11, CP11]
prox, Moreau envelope [CW05]
prox, Moreau envelope [CW05]; Convex Optimization [BV04 ch 4]; gradient descent
convergence of gradient descent [Van16]; fire drill!
worst-case counter-example for 1st order methods [Nes04]
Convex Optimization [BV04 ch 4.2]
Convex Optimization, LP/QP/SOCP/SDP [BV04 ch 4.2-3, 4.6]
SDP/LMI [BV04 ch 4.6], Duality and Lagrangians [BV04 ch 5.1]
LP duality [Van08], sensible rules for remembering duals (SOB) [Ben95], Slater
Slater and strong duality [BV04 ch5.2], geometric interpretation [BV04 ch5.3]
Game theory [BV04 ch 5.2.5; 5.4.3], saddle point interpretation [BV04 ch 5.4]; some Fenchel-Rockafellar duality
Fenchel-Rockafellar duality [BC11, chs15,18-19]
Optimality and KKT conditions [BV04 ch5.5], perturbation and sensitivity [BV04 ch5.6]
perturbation and sensitivity [BV04 ch5.6], generalized inequalities (eg. SDP) [BV04 ch5.9]
Interlude on proximal [CW05][Van16] and accelerated (Nesterov) gradient methods [Nes04][Van16][BT10]; convergence rates
Fast/fancy algorithms for unconstrained optimization [NW05]: Nesterov, Newton, linear/non-linear conjugate gradient
Fast/fancy algorithms for unconstrained optimization [NW05]: quasi-Newton and L-BFGS
Fast/fancy algorithms for unconstrained optimization [NW05]: inexact/matrix-free Newon (Newton-CG), non-linear least-squares and Gauss-Newton, Levenberg-Marquardt, active-set methods
Constrained optimization [NW05]: penalty methods, augmented Lagrangian
derivatives; some thms from [BC11], [Ber99]
derivatives, automatic differentiation [NW05]
adjoint-state method
Newton's method, self-concordant [BV04 ch 9]
Newton's method, self-concordant [BV04 ch 9]
Newton for equality constraints [BV04 ch 10], interior-point methods [BV04 ch 11]
interior-point methods [BV04 ch 11]
First order methods: proximal pt, subgradient descent, conditional gradient aka Frank-Wolfe
First order methods: ADMM [Boyd10], Douglas-Rachford [BC17], [CP11]
Primal-dual method [Con13][CBS14][CCPV14][ChPo11]
Simplex method [NW05]
Complexity of LP and Simplex [Nem16], smoothed analysis, integer linear programming [Van08]; LP in fixed dimension [DMW97]
STUDENT PRESENTATIONS
STUDENT PRESENTATIONS
STUDENT PRESENTATIONS

References (books)

[BV04] S. Boyd and L. Vandenberghe, “Convex Optimization” (Cambridge U. Press, 2004). Free electronic version
[NW05] J. Nocedal and S. Wright, “Numerical Optimization” (Springer, 2005). We have free electronic access at SpringerLink
[Nem16] A. Nemirovski, Introduction to Linear Optimization, lecture notes
[Van08] R. Vanderbei, “Linear Programming.” (2008) Free download at Vanderbei's website and also via SpringerLink.
[BC11] H. Bauschke and P. Combettes, “Convex Analysis and Monotone Operator Theory in Hilbert Spaces”, 1st ed, Springer 2011, available electronically via SpringerLink
[BC17] H. Bauschke and P. Combettes, “Convex Analysis and Monotone Operator Theory in Hilbert Spaces”, 2nd ed, Springer 2017, available electronically via SpringerLink
[Ber99] D. Bertsekas, "Nonlinear Programming", 2nd ed, Athena Scientific, 1999.
[Ber16] D. Bertsekas, "Nonlinear Programming", 3rd ed, Athena Scientific, 2016.
[Nes04] Y. Nesterov, "Introductory Lectures on Convex Optimization" (2004); Springer
- A new version is out, "Lectures On Convex Optimization" Nesterov 2018, but it has rearranged the chapters a bit, so you need to adjust the numbering quite a bit

References (papers)

[Boyd10] “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers” by S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein (2010);
[CP11] P. L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing,” 2011;
[Con13] Laurent Condat “A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms”, 2011 (J. Optim. Theory and Appl. 2013).
[CBS14] "Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics", Volkan Cevher, Stephen Becker, Mark Schmidt, IEEE Signal Processing Magazine, vol. 31, no. 5, 2014;
[CCPV14] "A forward-backward view of some primal-dual optimization methods in image recovery", Patrick L. Combettes, Laurent Condat, Jean-Christophe Pesquet, Bang Cong Vu (2014)
[ChPo11] "A first-order primal-dual algorithm for convex problems with applications to imaging", A Chambolle, T Pock (2011);
[Luc10] "What Shape Is Your Conjugate? A Survey of Computational Convex Analysis and Its Applications", Yves Lucet, SIAM Review, 52(3) 2010
[CW05] "Signal recovery by proximal forward-backward splitting", PL Combettes, VR Wajs - Multiscale Modeling & Simulation, 2005;
[Van16] EE236C class, lecture on gradient methods, L. Vandenberghe, 2016;
[Ben95] A. Benjamin, Sensible rules for remembering duals. SIAM Review (37)1 1995.
[BT10] A. Beck and M. Teboulle, "Gradient-Based Algorithms with Applications in Signal Recovery Problems" (2010);
[DMW97] M. Dyer, N. Megiddo, E. Welzl, "Linear programming" (chapter in Handbook of Discrete and Comp. Geometry, 1997)
[Dual] "Please explain intuition behind dual problem" on math.stackexchange

Resources

We will not follow a single textbook. The following resources (mostly available online) will be the primary references:

[BV2004] S. Boyd and L. Vandenberghe, “Convex Optimization” (Cambridge U. Press, 2004). Free electronic version This is a standard text that gives some convex analysis background and focuses on interior-point methods and engineering applications. The books is very useful, and I recommend you buy a copy. We will follow the book closely at the beginning of the semester. The appendices A and C are also quite useful if you do not have an applied math background.
[V_course] L. Vandenberghe’s course notes for ee236b follow the book, while his course notes for ee236c contain more details on proofs and advanced topics.
[NW05] J. Nocedal and S. Wright, “Numerical Optimization” (Springer, 2005). We have free electronic access at CU via SpringerLink. This is the go-to reference for implementing a standard method.
[Beck17] Amir Beck, “First-Order Methods in Optimization” (SIAM, 2017). A more advanced text than Beck’s 2014 text; free on campus at SIAM eBooks
[BC17] H. Bauschke and P. Combettes, “Convex Analysis and Monotone Operator Theory in Hilbert Spaces”, 2nd ed, Springer 2017, available electronically via SpringerLink. A nice collection of results, best as a reference and not as a step-by-step textbook. Sometimes we refer to the 2011 1st edition.

Some supplemental resources:

Lijun Chen teaches a similar optimization course in the CS department, CSCI 5254. That course uses the Boyd and Vandenberghe book. See also a list of optimization courses at CU
Encyclopedia of Optimization, 2009 Edition edited by C. A. Floudas and P. M. Pardalos is very comprehensive and has many articles on detailed topics. There are many other books that have similar names ("handbook of optimization", etc) but most of those are somewhat random in content (usually compiled after a conference). In contrast, this encylopedia really is what you hope it might be.
Ben Recht's CS 726: Nonlinear Optimization I at Wisconsin 2012
Sebastian Bubeck's ORF523: The complexities of optimization at Princeton (blog style) and his monograph on the same topic, "Convex Optimization: Algorithms and Complexity" (pdf, published in Foundations and Trends in Machine Learning, Vol. 8: No. 3-4, pp 231-357, 2015); another link More advanced resources:
Ben Recht's Convex Optimization and Approximation: Optimization for Modern Data Analysis EECS 227C/STAT 260 Spring 2016 and his CS294 - The Mathematics of Information and Data
Arkadi Nemirovski has many lecture notes on various topics, including linear programming.
- His Lectures on Modern Convex Optimization with A. Ben-Tal may be the most relevant; the PDF notes are from 2013, and they have a 2001 book from SIAM of the same title
- For complexity and linear programming, see his Introduction to Linear Optimization from 2016
John Duchi's Introductory Lectures on Stochastic Optimization
Stephen Boyd’s course notes for EE364a, which follow his book.
Amir Beck, “Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB” (SIAM, 2014, $89). This covers classical results but by a modern researcher aware of current research
Stephen Wright, “Primal-Dual Interior-Point Methods” (SIAM 1997), available electronically from SIAM. Good summary of LP theory, and more practical/thorough IPM discussion than in [BV2004] (e.g., includes Mehotra’s predictor-corrector method), though restricted to LP of course.
R. Tyrrell Rockafellar, “Convex Analysis”, Princeton 1970 (reprinted 1997). Terry Rockafellar wrote the first, and definitive, book on the subject.
D. Bertsekas, A. Nedic, A.E. Ozdaglar, “Convex Analysis and Optimization” (Athena Scientific).
D. Bertsekas, “Convex Optimization Theory” (Athena Scientific). [Bertsekas has a lot of other books with useful information as well]
J. M. Borwein and A. S. Lewis, “Convex Analysis and Nonlinear Optimization” (Springer). A bit more mathematical
J.B. Hiriart-Urruty and C. Lemarechal, “Convex Analysis and Minimization Algorithms” (Springer). More on analysis, less on optimization. Technical and a useful reference.
D. Luenberger and Y. Ye, “Linear and Nonlinear Programming” (Springer). A bit more mathematical than [BV2004], slightly different topics.
R. Tyrrell Rockafellar and Roger J. B. Wets, "Variational Analysis" (Springer; via SpringerLink or Rockafellar's website), goes more into detail on the analysis portion; a good mathematical reference for researchers.

And a fun miscellaneous reference:

Harvey Greenberg (UC Denver, emeritus), Myths and counterexamples in mathematical programming. Entertaining and informative

Programming

Homeworks will involve by mathematical analysis and programming.

Students are expected to already know how to program. We encourage using Python and Matlab; Julia is another good choice though we will not be using it explicitly. For homework assignments, usually the deliverable is the outcome of some code, so therefore the student may choose any reasonable programming language. However, we will be doing demonstrations in Python and/or Matlab (and the instructor is best at debugging Python and Matlab). Most of our demonstrations will be using github in conjunction with python via colab.

For Matlab, we'll use CVX (though yalmip is also good); for Python, we'll use cvxpy; for Julia, try Convex.jl. These are all good prototyping packages.

Matlab and Python mainly rely on small third-party packages. Julia has more "official"-like packages. Most of them use the JuMP framework, including big packages like JuliaSmoothOptimizers / SolverTools.jl and JuliaNLSolvers / Optim.jl. One of the nice things about Julia is that it has very good automatic differentiation support (see a discussion of Julia AD packages). Other nice things about Julia: very active and friendly community that are eager to help if you post in the forums; code executes very rapidly (like C and Fortran); high-level programming like Python and Matlab; lots of packages, and increasing quickly (way behind Python overall, but catching up in terms of specialized numerics packages); easy to call Python libraries from within Julia.

We recommend use of an IDE or something like JupyterLab.

We do not recommend using C, C++ or Fortran for programming in this class; these languages take a lot longer to prototype in, and the performance can be fast but only if you do it right (and usually requires calling the right kind of libraries, like NAG or GLPK or using suites like PETSc, Trilinos, Eigen, etc.)

For automatic differentiation in Matlab, try ADiMat; in Python, try JAX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!