[docs] Highlight `update!` API more to attract DL researchers #2104

MilesCranmer · 2022-11-09T16:31:54Z

I think the update! API should be presented up-front in addition, or instead of, the Flux.train! API. This will help significantly with attracting deep learning researchers who I see as the bridge to wider adoption.

Motivation. I first encountered FluxML.jl maybe ~1.5 years ago. At the time, I skimmed the docs, saw this Flux.train! API on the current ~~README.md~~ quickstart page, and wrote off the entire package as being another one of those super high-level deep learning libraries - one where it's easy to write things in the high-level API but nearly impossible to tweak the internals. (Many others out there might do the same quick first impressions evaluation, even though a package maintainer's dream is that every user read all the docs.)

Today, I decided to take another look through the docs in more detail: I wanted to find something equivalent to what PyTorch and JAX deep learning frameworks have in that you can work directly on gradient updates and parameters. (This is important for many areas of deep learning research, as I am sure you know!)

I found the update! API (and withgradient) after a lot of digging through the docs. I am really happy with this API, as it gives me the low-level control over my deep learning models that I need for my research! So now I am actually planning to use FluxML for research.

Conclusion. It took me two passes at the docs, the second one very deep, before I actually found this API. Even after I found it, I only found the API reference for update!, rather than an easy-to-find example I could copy and start working with. This user experience is something that might lose potential users.

Proposal. Therefore, I propose that the update! API be demonstrated in the quick start example: both on the README, and up front in the documentation. I think this is really key to attract deep learning researchers as users, as the most popular deep learning packages by default expose this slightly lower-level API. It needs to be extremely obvious that one can do a similar thing with Flux.jl!

Here's an example I propose, which is similar to the style of PyTorch training loops (and so is a great way to convert some PyTorch users!):

using Flux
import Flux: withgradient, update!

# Chain of linear layers:
mlp = Chain(
    Dense(5 => 128), relu,
    Dense(128 => 128), relu,
    Dense(128 => 128), relu,
    Dense(128 => 128), relu,
    Dense(128 => 1),
)

# Set up the optimizer:
p = params(mlp)
opt = Adam(1e-3)
n_steps = 10_000

for i in 1:n_steps
    # Batch of example data:
    X = rand(5, 100) .* 10 .- 5
    y = cos.(X[[3], :] * 1.5) .- 0.2

    # Compute gradient of the following code
    # with respect to parameters:
    loss, grad = withgradient(p) do
        # Forward pass:
        y_pred = mlp(X)

        # Square error loss
        sum((y_pred .- y) .^ 2)
    end

    # Step:
    update!(opt, p, grad)

    # Logging:
    println(loss)
end

The text was updated successfully, but these errors were encountered:

MilesCranmer · 2022-11-09T16:46:21Z

My current take on the README example is: "Here is a bunch of complex things we can do with very little code," but this is:

Intimidating, as it uses as the entire range of Julia syntax tricks (even as a Julia developer, it is hard to parse everything going on!).
Overall just hard to follow the logical flow.
I have no idea how to adapt it to any projects I work on, since it uses the high-level API.

I think the quickstart example should be:

Very straightforward in terms of use of the API and syntax. It should be helpful to new users, not a code golf submission.
Act as a template for the user to copy and start modifying for their problem.

I think for these reasons, it would be really nice if the example was simple and used the update! syntax. With an example like this, I think it is much easier to go about modifying it to a wide range of problems.

ToucheSir · 2022-11-09T17:01:53Z

The example was added in #2067. I'm personally in favour of removing train! from the docs wherever possible, but since this was added so recently I think a bit more discussion is required.

MilesCranmer · 2022-11-09T17:38:10Z

I see, thanks! I would also change this example: https://fluxml.ai/Flux.jl/stable/models/quickstart/ to include update! there, and perhaps also avoid the dataloader. Passing a dedicated dataloader to a dedicated train! function that does some internal stuff makes me think it’s a rigid package. It would be nice if the example demonstrates seamless integration with regular Julia code to show that no, FluxML is actually super flexible in terms of how you train the model and pass data. (A dataloader is something that I would look up once I need it, but for the quickstart, I think it might give the wrong impression.)

Edit: actually, maybe it's okay to use a dataloader in the quickstart example, so long as the looping is explicit.

MilesCranmer · 2022-11-09T17:41:57Z

(I think I confused the quickstart and readme pages from when I first checked out this package… I do remember seeing a train! example and getting scared off though)

darsnack · 2022-11-09T19:59:05Z

FWIW, most of the current maintainers do not like train! or the current difficulty finding information in the docs. We discussed both topics in the most recent ML call, and we drafted #2105 as template for overhauling the docs. I think the changes here should get reflected in that template. (I am too busy this week to polish up the template, but I will try this weekend).

ToucheSir · 2022-11-09T20:18:37Z

Worth adding here that we have this ML call every other week and it's open to anyone, so if you're interested in talking about docs work or anything else feel free to drop in :)

MilesCranmer · 2022-11-09T20:53:33Z

Awesome, thanks for sharing this update! I think that is an awesome initiative and would be well-appreciated by the community!

mcabbott · 2022-11-10T05:09:32Z

Welcome, and glad you persisted!

I made these examples recently. The goals I suppose were:

Have something you can copy & run off the readme, previously there was nothing. It must solve some vaguely neural net problem. It must be short, just a taste, not try to grow into yet another competing intro path. (I guess a few non-Flux lines are a bit golfed, but they do produce straighforward output when pasted into the repl, I think?)
Have a one-page quickstart which shows you major features in the docs, aimed at people who have seen some of this elsewhere. The main docs start very slowly introducing concepts one-by-one, which requires a lot of reading to just get to what the different things are called. And the model zoo tends to have a lot of auxillary stuff, reading args & loading data & so on, outside of core Flux.

Both can surely be better. Want to have a go tweaking the quickstart example to avoid train!?

I would vote to keep it with implicit params etc for now. Partly so that the to-be-written "how to upgrade from implicit to explicit" guide can clearly point to the before & after versions.

I would also vote for it to generate data outside the loop, as this is a bit more realistic. Demonstrating that DataLoader is something which takes & gives matrices also seemed like a good idea.

I think it's important that it not just push random numbers through, but solve some problem, however simple. (When I run the loop above, the loss doesn't decline, and there's nothing I can plot afterwards.)

MilesCranmer · 2022-11-10T13:46:23Z

Here's a tweaked README example. As a longtime PyTorch and JAX user, the following syntax feels very intuitive for me, I feel like I could understand it while being new to Julia. It's both not intimidating, and would make it easier for me to start tweaking various steps and modifying it to my own use case:

using Flux

# We wish to learn this function:
f(x) = cos(x[1] * 5) - 0.2 * x[2]

# Generate dataset:
n = 10000
X = rand(2, n)  # In Julia, the batch axis is last!
Y = [f(X[:, i]) for i=1:n]
Y = reshape(Y, 1, n)

# Move to GPU
X = gpu(X)
Y = gpu(Y)

# Create dataloader
loader = Flux.DataLoader((X, Y), batchsize=64, shuffle=true)

# Create a simple fully-connected network (multi-layer perceptron):
n_in = 2
n_out = 1
model = Chain(
    Dense(n_in, 32), relu,
    Dense(32, 32), relu,
    Dense(32, 32), relu,
    Dense(32, n_out)
)
model = gpu(model)

# Create our optimizer:
optim = Adam(1e-3)
p = Flux.params(model)

# Let's train for 10 epochs:
for i in 1:10
    losses = []
    for (x, y) in loader
    
        # Compute gradient of the following code
        # with respect to parameters:
        loss, grad = Flux.withgradient(p) do
            # Forward pass:
            y_pred = model(x)
    
            # Square error loss
            sum((y_pred .- y) .^ 2)
        end
    
        # Step with this gradient:
        Flux.update!(optim, p, grad)

        # Logging:
        push!(losses, loss)
    end
    println(sum(losses)/length(losses))
end

And we can visualize our predictions below:

using Plots

# Generate test dataset:
Xtest = rand(2, 100)
Ytest = mapslices(f, Xtest; dims=1)  # Alternative syntax to apply the function `f`

# View the predictions:
Ypredicted = model(Xtest)
scatter(Ytest[1, :], Ypredicted[1, :], xlabel="true", ylabel="predicted")

MilesCranmer · 2022-11-10T13:55:38Z

PR in #2108

mcabbott · 2022-11-10T14:07:20Z

I think this loop is exactly what we want in the quickstart:

for i in 1:10
    losses = []
    for (x, y) in loader

But I do not think the readme example should be as long as the quickstart one. We already have a problem with there being too many entry points, and I would like anyone reading a 30 lines to already be on a page of the docs (not the website tutorials, and not the readme).

More later.

MilesCranmer · 2022-11-10T14:09:02Z

I think generally it is good to keep the quickstart like a mini-tutorial while still being general enough so users can think about how to modify to their use-cases. So, in retrospect, I changed my mind and now agree with you that the dataloader is good to include!

I think many ML practitioners have very short attention spans - people will literally copy the quickstart example, try to hack it for their use-case using only trial-and-error, and never once read the docs, and quit if they can't figure it out. But once you "hook" them, and they can get something working for their use-case, then they will be much more likely to search around the docs pages to do something specific.

MilesCranmer · 2022-11-10T14:13:03Z

But I do not think the readme example should be as long as the quickstart one.

I think the first code example a user sees is the one they will assume to be the quickstart.

So perhaps if the goal is to move them to the docs pages quickly, then I would just remove the code example from the README altogether. (When I was trying Flux.jl yesterday, the README example acted as my quickstart tutorial - I didn't even look at the quickstart page at first).

ToucheSir added the documentation label Nov 9, 2022

ToucheSir closed this as completed Nov 9, 2022

ToucheSir reopened this Nov 9, 2022

ToucheSir added the discussion label Nov 9, 2022

MilesCranmer linked a pull request Nov 10, 2022 that will close this issue

Improve README example #2108

Open

mcabbott mentioned this issue Nov 11, 2022

Remove train! from quickstart example #2110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Highlight `update!` API more to attract DL researchers #2104

[docs] Highlight `update!` API more to attract DL researchers #2104

MilesCranmer commented Nov 9, 2022 •

edited

Loading

MilesCranmer commented Nov 9, 2022

ToucheSir commented Nov 9, 2022

MilesCranmer commented Nov 9, 2022 •

edited

Loading

MilesCranmer commented Nov 9, 2022

darsnack commented Nov 9, 2022

ToucheSir commented Nov 9, 2022 •

edited

Loading

MilesCranmer commented Nov 9, 2022

mcabbott commented Nov 10, 2022 •

edited

Loading

MilesCranmer commented Nov 10, 2022 •

edited

Loading

MilesCranmer commented Nov 10, 2022

mcabbott commented Nov 10, 2022 •

edited

Loading

MilesCranmer commented Nov 10, 2022

MilesCranmer commented Nov 10, 2022

[docs] Highlight update! API more to attract DL researchers #2104

[docs] Highlight update! API more to attract DL researchers #2104

Comments

MilesCranmer commented Nov 9, 2022 • edited Loading

MilesCranmer commented Nov 9, 2022

ToucheSir commented Nov 9, 2022

MilesCranmer commented Nov 9, 2022 • edited Loading

MilesCranmer commented Nov 9, 2022

darsnack commented Nov 9, 2022

ToucheSir commented Nov 9, 2022 • edited Loading

MilesCranmer commented Nov 9, 2022

mcabbott commented Nov 10, 2022 • edited Loading

MilesCranmer commented Nov 10, 2022 • edited Loading

MilesCranmer commented Nov 10, 2022

mcabbott commented Nov 10, 2022 • edited Loading

MilesCranmer commented Nov 10, 2022

MilesCranmer commented Nov 10, 2022

[docs] Highlight `update!` API more to attract DL researchers #2104

[docs] Highlight `update!` API more to attract DL researchers #2104

MilesCranmer commented Nov 9, 2022 •

edited

Loading

MilesCranmer commented Nov 9, 2022 •

edited

Loading

ToucheSir commented Nov 9, 2022 •

edited

Loading

mcabbott commented Nov 10, 2022 •

edited

Loading

MilesCranmer commented Nov 10, 2022 •

edited

Loading

mcabbott commented Nov 10, 2022 •

edited

Loading