Skip to content

Migration from Pylons to Flask

Cody Boyko edited this page Nov 30, 2018 · 62 revisions

Note: this page is a work in progress, it will get updated as the migration progresses

Please provide any feedback on this issue: https://github.com/ckan/ckan/issues/3119


Contents

Changelog

  • March 2016

    • First version, Rationale, FAQ
  • June 2016

  • Updated the whole document
  • Added Implementation notes
  • Added controllers and extensions lists
  • September 2016
  • Updated implementation sections
  • Added PR overview
  • Updated extensions list
  • Added link to Roadmap page

Rationale

CKAN development started around seven years ago. At that time Pylons was one of the most advanced web frameworks out there so it was chosen as the basis for CKAN and its ecosystem of extensions. As time has gone by though, Pylons has lost user base. In fact Pylons has been deprecated in favour of a new framework called Pyramid. For a number of historical reasons, the version of Pylons that CKAN uses (0.9.7) is not the last supported one (1.0).

All this means that CKAN is stuck with a framework that does not have traction among the Python community and with old versions of libraries which may contain bugs or vulnerabilities. We are now at a point where this is actually preventing CKAN evolution and adds a significant amount of technical debt.

The alternative of choice from the CKAN tech team is Flask, a modern, light-weighted web framework with a wide user base and a rich ecosystem of plugins. Migrating to Flask will not also bring the CKAN code base more up to speed with modern Python libraries but also make contributions from external developers much more likely.

FAQ

Why Flask?

As big as CKAN's code base is, the actual features needed from an external web framework are not many. A significant amount of code is custom and specific to CKAN needs. This fits well with Flask's light-weight / microframework approach and we can only use the low-level features we need for handling web requests like middleware, routing, etc. It is also more likely that the migration will be able to be done in smaller steps without breaking existing functionality.

The main reason though is that Flask is widely known and used across the Python community, has a low barrier to entry and has a rich ecosystem of extensions, which will hopefully encourage more contributions.

Why not Pyramid?

Even though migrating to Pyramid from Pylons seems like a natural move, as mentioned before we are actually not using a great deal of Pylons features and we felt that Pyramid didn't have the same level of usage and contributions.

Why not Django?

Django is massively popular but the task of migrating the existing code base and extension to Django concepts and tools would have been a much larger undertaking.

What does this mean for my CKAN instance and/or extension?

It is early stages, but we aim to keep as much backwards compatibility as possible, certainly in terms of the API both externally and internally. In terms of extensions, there's likely to be some work involved in upgrading, specially if you are registering custom routes and controllers (see Extensions and Plugin interfaces). We will provide comprehensive guides for extension developers to help them upgrade or support different CKAN versions. Definitely don't hold back writing an extension right now.

Approach and roadmap

Update: September 2016 Status and Roadmap

Pylons is too embedded in the CKAN source code to attempt a complete rewrite in one go. Maintaining compatibility with existing sites and extensions also needs to be taken into account. Instead, the suggested approach is to start a step by step transition to a Flask only application using both libraries side by side and focusing on the more important or manageable tasks first. Crucially, a first stage of auditing all uses of Pylons in the CKAN code base would be performed to identify potential issues and critical points.

The process that we will follow is roughly the following one:

  1. [DONE] Audit of Pylons use in the CKAN code base, including (See https://github.com/ckan/ckan/wiki/Pylons-imports):
    • code occurrences
    • potential replacements or refactorings needed
    • plugins toolkit objects that need to be migrated
    • plugin interfaces that may be affected by the change
  2. [DONE] Proofs of concept for low level "foundation" work:
    • 2a: Create Flask middleware with Flask-specific logic that can run side by side with the Pylons one (See #2845 and https://github.com/ckan/ckan/tree/2845-wsgi-dispatcher)
    • 2b: Make sure that both apps can access common share objects and functions like config, c, etc. for instance by providing shims or wrappers (See eg #3163)
    • 2c: Add deprecation messages that warn against the direct import of Pylons objects from extensions. Remove direct imports from Pylons the CKAN code
  3. Move selected controllers to use the Flask stack
    • 3a: [DONE] Start with the API controller
    • 3b: Reference frontend controller, that specifically deals with template rendering and helpers.
    • 3c: Start moving other controllers
  4. ...
  5. (Future) Remove Pylons requirement entirely, drop support for Pylons based controllers, etc

Implementation / Things to consider

Overview

A global overview of the branches / pull requests with the actual implementations can be found here: https://github.com/ckan/ckan/issues/3196

Feature Status
Redirects Merged #3194
Debug and testing settings for Flask Merged #3206
Common request object Merged #3197
Common global object (g/c) Merged #3203
Blueprints registration Merged #3207
Common session object Merged #3208
Common logic before and after requests Merged #3212
I18n Merged #3213
url_for Merged #3228
API Blueprint Merged #3239

App Dispatcher Middleware

Although a complete rewrite would be cleaner, we decided to go with a more flexible approach, where both the Pylons app and the Flask app live side by side, each with each own middleware stack, sharing as much common middleware as possible, and each with their own controllers/views, which handle the incoming requests. The idea is that any other lower level code, like templates, model etc are shared among both as much as possible.

To allow this, there is a new top level middleware named AskAppDispatcherMiddleware, implemented using WSGIParty. This wraps the two CKAN apps, the Flask one and the Pylons one. On each request it asks both applications if they can handle it, and the apps answer yes or no and whether core or an extension can handle it. The request is then forwarded to the relevant app following this order of precedence:

Flask Extension > Pylons Extension > Flask Core > Pylons Core

App dispatcher

See the How it Works section on #2905 for more details on how it actually works.

Implementation

See #2905

  • Write top level middleware to dispatch requests to the relevant app (Master)
  • Identify the Flask routes registered from extensions as such (see Routing)

App Factories

The main entry point for the CKAN app is ckan.config.middleware.make_app. This will call the app factories for Flask and Pylons (make_flask_stack and make_pylons_stack respectively) and pass their outputs to the a new instance of AskAppDispatcherMiddleware.

On these app factory functions is where we apply all relevant configurations and middlewares to each app.

On the main POC branch we have refactored the middleware.py file and separated it on different modules as it was becoming too big:

ckan/ckan/config/middleware
├── __init__.py             # Main entry point (make_app) and AskAppDispatcherMiddleware
├── common_middleware.py    # Middleware classes used on both stacks (eg I18nMiddleware)
├── flask_app.py            # Flask app factory and Flask specific stuff
└── pylons_app.py           # Pylons app factory and Pylons specific stuff

Implementation

See #3116

  • Refactor middleware.py into separate modules (POC branch)
  • Check that the repoze.who middleware is working fine (setting REMOTE_USER on the environ)
  • Check what middleware needs to be applied to the Flask stack or be deprecated: TrackingMiddleware, Fanstatic, Beaker's CacheMiddleware, PageCacheMiddleware
  • Check ErrorHandler and StatusCodeRedirect (these seem Pylons specific so we probably don't need them)
  • Check IMiddleware. Does it work as is on Flask or do we need to change the interface?
  • See if the ckan.use_pylons_response_cleanup_middleware logic is relevant to Flask

Flask Views, Blueprints and Routing

On Pylons, incoming requests are routed to a specific action of a controller class. These classes live in ckan/controllers, and there is a centralized registry of routes in ckan/config/routing.py. The traditional Flask approach is to have individual view functions with the routing defined as a decorator of these functions or via app.add_url_rule(). There are alternatives to structure views as classes, Flask's own pluggable views or third-party extensions like Flask-Classy. We decided not to use these options on a first stage, instead focussing on the simplest approach and refactoring later if necessary. The new views (blueprints, see below) will live at ckan/views.

The general approach for migrating controllers will be that each controller will be a Flask Blueprint. It makes sense as controllers have their own endpoint (/dataset, /user, etc) and are quite big in themselves (essentially mini-apps).

Right now the API controller have been partially migrated to a blueprint, and it follows this basic structure:

# Imports and config vars
...

# Blueprint definition

api = Blueprint('api', __name__, url_prefix='/api')

# Private methods

def _finish():
   ...

# View functions

def action(logic_function, ver=API_DEFAULT_VERSION):
    ...

def get_api(ver=1):
    ...

# Routing

api.add_url_rule('/', view_func=get_api, strict_slashes=False)
api.add_url_rule('/action/<logic_function>', methods=['GET', 'POST'],
                 view_func=action)
api.add_url_rule('/<int(min=3, max={0}):ver>/action/<logic_function>'.format(
                 API_MAX_VERSION),
                 methods=['GET', 'POST'],
                 view_func=action)

Current Pylons controllers share common logic that is executed before and after each request. This is done on the __before__, __call__ and __after__ methods of the ckan.lib.base.BaseController class, from which all controllers are extended. This includes identifying the user, handling i18n, setting CORS headers, etc. All this logic will be moved to ckan/views/__init__.py and reused by the Pylons BaseController and Flask's before_request and after_request handlers.

Blueprints are currently manually registered in make_flask_stack, this will be automated later on. Extensions can register their own blueprints by implementing a new IBlueprint interface. This essentially replaces the most common case of plugins implementing IRoutes and having a custom controller extending toolkit.BaseController. In a nutshell, this:

import ckan.plugins as p


class MyPlugin(p.SingletonPlugin):


    p.implements(p.IRoutes, inherit=True)

    def before_map(self, _map):
        # New route to custom action
        m.connect(
            '/foo',
            controller='ckanext.my_ext.plugin:MyController',
            action='custom_action')

        # Overriding a core route
        m.connect(
            '/group',
            controller='ckanext.my_ext.plugin:MyController',
            action='custom_group_index')
        return m


class MyController(p.toolkit.BaseController):

    def custom_action(self):
        # ...

    def custom_group_index(self):
        # ...

will become this:

import ckan.plugins as p


def custom_action():
    # ...

def custom_group_index():
    # ...


class MyPlugin(p.SingletonPlugin):


    p.implements(p.IBlueprint)

    def get_blueprint(self):
        blueprint = Blueprint('foo', self.__module__)
        rules = [
            ('/foo', 'custom_action', custom_action),
            ('/group', 'group_index', custom_group_index),
        ]
        for rule in rules:
            blueprint.add_url_rule(*rule)

        return blueprint

or this (whatever way the extension developer prefers, as long as get_blueprint returns a Blueprint object):

from flask import Blueprint
import ckan.plugins as p

foo = Blueprint('foo', __name__)

@foo.route('/foo', endpoint='custom_action')
def custom_action():
    # ...

@foo.route('/group', endpoint='group_index')
def custom_group_index():
    # ...


class MyPlugin(p.SingletonPlugin):


    p.implements(p.IBlueprint)

    def get_blueprint(self):

        return foo

Note that extensions no longer need to inherit from BaseController as the core before_request and after_request handlers work on all requests regardless of where they were registered. Extensions can also add their own request handlers on top of that.

The order of precedence enforced for routes defined in extensions is the following:

Flask Extension > Pylons Extension > Flask Core > Pylons Core

Of course extensions that want to support both approaches (ie CKAN versions pre and post Flask) can check the CKAN version and use one interface or the other as needed.

Implementation

See #3239 and #3207

  • General structure for Flask views and example API one (not fully migrated)
  • Settle on an approach for defining routes on core (.add_url_rule() vs decorator)
  • IBlueprint interface to allow registration from extensions
  • Identify the Flask routes registered from extensions as such
  • Abstract logic in BaseController and use it on the before_request and after_request handlers (POC branch)
  • Automatic registration of blueprints in ckan/views
  • Set strict_slashes=False globally (Flask defaults to redirect /foo to /foo/ and the current CKAN behaviour is the opposite)

URL Generation

While the transition to Flask is ongoing we will have routes and controllers which are served by Pylons and routes and views which are served by Flask. We need to ensure that URLs can be generated in the same way regardless of the context of the request. Although similar, Pylons (or Routes) url_for and Flask don't accept the same parameters. There are different syntaxes supported on Pylons but basically it works like this:

    # Pylons
    url = url_for(controller='api', action='action', ver=3, qualified=True)

    # Flask
    url = url_for('api.action', ver=3, _external=True)

To complicate things a bit more, Flask insists on being under an Application Context when generating URLs. The approach we've followed in our POC branches is the following:

  1. All imports for url_for point to our wrapper on ckan/lib/helpers.py.

  2. This url_for in helpers has been modified to support both Flask and Pylons parameters. Regardless of what parameters are passed, first we try the Flask url generation, and if it doesn't work we fall back to the Pylons one.

  3. In order for this to work in the context of both applications, some changes are made at the top of the middleware stack just before dispatching the request to the relevant app. Essentially, if Flask is serving the request we create a routes_request_config object and attach the Routes mapper to it, and if it is a Pylons request, we wrap it with flask_app.test_request_context(environ_overrides=environ). This allows both routers to work even in the context of a request being served by the other app.

These changes allow most of the existing code in core and extensions to work unchanged, except for the tests.

Test are going to need updating in order to use the combined url_for function. Essentially as we are outside the context of a web request, whenever we call url_for directly in the tests (or a function that triggers url_for later on) Flask will raise an exception:

RuntimeError: Attempted to generate a URL without the application context being pushed. This has to be executed when application context is available.

These calls need to be wrapped in the test_request_context context manager:

    def test_atom_feed_page_negative_gives_error(self):
        group = factories.Group()

        app = self._get_test_app()
        with app.flask_app.test_request_context():
            offset = url_for(controller='feed', action='group',
                             id=group['name']) + '?page=-2'
        res = app.get(offset, status=400)
        assert '"page" parameter must be a positive integer' in res, res

The example above is quite straightforward and is not difficult to see that the code wrapped in the context manager might need a request. In some cases though, it's a bit more obscure:

    def test_create_datastore_only_view(self):
        # ...
        # datastore_create will call internally (or trigger something that
        # calls it) so we need a Flask test context
        with self.app.flask_app.test_request_context():
            result = helpers.call_action('datastore_create', **data)

    def test_as_dict(self):
        # Internally as_dict calls url_for so we need a test context
        app = helpers._get_test_app()
        with app.flask_app.test_request_context():

            pkg = model.Package.by_name(self.name)
            out = pkg.as_dict()

Both this cases have nothing to do with url generation or even functional tests, but internally they call url_for at some point. This might be confusing at first but given that the Flask exception is quite recognisable and assuming we document how to get around it I think it's a fair compromise.

We decided not to write a clever wrapper / hack that allowed us to call url_for directly on the tests because first all it wasn't easy to cover all cases in our tests (you need to have a reference to the app being used on the test and you don't want to mix requests contexts), but also because we think it's worth moving towards the tests being closer to the Flask conventions.

Implementation

See #3238

  • Support for Flask routes and for both syntaxes on our own url_for (POC branch)
  • Changes in middleware to allow the app not being used to generate URLs (POC branch)
  • Update existing core tests to use the Flask test request context when necessary (POC branch)
  • Check Pylons named routes. Can we support them in the Flask router? Should we?
  • Deprecation message if you are calling a url_for for a Flask route using Pylons style params
  • Provide guidance on how to update existing tests in extensions

Common objects and functions

Although requests will be routed to either a Flask view or a Pylons controller there are still many other parts of the code that will be executed (actions, auth, helpers, etc). These currently use Pylons specific objects like config, c, etc. but they need to work also on Flask, ideally without having to rewrite too much code.

Each of these objects or functions have its own peculiarities (see sections below), but in general the pattern followed has been:

  • Centralize all core imports for the object or function to ckan/common.py, and for extensions to the plugins toolkit (which points to ckan.common)
  • Provide a shim that forwards to the relevant Pylons or Flask object depending on which one is handling the request. To implement these we will use Werkzeug's Context Locals, which provide thread-local safe implementations.

c object

The Pylons c (or tmpl_context) object was originally meant to pass variables to templates, although it is used all over the place in CKAN, for instance to store the current logged in user (c.user for user name and c.userobj for the model object).

The c object should not be used at all in new code, unless it is needed to support backwards compatibility. If variables need to passed to templates that should be done explicitly via extra vars (see Template rendering, and if a variable needs to be made accessible globally, the flask.g object should be used.

The c shim provided in ckan.common forwards to Flask's g or Pylons c depending on the handling app.

Implementation

See #3203

  • Provide a shim for c in ckan.common
  • Make sure that c is still available (as an alias of g) to templates rendered in Flask

g object

Pylons has another global object, g (or app_globals), which is the one that conceptually is more close to Flask's g (as the name suggests :) ). In CKAN, this is used essentially to make configuration options available to the templates, and to some other places. The latter need to be replaced with accessess to the config.

Currently the g object is set up on ckan.lib.app_globals and linked to Pylons in ckan.config.environment. We won't be removing or refactoring this logic, but in order to support current usages of g in templates in Flask we use a custom app_ctx_globals_class class in Flask that falls back to CKAN's app_globals if a property was not found on flask.g.

Implementation

See #3203

  • Custom app context global class in Flask that falls back to app_globals
  • Replace g usages outside templates with config

Request object

The request object is very often used in views and controllers. In most cases these are just used in the view or controller, meaning that we can safely use them in the context of the relevant app, but in some cases the request object is called from some common code, like the template render function. Both Flask and Pylons have very similar properties (eg request.environ, which is a dict in both cases) but in some others the param names change (eg params vs args). For some of these is worth adding a special case property in the shim object on ckan.common.

Implementation

See #3197

  • Provide a shim for request in ckan.common
  • Provide support for request.params in Flask

Response object

Contrary to Pylons, Flask does not have a global response object. If a view needs to modify the default response, eg by adding headers we need to create a new instance of a response object. If pylons.response is being used in other parts of the code where it doesn't make sense to return a response code (eg in ckan.lib.base.render) we will need to refactor it to have a different behaviour depending on whether Flask or Pylons is serving the request.

Sessions

Sessions for the Pylons and Flask apps are both managed by Beaker. We have a proxy session object in common.py to either use pylons.session or flask.session, depending on the context of the request. But, either way, the session cookie is the same and the same session is retrieved by both apps. Currently, the session is only used in CKAN core for flash messaging support. The POC branch provides some tests to show that messages can be added in a Pylons action, and retrieved by a Flask view (and visa versa).

Implementation

See #3208

  • Proxy object for session in ckan.common

Config

Right now the config object is imported directly from Pylons almost everywhere, both in CKAN core and extensions (specially since for some mysterious reason config is not part of the toolkit). The first step should be then to provide a wrapper on ckan.common for it. Both Pylons and Flask configuration objects are dict-like objects with a few custom methods, and they both add internal keys to the configuration object, eg things like pylons.config['pylons.app_globals'] in Pylons or app.config['PROPAGATE_EXCEPTIONS'] in Flash.

A simple shim like the ones mentioned before should work well, but we need to consider how to initialize it on the Flask side. Right now for Pylons this is done on environment.py based on the values from the ini file parsed by Paster. We need to add those to the flask.config object somehow as well. We can reuse the values parsed by Paster on a first stage but moving forward it might be worth having a Flask-only way to initialize the configuration from the ini file (perhaps using this or our own parser).

Another thing to take into account is that config keys are upper case in Flask. It doesn't seem worth to change the current convention we use on CKAN config keys but some keys like DEBUG we want to link to the key currently used in our ini files (debug)

Implementation

See #3163

  • Populate flask.config with the values on the ini file
  • Provide a shim for config in ckan.common
  • Update imports to point to ckan.common.config
  • Add config to the plugins toolkit, and encourage its use
  • Check that IConfigure and IConfigurable work as expected

Internationalization (i18n)

Translations in Flask are handled by Flask-Babel. The initial evaluation found some potential complications in its usage but luckily for us @TkTech has recently become one of its maintainers and has dealt with most of the issues. The only remaining one that prevents us from us from using the official release has to do with supporting custom text domains (python-babel/flask-babel#91).

On the CKAN side, there are a couple of things to check but no major blockers expected, as the basic functionality (ie returning translated strings) has been proved to work fine.

Implementation

See #3213

  • Initial integration of Flask-Babel on our Flask stack
  • Provide a shim for _, ngettext, etc in ckan.common
  • Support for the different domains (ie ckan) in Flask-Babel
  • Support for uggetext and ungettext (not present in Flask-babel)
  • Check logic in ckan.lib.i18n executed on each request (i18n.handle_request(request, c)) to see if it is necessary in Flask
  • Check that ITranslations work as expected

Template rendering

Controllers

Name Status Comments
admin ✅ Merged (#3775)
api ✅ Merged (#3229)
error ✅ Merged (#4257) Check if this is needed in Flask. If it is it will require significant refactoring as it's very Pylons specific
feed ✅ Merged (#3870)
group ✅ Merged (#4143)
home ✅ Merged (#3891)
organization ✅ Merged (#4143)
package ✅ Merged (#4062)
revision
storage ✅ Merged (#4282 Controller for pre CKAN 2.2 FileStore, should be easy to migrate but we might want to consider deprecating it. Removed.
tag ✅ Merged (#4362) Removed
template Catch-all controllers that renders any template in the templates folder matching the URL. Probably deprecate or limit.
user ✅ Merged (#3927)
util

Extensions and Plugin interfaces

This table lists the current CKAN plugin interfaces and how they might be affected by the migration to Flask. Right now is a preliminary audit based on the work we've been doing so far, specially on #2971. This table will be updated as things become clearer.

Also note that while a particular interface might not be directly affected by the move to Flask it might still be indirectly related, for instance if the hook methods interact with common objects like c, render templates, etc.

Interface Backwards compatible? Changes in core needed? Changes in extensions needed? Comments
IRoutes No Yes Yes The current before_map and after_map methods get a Routes mapper and extensions are expected to interact directly with it (eg with map.connect('...')). It'd be very difficult to support both this and whatever way we settle on for defining Flask routing. The new way of defining custom routes is via IBlueprint`.
IMapper Yes No No Low level DB stuff
ISession Yes No No Low level DB stuff
IMiddleware Yes Yes Yes
IAuthFunctions Yes No No Logic layer
IDomainObjectModification Yes No No Low level DB stuff
IGroupController Yes No No
IOrganizationController Yes No No
IPackageController Yes No No Logic layer
IPluginObserver Yes No No
IConfigurable Yes Yes No The extension point works as always, but extensions should always use toolkit.config
IConfigurer Yes Maybe No The extension point works as always, but extensions should always use toolkit.config
IActions Yes No No Logic layer
IResourceUrlChange Yes No No Low level dictization + DB stuff
IDatasetForm Probably yes Indirectly Maybe Depends on core changes to support template rendering. Also setup_template_variables docs encourage use of c
IValidators Yes No No Logic layer
IResourceView / IResourcePreview Probably yes Indirectly No Depends on core changes to support template rendering
IResourceController Yes No No Logic layer
IGroupForm Probably yes Indirectly Maybe Depends on core changes to support template rendering. Also setup_template_variables docs encourage use of c
ITagController Yes No No
ITemplateHelpers Yes Yes No We might need to change the way helpers are registered on Flask, but the interface should be the same
IFacets Yes No No
IAuthenticator Hopefully yes ? ? Really dependent on c (it sets c.user and c.userobj). It also depends on the repoze.who auth working fine in the Flask context
ITranslation Probably yes Yes No (?) Need to adapt the extension hooks to Flask-babel
IUploader Probably no Yes Probably yes We will probably need to abstract lib/uploader.py to make it not rely on how Pylons handles files on the web requests

Pain points

(This was moved here for reference from a page started by @rossjones on September 2015)

This page is currently just a list of pain points on trying to run flask side-by-side with Pylons.

Flask 'g'

Cannot override the flask g with the pylons g (which is app_global_context)

So for trying to load /ckan-admin/ we find that the base.html extends the CKAN page.html - where it uses g.main_css for loading resources. Flask doesn't seem to let you override the g that is available to templates ..

    from ckan.lib.app_globals import app_globals
    app.jinja_env.globals.update(g=app_globals)

The end result of this may be either:

1. Managing two copies of page.html (euw)
2. Duplicating app_globals to add them to the Flask g.

Routes

When trying to use h.url, or h.url_for we look up the route in flask using flask's routing system, but it may fail if the route is defined in Pylons. If you then try and make the call to the Pylons routing, it will complain about Pylons not being setup - this is because we're in Flask :(

 File "/vagrant/ckan/ckan/lib/helpers.py", line 139, in url
    my_url = _pylons_default_url(*args, **kw)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/registry.py", line 155, in __call__
    return self._current_obj()(*args, **kw)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/registry.py", line 197, in _current_obj
    'thread' % self.____name__)
TypeError: No object (name: url) has been registered for this thread

Helpers

The template helper functions use a lot of pylons and webhelpers functions, which means realistically to keep things running side-by-side there is a need to maintain a flask_helpers module which is a function-signature-level duplicate of helpers.

The alternative is to fork the page.html for flask/pylons so that it will only call specific URLs - this won't help with looking up Pylons URLs.

Clone this wiki locally