feat: Support for SQLite backend #130

pranavmodx · 2024-05-31T02:33:16Z

Fixes #129

✅ Feature parity with existing backends
✅ Roughly the same amount of test coverage as existing backends

All of the major features are working with this new SQLite backend. Almost all of the tests in other backends have been extended for this one. More can be added as required to capture missed or specific cases. Currently tests create test.db within backends/sqlite. Not sure if that is ideal. I am open to hearing your thoughts.

For listening to jobs, Go channels are used as we don't have something similar to pg_notify for SQLite.
I added a new in progress job status which was useful for our use case but up to you to decide whether to keep it.

…ob; multiple tests

acaloiaro

This is looking solid! Thanks for the contribution.

I've commented directly on code lines where I have question and change requests.

Here are a few more thoughts and questions that don't direclty relate to code.

Can you help me understand how mutual exclusion of jobs is achieved? For Postgres it is achieved with FOR UPDATE SKIP LOCKED at the query level.
Is mutual exclusion affected by: -DSQLITE_THREADSAFE=0
Is mutual exclusion affected by WAL mode?: https://www.sqlite.org/wal.html
Would it make sense to get test coverage over sqlite in these alternative modes? Or is that being too paranoid?

Tests and lints

There are some tests (data races) and lints failing. Feel free to annotate some of the lints to be ignored if you don't think they make sense or seem to onerous.

You can lint and test locally

make lint
go test backends/sqlite/sqlite_backend_test.go -tags testing -race

acaloiaro · 2024-06-02T15:29:12Z

backends/sqlite/migrations/1_create_db_tables.up.sql

+);
+
+CREATE TABLE neoq_dead_jobs (
+	id integer primary key not null,


Looks like your editor/autoformatter adds an extreanous indent at the start of these CREATE TABLE statements (same for the neoq_jobs table).

acaloiaro · 2024-06-02T15:52:31Z

backends/sqlite/sqlite_backend.go

+	handlers          map[string]handler.Handler // a map of queue names to queue handlers
+	queueListenerChan map[string]chan string     // each queue has a listener channel to process enqueued jobs
+	logger            logging.Logger             // backend-wide logger
+	dbMutex           *sync.RWMutex              // protects concurrent access to sqlite db on SqliteBackend


I think this may be how we're achieving job mutual exclusion. Though I do wonder if we should consider using SQLlite's internal mutual exclusion instead and enforce THREADSAFE=1...is that possible?

I was not aware of this mode. I can try experimenting with this and see if it works well.

Hi @acaloiaro, I tried couple of things and here's my understanding:

We might not need a custom job mutual exclusion (with sync.RWMutex). Apparently, SQLite by default works in serialized mode aka -DSQLITE_THREADSAFE=1, meaning it handles it internally. It could handle around 3k concurrent writes (neoq.Enqueue) before getting a database is locked error. I think this is why I went ahead with the custom handling earlier.
But alternatively (and perhaps a better solution), go-sqlite3 driver FAQ recommends using a ?cache=shared mode with db.SetMaxOpenConns(1) (a related thread). It seems to handle as high as 10k concurrent writes and probably more.
I couldn't observe possible implications from this as long as the transactions are very small - which brings me to the question - do we need to wrap entire handleJob code in a transaction (like how it's done in postgres_backend.go), or do one small transaction before handler.Exec (for deadline check) and one after (marking as processed)? Is there a reason why the former is done? Because otherwise, the latter makes more sense to me as both it follows the general thumb rule and also does not slow down writes. The enqueues get stuck otherwise beyond a point as one of the handleJob will obtain exclusive access to db. Let me know your thoughts.

Mutual exclusion would most likely be affected by -DSQLITE_THREADSAFE=0. I didn't try this - it requires manually building SQLite source code and linking it to go-sqlite3 driver. Is this something necessary to consider?

WAL Mode works well too without the need for custom handling. According the documentation, it helps in achieving higher concurrency. But it introduces additional files -shm and -wal, and the latter came to be around 3 times the size of db.

Note: QueueListenerChanBufferSize and handler.Concurrency needs to be correctly configured by user based on their specific needs for things to work well.

Hi @acaloiaro Let me know what you think and which way to go. I'll do the changes accordingly and wrap this up.

do we need to wrap entire handleJob code in a transaction (like how it's done in postgres_backend.go)

My recollection of why handleJob is wrapped in a transaction is because I was considering exposing access to neoq's transaction to the user, so users always have a transaction available within their jobs. Doing so would ensure that any database modifications within jobs that use the exposed transaction can be rolled back if the job fails. Ultimately, I haven't done any API work to expose the transaction, but I'd like to retain that as an option. So for that reason, I'd like sqlite be able to do the same, if the time comes. I believe there are performance implications to this decision that I haven't fully thought out. If there's a good reason to push back on this decision, I think we should entertain any objections.

Let's not do that.

Ok. Maybe just think through whether we need any settings that are WAL mode. If not, feel free to ignore this.

Maybe having a mutex isn't a bad idea. We certainly don't want people to have to have special builds of sqlite to have a mutual exclusion guarantee, and I doubt this mutex you've added will dominate performance behavior. I just wanted to make sure it was thoroughly thought through. If having a mutex here achieves the best result in terms of reliability, then we should keep it.

Have a look through any remaining, unresolved conversations on the PR. When everything is resolved, I'll give this another review.

Also have a look at this top level of this thread. Lints and tests with -race are failing. So you'll want to run those locally and get them resolved.

backends/sqlite/sqlite_backend.go

acaloiaro · 2024-06-02T15:56:46Z

backends/sqlite/sqlite_backend.go

+	}
+
+	dbURI := strings.Split(s.config.ConnectionString, "/")
+	dbPath := strings.Join(dbURI[1:], "/")


Let's make sure len(dbURI) > 1, and throw an error that the connection string is malformed, before indexing into its slice.

Alternatively, I thought to have:
dbPath := strings.TrimPrefix(s.config.ConnectionString, "sqlite3://")

When testing for Windows I found that the connection string must be of the form sqlite3://file:///<db_path> for it to open the db successfully. If we do a split on "/" we get an extra "/" attached at the start of the resulting db path. What do you think?

Sounds better than splitting.

acaloiaro · 2024-06-02T15:58:12Z

backends/sqlite/sqlite_backend.go

+	}
+
+	// Rollback is safe to call even if the tx is already closed, so if
+	// the tx commits successfully, this is a no-op


I remember making this comment about Postgres transactions. Is it true of Sqlite too?

pgx lib clearly documents that tx.Rollback is safe to call multiple times even after commit. But the same is not there for sql lib. Theoretically rollback shouldn't have any effect after commit right? But if it is a concern then maybe we can do something like:

defer func() { if err != nil { _ = tx.Rollback(ctx) s.logger.Error("Transaction rolled back due to error", slog.Any("error", err)) } }()

This confirms that it’s a no-op for the sql.DB interface more generally.

That is from the fact that in their example code they are doing a defer tx.Rollback() like we are doing currently?

I was referring to the following from the example:

Defer the transaction’s rollback. If the transaction succeeds, it will be committed before the function exits, making the deferred rollback call a no-op. If the transaction fails it won’t be committed, meaning that the rollback will be called as the function exits.

acaloiaro · 2024-06-02T16:07:35Z

backends/sqlite/sqlite_backend.go

+	return
+}
+
+func (s *SqliteBackend) updateJobToInProgress(ctx context.Context, h handler.Handler, job *jobs.Job) (err error) {


I think we should consider having this function receive a *sql.Tx rather than it create one. That way, job handling can use a single transaction for all db operations.

In the PG backend, we do this by adding a tx to the ctx, e.g. from handleJob:

ctx = context.WithValue(ctx, txCtxVarKey, tx)

In any case we'll need atleast 2 transactions right? One for in progress/failed and one for processed/failed ? Since in progress needs to be committed

How important is in progress to you? Is there a specific use case you have in mind?

I intentionally avoided it for postgres because it's an additional database round trip, and the best use case I had for it was to be able to show in-progress jobs in a UI (which doesn't exist). As you can see, leaving in progress out also reduced complexity because the "state machine" for determining its correct value also needs to be considered. I'd be inclined to remove the in progress requirement unless it's serving a specific goal of yours.

I had the very same use case for being able to show in-progress jobs in a UI. But maybe you're right, it adds more complexity. We can perhaps remove the requirement for now as it would also be consistent with other backends.

backends/sqlite/sqlite_backend.go

acaloiaro · 2024-06-02T16:20:10Z

backends/sqlite/sqlite_backend_test.go

+		return
+	})
+	// Make sure that each neoq worker only works on one thing at a time.
+	h.Concurrency = 1


This possibly places the test under unrealistic conditions. I would expect real world users to use default concurrency. Will this test work with concurrency > 1?

These are test cases borrowed from postgres backend. I hadn't put much thought on this specific case. I can try your suggestion.

acaloiaro · 2024-06-02T16:23:07Z

backends/sqlite/sqlite_backend_test.go

+	const WaitForJobTime = 1100 * time.Millisecond
+
+	// allow time for listener to start and for at least one job to process
+	time.Sleep(WaitForJobTime)


Can we add a done channel here instead? 1.1s is a pretty long fixed wait time. Some of my early tests did this, but I've tried to get away from any fixed wait times in favor of a done channel and a timeout instead.

timaa2k · 2024-11-04T00:18:07Z

Any chance for this getting merged? It would be very useful for me :)

acaloiaro · 2024-11-04T00:21:19Z

@timaa2k If you or @pranavmodx want to get the pull request over the finish line, for sure.

pranavmodx added 30 commits May 5, 2024 17:43

change mod init name

3fc305a

sqlite backend work so far

c3df22a

migration works

4375dbb

init db

9bff228

enqueuing works

a1ba274

notification - related work

7cfd550

job executes but db gets locked so not commited

ab982c6

pending jobs works

ed54b2a

db lock issue fixed, enqueuing works

3800287

comment few logs

5042ed4

fix enqueue locking indefinitely issue

66f6154

retrying works

5e0dcb0

fix job chaining sync issue

e2d5d04

code cleanup

3be1f0d

remove queue monitor and related fields

8b2be88

use ctx versions of sql apis

9fdb1cb

code cleanup

59201dd

update and fix max_retries entries in db

e56993a

fix sqlite3 db path

a35e870

make queueListenerChan buffer size configurable

82a13d9

fix order of scanned columns in query

00b8429

remove deadline from query

4dea09d

add in progress job status

a575976

add dbgs

fd7e5cc

add dbgs

330893c

add dbgs

ddbc6b1

close db when calling shutdown

4859e6e

implement initFutureJobs

66a7f1e

remove dbgs

0ac8178

create neoq_dead_jobs table; implement moveToDeadQueue

faaa11d

pranavmodx added 15 commits May 5, 2024 17:44

implement StartCron

aeebbf2

add basic test layout

31de8d5

add tests for multiple queues, cron

9e7265d

add tests job errors, future job scheduling

b710137

change author back to original

85e7409

cleanup postgres backend logs

f2e60ba

code cleanup; carve out updateJobToInProgress

a159d2b

fix futureJobs concurrency issue

b1c10a3

extract queueListenerChan for a given queue beforehand

7857380

add support for deadline, insert deadline and max_retries in enqueueJ…

bf6680e

…ob; multiple tests

add tests - TestMultipleProcessors, TestMultipleCronNodes

f29e7b3

populate id and deadline to dead queue

d59a5f4

support json payload; remove payload2

a4fe135

add test for duplicate job rejection

e2b4608

code cleanup

1ce892a

acaloiaro reviewed Jun 2, 2024

View reviewed changes

pranavmodx added 4 commits June 8, 2024 07:35

extract db path by trimming prefix

0691d74

remove id in query for moveToDeadQueue

91e3609

edit comment for initializeDB

b818224

replace TrimLeft with TrimPrefix

e4da1db

acaloiaro added the in progress Work is in progress for an issue label Aug 15, 2024

acaloiaro removed the in progress Work is in progress for an issue label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support for SQLite backend #130

feat: Support for SQLite backend #130

pranavmodx commented May 31, 2024

acaloiaro left a comment •

edited

Loading

acaloiaro Jun 2, 2024

acaloiaro Jun 2, 2024 •

edited

Loading

pranavmodx Jun 10, 2024

pranavmodx Jun 26, 2024

pranavmodx Jul 22, 2024

acaloiaro Jul 25, 2024 •

edited

Loading

acaloiaro Jul 25, 2024

acaloiaro Jul 25, 2024

acaloiaro Jun 2, 2024

pranavmodx Jun 8, 2024 •

edited

Loading

acaloiaro Jun 9, 2024

acaloiaro Jun 2, 2024

pranavmodx Jun 8, 2024 •

edited

Loading

acaloiaro Jun 9, 2024

pranavmodx Jun 9, 2024

acaloiaro Jun 11, 2024

acaloiaro Jun 2, 2024

pranavmodx Jun 9, 2024

acaloiaro Jun 11, 2024

pranavmodx Jun 26, 2024

acaloiaro Jun 2, 2024

pranavmodx Jun 26, 2024

acaloiaro Jun 2, 2024

timaa2k commented Nov 4, 2024

acaloiaro commented Nov 4, 2024

feat: Support for SQLite backend #130

Are you sure you want to change the base?

feat: Support for SQLite backend #130

Conversation

pranavmodx commented May 31, 2024

acaloiaro left a comment • edited Loading

Choose a reason for hiding this comment

Tests and lints

Choose a reason for hiding this comment

acaloiaro Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acaloiaro Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pranavmodx Jun 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pranavmodx Jun 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timaa2k commented Nov 4, 2024

acaloiaro commented Nov 4, 2024

acaloiaro left a comment •

edited

Loading

acaloiaro Jun 2, 2024 •

edited

Loading

acaloiaro Jul 25, 2024 •

edited

Loading

pranavmodx Jun 8, 2024 •

edited

Loading

pranavmodx Jun 8, 2024 •

edited

Loading