-
Notifications
You must be signed in to change notification settings - Fork 309
Eating elephant, first step (events, db schema) #2006
Conversation
"WHERE username=%s " | ||
"RETURNING username, username_lower" | ||
, (suggested, lowercased, self.username) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intentionally remove the back_as=tuple
argument here? Why isn't it needed any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only is not needed, it is not allowed. The way it works is that back_as
is a property of the cursor. Since now I create the cursor explicitly, I need to supply it as a param in the get_cursor
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I see that. Makes sense.
@zwn is #1939 (comment) related to this PR? |
Yes, it is. Actually that write up is pretty good. Thinking about showing the history to the users forces us to be honest and think everything through. |
@whit537 I must say I am pretty surprised (not in a bad way) you arguing this side. You might wonder why I am but have you recently looked at the gittip schema? 😉 You have used perhaps all the standard and nonstandard postgres features there are. Ok, you are making sense. I see what I can do about it. |
CREATE TABLE events | ||
( id serial PRIMARY KEY | ||
, ctime timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP | ||
, pid1 bigint NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to reference the participants table here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because it is the other way around - participants should reference events (if anything). Participants should be recreateable from events from scratch.
- Shortening "Branding Guidelines" to "Brand Guidelines" for punchiness - Broadening "Data Model" to "System Architecture." More parallel to "Information Architecture," and informs things like gratipay/gratipay.com#2006 and gratipay/gratipay.com#1939.
:-) Postgres is a pretty powerful system. |
Some inspiration from what github stores and publishes: http://developer.github.com/v3/activity/events/ |
If anyone knows how to reasonably (=not by string manipulation functions) build complex (=2 levels deep) json objects in SQL let me know. So far I have found only https://github.com/pgexperts/json_build but that does not work for heroku. I need it to fill the |
I am trying to put the timestamp inside the json CREATE INDEX events_ts ON events( cast(payload->>'ts' as timestamp) ASC) is still not immutable (why? why?). Another bump is that such expression must be used in each query wanting to use such an index. So, we can index timestamp as text. Or we can keep the |
Oh well, the time of the last commit is off probably because of my use of stgit. Now I have An example of an event where participant is changing her username: id: 9384
ts: 2014-02-15T....
type: participant
payload:
id: 1234
action: set
values: { username: joe } The data conversion from current tables to the |
This reminds me a lot of MongoDB, which I used a fair bit at my last job. Do you have any experience with Mongo? It gives you JSON "documents" (rows) in "collections" (tables) in "databases" on "servers." And the Do you think we should explore MongoDB as an option? |
Interesting. I've never done anything with MongoDB. As you describe it, it is definitely worth checking out. |
I was afraid you would be like, "Chad! Enough with the rabbit holes!" But since you weren't ... ;-) The problem is that I have a bad impression of MongoDB from an engineering perspective. It's like the difference between MySQL and PostgreSQL. After watching engineers at my last company deal with data loss due to MongoDB design decisions, I don't feel fully comfortable trusting my data to it. Basically, this:
|
So if we're going to evaluate alternate databases (is that what we're going to do?), I think we should also look at other options besides MongoDB. We should probably start a new ticket, as well. :-) |
Oh, I am not ready to evaluate alternate databases. I wanted to explore MongoDB for interesting ideas on how to work with data, what kind of APIs work for what etc. We don't have the bandwidth for switching the db technology. But that is a tangent anyway. Are we ready to merge this PR? What else do we need here? My goal is to have an events table and gradually start adding event types as we work through them on building.gittip.com, IA. So far I see a clear path forward for the participants table. I'd like to do "weekend sized" PRs and ideally have them merged during the week to be ready for another weekend's work. |
ORDER BY ts | ||
""") | ||
|
||
for event in heapq.merge(claimed, usernames, api_keys, goals): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heapq.merge
Nice. I've never used that before. :-)
CREATE TABLE events | ||
( id serial PRIMARY KEY | ||
, ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP | ||
, type text NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use kind
instead of type
, since (as we've mentioned somewhere) type
is a builtin function in Python?
for event in heapq.merge(claimed, usernames, api_keys, goals): | ||
payload = dict(action=event.action, id=event.id) | ||
if event.action == 'set': | ||
payload['values'] = { event.__dict__.keys()[-1]: event[-1] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a Traceback here:
[gittip] $ honcho run -e defaults.env,local.env ./env/bin/python branch.py
Traceback (most recent call last):
File "branch.py", line 47, in <module>
main()
File "branch.py", line 39, in main
payload['values'] = { event.__dict__.keys()[-1]: event[-1] }
AttributeError: 'Record' object has no attribute '__dict__'
[gittip] $
Conflicts: gittip/models/__init__.py gittip/models/participant.py
Not sure why this API changed. Something we did on master in the mean time?
The need for this import entered with #1369.
|
||
/* run branch.py before this | ||
DROP RULE log_api_key_changes ON participants; | ||
DROP RULE log_goal_changes ON participants; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have four rules on the participants table right now:
bitcoin_addresses
logs intobitcoin_addresses
log_api_key_changes
logs intoapi_keys
log_goal_changes
logs intogoals
log_participant_number
logs intolog_participant_number
This PR replaces (2) and (3) with calls to the new add_event
function, and it also uses add_event
to start logging changes to username
and claimed_time
, which were not previously logged. (1) and (4) are unaffected by this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. Now that we are somewhat unclogged I'll look into adding those over the weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Be sure to give me bite-sized PRs, eh? :-)
I'm going to merge this PR, because I want to land #1976, and account disconnects are something we should really be logging. The logging system in this PR is not perfect, and I'm having tons of ideas about what sort of system would be better. But that's the point of this PR: not to be the final word by any means, but to move us forward incrementally. |
Eating elephant, first step (events, db schema)
Note for the future: would've been good to optimize the branch.py script, or at least log its progress. I'm sitting here waiting for it to finish, and I have no idea how long that's going to take. The app is in maintenance mode in the meantime. :-/ |
Yeah, that sucked. We were down for, like, an hour. :-( |
This time we also had a branch.py. I've noted where that was run.
Sorry. 😟 Locally it was so fast I never felt the need to add any progress tracking and/or further optimizations. Were you running it locally or on heroku? I was imagining it running like |
It broke in #2006, not sure how we missed it. Turns out we did have a test for the logging rule after all. :-)
Locally.
Good idea. |
Ringing in super late, after seeing the brief MongoDB back-and-forth above -- I think your instinct that a MongoDB-like model is helpful is right-on. If it's not Mongo itself, maybe RethinkDB? I haven't used them, but they have some pretty compelling ideas. |
As discussed at #1549 we need to pick a place where the elephant is softest and start there. Tracking
claimed_time
andusername
appears to be it.The
events
table is our log. The ultimate goal being that any db state at any point of time in the past should be reconstructable. After several tries at this it seems prudent to keep two participant ids in this table. The first one is to be used "the participant" (tipper, claimer etc.). The second one is to be used as "the other" (tippee, team etc.). This setting allows somewhat sane filtering. Using ids instead of usernames makes the past unmutable which is what we need for any log.The next steps are to fold goal, number, statement, bitcoin and other attributes into this architecture.
The more chewy parts of the elephant include user creation semantics, tipping and absorbing (take_over).
The ultimate test will be reconstructing the current state of the db just from the
events
table.Please review & merge.