Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor values get emptied when passed to Counts.publish() #58

Open
Diginized opened this issue Aug 9, 2015 · 23 comments
Open

Cursor values get emptied when passed to Counts.publish() #58

Diginized opened this issue Aug 9, 2015 · 23 comments

Comments

@Diginized
Copy link

I'm not sure why, but when I save my search cursor to a variable (so that I can use it twice) and pass it into Counts.publish() and then use it again to return the values to the client, the cursor sent to the client has _id fields only. As you can see in the code below, the search specifies which fields are to be returned to the client, and this works just fine if the Cursor.publish() line is commented out. But with that line active the fields{} parameter seems to be overridden.

I did find a workaround (below in case it benefits others) but the workaround is a little less elegant and I wanted to ask if I was doing something wrong.

Is this the best way to do it?

    Meteor.publish("search", function (searchObj, limit, skip) {

        //build a search object that looks like {lastName: "Smith", firstName: "Jim"} (but with regex to fix case sensitivity)
        searchParameters = Meteor.call("buildSearchParameters", searchObj);

        if(!limit){var limit = 30} //in case the client code forgot to set one, or to prevent malicious overload
        if(!skip){var skip = 0}

        cursor = Contacts.find(
            searchParameters   //object built by buildSearchParameters method
            , 
            {
                limit: limit,
                skip: skip,
                fields: 
                {
                    lastName: 1, 
                    firstName: 1, 
                    middleName: 1,
                    streetNumber: 1,
                    streetName: 1,
                    city: 1,
                    zipCode: 1 
                }
            }
        ); 

        //use tmeasday:publish-counts package to publish the total count from this cursor
        //{noReady: true} allows the cursor to determine the limit rather than this counter (see docs)
        Counts.publish(this, 'totalSearchResultsCounter', cursor);
        return cursor;

Workaround: Instead of using cursor in the Counts.publish() function, I just do a new Contacts.find and bring down the same searchParameters variable that I was already doing above. This method requires that I save all of my search parameters into a separate object to avoid duplication (DRY).

                ...
        Counts.publish(this, 'totalSearchResultsCounter', Contacts.find(searchParameters);
        return cursor;

Thanks in advance, and thanks for this handy package !

@boxofrox
Copy link
Contributor

publish-counts optimizes the field specifier in a cursor to only pull those fields it needs. With a basic count, as used in your example, the field specifier is replaced by _id.

At the time this seemed like the proper course of action with the documentation using separate cursors in the examples for publish-counts and Meteor.publish, which you found works.

Will update the package to keep existing field specifiers on basic counts. @tmeasday originally asked for this behavior when I rewrote the optimizer until I talked him out of it.

@Diginized
Copy link
Author

helping to think this through and make sure I'm not requesting a change with bad side effects...
I can definitely see how passing only the _ids should make more sense. When I do a query with a limit and then use publish-counts to find the "of x total" then I definitely only want the contents sent to the browser for that limited amount to be sent. But if you're only propagating the { "_id": "totalSearchResultsCounter", "count": 2038 } object that I see you've sent to the clients counts db, then I don't see it being a problem.

For the moment my workaround does at least do the job, even if a touch more slowly, but if you do make this change I'll be interested to try it out and let you know how it's going. When only doing the count I'm not sure if there's any performance hit to having all of those field selectors in there, but if you chose to making it as an option then you'd be able to pick which you wanted to use... though, I guess that option is already built in because user could just provide an empty field selectors parameter or just {_id": 1}

@boxofrox
Copy link
Contributor

@Diginized, if you're adding a limit or skip option to a single cursor used with both Counts.publish and Meteor.publish, then the limit will apply equally to both cases.

Without a separate cursor for Counts.publish, dropping the limit option will still affect the original cursor object returned to Meteor.publish. Keep in mind that when publish-counts optimizes the fields option of the cursor, it's digging into the internal Meteor implementation of Cursor to edit the fields option. I recommend using separate cursors. One without limit for Counts.publish, other with limit for Meteor.publish. It's less DRY, but it is necessary.

The fix for this issue will behave as follows:

  • If user does not assign a field specifier to the query/cursor, automatically assign {_id: true}.
  • If use does assign a field specifier, let it be. Meteor will ensure that '{_id: true}' is set if omitted. If user specifies {_id: false}, then Meteor will report a fatal error by design.

@Diginized
Copy link
Author

OK, I'm cool with doing the two cursor thing and it seems to make sense, but I don't think the behavior I'm getting matches the reason you're citing. Note the "30" displayed in the two screenshots below (code following screenshots)

What's interesting, and surprised me initially, is that when I use the same cursor for both, the method that results in the blank records coming back, it still returns exactly 30 blanks... so publish-counts dropped my fields: and my limit:, but my return cursor still honored the limit. This is what happens using that code from my very first post above. Unless I misunderstand, this seems contrary to your second paragraph in the latest post.

Just so that I'm not confusing anything, here are some screenshots (with code included) to go with what I'm saying.

  • The code in the very first post above yields 30 blank records (_id only) and a correct "total records" count: http://cl.ly/image/101J370O0s3C
  • Modify that code (the workaround described in that same post) to use a second Contacts.find() cursor and you get 30 complete records plus the correct counts (but much slower page load): http://cl.ly/image/0c383t2r3H12

What's weird to me is that this is how the client is coming up with that "30" count:

Template:

<h3>Search Results</h3>
    <div>(Showing <span class="badge">{{pageCount}}</span> of <span class="badge">{{getPublishedCount 'totalSearchResultsCounter'}}</span>   results)</div>

Helper:

  'pageCount': function(){
    return Contacts.find({}).count()  //note, nothing here that knows about the limit:30 on the server
  },

Is this the expected? Am I making sense?

@boxofrox
Copy link
Contributor

Yea, I'm not writing very cogent comments. publish-counts does not drop limit, but if it did, it would affect the cursor shared with Meteor.publish. Sorry about that.

According to #21, Meteor is a bit inconsistent on how cursor.count works with limit and skip options. I'm not familiar with all the details, but I think that issue may explain why your initial count matches in both tests, however, I expect future reactive updates to the count will be affected by limit and skip as documents are added/removed from the collection, because publish-counts relies on cursor.observe for reactive updates instead of cursor.count. In a nutshell, cursor.count ignores limit and skip, while cursor.observe does not.

If you only need the initial count without subsequent reactive updates, then your use of one cursor appears to be sufficient. I just need to leave your field specifier alone. You should disable reactive counts in this case.

Yes, this apparently is expected, and you're making better sense than I have been. The screenshots help very much. Thank you for including those. I also didn't realize that your earlier comments mentioned one cursor is loading faster than two.

Out of curiosity, does using the noReady option with two cursors affect the load time?

@tmeasday
Copy link
Member

@boxofrox I'm not completely following but don't you think the answer here is that we should just clone the cursor before we start messing with it? I've done something like this before, the code looks something like:

    var selector = cursor.matcher ? cursor.matcher._selector : cursor._cursorDescription.selector;
    var newQuery = _.extend({}, selector, query);

    var oldOptions = cursor._cursorDescription ? cursor._cursorDescription.options : cursor;
    var newOptions = _.extend(_.pick(oldOptions, "skip", "limit", "fields"), options);
    if (oldOptions._transform) {
      newOptions.transform = oldOptions._transform;
    }

    if (cursor.collection) {
      return cursor.collection.find(newQuery, newOptions);
    }
    return cursor._mongo.find(cursor._cursorDescription.collectionName, newQuery, newOptions);

(The code above works both client and server, and respects transform, neither of which we need)

@boxofrox
Copy link
Contributor

@tmeasday Perhaps. I'm not entirely sold on cloning.

  1. It still creates a second cursor/query, yes? @Diginized's preliminary tests indicate one cursor is faster than two over the same collection (not independently verified). If he's aware of the trade-offs, he can still use a single cursor to get his perf boost. He may lose that choice with cloning.
  2. If a user only publishes a count without returning the cursor (example code below), we're creating a new cursor, but what happens to the old cursor? Does it just drop out of scope and garbage collect without any work (e.g. touching the database, or some other perf adverse action)?
Meteor.publish('posts-visits-count', function() {
  Counts.publish(this, 'posts-visits', Posts.find(), { countFromField: 'visits' });
});
  1. Do we risk more github issues from users confused by what publish-counts does behind its API? They give us a cursor with certain settings and we clone, then tear the clone down to only what we need, and the results aren't what they expect. I've tried to offset possible confusion with warnings in the field optimizer code. On the other hand, this issue (Cursor values get emptied when passed to Counts.publish() #58) is an example of adverse side-effects spilling into the user's code, which cloning will certainly alleviate.
  2. Your cloning example still picks limit and skip. Currently, our use of cursor.count ignores these two options by default, but I believe the cursor.observe we use for reactive updates will be constrained by these two options, yes? I can see no reason to keep them, but my experience is very limited.

My cloning ability in javascript is weak. I'm not certain of the problems/pitfalls I will run into attempting it. Your example is a good point to start from.

My opposition to cloning is I don't understand it enough to address these questions. I'll give it a shot and see what I find.

Thanks for jumping in here. :)

@tmeasday
Copy link
Member

  1. True if we don't need to change the cursor. Perhaps sharing the cursor is an optimization we can special case? (maybe we call it with a warning in devel mode if we can't? idk)
  2. I'm pretty confident this will be OK but I guess we should test it to be sure..
  3. I think the altering of the cursor is an optimization that shouldn't affect users. I'd say it's a bug if not, or are there use cases that I've missed here?
  4. Sorry I just hastily copied that from somewhere else, we should also drop those I think.

@boxofrox
Copy link
Contributor

With respect to (3), yes it's a bug. My intended fix is to modify the field optimizer to leave any field specifier from the user alone, otherwise set an undefined field specifier to {_id: true}.

This was the behavior you asked for when I rewrote the optimizer, but I argued there was no reason to keep the user's field specifier for basic counts. I thought standard operating procedure for publish-counts was Counts.publish got its very own cursor, then @Diginized came along and showed me otherwise.

@Diginized
Copy link
Author

@boxofrox, to be honest, I originally assumed the same as you about giving Counts.publish its own cursor rather than a shared one--because after all, it's an integer we're looking for. I was surprised to find that the shared on was performing faster.

Based on my 3 min little test right now, @tmeasday's suggestion in #59 for wrapping it in Meteor.defer() is working out pretty well regardless of whether I'm using a shared or unique cursor. The first page of results are now appearing before the total count, which from a UX perspective is much closer to that "snappy" goal I'm trying to achieve.

        var sub = this;
        Meteor.defer(function(){
            counts = Counts.publish(sub, 'totalSearchResultsCounter', cursor);
        });

I will try messing around with some of the other suggestions listed above and report back if I learn anything potentially useful, but it will probably be a couple of days before I can get to it.

@boxofrox
Copy link
Contributor

The PR for #57 is up, so I will be looking into this next.

After some deliberation, if we move forward with cloning cursors, I'm of the opinion it should be an opt-in feature. This can be provided via a Counts.sanitizeCursor() function that clones a cursor with sane defaults (e.g. stripping off limit and skip).

  • For any users that are OCD about DRY, this will allow them to define a cursor once.
  • If sanitizeCursor()'s sane defaults are not germane to a user, they can write their own version.
  • Users may use at their discretion a single cursor for both Counts.publish and Meteor.publish.

Optimizing the field specifier will still be a separate problem. As #53 points out, Collections middleware may modify the field specifier in addition to the user's fields. At this point publish-counts is unable to distinguish which fields are unecessary (user) or necessary (middleware). Thus, cloning won't be an opportunity to modify the field specifier.

On a side note, the need for optmizing assumes that reducing the quantity of fields returned from a query improves performance. I have not read about or tested this. IMO, if a cursor is shared between Counts and Meteor, it seems like more work to strip away fields for Counts that mongo will otherwise provide to Meteor.

@tmeasday
Copy link
Member

@boxofrox - Thanks for all your diligence here.

Let's discuss this question a little more perhaps. I think there's realistically two main ways to use Counts.publish:

  1. Using a cursor that is unique to publish counts. Here typically a user wouldn't specify a fields option at all (unless they were trying to optimize themself).
  2. Sharing a cursor for both a total count and a standard publication of a limited set that's currently visible. Here a sort and limit is likely and a fields specifier is also very possible. This is the example quoted above.

Now, on the question of performance, I've not tested it closely, but it seems clear that in network bandwidth alone it's going to be better to not fetch the full document but just the _id, if we are simply counting documents.

There'd be an argument if it was possible we could share the observer with the second use of the cursor, but this is certainly not going to be the case. If we were publishing the full document set, there'd be no point in publishing the count. So it's fair to assume that the user is not publishing the entire cursor.

@boxofrox
Copy link
Contributor

@tmeasday I thought about this over the week, but didn't think up any counter examples, so I concur with your assessment. Though I'm still a bit hazy on the implications. Here's where I'm at.

  1. Users may use at their discretion a single cursor for both Counts.publish and Meteor.publish. This sentiment is out. There is no reason to share cursors, period. At least one of the properties limit, skip, and/or fields shall be stripped, otherwise the user doesn't need publish-counts.
  2. My intended fix--of disabling field specifier optimizations when the user specifies fields on a basic count--should be deep-sixed. Cloning the cursor allows us to keep the aggressive optimization while leaving the original cursor intact. Networks will thank us for it.

I think the rest of my preceding comment still stands. If we automatically clone the cursor in Counts.publish, we are unable to optimize the field specifier because we won't know which fields are necessary for Collection middleware.

With the sanitizeCursor(), we can offer users an options.fields parameter to specify fields necessary for middleware, which leaves the function free to aggressively optimize the field specifier at all times. On second thought, we could add options.fields to Counts.publish... but we deny the user the ability to roll their own cloning function if our "sane" defaults don't work for them.

With cloning, I think optimization should always occur, because we cannot guarantee that a cursor will have defined limit and skip properties. If the cursor is missing those, then without modifications to the field specifier, there'd be no point in publishing the count, as you said.

On a side note. Another issue I have with cloning cursors is that cursors are inconsistent between local collections new Mongo.Collection(null) and "regular" collections, as we've seen in #35.

@tmeasday
Copy link
Member

Glad we are getting closer to the bottom of this @boxofrox.

I'm still a little confused about the "collection middleware" (I'll admit I'm not totally up to speed with what that is). Can we nail down a simple example of what we are talking about here? (I'm trying to follow the comments above but not quite seeing it).

On the final point, I think a simple cursor-utils package that offers a properly isomorphic (ie works on all three types) API to clone, restrict, loosen, etc cursors would be super useful! I don't think it needs to be in core, but it's something I've wanted for other purposes many times (and implemented bits of here and there). Does anyone know of such a package?

@boxofrox
Copy link
Contributor

I'm still a little confused about the "collection middleware"... Can we nail down a simple example of what we are talking about here?

Certainly. In #53, kristijanhusak used a package called meteor-partitioner. It wraps around a Mongo.Collection and adds a field specifier, _groupId, to Collection queries in order to filter/partition result sets. When Counts.publish optimizes the field specifier, it clears _groupId and defeats Partitioner's mechanism.

// quick hacky example.  likely broken.  based on examples on paritioner's github page.
var ChatMessages = new Mongo.Collection('messages');
Partitioner.partitionCollection(ChatMessages); // Adds hooks to a particular collection
                                               // so that it supports partition operations.

if (Meteor.isServer) {
  Meteor.publish('messages', function () {
    var cursor = ChatMessages.find();
    // partitioner added a _groupId field specifier to this cursor.

    Counts.publish(this, 'message-count', cursor);
    // Counts.publish just optimized the field specifier list to { _id: true }.
    // No more partitioning.

    return ChatMessages.find();
  });
}

I started referring to packages like Partitioner as Collection middleware, since it technically adds a layer between a Mongo.Collection and publish-counts.

@boxofrox
Copy link
Contributor

Btw, @tmeasday. You mention an isomorphic API for three types. Which three types are you referring to? limit, 'skip', and 'fields'?

@tmeasday
Copy link
Member

Oh, I was referring to server collections, client (i.e. server backed) collections, and local collections (i.e. null collections). Not sure what the correct terminology for the three is.

@tmeasday
Copy link
Member

Thanks @boxofrox. My reading of https://github.com/mizzao/meteor-partitioner/blob/master/grouping.coffee#L139 is that it's adding {_groupId: 0} (i.e. hiding the _groupld from us).

Obviously we don't care if the_groupId comes down the wire or not (well, we don't want it to, really). We just want to make sure we respect whatever selector they pass into the query (so if they pass in {_groupId: 7} we only count docs in group 7). I can't see why we wouldn't be doing that.

@boxofrox
Copy link
Contributor

Ok, I completely misunderstood how Partitioner was working. I'm surprised any of my solutions to kristijanhusak worked in this case, because I was trying to let him add _groupId: 1 to the field specifier.

Well, this is good news. Since, Partitioner is completely transparent, I'm okay moving forward under the assumption that similar libraries should also be transparent.

@KristerV
Copy link
Contributor

man I just spent a whole day trying to figure out why the hell my publish wasn't working properly... and it was because of this issue. how annoying.

KristerV added a commit to KristerV/publish-counts that referenced this issue Oct 19, 2015
@tmeasday
Copy link
Member

I thought this problem was resolved though? @KristerV I'm pretty sure you don't need to do what you did in that documentation example. Sharing the cursor is a thing that we do all the time.

@KristerV
Copy link
Contributor

really? uhhh.... still had this issue like yesterday. i'll try again tomorrow

@boxofrox
Copy link
Contributor

I haven't had much time to get the cursor cloning working and tested to my satisfaction. This is still a pending issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants