Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to omit HTML from collated JSON #48

Open
lolmaus opened this issue Sep 25, 2020 · 10 comments
Open

Option to omit HTML from collated JSON #48

lolmaus opened this issue Sep 25, 2020 · 10 comments

Comments

@lolmaus
Copy link

lolmaus commented Sep 25, 2020

Hi! Using broccoli-static-site-json with Prember.

I would like to show a list of blog posts on the blog index page. But loading the content of all the blog posts seems excessive. I would like to load only the metadata (frontmatter).

It would be nice to have an option to enable this.

@lolmaus
Copy link
Author

lolmaus commented Sep 25, 2020

I can try implementing that if broccoli-static-site-json maintainers agree with the general idea and provide basic implementaion clues.

CC @simonihmig.

@mansona
Copy link
Member

mansona commented Sep 25, 2020

@lolmaus did you try setting the content-types configuration? 🤔 https://github.com/empress/broccoli-static-site-json#content-types

@lolmaus
Copy link
Author

lolmaus commented Sep 25, 2020

@mansona I did not. I assume removing html from contentTypes will prevent me from rendering individial posts at the posts.post route.

I only want to remove the HTML from the collated response for the posts.index route, but keep it in individual responses.

I believe this should be a rather common requirement.

@mansona
Copy link
Member

mansona commented Sep 25, 2020

@lolmaus I don't think that this would be a common requirement 😉

what you could do is that you could have 2 instances of broccoli-static-site-json (one with contentTypes and one without) and merge them with broccoli-merge-trees with override: true setup.

@lolmaus
Copy link
Author

lolmaus commented Sep 25, 2020

I don't think that this would be a common requirement 😉

Loading the content of all posts on the posts index page makes no sense to me because I'm not showing full content on the index. With a large library of blog posts, it will also affect performance on entry-level mobile phones, smartwatches, etc.

Why is it common to choose wasting user's time, traffic, CPU and memory? Well, there is one legit case: to speed up page transitions for users with powerful devices and fast internet. But I doubt that it is entirely uncommon to care for other users.

Note that there are cases when you want to load all posts' metadata on individual post pages. For example, you want a table of contents, or prev/next links labeled as ← <post title> and <post title> →, or you want to show some stats, etc. Without my suggestion, this would require loading all posts for every page. If I wanted to do that, i'd simpy use broccoli-static-site-json in a treeForApp rather than treeForPublic, adding all posts as JS modules to the main bundle and saving one network request on initial loading and one network request for each transition. 😬

The whole point of loading posts over network is to load only the currently required one for each page, rather than all of them.


what you could do is that you could have 2 instances of broccoli-static-site-json (one with contentTypes and one without) and merge them with broccoli-merge-trees with override: true setup.

That works, but it is not convenient. It will require messing with adapters to abstract away the fact that different names are used for different types of requests of the same resource. Erm, that's not a legit argument because you need to modify the adapters anyway.

But note that there is no option to opt out of generating individual files, resulting in two sets of them generated!

It will also double the amount of broccoli-static-site-json configuration code.

Why are you opposed to adding an option like collatedContentTypes? When this option is omitted, it would default to the value of contentTypes.

What is the downside of having this option in broccoli-static-site-json?

@mansona
Copy link
Member

mansona commented Sep 25, 2020

excuse me @lolmaus but I don't think your negative response here is warranted. The scenario that I was saying was not a common requirement is having a difference between an aggregated view of the data and the individual view of the data.

I will do my best to answer your query as it is written, but I will ask that you follow the Ember Community Guidelines when continuing this conversation, especially the section about Be empathetic and respectful.

Firstly I have a question for you: why are you continually asking about "posts" when you're discussing this? Are you using this library directly or are you actually using empress-blog (which is one of the main users of this library). The reason why I'm asking this is because broccoli-static-site-json is a very low-level broccoli library that is concerned only with converting folders of Markdown files into JSON:API compliant json files, and a lot of your concerns are valid application-level concerns that have already been solved (to a degree) in empress-blog and some specific empress-blog templates.

The whole point of loading posts over network is to load only the currently required one for each page, rather than all of them.

There is a rudimentary pagination implementation that might interest you: https://github.com/empress/broccoli-static-site-json#paginate

what you could do is that you could have 2 instances of broccoli-static-site-json (one with contentTypes and one without) and merge them with broccoli-merge-trees with override: true setup.

That works, but it is not convenient. It will require messing with adapters to abstract away the fact that different names are used for different types of requests of the same resource. Erm, that's not a legit argument because you need to modify the adapters anyway.

This is some of the main reasons why I think what you're trying to do is not a common case. There is no way to solve this in a generic way that will solve everyone's use case if there is every any differences between the aggregated view and the individual view. Again I will re-iterate that this is a low level library for file conversions, this should be thought of as building a static API that is valid JSON:API rather than "integrating specifically with ember" (even though I give a helpful example of how an Ember dev might use this in the markdown)

But note that there is no option to opt out of generating individual files, resulting in two sets of them generated!

Well this is why I suggested using MergeTrees with override:true so you would have the individual files with all the data and the collated data without the data

It will also double the amount of broccoli-static-site-json configuration code.

how much broccoli-static-site-json configuration code do you have? In most cases (unless you're doing something relatively complex like empress-blog does) you should have pretty much a single line for each tree of files no? (maybe split over multiple lines for attributes etc.) but logically speaking you should have a single statement for your broccoli-static-site-json tree no?

Why are you opposed to adding an option like collatedContentTypes? When this option is omitted, it would default to the value of contentTypes.
What is the downside of having this option in broccoli-static-site-json?

I never said I was opposed to anything 😄 I was merely trying to challenge an assumption that you had in your assessment that something would be a common requirement. Although it's worth noting that no option to any library is ever free, if we add this we need to test it, maintain it, upgrade it if we restructure this library (again), make sure the documentation is correct, make sure the documentation is updated if there are any reasonable changes to the implementation, and so on. And this is not even counting the situation where it would add implicit complexity to a library that is already arguably overly complex for the simple feature-set that it is trying to achieve.

Not only that, we have seen that the library can be effective in very large implementations without hitting any of the major issues that you are suggesting are inherent as a result of the architecture of the lib 🤔

@lolmaus
Copy link
Author

lolmaus commented Sep 26, 2020

Hey @mansona, thank you for a detailed response!

your negative response
you follow the Ember Community Guidelines when continuing this conversation, especially the section about Be empathetic and respectful.

Please accept my apology; I assure you that offence was not my intention. 😔 It is my fault that my excitement and passion translates to arrogance in writing.


Firstly I have a question for you: why are you continually asking about "posts" when you're discussing this? Are you using this library directly or are you actually using empress-blog

I use it directly directly to add a blog section to a Prember-driven app, in a very similar way to empress-blog.


a lot of your concerns are valid application-level concerns that have already been solved (to a degree) in empress-blog

Well, we've discussed it many times over already: Empress is way too biased. It's not like any other Ember addon, it's more like a prebuilt app which happens to be bundled as an addon.

I need flexibility and broccoli-static-site-json fits really well to my needs.


There is a rudimentary pagination implementation that might interest you

It will only make things worse:

  • Without pagination, I receive all posts' frontmatter together with all its content. My goal is to get rid of the content.
  • With pagination, I also receive all the unwanted content, but I now need multiple network requests to render a ToC.

There is no way to solve this in a generic way that will solve everyone's use case if there is every any differences between the aggregated view and the individual view.
should be thought of as building a static API that is valid JSON:API rather than "integrating specifically with ember"

Every API is designed for a specific purpose.

Some APIs are designed to be used publicly, but I doubt that a static blog is one of them. And even it were, I still don't see a problem: I'm not asking to deprive all broccoli-static-site-json users from content in their collated responses. I'm only asking to allow opting into removing content from collated responses for those who need it for their specific websites.

But note that most APIs are designed for a specific frontend, in this case a blog website. Almost all blog websites are consumed with a browser, not an API client (well, except for RSS, but it's generated separately). And if my particular website does not have a case for loading full content in all.json, why should I force my users to do so every time?

As for "rather than integrating specifically with ember", I don't believe Broccoli has any mention-worthy usage outside the Ember community. It has only 88 dependants on npm, with ember-cli being literally the only one regularly updated. For comparison, webpack has over 20K dependants.


Well this is why I suggested using MergeTrees with override:true so you would have the individual files with all the data and the collated data without the data

OK, that works. I assumed it would not work because file names would be different and will not overwrite each other.

But I now see that I can use identical names and only change the collationFileName option.


It will also double the amount of broccoli-static-site-json configuration code.

how much broccoli-static-site-json configuration code do you have? In most cases (unless you're doing something relatively complex like empress-blog does) you should have pretty much a single line for each tree of files no? (maybe split over multiple lines for attributes etc.) but logically speaking you should have a single statement for your broccoli-static-site-json tree no?

Well, yes, it's a single statement, but:

  • Yes, it's multiline.
  • It is one statement for each content type.
  • It's not only about doubling the amount of configuration code, but also about keeping each pair in sync. It's not that bad of course, but I just don't see why broccoli-static-site-json should require that over simply adding a single configuration option.

Also, using MergeTrees is a lot of trouble:

  • You need to be aware broccoli-merge-trees exists.
  • You need to learn how to use it. Broccoli is quite peculiar. For example, when I tried to figure out why broccoli-static-site-json requires a resource name to be specified twice, I've looked into tests for answers. I've discovered that while the readme suggests using a string as a first argument, tests use an object as the first argument. WTF? Broccoli.
  • Debugging Broccoli, in case you messed something up, is a nightmare.
  • You need to install an extra dependency and import it.

I was merely trying to challenge

And I ended up violating community guidelines. ☹ I'm sorry for that, of course I know you're acting in good faith and never did/said anything that would allow my arrogance. I just hope you see that I am acting in good faith too. I would like to contribute to a wonderful tool that you've built, fix a pain point I'm having (and I still believe it should be a common practice for static blogs, and many broccoli-static-site-json users would use it if they were aware of the difference it made), and do so without starting a fork.


no option to any library is ever free

If your argumentation is taken with full seriousness, it would seem that no library is worth writing.

I believe it should be a matter of comparing the maintainers' costs to users' costs. Most of what you mentioned is a one-time thing: after the feature is implemented, all the documentation updated and all the issues resolved, there is little to no effort required to maintain it.

But if it's not implemented, users will have to be paing the cost indefinitely: every time someone cares about the payload size, they need to go learning what broccoli trees are and how they work. Provided they come up with this idea in the first place.


OK, I'm not wasting any more of your time.

@lolmaus
Copy link
Author

lolmaus commented Sep 28, 2020

@mansona Thank you for your broccoli-merge-trees suggestion. Here's the setup I ended up with:

const StaticSiteJson = require('broccoli-static-site-json');
const broccoliMergeTrees = require('broccoli-merge-trees');

const blogPostsInputFolder = 'content/posts'; // input folder

const blogPostsConfig = {
  attributes: ['title', 'snippet', 'description', 'date', 'author', 'layout', 'twitter'],
  contentFolder: 'api/posts', // output folder
  type: 'blog-post', // JSONAPI resource name
};

const blogPostsJsonFrontmatterOnly = new StaticSiteJson(blogPostsInputFolder, {
  ...blogPostsConfig,
  collate: true,
  contentTypes: [],
});

const blogPostsJsonFull = new StaticSiteJson(blogPostsInputFolder, {
  ...blogPostsConfig,
  contentTypes: ['html'],
});

module.exports = {
  name: 'content',

  treeForPublic() {
    return broccoliMergeTrees([blogPostsJsonFrontmatterOnly, blogPostsJsonFull], {
      overwrite: true,
    });
  },
};

With my suggestion implemented, it would look like this:

const StaticSiteJson = require('broccoli-static-site-json');

const blogPostsJson = new StaticSiteJson(
  'content/posts', // input folder
  {
    attributes: ['title', 'snippet', 'description', 'date', 'author', 'layout', 'twitter'],
    collate: true,
    contentFolder: 'api/posts', // output folder
    contentTypes: ['html'],
    collatedContentTypes: [],
    type: 'blog-post', // JSONAPI resource name
  }
);

module.exports = {
  name: 'content',

  treeForPublic() {
    return blogPostsJson;
  },
};

@mansona
Copy link
Member

mansona commented Sep 28, 2020

Please accept my apology; I assure you that offence was not my intention. 😔 It is my fault that my excitement and passion translates to arrogance in writing.

Thanks for saying that, it makes a big difference. I appreciate that emotions can run high, especially when you are trying to achieve something and a library isn't providing what you need it to do.

Now let's get onto the rest of your questions.

Well, we've discussed it many times over already: Empress is way too biased. It's not like any other Ember addon, it's more like a prebuilt app which happens to be bundled as an addon.

Yes you're right here, empress-blog is intended to be biased and brought it up as a reference rather than a suggestion for you to use it. Sorry If I have forgotten that we have discussed empree-blog already 🙈 I am starting to get to the point where I am forgetting who I've discussed it with, maybe this is some sort of measure of success? 😂

Without pagination, I receive all posts' frontmatter together with all its content. My goal is to get rid of the content.
With pagination, I also receive all the unwanted content, but I now need multiple network requests to render a ToC.

So we have actually solved this ToC problem for Guidemaker and Field Guide in different ways, but both of them essentially use a custom broccoli-tree to build a separate file that can be requested at run time. I'm working on a more generic solution for all Empress projects but for now ToC building is quite a hard problem unfortunately 😫

Some APIs are designed to be used publicly, but I doubt that a static blog is one of them. And even it were, I still don't see a problem: I'm not asking to deprive all broccoli-static-site-json users from content in their collated responses. I'm only asking to allow opting into removing content from collated responses for those who need it for their specific websites.

again with the "static blog" as an example 😂 while empress-blog is the most "popular" project to use this library I doubt that the aggregate of page views on all empress-blogs would match its usage on the Ember Website. The easiest example of this being used right now is the fact that the team page is build using this sort of "static API" so you can build a quick ember app that directly requests https://emberjs.com/data/team-members/all.json or https://emberjs.com/data/team-members/chris-manson.json as necessary 😉 this may seem like a toy example but there is a plan to integrate this into a new quick-start tutorial at some point in the future.

The reason I bring up these examples is that I really want to hammer home the fact that you can't think of this library is "able to build something for your frontend" but instead it should be thought of as "building a valid static API"

As for "rather than integrating specifically with ember", I don't believe Broccoli has any mention-worthy usage outside the Ember community. It has only 88 dependants on npm, with ember-cli being literally the only one regularly updated. For comparison, webpack has over 20K dependants.

While I would be very happy if someone did use broccoli-static-site-json outside of an Ember context, my first emotion would be an extreme surprise 😂 by suggesting that you think of this as a separate thing to "something that integrates with Ember" is not saying that it should be flexible enough to work outside Ember, but instead I'm trying to re-enforce the point that when working with broccoli-static-site-json you need to have a different mental context (as I have said before). Think of it as building an API, not getting data from Markdown into an Ember app 👍

It's not only about doubling the amount of configuration code, but also about keeping each pair in sync. It's not that bad of course, but I just don't see why broccoli-static-site-json should require that over simply adding a single configuration option.

If keeping each pair in sync is an issue I recommend wrapping this functionality in a little function that returns the merged trees.

Also, using MergeTrees is a lot of trouble:

  • You need to be aware broccoli-merge-trees exists.
  • You need to learn how to use it. Broccoli is quite peculiar.

I'm sympathetic about this issue because I have believed for a long time that the broccoli docs could vastly be improved. I know broccoli dev is currently an expert-level task in the Ember community but that doesn't really need to be the case. I think that with a little hand-holding and a few tutorials this issue could be solved without too much effort. I also loved pretty much everything that @oligriffiths was doing in this area a little while ago 👍

For example, when I tried to figure out why broccoli-static-site-json requires a resource name to be specified twice, I've looked into tests for answers. I've discovered that while the readme suggests using a string as a first argument, tests use an object as the first argument. WTF? Broccoli.

So you're hitting against 2 things here. Yes, the whole "string defaults to a folder tree" thing is a bit strange with broccoli, but the API that you're talking about is a bad decision on my part when I first created broccoli-static-site-json 😞 it is unnecessarily confusing and has caused quite a few issues over the years. I guess I'm only human 🤷 🙃 😂

Debugging Broccoli, in case you messed something up, is a nightmare.

I agree, at least until I discovered https://github.com/stefanpenner/broccoli-stew and the log tree/function. This doesn't solve the problem because:

  • back to the education issue, how do you know about this unless you know about this? 🙈
  • the docs on broccoli-stew aren't great 🙃

You need to install an extra dependency and import it.

while this is technically true, merge-trees is used so much in everything that uses broccoli that it's practically fundamental and adding that dependency will likely not actually add a new entry to your package-lock. It's actually so fundamental that we could probably propose adding it to the broccoli lib by default 😂

no option to any library is ever free

If your argumentation is taken with full seriousness, it would seem that no library is worth writing.

that's a bit of a leap I think 😂 saying something is never free doesn't mean that it automatically costs Infinity and therefore would not be worth writing

I believe it should be a matter of comparing the maintainers' costs to users' costs. Most of what you mentioned is a one-time thing: after the feature is implemented, all the documentation updated and all the issues resolved, there is little to no effort required to maintain it.

Well, this is already not true. The "little mistake" that I described earlier in this comment already cost me an @billybonks quite a lot of time in his effort to modernise this repo by properly using broccoli trees in #32

It's quite a well-documented fact that the majority of the cost of software is in the maintenance, I did a quick search there and found plenty of sources (from the last 20 years) estimating that it's somewhere between 60% and 75%. That doesn't mean that we're going to spend hours working on this one feature if we add it, it's my opinion that adding complexity like this to an already complex library will likely slow us down in the future because we need to keep the implementation working while we refactor or change the library to be more efficient.

And that's not even taking into consideration the fact that if we add this config option I believe that it would allow (or actually encourage) people to create invalid JSON:API static endpoints.

But if it's not implemented, users will have to be paying the cost indefinitely: every time someone cares about the payload size, they need to go learning what broccoli trees are and how they work. Provided they come up with this idea in the first place.

I mentioned in my previous post that we have an alternative idea on how to improve the payload size of the collated document, but I'm very convinced that this is an application concern and not something that the low-level library should be doing.

OK, I'm not wasting any more of your time.

I don't see any of this as a waste of time 🎉 one thing that I know I'm lacking is actually writing down the architectural goals on libraries like this one, being able to respond to questions like this really helps to implicitly document the motivation behind this stuff so I appreciate you sticking with it and

@mansona
Copy link
Member

mansona commented Sep 28, 2020

And as for your code example, I would probably have done something closer to:

const StaticSiteJson = require('broccoli-static-site-json');
const broccoliMergeTrees = require('broccoli-merge-trees');

function buildJsonTree(inputFolder, config) {
  let contentFolder = `api/${inputFolder}`
  const blogPostsJsonFrontmatterOnly = new StaticSiteJson(`content/${inputFolder}`, {
    ...blogPostsConfig,
    contentFolder,
    collate: true,
    contentTypes: [],
  });

  const blogPostsJsonFull = new StaticSiteJson(`content/${inputFolder}`, {
    ...blogPostsConfig,
    contentFolder,
    contentTypes: ['html'],
  });

  return broccoliMergeTrees([blogPostsJsonFrontmatterOnly, blogPostsJsonFull], {
    overwrite: true,
  });
}

module.exports = {
  name: 'content',

  treeForPublic() {
    return buildJsonTree('posts', {
      attributes: ['title', 'snippet', 'description', 'date', 'author', 'layout', 'twitter'],
      type: 'blog-post', // JSONAPI resource name
    })
  },
};

that way the function essentially encapsulates the thing that you're trying to achieve here logically 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants