Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lit]: Add build time mangler for lit-html templates #27703

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

fallaciousreasoning
Copy link
Contributor

@fallaciousreasoning fallaciousreasoning commented Feb 18, 2025

Resolves brave/brave-browser#44051

Note: Needs Privacy/Security review
https://github.com/brave/reviews/issues/1872

Background

In Polymer, the HTML is compiled to a HTMLTemplate element which we're free to modify to our hearts desire. Lit makes use of lit-html to generate HTML from tagged template literals, which works super well if you don't need to modify someone else's template.

In brave-core we have a large number of existing HTML overrides for Polymer elements (especially in chrome://settings). Upstream is gradually migrating pages from Polymer to Lit, so we need a strategy to override these pages.

Unfortunately it is non-trivial to override the tagged template literals. They exist roughly in the following form:

interface TemplateResult {
    strings: string[] // array of the literal strings in the template
    values: unknown[] // array of the values to interpolate between the strings. There will be one less of these
}

and modifying these will be both non-trivial and extremely prone to breakage

Rough Approach

In this PR, I've introduced a lit-mangler for mangling the templates at build time. It works roughly like this:

  1. Parse the Typescript syntax tree for upstream's .html.ts file
  2. Extract all the html tagged template literals into a tree (as literal can exist inside literals for ternaries/maps/children ect).
  3. Create a mangler script for the upstream template
  4. Select one of these literals (i.e. by contains id="my-id")
  5. Load the HTML of the template into a DocumentFragment
  6. Let the mangler script modify the fragment using DOM methods (this should make migrating our Polymer templates straightforward).
  7. Reserialize the DocumentFragments to the original script and write them to the preprocess directory

Alternatives Considered

See #27661 for an alternative runtime approach. I think the solution in this PR is much more robust and should also be more performant, as it won't thrash Lit's template cache every render.

Demonstration

As part of this PR, I've rewritten an existing override which broke when upstream migrated their element from Polymer to Lit. I can remove that from this PR, if it makes things easier.

Rough Example:

// in brave/chromium_src/chrome/browser/resources/extensions/detail_view.mangle.html.ts
import { mangle } from 'lit_mangler'

// Note: This will error if we can't find the template to mangle
mangle((element) => {
    const section = element.querySelector('.section-content')
    if (!section) {
        throw new Error('[Brave Extensions Overrides] Could not find .section-content. Has the ID changed?')
    }
    section.textContent = '$i18n{privateInfoWarning}'
    section.append('<span ?hidden=${!this.data.incognitoAccess.isActive}> $i18n{spanningInfoWarning}</span>')
    section.append('<span> $i18n{privateAndTorInfoWarning}</span>')
}, t => t.text.includes('id="allow-incognito"'))

which replaces the old, broken override brave/chromium_src/chrome/browser/resources/extensions/detail_view.ts

import {RegisterPolymerTemplateModifications} from 'chrome://resources/brave/polymer_overriding.js'
import {loadTimeData} from 'chrome://resources/js/load_time_data.js';

RegisterPolymerTemplateModifications({
  'extensions-detail-view': (templateContent: DocumentFragment) => {
    let optionsTemplate: HTMLTemplateElement | null =
      templateContent.querySelector<HTMLTemplateElement>('template[is="dom-if"][if*="shouldShowOptionsSection_"]')
    if (!optionsTemplate) {
      console.error('[Brave Extensions Overrides] Could not find optionsTemplate')
      return
    }
    let incognitoTemplate =
      optionsTemplate.content.querySelector<HTMLTemplateElement>('template[is="dom-if"][if*="shouldShowIncognitoOption_"]')
    if (!incognitoTemplate) {
      console.error('[Brave Extensions Overrides] Could not find incognitoTemplate')
      return
    }
    let incognitoWarningDiv = incognitoTemplate.content.querySelector<HTMLDivElement>('.section-content')
    if (!incognitoWarningDiv) {
      console.error('[Brave Extensions Overrides] Could not find incognitoWarningDiv')
      return
    }
    incognitoWarningDiv.innerText = loadTimeData.getString('privateInfoWarning')
    const spanningWarningSpan = document.createElement('span')
    spanningWarningSpan.setAttribute('class', 'section-content')
    spanningWarningSpan.setAttribute('hidden', '[[data.isSplitMode]]')
    spanningWarningSpan.innerText =
        ' ' + loadTimeData.getString('spanningInfoWarning')
    const privateAndTorWarningSpan = document.createElement('span')
    privateAndTorWarningSpan.setAttribute('class', 'section-content')
    privateAndTorWarningSpan.innerText =
        ' ' + loadTimeData.getString('privateAndTorInfoWarning')
    incognitoWarningDiv.appendChild(spanningWarningSpan)
    incognitoWarningDiv.appendChild(privateAndTorWarningSpan)
  }
})

export * from './detail_view-chromium.js'

Submitter Checklist:

  • I confirm that no security/privacy review is needed and no other type of reviews are needed, or that I have requested them
  • There is a ticket for my issue
  • Used Github auto-closing keywords in the PR description above
  • Wrote a good PR/commit description
  • Squashed any review feedback or "fixup" commits before merge, so that history is a record of what happened in the repo, not your PR
  • Added appropriate labels (QA/Yes or QA/No; release-notes/include or release-notes/exclude; OS/...) to the associated issue
  • Checked the PR locally:
    • npm run test -- brave_browser_tests, npm run test -- brave_unit_tests wiki
    • npm run presubmit wiki, npm run gn_check, npm run tslint
  • Ran git rebase master (if needed)

Reviewer Checklist:

  • A security review is not needed, or a link to one is included in the PR description
  • New files have MPL-2.0 license header
  • Adequate test coverage exists to prevent regressions
  • Major classes, functions and non-trivial code blocks are well-commented
  • Changes in component dependencies are properly reflected in gn
  • Code follows the style guide
  • Test plan is specified in PR before merging

After-merge Checklist:

Test Plan:

@github-actions github-actions bot added the CI/storybook-url Deploy storybook and provide a unique URL for each build label Feb 18, 2025

# Note: We read from and write to the preprocess file - this way any
# preprocessing that upstream does will be mangled.
subprocess.run(['npx', 'tsx', '--tsconfig', '../../brave/tsconfig.json', '../../brave/build/commands/scripts/lit_mangler.ts', 'mangle', '-m', mangler_file, '-i', preprocess_file, '-o', preprocess_file])
Copy link
Contributor Author

@fallaciousreasoning fallaciousreasoning Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @thypon this seems like it'd be a good candidate for a semgrep rule - you shouldn't be able to execute npx without the --no flag to prevent installing the dependency

@brave-builds
Copy link
Collaborator

A Storybook has been deployed to preview UI for the latest push

@github-actions github-actions bot added the CI/run-audit-deps Check for known npm/cargo vulnerabilities (audit_deps) label Feb 18, 2025
@fallaciousreasoning fallaciousreasoning marked this pull request as ready for review February 18, 2025 02:18
@fallaciousreasoning fallaciousreasoning requested review from a team as code owners February 18, 2025 02:18
const world = new JSDOM()
const document = world.window.document
const element = document.createElement('template')
element.innerHTML = template.text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thypon can you review this please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is server side, and build time, so looks fine.
It is also not "executed" as a site, since in JSDOM.

Are we sure we want to use JSDOM here and not a chromium-headless to interpret the DOM?
These two DOMs are not equivalent.

@diracdeltas diracdeltas requested a review from thypon February 18, 2025 04:13
@goodov
Copy link
Member

goodov commented Feb 18, 2025

Okay, the approach is interesting, but we can improve things here:

  1. The placement of the script is a bit questionable. This is not a npm run ... command we usually put into build/commands. In my opinion the mangler should go into //brave/tools/chromium_src/lit_mangler.

  2. Consider splitting the mangler into a library and cli files to remove the awkward process.argv check.

  3. (most important) You wired things to run this TS tool without any type checking during all (including Release) browser builds. This is not cool, we need to have type checking in some form here. I see few possible options:

    • Figure a way to run tsc --noEmit on the tool sources on CI.
    • Figure a way to compile .ts into .js during the build and call .js version directly. It may be possible to have this as a GN build dependency to make sure .js is generated before any preprocess_if_expr.py tries to use it.
  4. npx should be avoided during the build as it's a pretty slow thing. Pls use something like brave_node.py to execute the required .js file directly.

Copy link
Contributor

Chromium major version is behind target branch (133.0.6943.98 vs 134.0.6998.15). Please rebase.

@github-actions github-actions bot added the chromium-version-mismatch The Chromium version on the PR branch does not match the version on the target branch label Feb 18, 2025
Copy link
Member

@petemill petemill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's exciting to think about doing this at compile time instead of run time. For modifying the lit component itself, we'd still use a combination of src_overrides and lit_overriding.ts - though perhaps the styles could be injected using this technique instead?

Using tsx seems ok because we don't force a bundled node version, so it just uses local node script same as our brave_node.py or npm run ... does.

Comment on lines 47 to 63
def run_mangler(mangler_file, preprocess_file):
"""Runs the mangler on the given file"""
import subprocess

ts_config = os.path.join(get_src_dir(), 'brave', 'tsconfig.json')
lit_mangler = os.path.join(get_src_dir(), 'brave', 'build', 'commands',
'scripts', 'lit_mangler.ts')

# Note: We read from and write to the preprocess file - this way any
# preprocessing that upstream does will be mangled.
subprocess.run([
'npx', '--no', '--', 'tsx', '--tsconfig', ts_config, lit_mangler,
'mangle', '-m', mangler_file, '-i', preprocess_file, '-o',
preprocess_file
],
check=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like injecting the mangle script run after preprocess is the simplest and most universal place. And if we want to type check we could run tsc here. Even though it's going to run separately for each file, it should be fast?
But it seems like we could do this at a higher level as part of the build_webui GN template after the preprocess stage, since all lit .html.ts files will run through that target. Not sure if doing it in GN will help with any issues that come up in review.
Also, if we're having any performance issues perhaps we could run against all the files in the directory or whatever list preprocess_if_expr has. But you'd need to turn the module into a class or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure what the best solution here is - at least with this, it will only be rerun when the target the manglers exist in change, whereas if this was part of build_webui it would have to run whenever any of the manglers changed, which I'm not sure would be better.

Additionally, it would make ordering quite difficult as this needs to run after the Chromium files are preprocessed (so it can run over the preprocessed files and doesn't get overwritten) but before the Typescript build happens.


mangler(element.content)

template.text = element.innerHTML
Copy link
Member

@thypon thypon Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not look right. Why do we need to de-escape escaped entities?
Shouldn't be possible to already get the unescaped elements anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because there's an encode which happens when parsing to HTML and we want things to roundtrip nicely:

// the string
"<div>${"<foo&"}</div>"

// is parsed to
<div>${&quot;&lt;foo&amp;&quot}</div>

// which should roundtrip back to the string
"<div>${"<foo&"}</div>"

This is because its valid to have html which is really just a string inside the interpolated content (in the real world it gets replaced with the result of parsing to JS). However, we want the raw string from inside the interpolation so we can edit it.

@github-actions github-actions bot removed the chromium-version-mismatch The Chromium version on the PR branch does not match the version on the target branch label Feb 20, 2025
Copy link
Contributor

[puLL-Merge] - brave/brave-core@27703

Description

This PR introduces a new system for modifying Chromium's web components using "lit-html" templates. It replaces the old Polymer-based overrides with a build-time mangling approach that modifies the templates during compilation. The main motivation appears to be modernizing the override system to work with lit-html templates instead of Polymer.

Changes

Changes

  1. .eslintrc.js:
  • Disables the no-template-curly-in-string rule
  1. PRESUBMIT.py:
  • Adds TypeScript build files compilation check
  • Updates ESLint configuration to skip lit mangler files
  1. build/commands/lib/util.js:
  • Adds support for .mangle.html.ts files that modify upstream .html.ts files
  1. New files:
  • lit_mangler.ts: Core mangling logic to modify lit-html templates
  • lit_mangler_cli.ts: CLI interface for the mangler
  • lit_mangler.test.ts: Test suite for the mangler
  • tsconfig-mangle.json: TypeScript configuration for mangling
  • detail_view.mangle.html.ts: Example of using the new mangling system
  1. Removed files:
  • detail_view.ts: Old Polymer-based override implementation
sequenceDiagram
    participant Build as Build Process
    participant Mangler as Lit Mangler
    participant DOM as JSDOM
    participant TS as TypeScript Compiler
    
    Build->>Mangler: Load template file
    Mangler->>Mangler: Parse lit-html templates
    Mangler->>DOM: Create virtual DOM
    Mangler->>DOM: Apply template modifications
    DOM->>Mangler: Return modified HTML
    Mangler->>TS: Type check modifications
    TS->>Mangler: Validation result
    Mangler->>Build: Write modified template
Loading

Security Hotspots

  1. Template Injection: The mangler directly manipulates HTML templates and performs string replacements. Special attention should be paid to ensure proper escaping of user input and template literals.

  2. Build Script Execution: Running TypeScript files during build with tsx could potentially execute malicious code if third-party files are compromised.

Possible Issues

  1. Performance Impact: The mangling process adds additional build steps and requires parsing/modifying templates at build time.

  2. Maintenance: The system relies on parsing specific HTML structures - if Chromium changes its template formats, the mangler may need updates.

  3. Debugging Complexity: Build-time modifications may make it harder to debug template issues since the source code differs from what's actually running.

.replaceAll('"', '&quot;')
.replaceAll('<', '&lt;')
.replaceAll('>', '&gt;')
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] User controlled data in methods like innerHTML, outerHTML or document.write is an anti-pattern that can lead to XSS vulnerabilities

Source: https://semgrep.dev/r/javascript.browser.security.insecure-document-method.insecure-document-method


Cc @thypon @kdenhartog

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is at build time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might still introduce issues, since we might demangle a safety mangled element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be safe because & is the first thing we encode so all HTML encoded entities should be converted and &amp; is the last thing we decode.

&lt;script&gt;alert('hahaha')&lt;script&gt; ==> &amp;lt;script&amp;gt;alert('haha')&amp;lt;script&amp;gt; ==> &lt;script&gt;alert('hahaha')&lt;script&gt;

// in the upstream file:
html`<div>Hi from Jay &amp; Andrea</div>`

// mangled
<div>Hi from Jay &amp;amp; Andrea</div>`
// we replaced all `&`s with &amp;

// output
html`<div>Hi from Jay &amp; Andrea</div>`
// we replaced all `&amp;`s with `&`

// `foo="${bar ? &quot;one&quot; : &quot;two&quot;}"`
.replaceAll(/\$\{.*?\}/gi, (...args) => {
return args[0]
.replaceAll('&', '&amp;') // Note: &amp; needs to be first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected a call to replaceAll() in an attempt to HTML escape the string args[0]. Manually sanitizing input through a manually built list can be circumvented in many situations, and it's better to use a well known sanitization library such as sanitize-html or DOMPurify.

Source: https://semgrep.dev/r/javascript.audit.detect-replaceall-sanitization.detect-replaceall-sanitization


Cc @thypon @kdenhartog

.replaceAll(/\$\{.*?\}/gi, (...args) => {
return args[0]
.replaceAll('&', '&amp;') // Note: &amp; needs to be first
.replaceAll('"', '&quot;')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected a call to replaceAll() in an attempt to HTML escape the string args[0]<br/> .replaceAll('&', '&amp;'). Manually sanitizing input through a manually built list can be circumvented in many situations, and it's better to use a well known sanitization library such as sanitize-html or DOMPurify.

Source: https://semgrep.dev/r/javascript.audit.detect-replaceall-sanitization.detect-replaceall-sanitization


Cc @thypon @kdenhartog

return args[0]
.replaceAll('&', '&amp;') // Note: &amp; needs to be first
.replaceAll('"', '&quot;')
.replaceAll('<', '&lt;')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected a call to replaceAll() in an attempt to HTML escape the string args[0]<br/> .replaceAll('&', '&amp;') // Note: &amp; needs to be first<br/> .replaceAll('"', '&quot;'). Manually sanitizing input through a manually built list can be circumvented in many situations, and it's better to use a well known sanitization library such as sanitize-html or DOMPurify.

Source: https://semgrep.dev/r/javascript.audit.detect-replaceall-sanitization.detect-replaceall-sanitization


Cc @thypon @kdenhartog

.replaceAll('&', '&amp;') // Note: &amp; needs to be first
.replaceAll('"', '&quot;')
.replaceAll('<', '&lt;')
.replaceAll('>', '&gt;')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected a call to replaceAll() in an attempt to HTML escape the string args[0]<br/> .replaceAll('&', '&amp;') // Note: &amp; needs to be first<br/> .replaceAll('"', '&quot;')<br/> .replaceAll('<', '&lt;'). Manually sanitizing input through a manually built list can be circumvented in many situations, and it's better to use a well known sanitization library such as sanitize-html or DOMPurify.

Source: https://semgrep.dev/r/javascript.audit.detect-replaceall-sanitization.detect-replaceall-sanitization


Cc @thypon @kdenhartog

@@ -44,17 +44,42 @@ def maybe_keep_upstream_version(override_in_folder, out_folder, override_file):
return None


def run_mangler(mangler_file, preprocess_file):
"""Runs the mangler on the given file"""
import subprocess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Consider possible security implications associated with subprocess module.


Source: https://semgrep.dev/r/gitlab.bandit.B404


Cc @thypon @kdenhartog


// Write the tsconfig to a temporary file.
const tempDir = os.tmpdir()
const tempTsConfigPath = path.join(tempDir, `${path.basename(file)}-tsconfig.json`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected possible user input going into a path.join or path.resolve function. This could possibly lead to a path traversal vulnerability, where the attacker can access arbitrary files stored in the file system. Instead, be sure to sanitize or validate user input first.

Source: https://semgrep.dev/r/javascript.lang.security.audit.path-traversal.path-join-resolve-traversal.path-join-resolve-traversal


Cc @thypon @kdenhartog

@fallaciousreasoning
Copy link
Contributor Author

  1. The placement of the script is a bit questionable. This is not a npm run ... command we usually put into build/commands. In my opinion the mangler should go into //brave/tools/chromium_src/lit_mangler.

That's a much better location. Moved it!

  1. Consider splitting the mangler into a library and cli files to remove the awkward process.argv check.

I actually did this originally but moved them into one 😆 Split them again.

  1. (most important) You wired things to run this TS tool without any type checking during all (including Release) browser builds. This is not cool, we need to have type checking in some form here. I see few possible options

I've gone with a two pronged approach:

  1. Run tsc --noEmit on each mangler when we run it. This unfortunately involves creating a temporary tsconfig file because tsc doesn't support path maps as command line args.
  2. Run tsc -p tsconfig-mangle.json as part of presubmit (which ensures lit_mangler_cli is checked).
  1. npx should be avoided during the build as it's a pretty slow thing. Pls use something like brave_node.py to execute the required .js file directly.

Calling via brave_node.py now 😄

@thypon
Copy link
Member

thypon commented Feb 20, 2025

@fallaciousreasoning the demangling looks still somewhat problematic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/run-audit-deps Check for known npm/cargo vulnerabilities (audit_deps) CI/storybook-url Deploy storybook and provide a unique URL for each build needs-security-review puLL-Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Lit]: It should be possible to override/modify upstreams lit-html templates
7 participants