SemHub

About

We were not satisfied with the default search experience for GitHub issues and wanted to see if a semantic search powered by embeddings would perform better. We are open-sourcing this project to share our attempt with the community. For further details:

This is an experimental project by Coder.com.

Development

To develop using this repo, make sure you have installed the following:

Bun
SST

Monorepo

core/

This is for any shared code.
workers/

This is for your Cloudflare Workers and it uses the core package as a local dependency.
scripts/

This is for any scripts that you can run on your SST app using the sst shell CLI.
wrangler/

This is for Cloudflare resources that are deployed via wrangler. We use this for Cloudflare resources that cannot be deployed via Pulumi/SST. wrangler also provides more configurability.
- We use Cloudflare Workflows to orchestrate the sync process. See the README for more details.

The infra/ directory allows you to logically split the infrastructure of your app into separate files. This can be helpful as your app grows.

Environment variables

You need the following environment variables (see .env.example) and secrets (see .secrets.example):

CLOUDFLARE_ACCOUNT_ID: Your Cloudflare account ID. (may not be 100% necessary)
CLOUDFLARE_API_TOKEN: Cloudflare API token to deploy Cloudflare workers and manage DNS.

We currently also use AWS to deploy the frontend, but this is temporary and will be replaced by Cloudflare in the future.

Secrets

Make a copy of .secrets.example and name it .secrets and a copy of .env.example and name it .env and fill in the values above. To load the secrets into SST, run bun secret:load.

Mobile

To test on mobile, use Ngrok to create a tunnel to your local frontend:

ngrok http 3001

Auth and cookies on local development

For auth to work on local development, there is a bit of rigmarole because we are running the frontend locally but the API server is on a .semhub.dev domain. So in order to set cookies, you need to:

Edit your /etc/hosts file to add a new entry for local.semhub.dev that points to 127.0.0.1
Install and set up mkcert:
```
brew install mkcert
mkcert -install
```
Generate the local certificates:
```
mkcert local.semhub.dev
```
This will create two files: local.semhub.dev-key.pem and local.semhub.dev.pem

If you look at vite.config.ts, you will see that we reference these certificates to provide HTTPS for local development.

OAuth

We choose to use GitHub App (instead of OAuth App) because of these reasons (more granular control, scale with number of users, etc.). For dev vs prod, we use separate GitHub Apps (the production one is sited within the coder organization).

To set up a GitHub App:

Register a GitHub App (dev one can be within your personal account, the prod one is within the coder organization)
- In terms of permissions:
  - Select the following read-only Repository permissions: Metadata (mandatory), Discussions, Issues, Pull Requests, Contents. (These should be tracked in code via github-app.ts.)
  - Select the following read-only User permissions: Emails (actually would've gotten the user's email from the login process)
  - Select the following read-only Organization permissions: Members (to enable SemHub to work for users in the same organization after it has been installed by an admin)
- Leave unchecked the box that says "Request user authorization (OAuth) during installation". Our app handles user login + creation.
- Select redirect on update and use the frontend /repos page as the Setup URL
  - Local dev: https://local.semhub.dev:3001/repos
  - Prod: https://semhub.dev/repos
- Callback URL is: https://auth.[stage].stg.semhub.dev/github-login/callback (see packages/workers/src/auth/auth.constant.ts)
- Webhook URL is: https://api.[stage].stg.semhub.dev/api/webhook/github. The webhook secret is automatically generated by SST and can be revealed by modifying outputs ininfra/Secret.ts. Installation events are automatically sent to this webhook, no need to subscribe manually. See here. Unlike callback URL, there can only be one webhook URL per app.
Generate and save the private key. NB the default format downloaded from GitHub is PKCS#1, but Octokit uses PKCS#8. You can convert the key using OpenSSL: openssl pkcs8 -topk8 -inform PEM -in private-key.pem -outform PEM -out private-key-pkcs8.pem -nocrypt.
Create a GitHub Client ID and Secret and load it into the .secrets.dev file
Go to Optional features and uncheck "User-to-server token expiration"

Note that when you use a GitHub App on a personal account, the warning message on the authorization page is misleading. See this thread.

Deployment

Right now, deployment is manual. Eventually, will set up GitHub Actions to automate this.

Deploying to new environment

For a deploying a given change to a new environment:

Load secrets. From root folder, run bun secret:load:<env>.
Run sst deploy --stage <env> first to create state in SST. This will fail.
Deploy Cloudflare resources. From /packages/wrangler, run bun run deploy:all:<env>.
Run database migrations on prod. From core folder, run: bun db:migrate:<env>.
Deploy SST resources again. This time it should succeed.

Should probably set up a script to do this automatically as part of CI/CD.

Todos

Deal with users who install our GitHub App without creating an account first.
Current codebase assumes private/public property of repo is static and membership in org is static. Need to account for change. (Currently, we query membership for when subscription is made. But we should either receive webhook or regularly query to ensure that users that have left org should not have access to private repos.)
Need to account for whether no_issues repo ever get issues. E.g. just run a daily cron to check?

Known issues

When bulk inserting using Drizzle, make sure that the array in values() is not empty. Hence the various checks to either early return if the array is empty or making such insertions conditional. If we accidentally pass an empty array, an error will be thrown, disrupting the control flow. TODO: enforce this by using ESLint?
Need some way to deal with error logging. Logging for SST-deployed workers is off by default (can turn it on via console, but it'll be overridden at the next update). At scale, will need to set something up so we will be informed of unknown errors.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
.vscode		.vscode
infra		infra
packages		packages
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.secrets.example		.secrets.example
README.md		README.md
bun.lock		bun.lock
eslint.config.mjs		eslint.config.mjs
package.json		package.json
prettier.config.js		prettier.config.js
sst-env.d.ts		sst-env.d.ts
sst.config.ts		sst.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SemHub

About

Development

Monorepo

Environment variables

Secrets

Mobile

Auth and cookies on local development

OAuth

Deployment

Deploying to new environment

Todos

Known issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

coder/semhub

Folders and files

Latest commit

History

Repository files navigation

SemHub

About

Development

Monorepo

Environment variables

Secrets

Mobile

Auth and cookies on local development

OAuth

Deployment

Deploying to new environment

Todos

Known issues

About

Resources

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages