Skip to content

Commit a970098

Browse files
committed
feat: open source SemHub
0 parents  commit a970098

File tree

358 files changed

+135424
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

358 files changed

+135424
-0
lines changed

.env.example

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
CLOUDFLARE_API_TOKEN=
2+
CLOUDFLARE_ACCOUNT_ID=

.github/workflows/ci.yml

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: Run checks
2+
3+
on:
4+
pull_request:
5+
branches: ["*"]
6+
push:
7+
branches: ["main"]
8+
9+
jobs:
10+
checks:
11+
runs-on: ubuntu-latest
12+
env:
13+
NODE_OPTIONS: "--max-old-space-size=8192"
14+
steps:
15+
- uses: actions/checkout@v4
16+
17+
- uses: oven-sh/setup-bun@v2
18+
with:
19+
bun-version: latest
20+
21+
- name: Install dependencies
22+
run: bun install
23+
24+
- name: Run all checks
25+
run: bun check:all

.gitignore

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# dependencies
2+
node_modules
3+
4+
# sst
5+
.sst
6+
7+
# secrets for loading onto sst
8+
.secrets.*
9+
!.secrets.example
10+
11+
# tmp files
12+
.#*
13+
14+
# env
15+
.env*.local
16+
.env
17+
18+
# opennext
19+
.open-next
20+
21+
# misc
22+
.DS_Store
23+
24+
# tsbuildinfo
25+
*.tsbuildinfo
26+
27+
# built JS files
28+
**/dist/
29+
**/build/
30+
31+
# wrangler
32+
.wrangler
33+
.dev.vars

.prettierignore

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
dist
2+
build
3+
node_modules
4+
routeTree.gen.ts
5+
sst-env.d.ts
6+
schema.docs.graphql.d.ts

.secrets.example

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
DATABASE_URL=
2+
GITHUB_PERSONAL_ACCESS_TOKEN=
3+
OPENAI_API_KEY=
4+
RESEND_API_KEY=
5+
SEMHUB_GITHUB_APP_CLIENT_ID=
6+
SEMHUB_GITHUB_APP_CLIENT_SECRET=

.vscode/settings.json

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"typescript.tsdk": "node_modules/typescript/lib",
3+
"typescript.enablePromptUseWorkspaceTsdk": true,
4+
"typescript.preferences.autoImportFileExcludePatterns": ["lucide-react"]
5+
}

README.md

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# SemHub
2+
3+
## About
4+
5+
We were not satisfied with the default search experience for GitHub issues and wanted to see if a semantic search powered by embeddings would perform better. We are open-sourcing this project to share our attempt with the community. For further details:
6+
7+
- [essay](https://tzx.notion.site/What-I-Learned-Building-a-Free-Semantic-Search-Tool-for-GitHub-and-Why-I-Failed-1a09b742c7918033b318f3a5d7dc9751)
8+
- [HN discussion](https://news.ycombinator.com/item?id=43299659)
9+
- [X thread](https://x.com/zxt_tzx/status/1896731663801131180)
10+
11+
This is an experimental project by [Coder.com](https://coder.com).
12+
13+
## Development
14+
15+
To develop using this repo, make sure you have installed the following:
16+
17+
- [Bun](https://bun.sh/docs/installation)
18+
- [SST](https://sst.dev)
19+
20+
### Monorepo
21+
22+
1. `core/`
23+
24+
This is for any shared code.
25+
26+
2. `workers/`
27+
28+
This is for your Cloudflare Workers and it uses the `core` package as a local
29+
dependency.
30+
31+
3. `scripts/`
32+
33+
This is for any scripts that you can run on your SST app using the
34+
`sst shell` CLI.
35+
36+
4. `wrangler/`
37+
38+
This is for Cloudflare resources that are deployed via `wrangler`. We use this for Cloudflare resources that cannot be deployed via Pulumi/SST. `wrangler` also provides more configurability.
39+
40+
- We use Cloudflare Workflows to orchestrate the sync process. See [the README](./packages/wrangler/README.md) for more details.
41+
42+
The `infra/` directory allows you to logically split the infrastructure of your app into separate files. This can be helpful as your app grows.
43+
44+
### Environment variables
45+
46+
You need the following environment variables (see `.env.example`) and secrets (see `.secrets.example`):
47+
48+
- `CLOUDFLARE_ACCOUNT_ID`: Your Cloudflare [account ID](https://developers.cloudflare.com/fundamentals/setup/find-account-and-zone-ids/). (may not be 100% necessary)
49+
- `CLOUDFLARE_API_TOKEN`: Cloudflare API token to deploy Cloudflare workers and manage DNS.
50+
51+
We currently also use AWS to deploy the frontend, but this is temporary and will be replaced by Cloudflare in the future.
52+
53+
### Secrets
54+
55+
Make a copy of `.secrets.example` and name it `.secrets` and a copy of `.env.example` and name it `.env` and fill in the values above. To load the secrets into SST, run `bun secret:load`.
56+
57+
### Mobile
58+
59+
To test on mobile, use Ngrok to create a tunnel to your local frontend:
60+
61+
```zsh
62+
ngrok http 3001
63+
```
64+
65+
### Auth and cookies on local development
66+
67+
For auth to work on local development, there is a bit of rigmarole because we are running the frontend locally but the API server is on a `.semhub.dev` domain. So in order to set cookies, you need to:
68+
69+
1. Edit your `/etc/hosts` file to add a new entry for `local.semhub.dev` that points to `127.0.0.1`
70+
2. Install and set up mkcert:
71+
72+
```bash
73+
brew install mkcert
74+
mkcert -install
75+
```
76+
77+
3. Generate the local certificates:
78+
79+
```bash
80+
mkcert local.semhub.dev
81+
```
82+
83+
This will create two files: `local.semhub.dev-key.pem` and `local.semhub.dev.pem`
84+
85+
If you look at `vite.config.ts`, you will see that we reference these certificates to provide HTTPS for local development.
86+
87+
### OAuth
88+
89+
We choose to use GitHub App (instead of OAuth App) because of [these reasons](https://docs.github.com/en/apps/oauth-apps/building-oauth-apps/differences-between-github-apps-and-oauth-apps) (more granular control, scale with number of users, etc.). For dev vs prod, we use separate GitHub Apps (the production one is sited within the `coder` organization).
90+
91+
To set up a GitHub App:
92+
93+
- [Register a GitHub App](https://docs.github.com/en/apps/creating-github-apps/registering-a-github-app/registering-a-github-app) (dev one can be within your personal account, the [prod one](https://github.com/organizations/coder/settings/apps/coder-semhub) is within the `coder` organization)
94+
- In terms of permissions:
95+
- Select the following read-only Repository permissions: Metadata (mandatory), Discussions, Issues, Pull Requests, Contents. (These should be tracked in code via `github-app.ts`.)
96+
- Select the following read-only User permissions: Emails (actually would've gotten the user's email from the login process)
97+
- Select the following read-only Organization permissions: Members (to enable SemHub to work for users in the same organization after it has been installed by an admin)
98+
- Leave unchecked the box that says "Request user authorization (OAuth) during installation". Our app handles user login + creation.
99+
- Select redirect on update and use the frontend `/repos` page as the Setup URL
100+
- Local dev: `https://local.semhub.dev:3001/repos`
101+
- Prod: `https://semhub.dev/repos`
102+
- Callback URL is: `https://auth.[stage].stg.semhub.dev/github-login/callback` (see `packages/workers/src/auth/auth.constant.ts`)
103+
- Webhook URL is: `https://api.[stage].stg.semhub.dev/api/webhook/github`. The webhook secret is automatically generated by SST and can be revealed by modifying `outputs` in`infra/Secret.ts`. Installation events are automatically sent to this webhook, no need to subscribe manually. See [here](https://docs.github.com/en/webhooks/webhook-events-and-payloads#installation). Unlike callback URL, there can only be one webhook URL per app.
104+
- Generate and save the private key. NB the default format downloaded from GitHub is PKCS#1, but Octokit uses PKCS#8. You can convert the key using OpenSSL: `openssl pkcs8 -topk8 -inform PEM -in private-key.pem -outform PEM -out private-key-pkcs8.pem -nocrypt`.
105+
- Create a GitHub Client ID and Secret and load it into the `.secrets.dev` file
106+
- Go to Optional features and uncheck "User-to-server token expiration"
107+
108+
Note that when you use a GitHub App on a personal account, the warning message on the authorization page is misleading. See [this thread](https://github.com/orgs/community/discussions/37117).
109+
110+
## Deployment
111+
112+
Right now, deployment is manual. Eventually, will set up GitHub Actions to automate this.
113+
114+
### Deploying to new environment
115+
116+
For a deploying a given change to a new environment:
117+
118+
1. Load secrets. From root folder, run `bun secret:load:<env>`.
119+
1. Run `sst deploy --stage <env>` first to create state in SST. This will fail.
120+
1. Deploy Cloudflare resources. From `/packages/wrangler`, run `bun run deploy:all:<env>`.
121+
1. Run database migrations on prod. From `core` folder, run: `bun db:migrate:<env>`.
122+
1. Deploy SST resources again. This time it should succeed.
123+
124+
Should probably set up a script to do this automatically as part of CI/CD.
125+
126+
## Todos
127+
128+
1. Deal with users who install our GitHub App without creating an account first.
129+
1. Current codebase assumes private/public property of repo is static and membership in org is static. Need to account for change. (Currently, we query membership for when subscription is made. But we should either receive webhook or regularly query to ensure that users that have left org should not have access to private repos.)
130+
1. Need to account for whether `no_issues` repo ever get issues. E.g. just run a daily cron to check?
131+
132+
## Known issues
133+
134+
1. When bulk inserting using Drizzle, make sure that the array in `values()` is not empty. Hence the various checks to either early return if the array is empty or making such insertions conditional. If we accidentally pass an empty array, an error will be thrown, disrupting the control flow. TODO: enforce this by using ESLint?
135+
1. Need some way to deal with error logging. Logging for SST-deployed workers is off by default (can turn it on via console, but it'll be overridden at the next update). At scale, will need to set something up so we will be informed of unknown errors.

0 commit comments

Comments
 (0)