Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] implement a github proxy to avoid rate limits #905

Closed
louis030195 opened this issue Dec 10, 2024 · 7 comments · Fixed by #1071
Closed

[feature] implement a github proxy to avoid rate limits #905

louis030195 opened this issue Dec 10, 2024 · 7 comments · Fixed by #1071
Labels

Comments

@louis030195
Copy link
Collaborator

when downloading stuff from screenpipe's store i regularly get rate limited

we could store a recent version of repo in an api and expose it

@louis030195 louis030195 added the enhancement New feature or request label Dec 10, 2024
@louis030195
Copy link
Collaborator Author

Let me help design a solution to avoid GitHub API rate limits when downloading pipes. Here's a proposed approach using Supabase storage and a periodic sync mechanism:

  1. First, let's modify the download function to try the proxy first, then fallback to direct GitHub:
// Add near imports
import { createClient } from '@supabase/supabase-js';

// Add this helper function
async function downloadFromProxy(url: string): Promise<Response | null> {
  try {
    // Extract repo info from GitHub URL
    const match = url.match(/github\.com\/([^\/]+)\/([^\/]+)/);
    if (!match) return null;
    
    const [_, owner, repo] = match;
    
    // Try to fetch from our proxy first
    const proxyUrl = `https://screenpipe-proxy.supabase.co/storage/v1/pipes/${owner}/${repo}/latest.zip`;
    
    const response = await fetch(proxyUrl);
    if (!response.ok) return null;
    
    return response;
  } catch (error) {
    console.error("proxy fetch failed:", error);
    return null;
  }
}

// Update the handleDownloadPipe function
const handleDownloadPipe = async (url: string) => {
  try {
    posthog.capture("download_pipe", { pipe_id: url });

    // Show progress toast
    const t = toast({
      title: "downloading pipe",
      description: (
        <div className="space-y-2">
          <Progress value={0} className="h-1" />
          <p className="text-xs">starting download...</p>
        </div>
      ),
      duration: 100000,
    });

    // Try proxy first
    const proxyResponse = await downloadFromProxy(url);
    
    // If proxy fails, fall back to direct GitHub
    const response = await fetch("http://localhost:3030/pipes/download", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ 
        url,
        proxyData: proxyResponse ? await proxyResponse.arrayBuffer() : null 
      }),
    });

    // Rest of the existing code...
  } catch (error) {
    console.error("Failed to download pipe:", error);
    toast({
      title: "error downloading pipe",
      description: "please try again or check the logs for more information.",
      variant: "destructive",
    });
  }
};
  1. Then modify the Rust code to handle the proxy data:
// Add to the download_github_folder function
async fn download_github_folder(
    url: &Url,
    dest_dir: &Path,
    proxy_data: Option<Vec<u8>>,
) -> Pin<Box<dyn Future<Output = anyhow::Result<()>> + Send>> {
    let url = url.clone();
    let dest_dir = dest_dir.to_path_buf();

    Box::pin(async move {
        // If we have proxy data, extract it
        if let Some(zip_data) = proxy_data {
            return extract_zip_data(zip_data, &dest_dir).await;
        }

        // Existing GitHub API logic as fallback
        let client = Client::new();
        // ... rest of the existing code ...
    })
}

// Add this helper function
async fn extract_zip_data(data: Vec<u8>, dest_dir: &Path) -> anyhow::Result<()> {
    use zip::ZipArchive;
    use std::io::Cursor;

    let reader = Cursor::new(data);
    let mut archive = ZipArchive::new(reader)?;

    for i in 0..archive.len() {
        let mut file = archive.by_index(i)?;
        let outpath = dest_dir.join(file.name());

        if file.name().ends_with('/') {
            tokio::fs::create_dir_all(&outpath).await?;
        } else {
            if let Some(p) = outpath.parent() {
                tokio::fs::create_dir_all(p).await?;
            }
            let mut outfile = tokio::fs::File::create(&outpath).await?;
            tokio::io::copy(&mut file, &mut outfile).await?;
        }
    }

    Ok(())
}
  1. Create a GitHub Action to sync repositories to Supabase storage:
# .github/workflows/sync-pipes.yml
name: Sync Pipes to Storage

on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours
  workflow_dispatch:  # Manual trigger

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Setup Node.js
        uses: actions/setup-node@v2
        with:
          node-version: '18'

      - name: Install dependencies
        run: npm install @supabase/supabase-js

      - name: Sync pipes
        run: node scripts/sync-pipes.js
        env:
          SUPABASE_URL: ${{ secrets.SUPABASE_URL }}
          SUPABASE_KEY: ${{ secrets.SUPABASE_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  1. Create the sync script:
// scripts/sync-pipes.js
const { createClient } = require('@supabase/supabase-js');
const { execSync } = require('child_process');
const fs = require('fs');
const path = require('path');

const supabase = createClient(
  process.env.SUPABASE_URL,
  process.env.SUPABASE_KEY
);

// List of pipes to sync
const PIPES = [
  { owner: 'mediar-ai', repo: 'screenpipe', path: 'pipes/data-table' },
  // Add other pipes here
];

async function syncPipe(pipe) {
  const { owner, repo, path: pipePath } = pipe;
  
  // Create temp dir
  const tempDir = `/tmp/${owner}-${repo}`;
  execSync(`rm -rf ${tempDir}`);
  execSync(`mkdir -p ${tempDir}`);

  // Clone repo
  execSync(`git clone https://x-access-token:${process.env.GITHUB_TOKEN}@github.com/${owner}/${repo}.git ${tempDir}`);

  // Create zip
  const zipPath = `${tempDir}/pipe.zip`;
  execSync(`cd ${tempDir}/${pipePath} && zip -r ${zipPath} .`);

  // Upload to Supabase
  const { error } = await supabase.storage
    .from('pipes')
    .upload(`${owner}/${repo}/latest.zip`, fs.readFileSync(zipPath));

  if (error) throw error;
}

async function main() {
  for (const pipe of PIPES) {
    try {
      await syncPipe(pipe);
      console.log(`Synced ${pipe.owner}/${pipe.repo}`);
    } catch (error) {
      console.error(`Failed to sync ${pipe.owner}/${pipe.repo}:`, error);
    }
  }
}

main();

This solution:

  1. Creates a proxy layer using Supabase Storage to store pre-downloaded pipe files
  2. Updates every 6 hours via GitHub Actions
  3. Tries the proxy first, falls back to direct GitHub API if needed
  4. Uses zip files for efficient transfer
  5. Maintains the same security model (only approved pipes are synced)

Benefits:

  • Avoids GitHub API rate limits
  • Faster downloads (single zip vs multiple API calls)
  • Lower bandwidth costs
  • Better reliability

Let me know if you'd like me to explain any part in more detail or make adjustments to the implementation!

@louis030195
Copy link
Collaborator Author

dont like this solution

we should probably just be able to say pipes.rs to use another URL as download api - like env var

simple

one file change + just copy pasting the code in object storage

@louis030195
Copy link
Collaborator Author

/bounty 120

suggest idea first - i can give you access to supabase, cloudflare, gcp, aws, azure whatever

Copy link

algora-pbc bot commented Dec 18, 2024

💎 $120 bounty • Screenpi.pe

Steps to solve:

  1. Start working: Comment /attempt #905 with your implementation plan
  2. Submit work: Create a pull request including /claim #905 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @neo773 #1071

@b4s36t4
Copy link
Contributor

b4s36t4 commented Dec 19, 2024

Hi, @louis030195. Here's my solution to the problem.

  • Use Libgit which doesn't require any git dependency in system. This works with local git repo as well (fetch/etc).
  • fetch the repo using git fetch and compare the current (main) branch commit with the origin main branch commit, if diff found merge (or pull).

This way we don't have to put much stuff on api or ratelimits etc (as long as it's open and github doesn't put restrictions).

Please let me know if this solution works or help in any way.

Copy link

algora-pbc bot commented Dec 30, 2024

💡 @neo773 submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Copy link

algora-pbc bot commented Dec 31, 2024

🎉🎈 @neo773 has been awarded $120! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants