Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Leetcode Problems to Dataset #5

Open
Eli4479 opened this issue Jan 31, 2025 · 0 comments
Open

Add Leetcode Problems to Dataset #5

Eli4479 opened this issue Jan 31, 2025 · 0 comments

Comments

@Eli4479
Copy link

Eli4479 commented Jan 31, 2025

Description

Currently, the project supports problems from sources like vjudge and AtCoder, but it does not include Leetcode problems. Adding Leetcode problems to the dataset will expand the available problem set and improve the semantic search functionality.

Proposed Changes:

  • Scrape or acquire Leetcode problem data: Either through scraping the website or using unofficial APIs to gather problem details.
  • Format problems into JSON: Each Leetcode problem should follow the same structure as the existing problems (e.g., problem ID, title, statement, tags, etc.).
  • Add problems to the problems/ folder: Ensure that problems are added with appropriate naming conventions (e.g., problems/leetcode_100.json).
  • Update the system to process Leetcode problems: Run the existing processing pipeline (build_summary, build_embedding, build_locale) for Leetcode problems.
  • Include Leetcode as a supported data source: Modify the codebase to recognize Leetcode as a valid problem source and allow users to search for Leetcode problems.

Benefits:

  • Expands the problem dataset.
  • Provides users with more problems to search through, improving the utility of the semantic search engine.

Steps to Reproduce

  • Scrape Leetcode problem data.
  • Format problems into the existing JSON structure.
  • Run the processing pipeline to generate summaries, embeddings, and detect languages.
  • Perform a search and verify that Leetcode problems are included in search results.

Additional Information

  • You can find Leetcode problems at Leetcode.
  • The project may require adjustments to accommodate the new data source for Leetcode.

Related Issues

  • Link to any related issues or discussions, if available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant