MSTSearch is a comprehensive search aggregation platform that crawls multiple search engines, processes and ranks the results based on various metrics, and leverages AI to provide insightful summaries and answers to user queries. Built with a Python backend and a Vue.js frontend, MSTSearch offers a seamless and efficient search experience.
- Features
- Tech Stack
- Architecture
- Installation
- Usage
- Future Plans
- Project Structure
- Contributing
- License
- Contact
- Multi-Engine Crawling: Scrapes search results from Baidu, Sohu, and other search engines.
- Result Processing: Cleans and normalizes search results for consistency.
- Scoring Mechanisms: Utilizes BM25, TF-IDF, and Word2Vec for ranking search results.
- AI-Powered Summarization: Uses AI models to summarize and answer user questions based on search data.
- Responsive Frontend: Built with Vue.js, offering a user-friendly interface for searching and viewing results.
- Settings Management: Allows users to add or remove search engines dynamically.
- Caching & Rate Limiting: Ensures efficient performance and protects against abuse.
- Python 3.8+
- Flask: Web framework for API endpoints.
- Selenium: Automates browser interactions for crawling.
- BeautifulSoup: Parses HTML content.
- Gensim: Implements Word2Vec for semantic analysis.
- Scikit-learn: Provides TF-IDF vectorizer and cosine similarity metrics.
- Rank BM25: Implements BM25 ranking algorithm.
- Concurrent Futures: Handles parallel processing.
- Flask-Limiter: Implements rate limiting.
- Flask-Caching: Caches responses for improved performance.
- Vue.js 3: JavaScript framework for building user interfaces.
- Vuex: State management pattern + library for Vue.js.
- Tailwind CSS: Utility-first CSS framework for styling.
- Axios: HTTP client for API requests.
- ChromeDriver: Automates Chrome browser for scraping.
- Webdriver Manager: Manages browser driver binaries.
MSTSearch follows a client-server architecture where the frontend communicates with the backend via RESTful APIs. The backend handles search crawling, result processing, scoring, and AI-driven summarization. The frontend provides an intuitive interface for users to perform searches, view results, and interact with AI summaries.
-
Clone the Repository
git clone https://github.com/yourusername/MSTSearch.git cd MSTSearch/backend
-
Create a Virtual Environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Setup ChromeDriver
The backend uses Selenium for crawling, which requires ChromeDriver.
-
Automatic Installation:
Ensure
webdriver_manager
is included inrequirements.txt
. TheBaiduCrawler.py
andSohuCrawler.py
scripts handle driver installation automatically. -
Manual Installation:
Download ChromeDriver from here and place it in the
./driver
directory.
-
-
Configuration
-
Environment Variables:
Create a
.env
file in thebackend
directory and add necessary environment variables like API keys.AI_API_KEY=your_api_key_here
-
-
Run the Backend Server
python app.py
The backend server will start on
http://127.0.0.1:5000
.
-
Navigate to Frontend Directory
cd ../frontend
-
Install Dependencies
npm install
-
Run the Frontend Server
npm run serve
The frontend application will start on
http://localhost:8080
.
-
Access the Application
Open your browser and navigate to
http://localhost:8080
. -
Perform a Search
- Enter your search query in the search bar.
- Click the Search button.
- View aggregated and ranked search results from multiple search engines.
-
View Summary
- After performing a search, input a question related to the search results.
- The AI will provide a summarized answer based on the top-ranked results.
-
Manage Search Engines
- Navigate to the Settings page.
- Add or remove search engines by providing their URLs.
- The system dynamically adjusts to include the specified search engines in future searches.
MSTSearch is continually evolving to enhance user experience and functionality. Upcoming features include:
-
Cookie Management:
- Purpose: To maintain session states and handle authentication where necessary.
- Benefits: Improved crawling efficiency, reduced likelihood of being blocked by search engines, and enhanced ability to access personalized or restricted content.
- Implementation: Integrate cookie handling mechanisms within the crawlers to store and reuse cookies during crawling sessions.
-
History-Based Sorting:
- Purpose: To personalize search result rankings based on user interaction history.
- Benefits: Provides users with more relevant and tailored search results, enhancing the overall search experience.
- Implementation:
- Data Collection: Track and store user interactions, such as clicked links and time spent on result pages.
- Algorithm Development: Develop algorithms that analyze historical data to influence the ranking of current search results.
- Integration: Modify the ranking system to incorporate history-based metrics alongside existing scoring mechanisms like BM25 and TF-IDF.
-
Enhanced AI Summarization:
- Purpose: To provide more accurate and context-aware summaries and answers.
- Benefits: Offers users clearer and more concise information derived from aggregated search results.
- Implementation: Explore and integrate advanced AI models and fine-tune existing models for better performance.
-
User Authentication and Profiles:
- Purpose: To allow users to create accounts and manage their preferences.
- Benefits: Enables personalized experiences, such as saving search history and customizing settings.
- Implementation: Implement authentication systems and profile management features in both backend and frontend.
-
Mobile Optimization:
- Purpose: To ensure seamless access and usability on mobile devices.
- Benefits: Expands accessibility and provides users with flexibility to use MSTSearch on the go.
- Implementation: Optimize the frontend design for responsive layouts and improve performance on mobile platforms.
-
API Enhancements:
- Purpose: To provide more robust and flexible API endpoints for integration with other services.
- Benefits: Facilitates broader usage scenarios and allows third-party integrations.
- Implementation: Develop additional API endpoints and comprehensive documentation for developers.
MSTSearch/ βββ backend/ β βββ app.py β βββ BaiduCrawler.py β βββ SohuCrawler.py β βββ crawler.py β βββ sort.py β βββ summarize.py β βββ process_result.py β βββ requirements.txt β βββ driver/ βββ frontend/ β βββ src/ β β βββ views/ β β β βββ SearchPage.vue β β β βββ ResultPage.vue β β βββ store/ β β β βββ index.ts β β βββ components/ β βββ public/ β βββ package.json β βββ tailwind.config.js βββ README.md βββ LICENSE
- backend/: Contains all backend-related code, including crawlers, sorting mechanisms, and AI summarization.
- frontend/: Contains the Vue.js frontend application.
- driver/: Stores browser driver binaries like ChromeDriver.
- requirements.txt: Lists Python dependencies.
- package.json: Lists frontend dependencies.
We are committed to continuously enhancing MSTSearch to provide a more personalized and efficient search experience. Our upcoming features include:
- Advanced Session Handling: Implement cookie management to maintain user sessions across different browsing activities.
- Personalized Search Results: Utilize stored cookies to tailor search results based on user preferences and past interactions.
- Enhanced Privacy Controls: Allow users to manage cookie settings, ensuring their privacy is respected while still offering personalized experiences.
- Search History Integration: Incorporate user search history to prioritize and rank search results that align with previously expressed interests.
- Dynamic Ranking Algorithms: Develop algorithms that adapt the ranking of search results based on the evolution of user behavior over time.
- User Feedback Loops: Enable users to provide feedback on search results, allowing the system to learn and improve its sorting mechanisms continuously.
- Scalability Enhancements: Optimize the backend to handle larger volumes of search queries and results more efficiently.
- UI/UX Refinements: Continuously improve the frontend interface based on user feedback to ensure an intuitive and seamless experience.
- Integration of Additional AI Models: Expand the range of AI models supported for more diverse and accurate summarizations and answers.
These features aim to make MSTSearch not only a powerful search aggregation tool but also a personalized assistant that evolves with your needs.
Contributions are welcome! Please follow these steps:
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/YourFeature
-
Commit Your Changes
git commit -m "Add some feature"
-
Push to the Branch
git push origin feature/YourFeature
-
Open a Pull Request
This project is licensed under the MIT License. You are free to use, modify, and distribute this software as per the license terms.
For any inquiries or feedback, please contact [email protected]/cn.
Made with β€οΈ by Your Name