TaxSmart Receipt Analyzer is an innovative Node.js/Express.js application that simplifies expense management for small business owners and self-employed individuals. By leveraging advanced AI technologies, TaxSmart streamlines the process of categorizing business expenses according to IRS Schedule C categories.
Our application combines the power of Optical Character Recognition (OCR) with state-of-the-art Language Models to automate receipt analysis and categorization. TaxSmart is designed to make optimal use of configurable AI services, providing efficient and accurate expense categorization.
- Automates the tedious process of expense categorization
- Utilizes advanced OCR for accurate receipt text extraction
- Leverages configurable Language Models for intelligent expense analysis
- Generates context-aware subject lines for each expense
- Adaptable to different AI services, with default setup for high-speed inference
- Secure data storage using MongoDB
TaxSmart Receipt Analyzer leverages cutting-edge AI technologies to provide an advanced expense management experience:
- Intelligent OCR: Utilizes Tesseract.js, an advanced OCR engine, to accurately extract text from receipt images.
- AI-Driven Categorization: Employs a configurable Language Model (LLM), set up for high-speed inference through Groq, to categorize expenses according to IRS Schedule C categories.
- Smart Subject Line Generation: Uses natural language processing capabilities of the configured LLM to automatically create concise, context-aware subject lines for each expense.
- Adaptable AI Integration: The project is designed to work with various LLMs, with default configuration for Groq's high-performance inference.
- Image Upload: User uploads a receipt image through the web interface.
- OCR Processing: Tesseract.js extracts text from the image using advanced optical character recognition.
- Text Preparation: The extracted text is processed and formatted for optimal AI analysis.
- AI Analysis: The prepared text is sent to the configured Language Model (default setup uses Groq for high-speed inference) for analysis.
- Smart Categorization: Based on the AI analysis, the expense is categorized according to IRS Schedule C guidelines.
- Subject Line Generation: The AI generates a concise, relevant subject line for the expense.
- User Verification: Results, including the category and subject line, are presented for user review and adjustment if necessary.
- Data Storage: Confirmed data is securely stored in MongoDB for future reference and reporting.
- Clone the repository:
git clone https://github.com/bstephens2002/TaxSmart-Receipt-Analyzer.git
- Navigate to the project directory:
cd TaxSmart-Receipt-Analyzer
- Install dependencies:
npm install
- Set up Tesseract.js (see Tesseract Setup section below)
- Set up MongoDB (see MongoDB Setup section below)
- Configure environment variables (see Configuration section below)
- Start the server:
npm start
- Open your browser and navigate to
http://localhost:3000
- Upload a receipt image and watch the magic happen!
To set up Tesseract.js for OCR functionality:
- Download the English trained data file:
- Go to https://github.com/tesseract-ocr/tessdata
- Download the
eng.traineddata
file
- Place the downloaded file:
- Create a
tessdata
directory in your project root if it doesn't exist - Move
eng.traineddata
into thetessdata
directory
- Create a
- Install MongoDB on your system if you haven't already. Follow the official MongoDB installation guide for your operating system.
- Start the MongoDB service on your machine.
Note: The application will automatically create the necessary database and collections when it first runs, so you don't need to manually create them.
Create a .env
file in the project root and add the following variables:
PORT=3000
DATABASE_URL=mongodb://localhost/taxsmart
SESSION_SECRET=your_generated_secret_here
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=mixtral-8x7b-32768
Replace the MONGODB_URI
with your actual MongoDB connection string and AI_SERVICE_API_KEY
with your AI service API key.
- Anomaly detection to flag unusual expenses
- Predictive analytics for expense forecasting
- Multi-language receipt support
- Integration with popular accounting software
- Node.js
- Express.js
- MongoDB
- Tesseract.js for OCR
- Configurable Language Model (default setup for Groq)
- React.js for the frontend (if applicable)
We welcome contributions to TaxSmart Receipt Analyzer! If you have suggestions for improvements or encounter any issues, please feel free to open an issue or submit a pull request.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Brad Stephens - [email protected]
Project Link: https://github.com/bstephens2002/TaxSmart-Receipt-Analyzer
TaxSmart Receipt Analyzer: Empowering your business with AI-driven expense management.