Semantic Similarity Scorer

A FastAPI server that compares the semantic similarity between two strings using a sentence-transformers model.

Features

Semantic similarity comparison between text strings
Uses sentence-transformers model (all-MiniLM-L6-v2 by default)
Automatically utilizes GPU if available
Simple REST API with JSON request/response
Model caching to avoid repeated downloads on server restart

Installation

Clone this repository:

git clone https://github.com/yourusername/similarity-scorer.git
cd similarity-scorer

Install the required packages:

pip install -r requirements.txt

Usage

Option 1: Run directly with Python

Development Mode

uvicorn main:app --reload

Production Mode with Gunicorn

./start.sh

You can customize settings by setting environment variables:

MODEL_NAME=all-mpnet-base-v2 WORKERS=4 ./start.sh

The server will run at: http://127.0.0.1:16000

Interactive API documentation is available at: http://127.0.0.1:16000/docs

Option 2: Run with Docker

docker build -t similarity-scorer .
docker run -p 16000:16000 similarity-scorer

Option 3: Run with Docker Compose

docker-compose up

To run in detached mode:

docker-compose up -d

To stop the service:

docker-compose down

Server Management

Starting the Server

To start the server in production mode:

./start.sh

For memory-constrained environments:

./start_optimized.sh

Stopping the Server

To stop any running server instances:

./terminate.sh

If regular termination doesn't work, use the force termination script (may require sudo):

sudo ./force_terminate.sh

These scripts will:

Identify all running processes related to the similarity scorer
Attempt to terminate them gracefully (regular script) or forcefully (force script)
Report the status after termination attempt

API Endpoints

POST /compare

Compares the semantic similarity between two sentences.

Request Body:

{
  "sentence1": "How do I bake a cake?",
  "sentence2": "What is the process for making a cake?"
}

Response:

{
  "sentence1": "How do I bake a cake?",
  "sentence2": "What is the process for making a cake?",
  "semantic_similarity": 0.87
}

Customization

Changing the Model

You can change the model in two ways:

By setting the MODEL_NAME environment variable:

# When running with Python
MODEL_NAME=all-mpnet-base-v2 uvicorn main:app --reload

# When running with Docker
docker run -p 16000:16000 -e MODEL_NAME=all-mpnet-base-v2 similarity-scorer

# Or update the environment variable in docker-compose.yml
# and then run docker-compose up

By directly editing the default in main.py:

model_name = os.environ.get("MODEL_NAME", "all-mpnet-base-v2")

Note that more accurate models like "all-mpnet-base-v2" require more computational resources but provide better similarity results.

Performance Considerations

Model Caching

The application is configured to cache downloaded models in the models/ directory. This means:

The model will only be downloaded once, even if you restart the server multiple times
Subsequent server startups will be much faster
When using Docker, the model cache is stored in a named volume for persistence

This is especially important for larger models like "all-mpnet-base-v2" which can be several hundred MB in size.

Shared Model Process

To optimize memory usage, the application uses a dedicated model service that runs in a separate process:

Only one copy of the model is loaded in memory regardless of how many Gunicorn workers are running
All worker processes communicate with the model service via IPC (inter-process communication)
This significantly reduces memory usage when running with multiple workers

This architecture is particularly beneficial for large models that would otherwise consume several GB of RAM if loaded separately in each worker process.

Memory Optimization

This application is designed for memory efficiency, especially in environments with limited resources:

Single Model Instance: The model is loaded only once in a dedicated process and shared across all workers
Conservative Worker Count: The Gunicorn configuration uses fewer workers than typical to reduce memory usage
Memory Monitoring: The application logs memory usage at various points to track resource utilization
Configurable Worker Count: You can set the WORKERS environment variable to further limit workers
Worker Lifecycle Management: Workers are restarted after handling a certain number of requests to prevent memory leaks

Optimized Startup

For environments with very limited memory, use the optimized startup script:

./start_optimized.sh

This script sets environment variables to optimize memory usage and provides memory usage tracking.

GPU Support

The application automatically detects and uses available GPU resources:

macOS GPU Support (Apple Silicon/AMD)

On macOS, the application uses:

Apple's Metal Performance Shaders (MPS) backend on Apple Silicon (M1/M2/M3) Macs
AMD GPUs on Intel Macs with compatible graphics cards

To check if your Mac is using GPU acceleration:

Start the server
Visit http://127.0.0.1:16000/system-info to see device information
Look for "mps_available": true and "current_device": "mps" in the response

NVIDIA GPU Support (Linux/Windows)

On systems with NVIDIA GPUs, CUDA will be used automatically.

If running in Docker, make sure to include the GPU runtime configuration as specified in the docker-compose.yml file.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
equivalence_model_service.py		equivalence_model_service.py
main.py		main.py
model_service.py		model_service.py
nohup.out		nohup.out
requirements.txt		requirements.txt
start.sh		start.sh
terminate.sh		terminate.sh
test_api.py		test_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Similarity Scorer

Features

Installation

Usage

Option 1: Run directly with Python

Development Mode

Production Mode with Gunicorn

Option 2: Run with Docker

Option 3: Run with Docker Compose

Server Management

Starting the Server

Stopping the Server

API Endpoints

POST /compare

Customization

Changing the Model

Performance Considerations

Model Caching

Shared Model Process

Memory Optimization

Optimized Startup

GPU Support

macOS GPU Support (Apple Silicon/AMD)

NVIDIA GPU Support (Linux/Windows)

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Penify-dev/penify-similarity-scorer

Folders and files

Latest commit

History

Repository files navigation

Semantic Similarity Scorer

Features

Installation

Usage

Option 1: Run directly with Python

Development Mode

Production Mode with Gunicorn

Option 2: Run with Docker

Option 3: Run with Docker Compose

Server Management

Starting the Server

Stopping the Server

API Endpoints

POST /compare

Customization

Changing the Model

Performance Considerations

Model Caching

Shared Model Process

Memory Optimization

Optimized Startup

GPU Support

macOS GPU Support (Apple Silicon/AMD)

NVIDIA GPU Support (Linux/Windows)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages