Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA: Jan 0.5.7 Release Sign-off #3818

Closed
imtuyethan opened this issue Oct 16, 2024 · 30 comments
Closed

QA: Jan 0.5.7 Release Sign-off #3818

imtuyethan opened this issue Oct 16, 2024 · 30 comments
Assignees
Labels
type: chore Maintenance, operational
Milestone

Comments

@imtuyethan
Copy link
Contributor

imtuyethan commented Oct 16, 2024

Regression Test Checklist

Get original QA checklist here: https://hackmd.io/@janhq/SJO9kXpJ1x
Release Version: v0.5.7


A. Installation, Update, and Uninstallation

1. Users Install App (New User Flow)

  • Installation package passes all security checks.
  • App launches successfully after installation.

2. Users Update App (Existing User Flow)

  • Validate that the update does not corrupt user data or settings.
  • App restarts or prompts the user to restart after an update.
  • Ensure the /models directory has any JSON/YML files that change according to the update.
  • Updating the app also updates extensions' versions correctly.
  • The bottom bar displays the correct app status when updating.

3. Users Uninstall / Close App

  • After closing the app, confirm that all models are unloaded.
  • Ensure the uninstallation process removes the app from the system completely.
  • Verify that the app recreates all necessary folders (models, extensions) after reopening post-uninstallation.

Testing Script:

1. Users Install App (New User Flow)

a. Installation Package Security Check
	•	Test: Download the installation package from https://github.com/janhq/jan/releases
	•	Before running, check if the package is flagged by any system security features (e.g., macOS Gatekeeper, Windows SmartScreen, antivirus software).
	•	Expected Behavior: The system should not display any warnings, and the installation package should pass all security checks without errors.

b. App Launch
	•	Test: Run the installation process and attempt to launch the app after the installation is complete.
	•	Steps:
	1.	Double-click the installation package and follow the on-screen instructions to install the app.
	2.	After the installation completes, locate the app in the Applications folder (macOS) or Start Menu (Windows).
	3.	Launch the app.
	•	Expected Behavior: The app should launch successfully and display the home screen without any crashes or errors.
	•	Verify: Ensure that all necessary folders (e.g., /models, /extensions) are created during the first launch.

2. Users Update App (Existing User Flow)
a. Validate Data Integrity
	•	Test: Update the app without corrupting user data or settings.
	•	Steps:
	1.	Open the current version of the app and verify that all models and settings are intact.
	2.	Download the latest update and run the update process.
	3.	After the update, reopen the app and check if all models, settings, and extensions are preserved.
	•	Expected Behavior: The app should update without corrupting any user data or settings, and previously saved settings should be intact.
        •.      Test: Ensure the bottom status bar displays the correct update progress/status during the update process.
	•	Steps:
	1.	While running the update, observe the bottom status bar for any messages related to the progress of the update.
	2.	Check for clear communication (e.g., “Update in progress”, “Restart required”) during the update.
	•	Expected Behavior: The bottom bar should display the correct status of the update, and progress should be clearly communicated.

b. App Restart After Update
	•	Test: Confirm that the app restarts or prompts the user to restart after an update.
	•	Steps:
	1.	Once the update is completed, check if the app automatically restarts or prompts the user to manually restart.
	2.	Reopen the app after restarting to ensure the update was applied successfully.
	•	Expected Behavior: The app should either restart on its own or prompt the user to restart, and the update should be applied.

c. Models Directory Update
	•	Test: Ensure that the /models directory has the correct files (e.g., JSON/YML) according to the update.
	•	Steps:
	1.	Navigate to the /models directory in the file system.
	2.	Check if any new files (e.g., model.json, .yml) were added or modified during the update.
	•	Expected Behavior: The directory should reflect any necessary changes, and new models should have updated metadata files.

d. Extension Version Update
	•	Test: Confirm that updating the app also updates the extensions to the correct versions.
	•	Steps:
	1.	Open the Extensions section of the app after the update.
	2.	Check the version numbers of each extension and compare them with the release notes to ensure they are up to date.
	•	Expected Behavior: All core and model provider extensions should be updated to the correct versions.

3. Users Uninstall / Close App

a. Unload Models
	•	Test: Confirm that all models are unloaded when the app is closed.
	•	Steps:
	1.	Launch a model in the app and then close the application.
	2.	Reopen the app and check the model status to ensure that it was unloaded correctly.
	•	Expected Behavior: No models should remain active after the app is closed.

b. Complete Uninstallation
	•	Test: Verify that the uninstallation process removes the app completely from the system.
	•	Steps:
	1.	Use the Uninstall feature in the app or manually uninstall it from the system (e.g., move the app to the trash on macOS or use the Control Panel on Windows).
	2.	After uninstallation, check that the app folder and its related files (e.g., models, logs) have been deleted from the system.
	•	Expected Behavior: The app and all associated files should be removed from the system.

c. App Recreates Necessary Folders
	•	Test: After uninstallation and reinstallation, verify that the app recreates all necessary folders (models, extensions).
	•	Steps:
	1.	Reinstall the app after uninstalling it.
	2.	Check that the necessary directories (e.g., /models, /extensions) are automatically recreated upon relaunch.
	•	Expected Behavior: The app should recreate all required folders during the reinstallation and first launch process.

B. Chat and Thread Functionality

1. Users Can Chat with Jan (Default Assistant)

  • Sending a message allows users to receive responses from the model.
  • Conversation thread is maintained without any loss of data upon sending multiple messages.
  • Users should be able to edit the message, and the assistant will re-generate the answer based on the edited version of the message.
  • Test for the ability to send different types of messages (e.g., text, emojis, code blocks).
  • Check the output format of the AI (code blocks, JSON, markdown, etc.).
  • Validate the scroll functionality in the chat window for lengthy conversations.
  • Users can copy or delete the response.
  • Users can import documents (RAG), and the system should process queries about the uploaded file, providing accurate and appropriate responses in the conversation thread.
  • Users can import an image, and a model with vision (e.g., LLaVA) can generate responses. #294
  • Check the clear message / delete entire chat button functionality.
  • Test the assistant's ability to maintain context over multiple exchanges within one thread.
  • Check the create new chat button and confirm that a new conversation will have an automatically generated thread title based on the user's message.
  • Ensure the app can handle changing models mid-thread without losing the conversation's context.
  • Check the regenerate button to ensure it renews the response (single/multiple times).
  • Check that Instructions are updated correctly after the user updates it midway through the conversation (mid-thread).
  • Users can switch between models if multiple are available.
  • Validate appropriate error handling and messaging if the assistant fails to respond.
    • Simulate conditions where a model might fail to load.
    • Ensure logs and instructions guide users through resolving the issue.
    • Verify the app captures hardware and log info for troubleshooting.

Testing Script:

1. Users Can Chat with Jan (Default Assistant)

a. Sending Messages

	•	Test: Send a basic message like:
Prompt: “What is the capital of France?”
	•	Expected Response: “The capital of France is Paris.”
	•	Verify: Ensure the message is sent, and a response is received from the model without any errors.

b. Conversation Thread

	•	Test: Send multiple messages consecutively:
Prompt 1: “Tell me a joke.”
Prompt 2: “What is the largest planet in the solar system?”
Expected Response: The thread should maintain all previous conversation data without losing any context between messages.
	•	Verify: Ensure there is no loss of data or context as multiple messages are sent.

c. Editing Messages

	•	Test: Send a message and then edit it:
Prompt: “What is 2 + 2?”
	•	Edit: Change the question to “What is 5 + 3?”
	•	Expected Response: The assistant should re-generate the response based on the edited version, answering “5 + 3 is 8.”
	•	Verify: Ensure that the response updates correctly after editing the original prompt.

d. Sending Different Types of Messages

	•	Test: Send various types of messages, including text, emojis, and code blocks:
Prompt:

Here is a Python code:
```python
def hello():
    return "Hello, World!"

- **Expected Response**: The assistant should correctly format and display code blocks.
- **Verify**: Ensure that emojis (e.g., 😊) and code blocks are handled appropriately.



e. Output Format

	•	Test: Ensure the assistant generates responses in different formats:
Prompt: “Generate a JSON object with name, age, and location.”
	•	Expected Response:

{
  "name": "Alice",
  "age": 30,
  "location": "New York"
}


	•	Verify: Ensure the response is structured correctly according to the requested format.

f. Scroll Functionality

	•	Test: Engage in a long conversation or ask for a lengthy response:
Prompt: “Explain in detail how photosynthesis works.”
	•	Expected Response: The answer should be long enough to require scrolling.
	•	Verify: Ensure that scrolling works correctly and all content is visible without truncation.

g. Copy/Delete Response

	•	Test: Copy and delete specific responses in the chat.
	•	Verify: Ensure that responses can be copied and deleted correctly without affecting other messages.

h. Importing Documents (RAG)

	•	Test: Import a document and ask a query related to the content of the document.
Prompt: “Please answer based on this document.”
	•	Expected Response: The system should process the query and provide accurate responses based on the uploaded file’s content.

i. Importing an Image (Vision Model)

	•	Test: Import an image and ask the model (e.g., LLaVA) to generate a response:
Prompt: “Describe what is in this image.”
	•	Expected Response: The model should generate a description of the image.
	•	Verify: Ensure the model with vision capabilities responds accurately based on the uploaded image.

j. Clear/Delete Chat

	•	Test: Use the clear message / delete entire chat options.
	•	Verify: Ensure that these actions are successful and do not affect the model instructions or settings.

k. Context Over Multiple Exchanges

	•	Test: Engage in multiple exchanges within the same thread:
Prompt 1: “Who was Albert Einstein?”
Prompt 2: “What is the theory of relativity?”
	•	Expected Response: The assistant should maintain context between exchanges and refer to previous responses when relevant.

l. Create New Chat

	•	Test: Use the create new chat button and start a fresh conversation.
	•	Prompt: “What is the weather like today?”
	•	Expected Response: The conversation should start as a new thread, and the title should automatically generate based on the user’s message.
	•	Verify: Ensure that the previous conversation context is not carried over into the new chat.

m. Model Switching Mid-Thread

	•	Test: Switch models mid-thread and ask for another response.
Prompt: “Can you write a haiku?”
	•	Expected Response: The app should handle switching models and generate a response from the newly selected model without losing conversation context.

n. Regenerate Response

	•	Test: Use the regenerate button to regenerate the assistant’s response.
Prompt: “Tell me something about space exploration.”
	•	Expected Response: A new response should be generated upon clicking the regenerate button.
	•	Verify: Ensure that the regenerate button works multiple times if needed.

o. Instructions Update

	•	Test: Update the instructions for the assistant midway through a conversation.
Prompt: Update the instruction to “Give concise answers” and ask:
Prompt: “What is quantum computing?”
	•	Expected Response: The response should be concise, reflecting the updated instructions.
	•	Verify: Ensure that the instruction update is applied immediately.

p. Error Handling

	•	Test: Simulate conditions where a model might fail to load (e.g., disconnect network temporarily).
	•	Expected Behavior: Logs and instructions should guide users through the issue.
	•	Verify: Ensure the app captures hardware information and logs for troubleshooting.

2. Model Display & Handling

  • Models in Model Selection should highlight recommended models based on user RAM (this is likely based on a static formula).
  • Models in Model Selection should be grouped correctly.
  • The bottom bar displays the correct status when downloading models.
  • Threads display the correct status for models
    - Start models
    - Model started successfully
    - Generate response
    - Stopping models
    - Models fail to start

Testing Script:

1. Model Selection Highlight for Recommended Models Based on User RAM
	•	Test: Verify that models are highlighted as “Recommended” based on the user’s available RAM.
	•	Steps:
	1.	Open the Model Selection dropdown.
	2.	Look for any models labeled with a warning about insufficient RAM or marked as “Recommended” based on your device’s memory.
	3.	Confirm if the recommendation matches your system specifications.
	•	Expected Behavior: Models that are suitable for your system RAM should be highlighted, and those exceeding the system’s capacity should show warnings (e.g., “Slow on your device”).

2. Thread Status for Models
	•	Test: Ensure that threads display the correct model status, including starting, stopping, or failing to start a model.
	•	Steps:
	1.	Start a conversation with a model in a new thread.
	2.	Change models mid-conversation by selecting a different model from the Model Selection dropdown.
	3.	Test starting, stopping, and switching models to check the status messages.
	•	Expected Behavior: The thread should show the following statuses correctly:
	•	“Starting Model…”
	•	“Model Started Successfully”
	•	“Stopping Model…”
	•	Any error messages for models failing to start or run.

3. Users Can Customize Thread Settings

  • Instructions set by the user are followed by the assistant in subsequent conversations. Changes to instructions are updated in real time and do not require a restart of the application or session.
  • Ability to reset instructions to default or clear them completely.
  • Adjust model parameters (e.g., Temperature, Top K, Top P) from the GUI and verify they are reflected in the chat behavior.
  • Check the maximum and minimum limits of adjustable parameters and how they affect the assistant's responses.
  • Changes can be saved and persisted between sessions.
  • Users can access and modify the model.yml file.
  • Changes made in model.yml are correctly applied to the chat session upon reload or restart.

Testing Script:

a. Temperature

	•	Purpose: Controls the randomness of the model’s responses.
	•	Test:
	•	Set temperature to 0.2 and ask a factual question:
Prompt: “What is the capital of France?”
Expected: “The capital of France is Paris.”
	•	Set temperature to 0.9 and ask a creative prompt:
Prompt: “What is the capital of France?”
Expected: A more varied, creative response. E.g, "The capital of France is actually Paris, not Rome. Rome is the capital of Italy."



b. Top P (Nucleus Sampling)

	•	Purpose: Controls the probability mass of tokens considered for responses.
	•	Test:
	•	Set Top P to 0.95 and ask:
Prompt: “What is artificial intelligence?”
Expected: A detailed and varied response.
	•	Set Top P to 0.5 and ask the same question.
Expected: A more concise, deterministic answer.

c. Stream

	•	Purpose: Enables streaming of tokens as they are generated.
	•	Test:
	•	Turn Stream ON and ask a long-form question:
Prompt: “Explain how solar panels convert sunlight into electricity.”
Expected: The response is streamed token by token.
	•	Turn Stream OFF and ask the same question.
Expected: The full response appears after it’s fully generated.

d. Max Tokens

	•	Purpose: Limits the maximum number of tokens the model can generate.
	•	Test:
	•	Set Max Tokens to 100 and ask a detailed question:
Prompt: “Can you explain the process of photosynthesis?”
Expected: A shorter, more concise response.
	•	Increase Max Tokens to 400 and ask the same question.
Expected: A more detailed, longer explanation.

e. Frequency Penalty

	•	Purpose: Discourages word repetition by applying penalties.
	•	Test:
	•	Set Frequency Penalty to 0 and ask:
Prompt: “Explain why repetition is important in learning.”
Expected: Some repetition may occur.
	•	Set Frequency Penalty to 1 and ask the same question.
Expected: Minimal or no repetition.

f. Presence Penalty

	•	Purpose: Encourages introducing new ideas into the response.
	•	Test:
	•	Set Presence Penalty to 0 and ask:
Prompt: “Describe a typical day in your life.”
Expected: Similar topics may repeat.
	•	Set Presence Penalty to 1 and ask the same question.
Expected: More varied content with fewer repeated ideas.

g. Model Settings (Prompt Template)

	•	Purpose: Customize the structure of the prompt given to the model.
	•	Test:
	•	Modify the Prompt Template with a system message:
Template:
 You are a friendly assistant. 
{prompt}
Prompt: “What is quantum computing?”
Expected: A friendly, concise answer following the system message.

h. Context Length

	•	Purpose: Controls how much conversation context the model retains.
	•	Test:
	•	Set Context Length to 128 and engage in a multi-turn conversation.
Expected: The model loses track of earlier parts of the conversation.
	•	Set Context Length to 4096 and repeat the conversation.
Expected: The model retains and references earlier parts more effectively.

i. GPU Layers

	•	Purpose: Controls how many layers of the model run on the GPU for faster processing.
	•	Test:
	•	Set GPU Layers to 1 and ask a complex question.
Expected: Slower response due to minimal GPU acceleration.
	•	Set GPU Layers to 29 and ask the same question.
Expected: Faster response with higher GPU utilization.

4. Users Can Click on a History Thread

  • Chat window displays the entire conversation from the selected history thread without any missing messages.
  • Historical threads reflect the exact state of the chat at that time, including settings.
  • Ability to delete or clean old threads.
  • Changing the title of the thread updates correctly.

Testing Script

1. Historical Thread Reflects Correct State
	•	Test: Confirm that historical threads reflect the exact state of the chat, including settings such as model choice, parameters, or system settings, as they were during the conversation.
	•	Steps:
	1.	Start a conversation using a specific model (e.g., Mistral 7B).
	2.	Adjust settings like temperature or tokens and save the conversation.
	3.	Open the saved conversation from history and verify if the settings match what was used at the time.
	•	Expected Behavior: The model, parameters, and settings should be exactly as they were when the conversation occurred.

2. Delete or Clean Threads

	•	Test: Use the option to delete or clean old threads from the history list.
	•	Steps:
	1.	Select an old thread from the sidebar.
	2.	Use the delete option (e.g., trash icon) to remove the thread.
	3.	Verify that the thread is removed from both the sidebar and the conversation history.
	•	Expected Behavior: The thread should be permanently deleted, and no residual data should be available after deletion.

3. Change Title of a Thread

	•	Test: Change the title of a conversation thread and ensure the title updates correctly.
	•	Steps:
	1.	Select a thread from the sidebar.
	2.	Use the rename option (if available) to modify the title of the thread.
	3.	Confirm that the new title is updated both in the sidebar and within the chat window.
	•	Expected Behavior: The new title should be reflected correctly, and the change should be applied without affecting the content of the thread.

C. Hub

1. Users Can Discover Recommended Models

  • Each model's recommendations are consistent with the user’s activity and preferences.
  • Search models and verify results/actions on the results.

2. Users Can Download Models Suitable for Their Devices

  • Model list should be in order: Local models > Remote models
  • Ensure that models are labeled with RAM requirements tags are displayed correctly based on the user's hardware.
  • Ensure each model has full information.
  • Check the download model functionality and validate if the cancel download feature works correctly.
  • Click Use button to use the model. Expect to jump to the thread and see the model in the dropdown model selector.

3. Users Can Import Models via a HuggingFace URL

  • Import via Hugging Face ID/full HuggingFace URL, and check the progress bar reflects the download process.
  • Test deep-link import. #2876
  • Users can use/remove the imported model.

Testing Script

	•	Test: Import a model using a Hugging Face URL.
	•	Example URL 1:

https://huggingface.co/hugging-quants/Llama-3.2-3B-Instruct-Q8_0-GGUF


	•	Example URL 2:

https://huggingface.co/city96/FLUX.1-schnell-gguf


	•	Expected Behavior: The model should begin downloading, and the progress bar should reflect the download process.
	•	Verify: Ensure the model appears in the list with the correct size and is marked as Inactive after the download completes.

4. Users Can Import New Models to the Hub

  • Ensure import works successfully via drag/drop or GGUF file upload.
  • Verify that the Move model binary file / Keep Original Files & Symlink options work correctly.
  • Users can add more info to the imported model (e.g., edit name, add information).
  • Ensure the new model updates after restarting the app.

5. Users can Integrate With a Remote Server

  • Users can click Use a remote model that they already set up in Hub

D. System Monitor

1. Users Can See Disk and RAM Utilization

  • Verify that the RAM and VRAM utilization graphs are accurately reported in real time.
  • Validate that the utilization percentages reflect the actual usage compared to the system's total available resources.
  • Ensure that the system monitors update dynamically as the models run and stop.

2. Users Can Start and Stop Models Based on System Health

  • Verify the Start/Stop action for a model, ensuring the system resource usage reflects this change.
  • Confirm that any changes in model status (start/stop) are logged or reported to the user for transparency.
  • Check the functionality of App log to ensure it opens the correct folder in the system file explorer.

Testing Script:

1. Users Can See Disk and RAM Utilization

a. Verify Real-Time RAM and VRAM Utilization

   •	Test: Run a model and observe the system monitor to ensure that RAM and VRAM utilization are accurately reported in real time.
   •	Steps:
   1.	Start a model from the My Models list.
   2.	Check the System Monitor section at the bottom of the app.
   3.	Observe the CPU and Memory usage graphs.
   •	Expected Behavior: The RAM and VRAM utilization should be updated in real-time, reflecting the active model’s resource usage.

b. Validate Utilization Percentages

   •	Test: Ensure the percentages displayed for RAM and VRAM usage reflect the actual usage compared to the system’s total available resources.
   •	Steps:
   1.	Open your system’s resource monitor (e.g., Activity Monitor on macOS or Task Manager on Windows).
   2.	Compare the values displayed in the app’s system monitor with the system’s resource monitor.
   •	Expected Behavior: The percentage of RAM and VRAM used in the app should closely match the usage reported by the system’s built-in resource monitor.

c. Monitor Updates Dynamically

   •	Test: Start and stop a model and verify that the system monitor updates dynamically as the model runs and stops.
   •	Steps:
   1.	Start a model from the My Models list and observe the system monitor.
   2.	Stop the model and check if the resource usage returns to the idle state.
   •	Expected Behavior: The resource graphs (CPU, Memory) should dynamically update when models are started or stopped, accurately reflecting the change in system load.

2. Users Can Start and Stop Models Based on System Health

a. Verify Start/Stop Action for Models

   •	Test: Start and stop a model, ensuring the system monitor reflects this action accurately.
   •	Steps:
   1.	Click Start to activate a model and observe the system monitor for any changes in CPU and Memory usage.
   2.	Click Stop to deactivate the model and check if resource usage drops.
   •	Expected Behavior: Starting the model should increase resource usage, while stopping the model should return the system to idle or lower resource consumption.

b. Log Model Status Changes

   •	Test: Confirm that any changes in the model’s status (start/stop) are logged or reported to the user for transparency.
   •	Steps:
   1.	Start and stop a model while observing the App Log.
   2.	Check if there are entries for when the model starts or stops.
   •	Expected Behavior: The app should log model status changes in the App Log, clearly indicating when a model is started or stopped.

c. App Log Functionality

   •	Test: Use the App Log button to open the log folder and ensure the app logs are accessible.
   •	Steps:
   1.	Click on the App Log button in the System Monitor section.
   2.	Verify that it opens the correct folder in the system file explorer.
   3.	Check the logs for any relevant information on model performance or resource usage.
   •	Expected Behavior: The log folder should open correctly, and the logs should contain entries related to model performance and system health.

E. Local API Server

  • Verify usage of the API Reference (Swagger) for sending/receiving requests.
    • Use default server option
    • Configure and use custom server options
  • Ensure logs are correctly captured.
  • Test the server with different configurations (e.g., model settings).
  • Verify functionality of Open Logs& Clear Logs works normally.
  • Ensure that model-related functions are disabled when the local server is running.

Testing Script

1. Verify Usage of the API Reference (Swagger) for Sending/Receiving Requests

a. Use Default Server Option

	•	Test: Start the local server using the default configuration.
	•	Steps:
	1.	Open the app and navigate to the Local API Server section.
	2.	Click Start Server using the default options (e.g., 127.0.0.1:1337).
	3.	Once the server is running, open the API reference URL in a browser:
URL: http://localhost:1337/static/index.html
	4.	Use the API Reference to send a sample request (e.g., list available models).
	•	Expected Behavior: The server should start successfully, and the API Reference (Swagger) should be accessible in the browser. Requests should be processed, and valid responses should be returned.

b. Configure and Use Custom Server Options

	•	Test: Change the server options and verify the functionality.
	•	Steps:
	1.	Modify the server configuration (e.g., change the port to 1338 or adjust the API prefix).
	2.	Click Start Server and verify the server starts with the new configuration.
	3.	Open the updated API reference in a browser:
URL: http://localhost:1338/static/index.html
	4.	Send a request (e.g., retrieve assistant details) and check if the request is processed correctly.
	•	Expected Behavior: The server should start with the new configuration, and the API reference should still be accessible with the modified URL.

2. Test the Server with Different Configurations

a. Model Settings
	•	Test: Run the server with different model settings to ensure proper log capture.
	•	Steps:
	1.	Start the local server.
	2.	Open the Model section and adjust the inference parameters (e.g., change the temperature or token limit).
	3.	Send a request via the API (e.g., generate a response using the adjusted model).
	•	Expected Behavior: The server should process the request with the updated model parameters, and logs should reflect the model’s behavior with the new settings.

3. Ensure Model-Related Functions Are Disabled When the Local Server Is Running

a. Model Functionality
	•	Test: Check that model-related functions (e.g., starting/stopping models from the UI) are disabled while the local server is running.
	•	Steps:
	1.	Start the local server and navigate to the My Models section.
	2.	Try to start or stop a model.
	•	Expected Behavior: Model-related actions (start/stop) should be disabled, and the user should see a notification or indicator that models cannot be managed while the server is running.

F. Settings

1. My Models

  • Check that downloaded models are shown in My Models page & assigned to provider groups correctly.
  • Each model’s name, version, and size should be displayed accurately.
  • Check the start / stop / delete buttons to confirm they perform the expected actions.
  • Check if starting another model stops the previously running model entirely.
  • Check that when deleting a model, it will delete all the model's files from the user's computer.

Testing Script

1. Test Start/Stop/Delete Buttons

	•	Test: Verify that the Start, Stop, and Delete buttons function as expected for each model.
	•	Steps:
	1.	Click the three-dot menu for any model in the list.
	2.	Use the Start button to load the model and confirm it successfully starts.
	3.	After starting the model, click the Stop button to unload it and ensure the process completes.
	4.	Test the Delete button and confirm that the model is deleted from both the list and your system storage.
	•	Expected Behavior: The model should start, stop, and delete as expected without any errors.

2. Verify Starting Another Model Stops the Previous Model
	•	Test: Ensure that when you start a different model, the previously running model is stopped entirely.
	•	Steps:
	1.	Start a model from the My Models list.
	2.	Once the model is running, select a different model and start it.
	3.	Observe if the first model stops running before the second model starts.
	•	Expected Behavior: Starting a new model should automatically stop the previously running model without errors.

3. Check Deletion of Model Files
	•	Test: Verify that deleting a model removes all associated files from the user’s computer.
	•	Steps:
	1.	Start by deleting a model from the My Models list using the Delete button.
	2.	After deletion, check the local storage directory (e.g., the model folder location) to ensure that all model files have been removed.
	•	Expected Behavior: The model and all related files should be completely removed from the user’s local storage after deletion.

2. Appearance

  • Test the Light, Dark, and System theme settings to ensure they are functioning as expected.
  • Confirm that the application saves the theme preference and persists it across sessions.
  • Validate that all elements of the UI are compatible with the theme changes and maintain legibility and contrast.
  • Desktop app should have shadow.
  • Ensure Spell Check works.

Testing Script

1. Color Theme Selection

	•	Test: Select different color themes from the dropdown (e.g., Joi Light, Dark, etc.).
	•	Expected Behavior: The interface should immediately update to reflect the selected color theme.
	•	Verify: Ensure the theme selection persists across app restarts.

2. Interface Theme (Solid vs Translucent)

	•	Test: Toggle between Solid and Translucent options under the Interface Theme.
	•	Expected Behavior: The background should change accordingly. Solid should display a standard opaque background, while Translucent should make the background semi-transparent.
	•	Verify: Ensure the change is applied across all sections of the interface (e.g., sidebar, chat window).

3. Spell Check

	•	Test: Toggle the Spell Check option ON/OFF.
	•	Expected Behavior: When enabled, spell check should highlight misspelled words in the chat or input fields.
	•	Verify: Misspelled words should appear underlined or highlighted when Spell Check is ON. No highlighting should occur when it’s OFF.

3. Keyboard Shorcuts

  • Test all shortcut keys (Settings > Keyboard Shortcuts) to confirm they function correctly.
  • Ensure that all information about shortcut are displayed correctly.

4. Advanced Settings

Experimental Mode

  • Test the Experimental Mode toggle to ensure it works as expected.
  • When enabled, experimental features should be activated and available across the application:
    • Tools (Retrieval,...)
    • Quick Ask
    • Vulkan (on Window & Linux)

Data Folder Relocation

  • Ensure the Jan Data Folder option opens the correct folder in the system.
  • Ensure relocating the Jan Data Folder works.
  • Ensure the app functions correctly after the data folder has been relocated.

2. Proxy and SSL Certificate

  • Test downloading a model using HTTP Proxy and confirm it works according to guidelines.
  • When Ignore SSL Certificates is enabled, the app should allow self-signed or unverified certificates, which may be required for certain proxies.

3. Logs and Data Management

  • Confirm that logs older than 7 days or exceeding 1MB are cleared upon starting the app.
  • Test that Clear Logs works.
  • Validate the Reset to Factory Settings option works correctly:
    • Retain the current app data location when resetting.
    • Reset the current app data location entirely.

Testing Script

1. Experimental Mode

	•	Test: Toggle the Experimental Mode setting ON/OFF.
	•	Expected Behavior: When enabled, experimental features should be activated and available across the application.
	•	Verify: Ensure that changes are applied immediately without requiring a restart.

2. Jan Data Folder

	•	Test: Click the Jan Data Folder path and attempt to change the folder location for storing messages and user data.
	•	Expected Behavior: A dialog box should open, allowing the user to select a new directory. Once changed, the data should be stored in the newly selected folder.
	•	Verify: Ensure that the new folder location is reflected in the application and persists across restarts.

3. HTTPS Proxy

	•	Test: Enter valid proxy credentials in the HTTPS Proxy field and leave it blank to test both cases.
	•	Expected Behavior: When valid credentials are entered, the app should route network traffic through the specified proxy.
	•	Verify: Ensure that when left blank, the app reverts to the default network configuration.

4. Ignore SSL Certificates

	•	Test: Toggle the Ignore SSL Certificates option ON/OFF.
	•	Expected Behavior: When enabled, the app should allow self-signed or unverified certificates, which may be required for certain proxies.
	•	Verify: Ensure proper handling of certificates based on the setting.

5. Clear Logs

	•	Test: Click the Clear button to delete all logs from the application.
	•	Expected Behavior: All log files should be removed, and the log directory should be empty.
	•	Verify: Reopen the application and confirm that no old logs are present.

6. Reset to Factory Settings

	•	Test: Click the Reset button to restore the app to its initial state.
	•	Expected Behavior: All models, chat history, and custom settings should be erased, and the app should return to its default state.
	•	Verify: Ensure the reset is irreversible and no previous data remains after the reset is complete.

This script covers the main functionalities and settings on the Advanced Settings page, ensuring that features like proxy, data folder relocation, and factory reset work as expected. Let me know if you need any further adjustments!

4. Extensions

1. Install and Enable Extensions

  • Validate the Install Extensions process by selecting and installing a plugin file.
  • Enable/disable extensions and confirm that the UI reflects these changes.

2. Extension Group Management

1. Model Providers

Test each model provider extension to ensure it updates properly and maintains functionality:

  • Anthropic (v1.0.2):

    • Confirm that Anthropic API key works.
  • Cohere (v1.0.0):

    • Confirm that Cohere API key works.
  • Groq (v1.0.1):

    • Confirm that Grog API key works.
  • Martian (v1.0.1):

    • Confirm that Martian API key works.
  • Mistral (v1.0.1):

    • Confirm that Mistral API key works.
  • TensorRT-LLM (v0.0.3):

    • Ensure GPU acceleration is functional and outputs expected performance.
  • NVIDIA NIM (v1.0.1):

    • Confirm that NVIDIA API key works.
  • OpenAI (v1.0.2):

    • Confirm that OpenAI API key works.
  • OpenRouter (v1.0.0):

    • Confirm that OpenRouter API key works.
  • Triton-TRT-LLM (v1.0.0):

    • Confirm that Triton-TRT-LLM API key works.

2. Core Extensions

  • Model Management (v1.0.33):

    • Ensure that Hugging Face Access Token works.
  • System Monitoring (v1.0.10):

    • Ensure that Enable App Logs works.
    • Ensure that Log Cleaning Interval works.
@imtuyethan imtuyethan added the type: bug Something isn't working label Oct 16, 2024
@imtuyethan imtuyethan self-assigned this Oct 16, 2024
@imtuyethan imtuyethan added type: chore Maintenance, operational and removed type: bug Something isn't working labels Oct 16, 2024
@imtuyethan imtuyethan changed the title chore: Jan 0.5.7 Release Sign-off QA: Jan 0.5.7 Release Sign-off Oct 16, 2024
@imtuyethan
Copy link
Contributor Author

@imtuyethan imtuyethan added this to the v0.5.7 milestone Oct 16, 2024
@imtuyethan
Copy link
Contributor Author

It takes super long to download the file, is it expected?

Screenshot 2024-10-16 at 6 26 46 PM

@imtuyethan
Copy link
Contributor Author

Minor UI issue:

Screenshot 2024-10-16 at 7 04 49 PM

@0xSage
Copy link
Contributor

0xSage commented Oct 16, 2024

It takes super long to download the file, is it expected?

Screenshot 2024-10-16 at 6 26 46 PM

This is a Github thing @imtuyethan . not related to the app

@imtuyethan
Copy link
Contributor Author

I have encountered this issue multiple times so far:

  • I tried to run some old/legacy models from previous releases.
  • They usually don't work because of outdated default settings (wrong prompt template...etc.)

How could we handle this better, considering there are users like me who rarely download new models (unless new releases have insane high-quality performance) & keep using what I already have?

Screen.Recording.2024-10-16.at.9.32.11.PM.mov

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

Tested on:
[x] Mac:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Model: Tinyllama Chat 1.1B Q4

Seems like wrong prompt template?

Screenshot 2024-10-16 at 9 43 43 PM Screenshot 2024-10-16 at 9 43 53 PM

With the same prompt, Llama 3.2 1B Instruct Q8 gave me a correct/thorough answer.

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

Tested on:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Suddenly there's a weird line:

Screenshot 2024-10-16 at 10 00 03 PM

While normally it looks like this:

Screenshot 2024-10-16 at 10 00 38 PM

I think i broke the app.

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

Tested on:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Wrong UI of RAG feature (Not urgent, we can fix this later when we improve this feature!):

Screenshot 2024-10-16 at 10 03 46 PM

Correct UI:
https://www.figma.com/design/ytn1nRZ17FUmJHTlhmZB9f/Jan-App-(1st-version)?node-id=783-43738&t=qhUfC7x5HPV38sei-4

(The design link is also super outdated - it is our old UI)

Screenshot 2024-10-16 at 10 07 10 PM

RAG just doesn't work

Screenshot 2024-10-16 at 10 10 09 PM Screenshot 2024-10-16 at 10 10 34 PM

@imtuyethan
Copy link
Contributor Author

Tested on:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Model: LlaVa 7B

Seems like wrong prompt template? And the model didn't generate thread title:

Screenshot 2024-10-16 at 10 25 24 PM Screenshot 2024-10-16 at 10 25 06 PM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

Tested on:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Search in Hub doesn't work properly:

Screenshot 2024-10-16 at 11 54 41 PM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

Tested on:
[x] Mac:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


Grammar issue (for all self-imported models by users):

Screenshot 2024-10-16 at 11 58 36 PM
  • Please change to "Self-imported model by user"
  • The way we define tags is weird.

Cloud models description could be better

These descriptions are not helpful:

Screenshot 2024-10-17 at 12 04 57 AM Screenshot 2024-10-17 at 12 04 32 AM

@imtuyethan
Copy link
Contributor Author

@imtuyethan

This UI needs to be improved:

Screenshot 2024-10-17 at 12 10 20 AM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

@imtuyethan UI improvement:

  • Remove Running Models on top left
  • Change Model to Running Models
Screenshot 2024-10-17 at 12 13 04 AM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

Tested on:
[x] Mac:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


1. Start server should show the correct status

When I click Start Server, it takes a while to start, but the button doesn't change, which makes me feel like it's not working.
Expected flow: Click Start Server > Button says Starting... in disable state > Button changes to Stop Server once server starts successfully

2. Server logs should automatically scroll to the latest line

When i perform any new actions, i need to manually scroll down to see it is recorded in server logs:

Screen.Recording.2024-10-17.at.12.24.58.AM.mov

@imtuyethan
Copy link
Contributor Author

This one should be left aligned:

Screenshot 2024-10-17 at 1 09 15 AM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

When i click quit, the app closes completely, is this expected? Because the latest stable release is not like this.
cc @louis-jan

Screen.Recording.2024-10-17.at.1.14.15.AM.mov

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

Tested on:

Operating System: MacOS Sonoma 14.2
Processor: Apple M2
RAM: 16GB


I tried to move Jan Data Folder to the desktop, but it took around 5mins then showed an error, I failed to relocate it. Then I checked my desktop; there are folders like settings, themes, models, and so on that have been moved.

Screenshot 2024-10-17 at 1 24 36 AM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 16, 2024

When i tried to clear logs, Cortex logs remained, should it be deleted as well?

Edited: These Cortex logs remain because of some old testing i did with Jan x Cortex implementation, issue is solved once i did factory reset.

@imtuyethan imtuyethan assigned imtuyethan and unassigned imtuyethan Oct 17, 2024
@imtuyethan
Copy link
Contributor Author

Step to reproduce:

Screen_Recording_2024-10-17_at_2.20.58_PM.mov

@imtuyethan
Copy link
Contributor Author

Nightly is currently broken

dwdwefwe

@louis-jan
Copy link
Contributor

cc @hiento09 ^

@louis-jan
Copy link
Contributor

@imtuyethan @hiento09 I can update or download without any issues. Might your computer be blocking certain requests?

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 17, 2024

119 (test-windows-app-2)

  • Processor: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 2.40 GHz
  • CPU: Intel Xeon Silver 4214R @ 2.40GHz
  • RAM: 96GB
  • OS: 64-bit operating system
  • Architecture: x64-based processor
  • GPU: AMD Radeon RX 7600 (but Jan does not support)

Gemma 7B Q4 from hub gave weird response

Seems like a prompt template issue?

Screenshot 2024-10-17 at 8 23 11 PM
Screenshot 2024-10-17 at 8 23 21 PM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 17, 2024

119 (test-windows-app-2)

  • Processor: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 2.40 GHz
  • CPU: Intel Xeon Silver 4214R @ 2.40GHz
  • RAM: 96GB
  • OS: 64-bit operating system
  • Architecture: x64-based processor
  • GPU: AMD Radeon RX 7600 (but Jan does not support)

This should be aligned properly
cc @urmauur

Screenshot 2024-10-17 at 8 32 45 PM

Screenshot 2024-10-21 at 12 05 32 PM

@imtuyethan
Copy link
Contributor Author

119 (test-windows-app-2)

  • Processor: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 2.40 GHz
  • CPU: Intel Xeon Silver 4214R @ 2.40GHz
  • RAM: 96GB
  • OS: 64-bit operating system
  • Architecture: x64-based processor
  • GPU: AMD Radeon RX 7600 (but Jan does not support)

  1. I imported a GGUF model https://huggingface.co/erfanzar/LinguaMatic-Coder-INST-1B-GGUF directly from Hub
  2. The response was weird, i took a look at the prompt template, it is the same as the model.json file. However, it is different than the prompt template stated in the HF model card in here: https://huggingface.co/erfanzar/LinguaMatic-Coder-INST-1B-GGUF directly from Hub

Is it a bug? @louis-jan

Screenshot 2024-10-17 at 8 41 03 PM

Screenshot 2024-10-17 at 8 40 24 PM

Screenshot 2024-10-17 at 8 42 17 PM

@imtuyethan
Copy link
Contributor Author

UI issue:
cc @urmauur
Screenshot 2024-10-17 at 8 56 16 PM

@imtuyethan
Copy link
Contributor Author

imtuyethan commented Oct 17, 2024

119 (test-windows-app-2)

  • Processor: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 2.40 GHz
  • CPU: Intel Xeon Silver 4214R @ 2.40GHz
  • RAM: 96GB
  • OS: 64-bit operating system
  • Architecture: x64-based processor
  • GPU: AMD Radeon RX 7600 (but Jan does not support)

It should not show the drop-down if there's no GPU detected (meaning the field should be disable):
cc @urmauur

Screenshot 2024-10-17 at 9 52 09 PM

@urmauur
Copy link
Member

urmauur commented Oct 18, 2024

@imtuyethan @louis-jan All UI feedback based on Ashley’s comments with the rocket emoji has been fixed in this PR #3833

@imtuyethan
Copy link
Contributor Author

Closing this. Some issues are fixed in the new build, the remaining ones I have shifted them to separate tickets for the next release.

@dan-homebrew
Copy link
Contributor

Thank you and great job @imtuyethan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: chore Maintenance, operational
Projects
Archived in project
Development

No branches or pull requests

5 participants