diff --git a/llm-inference-on-edge.md b/llm-inference-on-edge.md index cb2392b283..22cc3f0915 100644 --- a/llm-inference-on-edge.md +++ b/llm-inference-on-edge.md @@ -264,7 +264,7 @@ Let's run the app on our emulator/simulator as we showed before so we can start ### State Management -We will start by deleting everyting from the `App.tsx` file, and creating an empty code structure like the following : +We will start by deleting everything from the `App.tsx` file, and creating an empty code structure like the following :
App.tsx @@ -995,7 +995,7 @@ The conversation page should now look like this: Congratulations you've just built the core functionality of your first AI chatbot, the code is available [here](https://github.com/MekkCyber/EdgeLLM/blob/main/EdgeLLMBasic/App.tsx) ! You can now start adding more features to the app to make it more user friendly and efficient. -### The other Functionnalities +### The other Functionalities The app is now fully functional, you can download a model, select a GGUF file, and chat with the model, but the user experience is not the best. In the [`EdgeLLMPlus`](https://github.com/MekkCyber/EdgeLLM/tree/main/EdgeLLMPlus) repo, I've added some other features, like on the fly generation, automatic scrolling, the inference speed tracking, the thought process of the model like deepseek-qwen-1.5B,... we will not go into details here as it will make the blog too long, we will go through some of the ideas and how to implement them but the whole code is available in the repo @@ -1003,7 +1003,7 @@ The app is now fully functional, you can download a model, select a GGUF file, a The app generates responses incrementally, producing one token at a time rather than delivering the entire response in a single batch. This approach enhances the user experience, allowing users to begin reading the response as it is being formed. We achieve this by utilizing a `callback` function within `context.completion`, which is triggered after each token is generated, enabling us to update the `conversation` state accordingly. #### Auto Scrolling -Auto Scrolling ensures that the latest messages or tokens are always visible to the user by automatically scrolling the chat view to the bottom as new content is added. To implement that we need we use a reference to the `ScrollView` to allow programatic control over the scroll position, and we use the `scrollToEnd` method to scroll to the bottom of the `ScrollView` when a new message is added to the `conversation` state. We also define a `autoScrollEnabled` state variable that will be set to false when the user scrolls up more than 100px from the bottom of the `ScrollView`. +Auto Scrolling ensures that the latest messages or tokens are always visible to the user by automatically scrolling the chat view to the bottom as new content is added. To implement that we need we use a reference to the `ScrollView` to allow programatic control over the scroll position, and we use the `scrollToEnd` method to scroll to the bottom of the `ScrollView` when a new message is added to the `conversation` state. We also define an `autoScrollEnabled` state variable that will be set to false when the user scrolls up more than 100px from the bottom of the `ScrollView`. #### Inference Speed Tracking Inference Speed Tracking is a feature that tracks the time taken to generate each token and displays under each message generated by the model. This feature is easy to implement because the `CompletionResult` object returned by the `context.completion` function contains a `timings` property which is a dictionary containing many metrics about the inference process. We can use the `predicted_per_second` metric to track the speed of the model. @@ -1107,4 +1107,4 @@ You now have a working React Native app that can: This implementation serves as a foundation for building more sophisticated AI-powered mobile applications. Remember to consider device capabilities when selecting models and tuning parameters. -Happy coding! 🚀 \ No newline at end of file +Happy coding! 🚀