From 445d23f81eb1e872764e2dc3b1cb9aa56ae55ce9 Mon Sep 17 00:00:00 2001 From: Chitoku YATO <cyato@nvidia.com> Date: Wed, 29 Nov 2023 08:59:26 -0800 Subject: [PATCH] Have a proper tutorial introduction --- docs/tutorial-intro.md | 63 +++++++++++++++++++++++++++++++++++++++++- mkdocs.yml | 4 +-- 2 files changed, 64 insertions(+), 3 deletions(-) diff --git a/docs/tutorial-intro.md b/docs/tutorial-intro.md index 36931a20..8e176b18 100644 --- a/docs/tutorial-intro.md +++ b/docs/tutorial-intro.md @@ -1,4 +1,65 @@ -# Tutorial - Intro +# Tutorial - Introduction + +## Overview + +Our tutorials are divided into categories roughly based on model modality, the type of data to be processed or generated. + + +### Text (LLM) + +| | | +| :---------- | :----------------------------------- | +| **[text-generation-webui](./tutorial_text-generation.md)** | Interact with a local AI assistant by running a LLM with oobabooga's text-generaton-webui | +| **[llamaspeak](./tutorial_llamaspeak.md)** | Talk live with Llama using Riva ASR/TTS, and chat about images with Llava! | + +### Text + Vision (VLM) + +Give your locally running LLM an access to vision! + +| | | +| :---------- | :----------------------------------- | +| **[Mini-GPT4](./tutorial_minigpt4.md)** | [Mini-GPT4](https://minigpt-4.github.io/), an open-source model that demonstrate vision-language capabilities.| +| **[LLaVA](./tutorial_llava.md)** | [Large Language and Vision Assistant](https://llava-vl.github.io/), multimodal model that combines a vision encoder and Vicuna LLM for general-purpose visual and language understanding. | + +### Image Generation + +| | | +| :---------- | :----------------------------------- | +| **[Stable Diffusion](./tutorial_stable-diffusion.md)** | Run AUTOMATIC1111's [`stable-diffusion-webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to generate images from prompts | +| **[Stable Diffusion XL](./tutorial_stable-diffusion-xl.md)** | A newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities.| + +### Vision Transformers (ViT) + +| | | +| :---------- | :----------------------------------- | +| **[EfficientVIT](./tutorial_efficientvit.md)** | MIT Han Lab's [EfficientViT](https://github.com/mit-han-lab/efficientvit), Multi-Scale Linear Attention for High-Resolution Dense Prediction | +| **[NanoSAM](./tutorial_nanosam.md)** | [NanoSAM](https://github.com/NVIDIA-AI-IOT/nanosam), SAM model variant capable of running in real-time on Jetson | +| **[NanoOWL](./tutorial_nanoowl.md)** | [OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit) optimized to run real-time on Jetson with NVIDIA TensorRT | +| **[SAM](./tutorial_sam.md)** | Meta's [SAM](https://github.com/facebookresearch/segment-anything), Segment Anything model | +| **[TAM](./tutorial_tam.md)** | [TAM](https://github.com/gaomingqi/Track-Anything), Track-Anything model, is an interactive tool for video object tracking and segmentation | + +### Vector Database + +| | | +| :---------- | :----------------------------------- | +| **[NanoDB](./tutorial_nanodb.md)** | Interactive demo to witness the impact of Vector Database that handles multimodal data | + + +### Audio + +| | | +| :---------- | :----------------------------------- | +| **[AudioCraft](./tutorial_audiocraft.md)** | Meta's [AudioCraft](https://github.com/facebookresearch/audiocraft), to produce high-quality audio and music | +| **[Whisper](./tutorial_whisper.md)** | OpenAI's [Whisper](https://github.com/openai/whisper), pre-trained model for automatic speech recognition (ASR) | + +## Tips + +| | | +| :---------- | :----------------------------------- | +| Knowledge Distillation | | +| SSD + Docker | | +| Memory optimization | | + ## About NVIDIA Jetson diff --git a/mkdocs.yml b/mkdocs.yml index 33f7b48c..fb529a87 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -66,7 +66,7 @@ extra_css: nav: - Home: index.md - Tutorials: - - About NVIDIA Jetson: tutorial-intro.md + - Introduction: tutorial-intro.md - Text (LLM): - text-generation-webui: tutorial_text-generation.md - llamaspeak 🆕: tutorial_llamaspeak.md @@ -87,7 +87,7 @@ nav: - Vector Database: - NanoDB: tutorial_nanodb.md - Audio: - - Audiocraft 🆕: tutorial_audiocraft.md + - AudioCraft 🆕: tutorial_audiocraft.md - Whisper 🆕: tutorial_whisper.md # - Tools: # - LangChain: tutorial_distillation.md