Skip to content

Commit

Permalink
Add English README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
AliOsm committed Jun 30, 2024
1 parent 888b7e5 commit 51eccca
Show file tree
Hide file tree
Showing 2 changed files with 413 additions and 9 deletions.
390 changes: 390 additions & 0 deletions README.en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,390 @@
<p align="center">
<img src="https://user-images.githubusercontent.com/7662492/229289746-89c5a4c7-afa6-4d46-a0e6-63dfdeb98285.jpg" style="width: 100%;"/>
</p>

<div align="center">
<a href="https://pypi.org/project/tafrigh" target="_blank"><img src="https://img.shields.io/pypi/v/tafrigh?label=PyPI%20Version&color=limegreen" /></a>
<a href="https://pypi.org/project/tafrigh" target="_blank"><img src="https://img.shields.io/pypi/pyversions/tafrigh?color=limegreen" /></a>
<a href="https://github.com/ieasybooks/tafrigh/blob/main/LICENSE" target="_blank"><img src="https://img.shields.io/pypi/l/tafrigh?color=limegreen" /></a>
<a href="https://pepy.tech/project/tafrigh" target="_blank"><img src="https://static.pepy.tech/badge/tafrigh" /></a>

<a href="https://github.com/ieasybooks/tafrigh/actions/workflows/formatter.yml" target="_blank"><img src="https://github.com/ieasybooks/tafrigh/actions/workflows/formatter.yml/badge.svg" /></a>
<a href="https://sonarcloud.io/summary/new_code?id=ieasybooks_tafrigh" target="_blank"><img src="https://sonarcloud.io/api/project_badges/measure?project=ieasybooks_tafrigh&metric=code_smells" /></a>
<a href="https://tafrigh.ieasybooks.com" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
</div>

<div align="center">

[![ar](https://img.shields.io/badge/lang-ar-brightgreen.svg)](README.md)
[![en](https://img.shields.io/badge/lang-en-red.svg)](README.en.md)

</div>

<h1>Tafrigh</h1>

<p>Transcribing visual or audio materials into text.</p>

<p>You can view examples transcribed using Tafrigh from <a href="https://drive.google.com/drive/folders/1mwdJ9t4tiu8jFGosvNsq8SL54HoQMB8G?usp=sharing">here</a>.</p>

<h2>Features of Tafrigh</h2>

<ul>
<li>Transcribing visual and audio materials into text using the latest AI technologies provided by OpenAI</li>
<li>Ability to transcribe materials using wit.ai technologies provided by Facebook</li>
<li>Download visual content directly from YouTube, whether a single video or a complete playlist</li>
<li>Provide various output formats like <code>txt</code>, <code>srt</code>, <code>vtt</code>, <code>csv</code>, <code>tsv</code>, and <code>json</code></li>
</ul>

<h2>Requirements</h2>

<ul>
<li>A strong GPU in your computer is recommended if using Whisper models</li>
<li>Python version 3.11 or higher installed on your computer</li>
<li><a href="https://ffmpeg.org">FFmpeg</a> installed on your computer</li>
<li><a href="https://github.com/yt-dlp/yt-dlp">yt-dlp</a> installed on your computer</li>
</ul>

<h2>Installing Tafrigh</h2>

<h3>Using <code>pip</code></h3>

<p>You can install Tafrigh using <code>pip</code> with the command: <code>pip install tafrigh[wit,whisper]</code></p>

<p>You can specify the dependencies you want to install based on the technology you want to use by writing <code>wit</code> or <code>whisper</code> in square brackets as shown in the previous command.</p>

<h3>From the Source Code</h3>

<ul>
<li>Download this repository by clicking on Code then Download ZIP or by executing the following command: <code>git clone [email protected]:ieasybooks/tafrigh.git</code></li>
<li>Extract the file if downloaded as ZIP and navigate to the project folder</li>
<li>Execute the following command to install Tafrigh: <code>poetry install</code></li>
</ul>

<p>Add <code>-E wit</code> or <code>-E whisper</code> to specify the dependencies to install.</p>

<h2>Using Tafrigh</h2>

<h3>Available Options</h3>

<ul>
<li>
Inputs
<ul>
<li>Links or file paths: Pass the links or file paths of the materials to be transcribed directly after the Tafrigh tool name. For example: <code>tafrigh "https://yout..." "https://yout..." "C:\Users\ieasybooks\leactue.wav"</code></li>
<li>Skip transcription if output exists: Use the <code>--skip_if_output_exist</code> option to skip transcription if the required outputs already exist in the specified output folder</li>
<li>Specify items to transcribe from a playlist: You can specify a range of items to be transcribed from a playlist using the <code>--playlist_items</code> option by passing a value in the format <code>"[START]:[STOP][:STEP]"</code>. For example, passing <code>2:5</code> will download items from <code>2</code> to <code>5</code> from the playlist. This option affects all playlists passed as inputs to Tafrigh</li>
<li>Number of download retries: If downloading a full playlist using the <code>yt-dlp</code> library, some items may fail to download. The <code>--download_retries</code> option can be used to specify the number of retry attempts if a download fails. The default value is <code>3</code></li>
</ul>
</li>

<li>
Whisper Options
<ul>
<li>
Model: You can specify the model using the <code>--model_name_or_path</code> option. Available models:
<ul>
<li><code>tiny.en</code> (English only)</li>
<li><code>tiny</code> (least accurate)</li>
<li><code>base.en</code> (English only)</li>
<li><code>base</code></li>
<li><code>small.en</code> (English only)</li>
<li><code>small</code> <strong>(default)</strong></li>
<li><code>medium.en</code> (English only)</li>
<li><code>medium</code></li>
<li><code>large-v1</code></li>
<li><code>large-v2</code></li>
<li><code>large-v3</code></li>
<li><code>large</code> (most accurate)</li>
<li>Whisper model name on HuggingFace Hub</li>
<li>Path to a pre-downloaded Whisper model</li>
<li>Path to a Whisper model converted using the <a href="https://opennmt.net/CTranslate2/guides/transformers.html"><code>ct2-transformers-converter</code></a> tool for use with the fast library <a href="https://github.com/guillaumekln/faster-whisper"><code>faster-whisper</code></a></li>
</ul>
</li>
<li>
Task: You can specify the task using the <code>--task</code> option. Available tasks:
<ul>
<li><code>transcribe</code>: Convert speech to text <strong>(default)</strong></li>
<li><code>translation</code>: Translate speech to text in English</li>
</ul>
</li>
<li>Language: You can specify the audio language using the <code>--language</code> option. For example, to specify Arabic, pass <code>ar</code>. If not specified, the language will be detected automatically</li>
<li>Use faster version of Whisper models: By passing the <code>--use_faster_whisper</code> option, the faster version of Whisper models will be used</li>
<li>Beam size: You can improve results using the <code>--beam_size</code> option, which allows the model to search a wider range of words during text generation. The default value is <code>5</code></li>
<li>
Model compression type: You can specify the compression method used during the model conversion using the <code>ct2-transformers-converter</code> tool by passing the <code>--ct2_compute_type</code> option. Available methods:
<ul>
<li><code>default</code> <strong>(default)</strong></li>
<li><code>int8</code></li>
<li><code>int8_float16</code></li>
<li><code>int16</code></li>
<li><code>float16</code></li>
</ul>
</li>
</ul>
</li>

<li>
Wit Options
<ul>
<li>Wit.ai keys: You can use <a href="wit.ai">wit.ai</a> technologies to transcribe materials into text by passing your wit.ai client access tokens to the <code>--wit_client_access_tokens</code> option. If this option is passed, wit.ai will be used for transcription. Otherwise, Whisper models will be used</li>
<li>Maximum cutting duration: You can specify the maximum cutting duration, which will affect the length of sentences in SRT and VTT files, by passing the <code>--max_cutting_duration</code> option. The default value is <code>15</code></li>
</ul>
</li>

<li>
Outputs
<ul>
<li>Merge segments: You can use the <code>--min_words_per_segment</code> option to control the minimum number of words that can be in a single transcription segment. The default value is <code>1</code>. Pass <code>0</code> to disable this feature</li>
<li>Save original files before merging: Use the <code>--save_files_before_compact</code> option to save the original files before merging segments based on the <code>--min_words_per_segment</code> option</li>
<li>Save yt-dlp library responses: You can save the yt-dlp library responses in JSON format by passing the <code>--save_yt_dlp_responses</code> option</li>
<li>Output sample segments: You can pass a value to the <code>--output_sample</code> option to get a random sample of all transcribed segments from each material after merging based on the <code>--min_words_per_segment</code> option. The default value is <code>0</code>, meaning no samples will be output</li>
<li>
Output formats: You can specify the output formats using the <code>--output_formats</code> option. Available formats:
<ul>
<li><code>txt</code></li>
<li><code>srt</code></li>
<li><code>vtt</code></li>
<li><code>csv</code></li>
<li><code>tsv</code></li>
<li><code>json</code></li>
<li><code>all</code> <strong>(default)</strong></li>
<li><code>none</code> (No file will be created if this format is passed)</li>
</ul>
</li>
<li>Output folder: You can specify the output folder using the <code>--output_dir</code> option. By default, the current folder will be the output folder if not specified</li>
</ul>
</li>
</ul>

```
➜ tafrigh --help
usage: tafrigh [-h] [--version] [--skip_if_output_exist | --no-skip_if_output_exist] [--playlist_items PLAYLIST_ITEMS]
[--download_retries DOWNLOAD_RETRIES] [--verbose | --no-verbose] [-m MODEL_NAME_OR_PATH] [-t {transcribe,translate}]
[-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}]
[--use_faster_whisper | --no-use_faster_whisper] [--beam_size BEAM_SIZE]
[--ct2_compute_type {default,int8,int8_float16,int16,float16}]
[-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]] [--max_cutting_duration [1-17]]
[--min_words_per_segment MIN_WORDS_PER_SEGMENT] [--save_files_before_compact | --no-save_files_before_compact]
[--save_yt_dlp_responses | --no-save_yt_dlp_responses] [--output_sample OUTPUT_SAMPLE]
[-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]] [-o OUTPUT_DIR]
urls_or_paths [urls_or_paths ...]
options:
-h, --help show this help message and exit
--version show program's version number and exit
Input:
urls_or_paths Video/Playlist URLs or local folder/file(s) to transcribe.
--skip_if_output_exist, --no-skip_if_output_exist
Whether to skip generating the output if the output file already exists.
--playlist_items PLAYLIST_ITEMS
Comma separated playlist_index of the items to download. You can specify a range using "[START]:[STOP][:STEP]".
--download_retries DOWNLOAD_RETRIES
Number of retries for yt-dlp downloads that fail.
--verbose, --no-verbose
Whether to print out the progress and debug messages.
Whisper:
-m MODEL_NAME_OR_PATH, --model_name_or_path MODEL_NAME_OR_PATH
Name or path of the Whisper model to use.
-t {transcribe,translate}, --task {transcribe,translate}
Whether to perform X->X speech recognition ('transcribe') or X->English translation ('translate').
-l {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}, --language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,zh}
Language spoken in the audio, skip to perform language detection.
--use_faster_whisper, --no-use_faster_whisper
Whether to use Faster Whisper implementation.
--beam_size BEAM_SIZE
Number of beams in beam search, only applicable when temperature is zero.
--ct2_compute_type {default,int8,int8_float16,int16,float16}
Quantization type applied while converting the model to CTranslate2 format.
Wit:
-w WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...], --wit_client_access_tokens WIT_CLIENT_ACCESS_TOKENS [WIT_CLIENT_ACCESS_TOKENS ...]
List of wit.ai client access tokens. If provided, wit.ai APIs will be used to do the transcription, otherwise
whisper will be used.
--max_cutting_duration [1-17]
The maximum allowed cutting duration. It should be between 1 and 17.
Output:
--min_words_per_segment MIN_WORDS_PER_SEGMENT
The minimum number of words should appear in each transcript segment. Any segment have words count less than
this threshold will be merged with the next one. Pass 0 to disable this behavior.
--save_files_before_compact, --no-save_files_before_compact
Saves the output files before applying the compact logic that is based on --min_words_per_segment.
--save_yt_dlp_responses, --no-save_yt_dlp_responses
Whether to save the yt-dlp library JSON responses or not.
--output_sample OUTPUT_SAMPLE
Samples random compacted segments from the output and generates a CSV file contains the sampled data. Pass 0 to
disable this behavior.
-f {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...], --output_formats {all,txt,srt,vtt,csv,tsv,json,none} [{all,txt,srt,vtt,csv,tsv,json,none} ...]
Format of the output file; if not specified, all available formats will be produced.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Directory to save the outputs.
```

<h3>Transcription from command line</h3>

<h4>Transcribing using Whisper models</h4>

<h5>Transcribing a single material</h5>

```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```

<h5>Transcribing a full playlist</h5>

```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```

<h5>Transcribing multiple materials</h5>

```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--model_name_or_path small \
--task transcribe \
--language ar \
--output_dir . \
--output_formats txt srt
```

<h5>Speeding up the transcription process</h5>

<p>You can use the <code><a href="https://github.com/guillaumekln/faster-whisper">faster_whisper</a></code> library, which provides faster transcription, by passing the <code>--use_faster_whisper</code> option as follows:</p>

```bash
tafrigh "https://youtu.be/3K5Jh_-UYeA" \
--model_name_or_path large \
--task transcribe \
--language ar \
--use_faster_whisper \
--output_dir . \
--output_formats txt srt
```

<h4>Transcribing using wit.ai technology</h4>

<h5>Transcribing a single material</h5>

```bash
tafrigh "https://youtu.be/dDzxYcEJbgo" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```

<h5>Transcribing a full playlist</h5>

```bash
tafrigh "https://youtube.com/playlist?list=PLyS-PHSxRDxsLnVsPrIwnsHMO5KgLz7T5" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```

<h5>Transcribing multiple materials</h5>

```bash
tafrigh "https://youtu.be/4h5P7jXvW98" "https://youtu.be/jpfndVSROpw" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
--output_dir . \
--output_formats txt srt \
--min_words_per_segment 10 \
--max_cutting_duration 10
```

<h3>Transcribing using code</h3>

<p>You can use Tafrigh through code as follows:</p>

```python
from tafrigh import farrigh, Config

if __name__ == '__main__':
config = Config(
input=Config.Input(
urls_or_paths=['https://youtu.be/qFsUwp5iomU'],
skip_if_output_exist=False,
playlist_items='',
download_retries=3,
verbose=False,
),
whisper=Config.Whisper(
model_name_or_path='tiny',
task='transcribe',
language='ar',
use_faster_whisper=True,
beam_size=5,
ct2_compute_type='default',
),
wit=Config.Wit(
wit_client_access_tokens=[],
max_cutting_duration=10,
),
output=Config.Output(
min_words_per_segment=10,
save_files_before_compact=False,
save_yt_dlp_responses=False,
output_sample=0,
output_formats=['txt', 'srt'],
output_dir='.',
),
)

for progress in farrigh(config):
print(progress)
```

<p>The <code>farrigh</code> function is a generator that produces the current transcription state and the progress of the process. If you do not need to track this, you can skip the loop by using <code>deque</code> as follows:</p>

```python
from collections import deque

from tafrigh import farrigh, Config

if __name__ == '__main__':
config = Config(...)

deque(farrigh(config), maxlen=0)
```

<h3>Transcribing using Docker</h3>

<p>If you have Docker on your computer, the easiest way to use Tafrigh is through Docker. The following command downloads the Tafrigh Docker image and transcribes a YouTube material using wit.ai technologies, outputting the results in the current folder:</p>

```bash
docker run -it --rm -v "$PWD:/tafrigh" ghcr.io/ieasybooks/tafrigh \
"https://www.youtube.com/watch?v=qFsUwp5iomU" \
--wit_client_access_tokens XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
-f txt srt
```

<p>You can pass any option from the Tafrigh library options mentioned above.</p>

<p>There are multiple Docker images you can use for Tafrigh based on the dependencies you want to use:</p>
<ul>
<li><code>ghcr.io/ieasybooks/tafrigh</code>: Contains dependencies for both wit.ai technologies and Whisper models</li>
<li><code>ghcr.io/ieasybooks/tafrigh-whisper</code>: Contains dependencies for Whisper models only</li>
<li><code>ghcr.io/ieasybooks/tafrigh-wit</code>: Contains dependencies for wit.ai technologies only</li>
</ul>

<p>One drawback is that Whisper models cannot use your computer's GPU when used through Docker, which is something we are working on resolving in the future.</p>

<hr>

<p>A significant part of this project is based on the <a href="https://github.com/m1guelpf/yt-whisper">yt-whisper</a> repository to achieve Tafrigh faster.</p>
Loading

0 comments on commit 51eccca

Please sign in to comment.