-
Notifications
You must be signed in to change notification settings - Fork 7
/
tesseract.html
79 lines (77 loc) · 3.18 KB
/
tesseract.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
<script type="text/javascript">
RED.nodes.registerType('tesseract',{
category: 'analysis',
color: '#e6e0f8',
defaults: {
name: {value:""},
language: {value:"eng"}
},
inputs:1,
outputs:1,
icon: "light.png",
label: function() {
return this.name||"tesseract";
}
});
</script>
<script type="text/x-red" data-template-name="tesseract">
<div class="form-row">
<label for="node-input-name"><i class="icon-tag"></i> Name</label>
<input type="text" id="node-input-name" placeholder="Name">
</div>
<div class="form-row">
<label for="node-input-language"><i class="icon-language"></i> Language</label>
<input type="text" id="node-input-language" placeholder="Language (defaults to eng)">
</div>
</script>
<script type="text/x-red" data-help-name="tesseract">
<p>Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. It performs all OCR tasks locally without requiring a connection to any external service.</p>
<p>Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.</p>
<p>This <a href="https://github.com/sjoerdvanderhoorn/node-red-contrib-tesseract">Node-RED implementation of Tesseract.js</a> has been provided by Sjoerd van der Hoorn.</p>
<h3>Settings</h3>
<ul>
<li>Language - Code (<a href="https://github.com/naptha/tesseract.js/blob/master/docs/tesseract_lang_list.md" target="_blank">List of available language codes</a>).</li>
</ul>
<h3>Input</h3>
<ul>
<li><code>msg.payload</code> - Local filename, URL, or image buffer.</li>
</ul>
<h3>Output</h3>
<ul>
<li><code>msg.payload</code> - String with recognized text.</li>
<li><code>msg.tesseract</code> - Object with recognized text split out per line and word, plus confidence information.</li>
</ul>
<pre><code class="language-js">{
text: <span class="hljs-string">"Text from image\nSecond line"</span>,
confidence: <span class="hljs-number">87</span>,
lines:
[
{
text: <span class="hljs-string">"Text from image"</span>,
confidence: <span class="hljs-number">93</span>,
words:
[
{
text: <span class="hljs-string">"Text"</span>,
confidence: <span class="hljs-number">97</span>
},
{
...
}
]
},
{
...
}
]
}
</code></pre>
<h3>Additional information</h3>
<ul>
<li><a href="https://github.com/sjoerdvanderhoorn/node-red-contrib-tesseract" target="_blank">node-red-contrib-tesseract GitHub</a></li>
<li><a href="https://github.com/naptha/tesseract.js" target="_blank">Tesseract.js GitHub</a></li>
<li><a href="http://tesseract.projectnaptha.com/" target="_blank">Tesseract demo</a></li>
<li><a href="https://github.com/tesseract-ocr/tesseract" target="_blank">Original Tesseract OCR engine</a></li>
<li><a href="https://github.com/naptha/tessdata/tree/gh-pages/3.02" target="_blank">Language files</a></li>
</ul>
</script>