-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
181 lines (169 loc) · 5.84 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta charset="utf-8">
<title>TextRank - PHP.Science</title>
<meta name="description" content="TextRank (automatic text summarization) for PHP8">
<meta name="keywords" content="php,science,textrank,search,algorithm,summarization">
<meta name="robots" content="INDEX,FOLLOW">
<meta name="author" content="David Belicza, PHP.SCIENCE, https://php.science">
<meta name="viewport" content="width=device-width user-scalable=0 initial-scale=1.0">
<meta name="theme-color" content="#ffffff">
<link rel="stylesheet" href="/style.css" type="text/css">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="canonical" href="https://php.science/textrank/">
</head>
<body>
<div class="header">
<div class="header-inner">
<span class="logo">
<a href="/">
PHP.Science
</a>
</span>
<ul>
<li class="home">
<span>
<a href="/">
Home
</a>
</span>
</li>
<li class=" ">
<span>
<a href="/pagerank">
PageRank
</a>
</span>
</li>
<li class=" current ">
<span>
<a href="/textrank">
TextRank
</a>
</span>
</li>
</ul>
</div>
</div>
<div class="documentation">
<h1 align="center">
TextRank
</h1>
<p align="center">
<a href="https://github.com/PHP-Science/TextRank/actions">
<img src="https://github.com/php-science/textrank/workflows/tests/badge.svg"/>
</a>
<a href="https://packagist.org/packages/php-science/textrank">
<img src="https://poser.pugx.org/php-science/textrank/v/stable.svg" />
</a>
<a href="https://packagist.org/packages/php-science/textrank">
<img src="https://poser.pugx.org/php-science/textrank/downloads"/>
</a>
<a href="https://github.com/PHP-Science/TextRank/blob/master/LICENSE">
<img src="https://img.shields.io/badge/license-MIT-FFF300.svg"/>
</a>
</p>
<p align="center">
This source code is an implementation of the TextRank algorithm (Automatic summarization) on PHP7 strict mode. It can summarize a text, article for example to a short paragraph. Before it would start the summarizing it removes the junk words what are defined in the Stopwords namespace. It is possible to extend it with another languages.
<br />
<br />
</p>
<h2>TextRank or Automatic summarization</h2>
<blockquote>
<p>Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a representative subset of the data, which contains the information of the entire set. Summarization technologies are used in a large number of sectors in industry today. - Wikipedia</p>
</blockquote>
<p>The algorithm of this implementation is:</p>
<ul>
<li>Find sentences,</li>
<li>Remove stopwords,</li>
<li>Create integer values by find and count the matching words,</li>
<li>Change the integer values by the related words' integer values,</li>
<li>Normalize values to create scores,</li>
<li>Order by scores</li>
</ul>
<h2>Install</h2>
<pre><code>composer require php-science/textrank
</code></pre>
<h2>Test</h2>
<pre><code>cd project-folder
composer test
</code></pre>
<p>or</p>
<pre><code>cd project-folder
phpunit --colors='always' $(pwd)/tests
</code></pre>
<h2>Examples</h2>
<pre><code class="language-php">
use PhpScience\TextRank\Tool\StopWords\English;
// String contains a long text, see the /res/sample1.txt file.
$text = "Lorem ipsum...";
$api = new TextRankFacade();
// English implementation for stopwords/junk words:
$stopWords = new English();
$api->setStopWords($stopWords);
// Array of the most important keywords:
$result = $api->getOnlyKeyWords($text);
// Array of the sentences from the most important part of the text:
$result = $api->getHighlights($text);
// Array of the most important sentences from the text:
$result = $api->summarizeTextBasic($text);
</code></pre>
<p>More examples:</p>
<ul>
<li>
<a href="https://github.com/DoveID/PHP-Science-TextRank/blob/master/tests/TextRankFacadeTest.php">tests/TextRankFacadeTest.php</a>
</li>
<li>https://php.science</li>
</ul>
<h2>Authors, Contributors</h2>
<table>
<thead>
<tr>
<th>Name</th>
<th>GitHub user</th>
</tr>
</thead>
<tbody>
<tr>
<td>David Belicza</td>
<td>@DavidBelicza</td>
</tr>
<tr>
<td>Riccardo Marton</td>
<td>@riccardomarton</td>
</tr>
<tr>
<td>Syndesi</td>
<td>@Syndesi</td>
</tr>
<tr>
<td>vincentsch</td>
<td>@vincentsch</td>
</tr>
<tr>
<td>Andrew Welch</td>
<td>@khalwat</td>
</tr>
<tr>
<td>Andrey Astashov</td>
<td>@mvcaaa</td>
</tr>
<tr>
<td>Leo Toneff</td>
<td>@bragle</td>
</tr>
<tr>
<td>Willy Arisky</td>
<td>@willyarisky</td>
</tr>
</tbody>
</table>
</div>
<div class="footer">
<p>PHP.Science</p>
</div>
</body>
</html>