-
Notifications
You must be signed in to change notification settings - Fork 12
/
Copy pathREADME
189 lines (127 loc) · 6.39 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
SYNOPSIS
use WWW::Mechanize::Cached;
my $cacher = WWW::Mechanize::Cached->new;
$cacher->get( $url );
# or, with your own Cache object
use CHI;
use WWW::Mechanize::Cached;
my $cache = CHI->new(
driver => 'File',
root_dir => '/tmp/mech-example'
);
my $mech = WWW::Mechanize::Cached->new( cache => $cache );
$mech->get("http://www.wikipedia.org");
DESCRIPTION
Uses the Cache::Cache hierarchy by default to implement a caching Mech.
This lets one perform repeated requests without hammering a server
impolitely. Please note that Cache::Cache has been superceded by CHI,
but the default has not been changed here for reasons of backwards
compatibility. For this reason, you are encouraged to provide your own
CHI caching object to override the default.
CONSTRUCTOR
new
Behaves like, and calls, WWW::Mechanize's new method. Any params, other
than those explicitly listed here are passed directly to
WWW::Mechanize's constructor.
You may pass in a cache => $cache_object if you wish. The $cache_object
must have get() and set() methods like the Cache::Cache family.
The default Cache object is set up with the following params:
my $cache_params = {
default_expires_in => "1d", namespace => 'www-mechanize-cached',
};
$cache = Cache::FileCache->new( $cache_params );
This should be fine if you only want to use a disk-based cache, you
only want to cache results for 1 day and you're not in a shared hosting
environment. If any of this presents a problem for you, you should pass
in your own Cache object. These defaults will remain unchanged in order
to maintain backwards compatibility.
For example, you may want to try something like this:
use WWW::Mechanize::Cached;
use CHI;
my $cache = CHI->new(
driver => 'File',
root_dir => '/tmp/mech-example'
);
my $mech = WWW::Mechanize::Cached->new( cache => $cache );
$mech->get("http://www.wikipedia.org");
METHODS
Most methods are provided by WWW::Mechanize. See that module's
documentation for details.
cache( $cache_object )
Requires an caching object which has a get() and a set() method. Using
the CHI module to create your cache is the recommended way. See new()
for examples.
is_cached()
Returns true if the current page is from the cache, or false if not. If
it returns undef, then you don't have any current request.
positive_cache( 0|1 )
As of v1.36 positive caching is enabled by default. Up to this point,
this module had employed a negative cache, which means it cached 404
responses, temporary redirects etc. In most cases, this is not what you
want, so the default behaviour now better reflects this. You can revert
to the negative cache quite easily:
# cache everything (404s, all 300s etc)
$mech->positive_cache( 0 );
ref_in_cache_key( 0|1 )
Allow the referring URL to be used when creating the cache key. This is
off by default. In almost all cases, you will not want to enable this,
but it is available to you for reasons of backwards compatibility and
giving you enough rope to hang yourself.
Previous to v1.36 the following was in the "BUGS AND LIMITATIONS"
section:
It may sometimes seem as if it's not caching something. And this may well
be true. It uses the HTTP request, in string form, as the key to the cache
entries, so any minor changes will result in a different key. This is most
noticable when following links as L<WWW::Mechanize> adds a C<Referer>
header.
See RT #56757 for a detailed example of the bugs this functionality can
trigger.
cache_undef_content_length( 0 | 'warn' | 1 )
This is configuration option which adjusts how caching behaviour
performs when the Content-Length header is not specified by the server.
Default behaviour is 0, which is not to cache.
Setting this value to 1, will cache pages even if the Content-Length
header is missing, which was the default behaviour prior to the
addition of this feature.
And thirdly, you can set the value to the string 'warn', to warn if
this scenario occurs, and then not cache it.
cache_zero_content_length( 0 | 'warn' | 1 )
This is configuration option which adjusts how caching behaviour
performs when the Content-Length header is equal to 0.
Default behaviour is 0, which is not to cache.
Setting this value to 1, will cache pages even if the Content-Length
header is 0, which was the default behaviour prior to the addition of
this feature.
And thirdly, you can set the value to the string 'warn', to warn if
this scenario occurs, and then not cache it.
cache_mismatch_content_length( 0 | 'warn' | 1 )
This is configuration option which adjusts how caching behaviour
performs when the Content-Length header differs from the length of the
content itself. ( Which usually indicates a transmission error )
Setting this value to 0, will silenly not cache pages with a
Content-Length mismatch.
Setting this value to 1, will cache pages even if the Content-Length
header conflicts with the content length, which was the default
behaviour prior to the addition of this feature.
And thirdly, you can set the value to the string 'warn', to warn if
this scenario occurs, and then not cache it. ( This is the default
behaviour )
UPGRADING FROM 1.40 OR EARLIER
Caching behaviour has changed since 1.40, and this may result in pages
that were previously cached start failing to cache, and in some cases,
emit warnings.
To return to the 1.40 behaviour:
$mech->cache_undef_content_length(1); # Default is 0
$mech->cache_zero_content_length(1); # Default is 0
$mech->cache_mismatch_content_length(1); # Default is 'warn'
THANKS
Iain Truskett for writing this in the first place. Andy Lester for
graciously handing over maintainership. Kent Fredric for adding content
length handling.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc WWW::Mechanize::Cached
* Search CPAN
https://metacpan.org/module/WWW::Mechanize::Cached
SEE ALSO
WWW::Mechanize, WWW::Mechanize::Cached::GZip.