Skip to content

Commit

Permalink
I further reorganised the FAQ to be more compendious, and added a bri…
Browse files Browse the repository at this point in the history
…ef 'Getting started' and 'Further info' section
  • Loading branch information
okybaca committed Apr 7, 2023
1 parent addb531 commit 2a82b7b
Showing 1 changed file with 77 additions and 37 deletions.
114 changes: 77 additions & 37 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ RWI is an acronym for Reverse Word Index. This is generated by the indexer from
### Isn't P2P illegal?
No. P2P (peer to peer) only describes the technology by which computers exchange data amongst themselves. Past legal disputes have been over what types of data have been exchanged over such networks. Namely, copyrighted material. The P2P file sharing technique itself is legal, despite the fact that it has been used facilitate in the transfer of copyrighted data. The only files shared amongst YaCy peers are indexes of the publicly accessible internet.


## YaCy in general

### What are global and local indexes?
Expand Down Expand Up @@ -57,6 +58,40 @@ In general YaCy's architecture does not do peer-hopping, it also doesn't have a
### Do I need to set up and run a separate database?
No. YaCy contains its built-in database engine (solr), which does not need any extra set-up or configuration. Or you can use a standalone external solr instance instead, if you wish.

### Will running YaCy jeopardize my privacy?
YaCy respects user privacy. All password- or cookies-protected pages are excluded from indexing. Additionally, pages loaded using GET or POST parameters are not indexed by default. Thus, only publicly accessible, non-password-protected pages will be indexed.

For a detailed explanation on the technique: How YaCy protects your privacy wrt to personalized pages.

### Can other people find-out about my browsing log/history?
There's no way to browse the pages that are stored on a peer. A search of the pages is only possible on a precise word. The words are themselves dispatched over the peers thanks to the distributed hash tables (DHT). Then the hash tables of the peers are mixed, which makes retrieving the history of browsing of a certain peer impossible.


## Getting started

### How do I install YaCy?

See [download and installation](download_installation.md) guide to install on Linux, Windows, MacOS or various Unixes. [Readme](https://github.com/yacy/yacy_search_server/blob/master/README.md) contains the more information.

### How do I access my peer?

After successful start of YaCy, it should be running at localhost, port 8090, so you can access it by putting `http://localhost:8090` in your browser.

### How do I search?

Just put your query into the search field. Your instance will ask the other peers for results and collect them in search result page. It may take some time. By default the results are transfered to your peer as "RWI" and stored localy, so the next search will find the results more quickly.

You can also use some modifiers to refine your search. For example, `/date` added to a query will sort the results by date (of indexing). `inurl:` parameter will filter the results based on url, so `inurl:nytimes.com` will show just results from New York Times.


## Usage

### My YaCy search pages doesn't show!
The default address for the YaCy search and administration page is http://localhost:8090. If you are using Internet Explorer, please mind adding http:// before localhost:8090. In case you have changed the default port of YaCy from 8090 to another one, you will have to open the new port in your firewall or router (and maybe close the port 8090 if you don't use it). Another reason could be a bad setting of the proxy, in which case you need to deactivate the proxy for the local pages.

### Why does YaCy show different results from Google?
We expect YaCy to show different results than Google, for several reasons. As long as YaCy has only a few peers working, it cannot compete with Google. Hence the importance of having a great number of YaCy peers working. But even then YaCy will provide different and better results than Google, since it can be adapted to the user's own preferences and is not influenced by commercial aspects.

### What does Virgin, Junior, Senior, Principal Status mean?

#### virgin
Expand Down Expand Up @@ -87,51 +122,34 @@ ssh -f -R remotehost.org:8090:localhost:8090 [email protected] -N
```
and create tunnel to remotehost.org, port 8090.

### Will running YaCy jeopardize my privacy?
YaCy respects user privacy. All password- or cookies-protected pages are excluded from indexing. Additionally, pages loaded using GET or POST parameters are not indexed by default. Thus, only publicly accessible, non-password-protected pages will be indexed.

For a detailed explanation on the technique: How YaCy protects your privacy wrt to personalized pages.

### Can other people find-out about my browsing log/history?
There's no way to browse the pages that are stored on a peer. A search of the pages is only possible on a precise word. The words are themselves dispatched over the peers thanks to the distributed hash tables (DHT). Then the hash tables of the peers are mixed, which makes retrieving the history of browsing of a certain peer impossible.

### My YaCy search pages doesn't show!
The default address for the YaCy search and administration page is http://localhost:8090. If you are using Internet Explorer, please mind adding http:// before localhost:8090. In case you have changed the default port of YaCy from 8090 to another one, you will have to open the new port in your firewall or router (and maybe close the port 8090 if you don't use it). Another reason could be a bad setting of the proxy, in which case you need to deactivate the proxy for the local pages.

### Why does YaCy show different results from Google?
We expect YaCy to show different results than Google, for several reasons. As long as YaCy has only a few peers working, it cannot compete with Google. Hence the importance of having a great number of YaCy peers working. But even then YaCy will provide different and better results than Google, since it can be adapted to the user's own preferences and is not influenced by commercial aspects.

### I can not uninstall, because YaCy is still running
First check whether YaCy still runs. If it doesn't run, it may not have been shut down properly. Start YaCy again, then uninstall. Alternatively delete the yacy.running file in the yacy/DATA/ directory, then uninstall.

### How can I help?
First of all: run YaCy in senior mode. This helps to enrich the global index and to make YaCy more attractive.

If you want to add your own code, you are welcome; but please contact the author first and discuss your idea to see how it may fit into the overall architecture.

You can help a lot by simply giving us feedback or telling us about new ideas.

You can also help by telling other people about this software.

And if you find a bug or you see an uncovered use-case, we welcome your [bug-report](https://github.com/yacy/yacy_search_server/issues).

Any feed-back is welcome.
### How can I change the Connection Timeout value?
This can be done on the configuration page "Admin Console" -> "Advanced behavior" http://localhost:8090/ConfigProperties_p.html. Just search for the line client-timeout and change the value there. The timeout is in milliseconds.

You can [contribute](contribute.md) your code on GitHub, both to [YaCy](https://github.com/yacy/yacy_search_server) and it's [documentation](https://github.com/yacy/yacy_net_homepage).
Do not forget to restart YaCy after the change.

## Usage
Alternatively, another way to do this is through the configuration file httpProxy.conf in DATA/SETTINGS. If this type of configuration is to be performed then YaCy must be stopped before.

### Something seems not to be working properly ; what should I do?

YaCy is still undergoing development, so one should opt for a stable version for use. The latest stable version can be downloaded from the YaCy homepage https://yacy.net. If you are experiencing a strange behaviour of YaCy then you should search the forum https://community.searchlab.eu/ for known issues. If the issue is unknown, then you can ask for help on the forum (and provide the YaCy version, details on the occurrence of the issue, and if possible an excerpt from the log file in order to help fix the bug).

First thing to see while experiencing some errors, is the log located at `DATA/LOG/yacy00.log`. You can monitor it live using `tail` command. While it flips around when certain size is reached, it's better to use -F option:
```
tail -F DATA/LOG/yacy00.log
```
You can also filter lines by using `grep` command (eg. `tail -F DATA/LOG/yacy00.log | grep DHT` to show only DHT lines) or -v parameter of grep to ignore some lines (eg. `tail -F DATA/LOG/yacy00.log | grep -v DHT` to ignore DHT lines).

### YaCy is running terribly slow; what should I do?
As an indexing and search host, YaCy is quite resource hungry. It's written in Java. Fast disks (SSD or RAID) and plenty of RAM will help.

It occupies only the amount of RAM specified in “Maximum Used Memory”, so if you have more physical RAM, increasing this value should help.

Sometimes also ‘Database Optimisation’ helps, but it takes some time to run.

### I can not uninstall, because YaCy is still running
First check whether YaCy still runs. If it doesn't run, it may not have been shut down properly. Start YaCy again, then uninstall. Alternatively delete the yacy.running file in the yacy/DATA/ directory, then uninstall.


## DHT - Distributed Hash Table

### How do I give the index of one peer to another?
Expand All @@ -145,6 +163,7 @@ That results in larger RWIs on the target peers but on a smaller number of RWIs
### Are DHT entries unique in a search network or can URLs also appear twice or three times?
URLs are analyzed more than once so that a peer delayed does not lose his part in the search index. As for the indexes they are stored redundantly on multiple peers.


## Crawling / indexing

### How do I avoid indexing of some files?
Expand Down Expand Up @@ -175,12 +194,6 @@ This cannot be undone.

The String that you entered here is a [Java Pattern](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html)

### How can I change the Connection Timeout value?
This can be done on the configuration page "Admin Console" -> "Advanced behavior" http://localhost:8090/ConfigProperties_p.html. Just search for the line client-timeout and change the value there. The timeout is in milliseconds.

Do not forget to restart YaCy after the change.

Alternatively, another way to do this is through the configuration file httpProxy.conf in DATA/SETTINGS. If this type of configuration is to be performed then YaCy must be stopped before.

## Passwords

Expand All @@ -206,6 +219,7 @@ The next time you start YaCy the account/password combination will be read, encr

Then you will be able to log again into YaCy with the account/password you entered in the yacy.conf file, or set another password if you didn't set a combination.


## Disk space

### How can I limit the size of single files to be downloaded?
Expand All @@ -220,3 +234,29 @@ For the moment not directly. Automatically limiting that size would mean having
You can set two minimums of free disk space at /Performance_p.html: one for the crawls, and the other for DHT-in. The number for crawls seems to have to be equal or bigger than the number for DHT-in. Note that, with DHT-in disabled, global searching using the peer's UI is disabled. Also proxy/crawling privacy might suffer.
You can also just disable “Index Receive” at /ConfigNetwork_p.html, so that your index is only augmented through crawling (over which you have some control).
For a very indirect additional limit, you can change the Index Reference Size at /IndexControlRWIs_p.html.


## Further info

### Where do I find more documentation?

You can see the [legacy wiki](https://wiki.yacy.net/index.php/En:Start). Not all information were transfered to this FAQ yet.

You can search the [community forum](https://community.searchlab.eu/) or ask questions there.

For more theoretical concepts behind YaCy, you can see [slides for talks of Michael Christen](https://yacy.net/material/), the main developer, [slides for lecture Information Retrieval](https://yacy.net/material/SIM-IR-SS15-MichaelChristen-Introduction_Information_Retrieval-20150512.pdf) (partly in German) and a [scientific paper about YaCy](https://yacy.net/material/Description_of_the_YaCy_Distributed_Web_Search_Engine_Herrmann_Ning_Diaz_Preneel_ESAT_KULeuven_COSIC_article-2459.pdf).

### How can I help?
First of all: run YaCy in senior mode. This helps to enrich the global index and to make YaCy more attractive.

If you want to add your own code, you are welcome; but please contact the author first and discuss your idea to see how it may fit into the overall architecture.

You can help a lot by simply giving us feedback or telling us about new ideas.

You can also help by telling other people about this software.

And if you find a bug or you see an uncovered use-case, we welcome your [bug-report](https://github.com/yacy/yacy_search_server/issues).

Any feed-back is welcome.

You can [contribute](contribute.md) your code on GitHub, both to [YaCy](https://github.com/yacy/yacy_search_server) and it's [documentation](https://github.com/yacy/yacy_net_homepage).

0 comments on commit 2a82b7b

Please sign in to comment.