ENG/RUS   Main :: RiSearch :: RiSearch Pro :: RiCoord :: RiMap :: RiSearch PHP :: RiLax :: Forum

Introduction :: Installation :: Configuration :: System requirements :: Performance :: Query language :: License


      Edit file config.pl to set several parameters. Most of them are selfdocumented and does not require explanation.

  1.  $base_dir = ".";  - path to the directory, where your html files are located. If index.pl located in the same directory, leave this variable as is. Please note, that in all cases you should use or relative path, or absolute, starting from file system root (not from webserver root directory).

  2.  $base_url = "http://www.server.com/";  - URL of your site.

  3.  $site_size = 2;  - this variable controls database size and searching speed.

  4.  $file_ext = 'html txt htm shtml php';  - list of files extensions to be indexed.

  5.  $non_parse_ext = 'txt';  - list of extensions, were script should not remove HTML tags.

  6.  $no_index_dir = 'img image temp tmp cgi-bin';  - directories, which should not be indexed.

  7.  $zone[1] = 'dir1';  - site zones description. Any number of zones may be used. Every zone has unique number. The search form should send to the script additional parameter "z" with value equal to the chosen zone. You may use checkboxes, radio-buttons or menus. There is example in file "template.htm". When using checkboxes or menus with attribute multiple, you may choose several zones simultaneously. In such case search will be performed in all chosen zones. For searching in whole site "z" should be equal zero or not send to script at all.

    If one zone is located in several directories separate them by vertical bar without space ( $zone[1] = 'dir2|dir3'; ).

  8.  $numbers = '0-9';  - during the indexing script removes all non alphabetic characters from page and index what is left. As alphabetic character script interpret Latin characters and characters of regional alphabet (will be discussed later). Here you may add other characters, which should be indexed (such as numbers, underscore sign and so on).

  9.  $use_selective_indexing = "NO";  - this option is useful for big sites with complex navigation, news postings and other elements, which appear on every page and, probably, should not be indexed. It allows to tell to the script, which parts of page should be cut before indexing. Turn on this option ("YES") and uncomment next lines in file "config.pl".

     %no_index_strings = (
      q[<!-- No index start 1 -->] => q[<!-- No index end 1 -->],
      q[<!-- No index start 2 -->] => q[<!-- No index end 2 -->],

    Inside the square brackets you need to write two strings. Everything placed between them will be cut (note, if there are several occurrences of this strings in file, each occurrence will be processed). For this purpose you may use special marks, which divide different elements of design.

  10.  $cut_default_filenames = 'YES';  - this variable allows to cut default filenames (such as index.html) from URl in search results.

  11.  $INDEXING_SCHEME = 2;  - words indexing scheme. If indexing scheme equal "1", index is build on the whole word base. Most fastest method, but script will find only words equal to the keyword.

    When indexing scheme is "2", index is based on the beginning of each word. Script will find all words, which begin with given keyword. For example, for query "port" the words "portrait" and "portion" also will be found.

  12.  $use_stop_words = "YES";  - list of common words, which should not be indexed.

  13.  $descr_size = 256;  - length of file description (as description may be used first lines of file or content of "META description" tag).

  14.  $CAP_LETTERS = '\xC0-\xDF';  - Put here list of capital letters of your language (which are different from Latin). Do the same for small letters.

  15. There are many other parameters which are self-documented in config.pl file.

Memory allocation

      During indexing script build simple database with list of each word on your site and pages, where this word was found. This database requires lot of memory and can cause "Out of memory!" error. To reduce memory requriments RiFlex store part of this database in memory all time during indexing and other part write to disk. The more memory is available for script, the bigger sites you can index. You may cantrol how much memory script will use for temporarily database storing using several variables in config.pl.

  1.  $sitewords_block = 5000000;  - in this variable all words will be stored. For most sites 3-5 Mb should be enough, but you may increase this value if you plan to index very large text collection.

  2.  $hashwords_block = 3000000;  - this value in most cases should about two times smaller then previous.

  3.  $wind_temp_block = 300000;  - this value in most cases should be 10-20 times smaller then $sitewords_block.

  4.  $windblock = 64;  - should be multiple to four. The bigger value requires more memory for indexing and increases searching speed. Optimal values - 64-128.

  5.  $max_file_size = 5000000;  - large file processing require lot of resources and also can cause "Out of memory!" error. Script will index only first N bytes of large file.

      Total amount of memory allocated by script can be estimated using this formula:

$sitewords_block + $hashwords_block*4 + $wind_temp_block*$windblock

and for default settinds it gives:

5000000 + 3000000*4 + 300000*64 ~ 36 Mb plus additional space for files processing.

      During indexing script will report memory usage in next form:

Mem: HW-0.731199; SW-0.676259; WI-0.842597

If one of these values approaches to 1, you have to increase $hashwords_block, $sitewords_block or $wind_temp_block respectively.

Template usage

      Script uses template to control design of script output. Template is placed in file "template.htm". It is standard HTML file, which can be opened by every browser. You may look how your page will be displayed and edit it.

      Template consists of seven section: "header" and "footer" will be displayed in every case; "results_header", "results" and "results_footer" are displayed in case of succsessful search; "no_results" is used if no results are found; "empty_query" will be displayed if there are no query supplied.

      Each section divided by marks, like this:

 <!-- RiSearch::header::start --> 
You may edit everything between two dividers.

      Template uses several predefined parameters, which will be replaced by results of script work. Here is full list of parameters:

  1.  %query%  - query.

  2.  %search_time%  - time used by script to perform search.

  3.  %query_statistics%  - found words statistics (string like - "word1-n1 word2-n2").

  4.  %stpos%  - the starting number for results on this page.

  5.  %url%, %title%, %size%, %description%  - URL of found file, title, size and description.

  6.  %rescount%  - total number of found files.

  7.  %next_results%  - links to next pages with results.

  8.  %rand_number%  - random number in range from 0 to 255, which may be used in code for banner exchange systems (the number is fixed in one section, but new number is generated for each section).

http://risearch.org S.Tarasov, © 2000-2005