XML parser that is included with swish uses James Clarks's Expat library
![]()
|
Table of Contents:
OVERVIEWThis document describes how to download, build and install Swish-e. Also described is how to build Swish-e with optional, yet recommended libraries that extend and enhance Swish-e. This document also provides instructions on how to get help installing and using Swish-e (and the important information you should provide when asking for help). Also, below is a basic overview of using Swish-e to index documents, with pointers to other more advanced examples. For those in a hurry, see Quick Start for the Impatient. Also, please read the Swish-e FAQ SWISH-FAQ as it answers many frequently asked questoins. [ TOC ] SYSTEM REQUIREMENTSSwish-e is written in C, and, up to this time, it has been tested on a number of platforms. Any current Linux distribution should have no problem building Swish-e. Swish-e has also been used on these platforms:
Unless you are using the Win32 binary distribution, a C compiler is needed. Pretty much any standard compiler should do, although you will probably have best luck with a current version of gcc. If you are using something else (such as HP-UX or AIX) you may see more warnings during the build process. Any problems should be sent to the Swish-e discussion list after searching the list archives.
HTML and XML files. Instructions for installing and enabling the library are described below.
library (included with the Swish-e distribution), but Swish-e's old html.c but offers a much better HTML parser that Swish-e's html.c parser. Use Currently, setting a content type (IndexContents or DefaultContents) of ``HTML'' uses Swish-e's html.c parser, where a setting of ``HTML2'' this may change in future releases. zlib compression Swish-e can make use of zlib to compress document properties. This is recommended if you are using StoreDescription. A Swish-e program built with zlib will read an index from a version of Swish-e that was not built with zlib. But, if you are searching an index that was compressed with zlib then you will need to use a version of Swish-e built with zlib. Therefore, it's recommended to always include zlib support. Memory
Swish needs quite a bit of memory while indexing. How much depends on what
you are indexing. The index is portable between platforms (that use the
same basic data type sizes), so you can index on a machine that has lots of
memory available and move the index files to another machine for searching.
Use the Perl modules Swish-e uses a perl script for spidering web sites. The script requires the LWP bundle of modules (see http://search.cpan.org/search?dist=libwww-perl ). (Note: depending on your perl installation, you might need to install additional modules required by LWP; for requirements and downloads check http://www.cpan.org or http://search.cpan.org). The Perl helper script was tested with perl 5.005, 5.6.0, and 5.6.1 although it should probably work with any version 5 release. Do note that the LWP, HTTP, and HTML modules are updated often for bug fixes and such -- do check for upgrades, and don't expect that your system admin as been keeping up with bug fixes. [ TOC ] Platform Specific Information
A
Specific information for various platforms can be found in subdirectories
of the src directory. For example, the Win32 files can be found in The Windows binary is distributed as a separate package from the source distribution. See http://Swish-e.org for download information. Swish-e indexes are not portable between 32 and 64 bit platforms, but should be portable between machines with different ``endian'' types. [ TOC ] INSTALLATIONInstructions below are for installing Swish-e from source. Installing from source is recommended, but you should also check the Swish-e web site for binary distributions for your platform. Windows binary distributions are available from the Swish-e site. [ TOC ] Brief Instructions
Swish uses a configure script to generate a Makefile for your platform. The configure script should detect and use optional libraries if found on your system. [ TOC ] and XML documents. recommended, espeically if you are parsing HTML. As mentioned above, the and works well. The HTML parser in Swish-e has been in use for years, but more features (and more features for parsing XML), and is more accurate. If you are running Linux it may already be installed (look for
In this case, try specifying where zlib can be found. For example, if libz was located in /usr/local/lib you would use this when building
This will install the headers and library files in $HOME/local/include and $HOME/local/lib. You will need to inform the Swish-e build process of this non-standard directory location (explained below).
locations.
where that library is at run time. There seems to be a number of ways to do
this. First, you can set the environment variable For example, under Bourne type shells:
Other shells (like csh and tcsh) may require:
Another option is to use the be deleted or moved. [ TOC ] Building Swish-e with zlibBuilding with zlib is similar to the instructions for building Swish-e with found link Swish-e with the zlib library. zlib is common on many systems, but may be out of date, and versions prior to 1.1.4 have a know security issue. You should run at least version 1.1.4. To link with zlib in a non-standard location use, for example:
or [ TOC ] Downloading and unpacking and building Swish-eIf you are reading this INSTALL document, then you probably already have downloaded and unpacked the distribution. But just in case... Make sure you are using the current release from http://Swish-e.org. If you have any questions about which version to use, please ask on the Swish-e discussion list. How you download Swish-e is up to you: lynx, lwp-download, wget are all common methods.
[ TOC ] Join the Swish-e discussion listThe Swish-e discussion list is the place to ask questions about installing and using Swish-e, see or post bug fixes or security announcements, and a place where you can offer help to others. The list is typically very low traffic, so it won't overload your inbox. Please take time to subscribe. See http://Swish-e.org. If you are using Swish-e on a public site, please let the list know so it can be added to the list of sites that use Swish-e! Please review QUESTIONS AND TROUBLESHOOTING before posting a question to the Swish-e list. [ TOC ] Installing the Swish-e C Library (optional)Swish 2.2 creates the C library libswish-e.a during the build. Install this library if you wish to embed Swish-e into another application. For example, the library should be installed before using the high level Perl SWISH modules located on CPAN. http://search.cpan.org/search?mode=module&query=SWISH This is an *optional* step. Most users will not need to install the library. To install the library issue the following commands (again, you may need to su root)
By default this will install the library in /usr/local/lib, but this directory can be set when running ./configure with the --libdir option. For example:
So
Note: You may wish to run [ TOC ] Creating PDF and Postscript documentation (optional)The Swish-e documentation in HTML format was created with Pod::HtmlPsPdf, a package of Perl modules written and/or modified by Stas Bekman to automate the conversion of documents in pod format (see perldoc perlpod) to HTML, Postscript, and PDF. A slightly modified version of this package is include with the Swish-e distribution and used for building the HTML. If your system has the necessary tools to build Postscript and the converter ps2pdf installed, you may be able to build the Postscript and PDF versions of the documentation. After you have run ./configure, type from the top-level directory of the distribution:
And with any luck you will end up with the these two files in the top-level directory:
Most people find reading the documentation in HTML most convenient. [ TOC ] Installing the Swish-e documentation as man(1) pages (optional)
Part of the included Swish-e documentation can be installed as system
To build the man pages and install them into your system, type from the top-level directory (after running ./configure):
You will need to The man pages are installed in the system man directory. This directory is determined by running ./configure and can be set by passing the directory when running ./configure. For example,
Information on running ./configure can be found by typing:
The pod source files used to create the man files were written running under perl 5.6.1. Older version of Perl may complain slightly about the formatting of the pod files. This shouldn't be a problem, but please let the Swish-e list know if otherwise. Then upgrade your version of perl. ;) [ TOC ] QUESTIONS AND TROUBLESHOOTINGPlease search the Swish-e list archive before posting a question, and check the SWISH-FAQ to see if your question hasn't already been asked. Support for installation, configuration and usage is available via the Swish-e discussion list. Visit http://swish-e.org for information. Do not contact developers directly for help -- always post your question to the list. Before posting use tools available to narrow down the problem.
Swish-e has the -T, -v, and -k switches that may help resolve issues. If
possible find a single document that shows the problem, then index with -T
INDEXED_WORDS and watch the exact words that are indexed. Use -H 9 when
searching and look at
You can also use programs like [ TOC ] When posting please provide the following information:
[ TOC ] BASIC CONFIGURATION AND USAGEThis section should give you a basic overview of indexing and searching with Swish-e. Other examples can be found in the conf directory, which will step you through a number of different configurations. Also, please review the SWISH-FAQ. Swish-e reads a configuration file (see SWISH-CONFIG) for directives that control what and how Swish-e indexes files. Then running Swish-e is controlled by command line arguments (see SWISH-RUN). Swish-e does not require a configuration file, but most people need to change the default behavior by placing settings in a configuration file. To try the examples below change to the tests subdirectory of the distribution. The tests will use the *.html files in this directory when creating the test index. You may wish to review these *.html files to get an idea of the various native file formats that Swish-e supports. [ TOC ] Step 1: Create a Configuration FileThe configuration file controls what and how Swish-e indexes. The configuration file consists of directives, comments, and blank lines. The configuration file can be any name you like.
This example will work with the documents in the tests directory. You may wish to review the tests/test.config configuration file used for the For example, a simple configuration file (Swish-e.conf):
And that's a simple configuration file. It says to index all the .html files in the current directory, and provide some basic output while indexing. The complete list of all configuration file directives are described in SWISH-CONFIG. [ TOC ] Step 2: Index your Files
Now, make sure you are in the tests directory and save the above example configuration file as swish-e.conf. Then run Swish-e using the
This created the index file index.swish-e. This is the default index file name unless the IndexFile directive is specified in the configuration file:
[ TOC ] Step 3: Search
You specify your search terms with the
This example assumes that you are in the tests directory, and the Swish-e binary is in the ../src directory. Swish-e returns in response to that command the following:
So the word sample was found in two documents. The first number shown is the relevance or rank of the search term, followed by the file containing the search term, the title of the document, and finally the length of the document. The period (``.'') alone at the end marks the end of results.
Much more information may be retrieved while searching by using the [ TOC ] Phrase SearchingTo search for a phrase in a document use double-quotes to delimit your search terms. (The phrase delimiter is set in src/swish.h.) You must protect the quotes from the shell. For example, under Unix:
Or under Windows command.com shell.
The phrase delimiter can be set with the [ TOC ] Boolean SearchingYou can use the Boolean operators and, or, or not in searching. Without these Boolean, Swish-e will assume you're anding the words together. Here are some examples:
retrieves first the files that contain both the words ``apples'' and ``oranges''; then among those the ones that do not contain the word ``juice'' A few others to ponder:
See SWISH-SEARCH for more information. [ TOC ] Context Searching
The
For example:
[ TOC ] META TagsFor the last example we will instruct Swish-e to use META tags to define fields in your documents.
META names are a way to define ``fields'' in your documents. You can use
the META names in your queries to limit the search to just the words
contained in that META name of your document. For example, you might have a
META tagged field in your documents called Document Properties are somewhat related to meta tags: Properties allow the contents of a META tag in a source document to be stored within the index, and that text to be returned along with search results. META tags can have two formats in your documents.
And in XML format
This, of course, is invalid HTML. To continue with our sample Swish-e.conf file, add the following lines:
Reindex to include the changes:
Now search, but this time limit your search to META tag ``meta1'':
Again, please see SWISH-RUN and SWISH-CONFIG for complete documentation of the various indexing and searching options. [ TOC ] Additional Examples
The above example indexes local files using the file system access method
The Swish-e can also use filters to convert documents as they are processed by Swish-e. For example, MS-Word or PDF documents can be converted and indexed by Swish-e by using filters. See the section on filters in SWISH-CONFIG, and the examples shown in the filters and filter-bin directories. [ TOC ] QUICK START FOR THE IMPATIENT[ TOC ] InstallationHere's the steps required on most platforms for downloading and installing swish-e.
Not required, but once built you can install the src/swish-e program.
This installs to /usr/local/bin by default. [ TOC ] Indexing ExamplesHere's three examples of using swish. The first example shows how to use a very simple configuration file to index the Swish-e documentation. The second example shows spidering using the built-in HTTP method. The third example uses an external program for indexing and shows use of a CGI script for searching. [ TOC ] Example 1 - simple file system indexingFirst, create a configuration file off your home directory:
Now create the index:
And search:
[ TOC ] Example 2 - spideringThis example uses Swish-e's ``HTTP'' method of spidering. This method is depreciated due to it's lack of features. Spidering uses a perl helper script. You must have the Perl package LWP (libwww-perl) installed on your system.
Index:
And search:
[ TOC ] Example 3 - using an external programThis is a more advanced example that spiders a web site using the included prog-bin/spider.pl program and uses the included example/swish.cgi script for searching the index via a web interface. and zlib installed in the system, you have a current version of Perl and current versions of LWP, HTML:*, and HTTP:* modules installed, and Apache is installed and operating. If you have any trouble with these instructions please read the detailed installation instructions above, and see the documentation included with the swish.cgi script and the spider.pl programs. Please don't ask for help without reading the ``real'' documentation first.
Now you are ready to search. [ TOC ] Document Info$Id: INSTALL.pod,v 1.22 2002/09/11 00:54:08 whmoseley Exp $ . [ TOC ]
|