Installation Instructions for the Linguist's Search Engine These instructions have been tested on Linux. The recommended procedure for other systems is to follow the scripts but run commands by hand, adapting as necessary if things fail. These instructions assume your shell is bash and that you have some familiarity with postgresql, perl, and UNIX in general. STEP 1: INSTALL PREREQUISITES The LSE is a large and complex system and requires a number of external programs and libraries. Summary: perl 5.8.6 or later postgresql 8.0.3 or later GD 2 java JDK 1.4.2 or later expat puf openssh BerkeleyDB 4.2.52 -- perl5.8.6 - get from http://www.cpan.org or install using your OS's package management system (rpm, apt, portage, etc) You can use the default perl from your system, but make sure it's 5.8.6. 5.8.5 or earlier versions of perl 5.8 may work, but everything is tested with 5.8.6. You should make sure that perl 5.8.6 is the default perl (check with perl -v and adjust your path accordingly if not) -- postgresql 8.0.3 - get from http://www.postgresql.org or install using your OS's package management system INSTALLATION You should make sure psql for postgresql 8.0.3 is in your path. Everything assumes that postgresql is running on localhost on port 5432 - the default. If not, you'll need to adjust the configuration files and installation scripts accordingly - any time psql is used you'll need to give the port and hostname, and you'll need to change the connection parameters in general.ini. Read the psql and the DBD::Pg man pages for more information. Also, be sure to adjust pg_hba.conf so that any other machines running services or annotators (see the Administration Guide) will be able to connect to the database. TUNING You'll probably want to tune the postgresql configuration parameters are in postgresql.conf. Here are the settings we use (use at your own risk): In postgresql.conf we've used the following settings: - [ ] tune memory - [ ] shared_buffers = 32768 - [ ] work_mem = 65536 - [ ] maintenance_work_mem = 65536 - [ ] tune FSM - [ ] max_fsm_pages = 400000 - [ ] max_fsm_relations = 20000 - [ ] bgwriter_delay = 1000 - [ ] fsync = false - [ ] wal_buffers = 64 - [ ] checkpoint_segments = 30 - [ ] checkpoint_timeout = 600 - [ ] effective_cache_size = 16384 - [ ] random_page_cost = 3 - [ ] stats_command_string = true These settings should allocate more memory to postgresql and improve performance, especially for inserts. See the postgresql documentation for more information. You'll need to increase the SHMMAX parameter for your kernel if you use the above postgresql.conf settings. You can also use the "sysctl" command if you have it or echo a new setting to /proc/sys/kernel/shmmax. If not using Linux, you may need to change this setting in a different way. Google for SHMMAX and your operating system for more info. On linux (as root): sysctl -w kernel.shmmax=536870912 or echo 536870912 > /proc/sys/kernel/shmmax INITIAL DATABASE SETUP After you've tuned postgresql.conf, do an initdb with the directory you want to store the database in and start postgresql with pg_ctl. You'll also need to add a user called "lse" The commands to run are: initdb $PGDATA pg_ctl start -D $PGDATA psql -d template1 -c "CREATE USER lse WITH createdb, createuser" where $PGDATA is the directory where you want to keep the LSE database. If you install postgresql in an unusual place then the DBD::Pg perl module may need extra help installing. -- GD 2 - get from http://www.boutell.com or install using your OS's package management system Make sure to install GD with FreeType support (read the installation instructions!) GD is required for both graph-annot.pl and graph-annot-utf8.pl but the FreeType support is only needed for graph-annot-utf8.pl. You'll need to find a font to use - I use simsun.ttc from Windows XP since it has Chinese support. You should change the path to the font you want to use in the [graph_annot] section in general.ini. If GD is installed in an odd location then the GD perl module may need extra help to compile. For example, to set the script to use /usr/local/fonts/simsun.ttc, change line 23 of graph-annot-utf8.pl to read my $font = "/fs/LSE/lse_project/lse2/simsun.ttc"; In the future this will be set in a configuration file. -- GNU wget - probably installed on your system already, if not install using your package management system or get from any GNU archive (only needed for installation), make sure it's in your path -- JDK 1.4 or newer from java.sun.com - java doesn't need to be in your path, but you'll need to set JAVA_HOME to the root of the JDK (e.g. /usr/local/jdk1.4.2_03) in bash: export JAVA_HOME=/usr/local/jdk1.4.2_03 -- expat - probably installed on your system already, if not get from expat.sourceforge.net or install using your package management system. If installed in a weird place, various XML-related perl modules (especially XML::Parser) may need help building. -- puf - get from http://puf.sourceforge.net/ -- sshd - almost certainly already installed on your system, but you'll need to tweak it - you'll need to set "PermitUserEnvironment yes" in /etc/ssh/sshd_config (or maybe it's called /etc/sshd_config on your system) and restart sshd. Also make sure that you have RSA authentication enabled and that you either have all hosts you need in the known_hosts file or that you have strict host key checking disabled. -- BerkeleyDB version 4.2 - get from http://www.sleepycat.com or install using your package management system. !!! IMPORTANT - The gotcha with this is that the BerkeleyDB perl module, DB_File perl module and the Apache web server need to be linked against the same version of BerkeleyDB. this is fine with the default install but if you install in a weird place you'll need to build both apache and the BerkeleyDB perl module pointed at the right installation. Read the installation information for those packages for more information. You can test to make sure that these modules are installed properly by running the following commands: perl -MBerkeleyDB -MDB_File -e 'print "Checking BerkeleyDB..\n"' If you get an error about the modules being compiled and linked against different versions of BerkeleyDB, you need to manually set the right include and library paths and filenames when building those modules. Download the most recent versions of the BerkeleyDB and DB_File distributions directly from CPAN (http://www.cpan.org - just search for BerkeleyDB and DB_File) and read the included instructions. -- USER SETUP You should create a user specifically for the LSE. Make sure that the correct version of perl, psql and puf are in that user's path. You should create an ~/.ssh/environment file with the PATH setting as well since remote commands run via SSH don't load the profile by default. (this is why we set PermitUserEnvironment before) The following might work for you: (as root): adduser lse; su - lse mkdir ~/.ssh echo "PATH=/usr/local/bin:/usr/bin:/bin" > ~/.ssh/environment You can check which version of things gets run by doing ssh localhost 'echo $PATH' (single quotes are important, otherwise the local shell expands it before sending it through SSH) or ssh localhost 'which perl' Now, create a directory where you want to install LSE. Set the LSE_HOME environment variable to this directory. For example, you could just install it in the lse user's home directory: export LSE_HOME=/home/lse Download the LSE packages with wget: wget http://lse.umiacs.umd.edu/dist/lse-release.tgz wget http://lse.umiacs.umd.edu/dist/lsewww.tgz And extract using GNU tar: tar zxvf lse-release.tgz tar zxvf lsewww.tgz Make sure that LSE_HOME and JAVA_HOME are set to the directory where you want to install LSE and JAVA_HOME is set to the base directory for JDK 1.4.2 or later. The lse packages must be extracted in LSE_HOME. LSE MAIN INSTALLATION Now run the installer: cd lse/dist sh install-lse.sh 2>&1 | tee install-lse.log (NOTICE FOR ALPHA RELEASE: Instead of running the installer directly, you should probably open up install-lse.sh in your favorite editor and copy in commands one at a time - that way if something fails you can fix the problem then instead of having to examine the log output for errors. To be on the safe side you might even want to run the individual cpan commands in cpan-mods.sh by hand.) Various bits will ask you questions, but the defaults are always okay. The only time you should have to make any choices is in selecting a CPAN mirror. This will take a while as it downloads and compiles components. You should check the output for errors when it is done and install any failed perl modules by hand. Perl modules that might have problems are usually those with external dependencies - BerkeleyDB, GD, XML::Parser. LSE WEB INSTALLATION Next run the web installer (again you may want to run the individual commands by hand): sh install-lseweb.sh 2>&1 | tee install-lseweb.log This will build and install Apache and mod_perl. You'll need to answer some questions for Apache about what email address to use for the server administrator, the local host name, the local domain, and the port you want to run on. The defaults should be okay unless they're blank, in which case you need to provide something or Apache may not start. If everything went well, Apache should be started and you should have a new installation of the Linguist's Search Engine. CRONTAB You may need to tweak the crontab in a couple of cases - if you aren't running Vixie cron (e.g. Dillon's cron doesn't allow environment variables in the crontab, so you'd need to put them on the command line -- man cron should tell you what yours does or does not allow) or if the default path can't find the right perl or psql. Usually you can edit the crontab with crontab -e. WHERE TO GO FROM HERE You should be able to start building and annotating collections using the Web interface or with load-xml.pl. You can use lse/diploid/rist/buildindex.sh to create "fast" indices for use with large collections - see the administrator's guide for details. In the beta release, administrative tools for managing annotation types, services, annotators and indices will be provided. If you get errors the first thing to check should be the BerkeleyDB stuff mentioned above. If you get an error you can't figure out feel free to send email to me at lse@umiacs.umd.edu.