Show admin-guide.tex syntax highlighted
\documentclass[10pt]{article}
\usepackage{times} \usepackage{url} \usepackage{here}
\usepackage{graphicx}
%\documentstyle[times,art10,twocolumn,latex8]{article}
\oddsidemargin=0.10in %left margin is 1 inch on right-hand pages
\evensidemargin=0.10in %same for left-hand pages in twosided document
\textwidth=6.3in
\topmargin=-0.8in
\textheight=9.82in
%---------------------------------------------------------
\pagestyle{empty}
%---------------------------------------------------------
\begin{document}
\vspace*{2in}
\begin{center}
\begin{figure}[htb]
\begin{center}
\includegraphics[width=8.0cm]{gblogo_600}
\label{cover_logo}
\end{center}
\end{figure}
\section*{GridBlocks DISK version 1.0 - Admin Guide}
\end{center}
\clearpage
This administration guide describes setting up and maintaining GB-DISK,
the guide is intended for system administrators. It is assumed that the
reader is already familiar
with the GB-DISK concept. The basic introduction to DISK can be found
from the separate User's Guide.
%---------------------------------------------------------------
\section{Installation}
%---------------------------------------------------------------
This section guides you through the installation of GB-DISK.
There are many different ways to install the servers. The easiest way is
to use standalone server releases
which contain everything needed to run the storage service. The only
prerequisite is Java Runtime Environment version 5 or later. The standalone
releases are bundled with Jetty 6.0 application server and preconfigured
to run without any modifications.
Many other application servers are supported as well but may require modifying
some configuration files manually before the service works properly.
In principle, WAR release should be compatible with any Servlet container but
it is only tested under Apache Tomcat and JBoss.
\subsection{Starting Standalone DISK Service}
The standalone package of the front-end is started with a simple
command:
\begin{verbatim}
cd temp1
java -jar gb-disk-fe.jar
\end{verbatim}
Storage elements are started with a similar command. {\it Notice}, that you cannot bind
multiple services into the same port number on one host. The port can be changed during
the start-up with additional parameter {\tt -Djetty.port=8080} or from the configuration
file.
\begin{verbatim}
cd temp2
java -Djetty.port=8081 -jar gb-disk-se.jar
\end{verbatim}
The directory where the server is started is used as working directory to
store any temporary and permanent files. It is essential to NOT run
multiple servers from the same directory since the files will
conflict.
{\it Webapp} folder is extracted from the JAR archive when the service is started.
The extraction is done only if files are not yet found or the JAR file contains newer
version of the files. Your modifications into configuration files under
{\it webapp} directory doesn't
get overwritten unless you are updating the DISK into newer version.
Notice that it takes up to one minute until all dynamic security settings are
inserted to WebDAV server and it is accessible. More at Section \ref{policies}.
Binary build is shipped with multicast based information system. You can
start FEs and SEs in same multicast domain, and the services find each
other automatically. However, default settings in SVN project use
\url{http://opengrid.cern.ch:8888/gb-disk} testbed which is protected
with certificates. You can ask a copy of the certificate
by \url{support-gridblocks@cern.ch} mail, or create your own test bed.
\subsubsection{Alternative Way}
Alternatively, the release package can be extracted manually and started with
normal direct Java command.
\begin{verbatim}
jar -xf gb-disk-fe.jar
java -cp .:lib/jetty-6.0.1.jar:\
lib/servlet-api-2.5-6.0.1.jar:\
lib/jetty-util-6.0.1.jar:\
lib/jetty-plus-6.0.1.jar:\
ext/gb-disk.jar:\
ext/commons-logging.jar:\
ext/log4j-1.2.9.jar:\
ext/geronimo-jta.jar fi.hip.gb.disk.BootStrap
\end{verbatim}
\subsection{Installing from Sources}
The source distribution has to be extracted and compiled first. The building is done by
executing a command at the source folder:
\begin{verbatim}
sh build.sh release
or on Windows:
Build.bat release
\end{verbatim}
Release files are generated into {\bf output} directory. More
complete building instructions can be found from Section
\ref{devguide}.
\subsection{Tomcat Application Server}
The WAR release should be copied into the \$TOMCAT\_HOME/webapps directory.
Depending on Tomcat's configuration,
this might require restarting the server before the service is
deployed.
JAAS support is required for user accounts to work. Instructions for
Tomcat can be found from
\url{http://wiki.apache.org/jakarta-slide/TomcatSetup}. Basically
three steps are required. First, add the following configuration into
the Host section of \$TOMCAT\_HOME/conf/server.xml.
\begin{verbatim}
<Realm className="org.apache.catalina.realm.JAASRealm"
appName="slide_login"
userClassNames="org.apache.slide.jaas.spi.SlidePrincipal"
roleClassNames="org.apache.slide.jaas.spi.SlideRole"
name="GB-DISK WebApp"
useContextClassLoader="false" />
\end{verbatim}
Then create a \$CATALINA\_HOME/conf/login.config file with content:
\begin{verbatim}
slide_login {
org.apache.slide.jaas.spi.SlideLoginModule required
namespace=slide;
};
\end{verbatim}
The server need to be started with an additional environmental
variable pointing to the above file:
\begin{verbatim}
export CATALINA_OPTS=-Djava.security.auth.login.config=$CATALINA_HOME/conf/login.config
\end{verbatim}
After these operations Slide's JAAS login module is used
for authentication. Users and passwords are set through GB-DISK as
described in Section \ref{users}.
\subsection{JBoss Application Server}
In JBoss 4.x, copy the SAR release file into
\$JBOSS\_HOME/server/default/deploy directory. To enable JAAS
security, follow instructions described in
\url{http://wiki.apache.org/jakarta-slide/JbossSetup}. In most cases it
requires only adding the following lines into
\$JBOSS\_HOME/server/default/conf/login-config.xml:
\begin{verbatim}
<application-policy name = "slide_login">
<authentication>
<login-module
code="org.apache.slide.jaas.spi.SlideLoginModule"
flag = "required">
<module-option name="namespace">slide</module-option>
</login-module>
</authentication>
</application-policy>
\end{verbatim}
\section{Configuration}\label{config}
When GB-DISK server is started, the config file is extracted into {\bf
webapps/gb-disk/WEB-INF/classes/gb-disk.conf} and can be modified to
change coding parameters, protocol configurations etc. Modifications
does not come into effect before the server is restarted. Generic
settings used both by front-end and storage elements:
\vspace*{1.0cm}
\begin{tabular}{p{4.0cm} p{8.0cm}}
\hline
{\bf storage.silo} & The directory for WebDAV files, relative to
the working directory or an absolute path.\\
\hline
{\bf storage.encode} & Encode file names stored on the local hard drive. Required
to be TRUE on some Linux environments when non-ASCII characters are used on file names.
Windows servers do not require encoding in most cases.\\
\hline
{\bf auth.root.cred} & The root password for the WebDAV server. If not set,
a random password is generated when the server is started.\\
\hline
{\bf auth.guest.read} & Set value to TRUE if the root folder of the
server should be readable for unauthenticated users. If set to FALSE,
only registered users are able to read files.\\
\hline
{\bf auth.guest.write} & Set value to TRUE if the root folder of the
server should be writeable for unauthenticated users. If set to
FALSE, only registered users are able to modify files.\\
\hline
{\bf network.transport} & The transport protocol (http/jgroups/gridftp)
used by the server. Used for the dynamic server discovery.\\
\hline
{\bf network.port} & The port number the HTTP server listens is used for
publishing the contact URL of the DISK service. On standalone
server, this value sets Jetty's port number and can be overridden with {\it jetty.port} environmental
variable. On other application servers, this number must match the incoming HTTP port.\\
\hline
{\bf network.context} & The context path of the GB-DISK service used
for publishing the contact URL of the DISK service. If omitted, the value is
searched automatically from the name of the WAR-file. On standalone service,
this setting sets the context for DISK service.\\
\hline
{\bf infosys.groupinfo.class} & The class name for the group
communication and for discovery mechanism info system implementation. If omitted,
no group communication is used (personal release).\\
\hline
{\bf infosys.groupinfo.props} & The location of the group info
configuration file.\\
\hline
{\bf infosys.securityinfo.class} & A semicolon separated list of class
names of security subsystem implementations. The first item is the
the primary security info system class name and the rest are added
as listeners for changes.\\
\end{tabular}
\vspace*{0.5cm}
\newpage
Settings used by front-ends only:
\vspace*{1.0cm}
\begin{tabular}{p{4.0cm} p{8.0cm}}
\hline
{\bf storage.temp} & The directory for temporary files, relative to the
working directory or an absolute path.\\
\hline
{\bf coding.k} & The coding parameter indicating how many stripes are needed for
decoding.\\
\hline
{\bf coding.n} & The coding factor indicating how many stripes are created all
together.\\
\hline
{\bf coding.packet.size} & The packet size used for encoding files.\\
\hline
{\bf scheduler.class} & The class name of the scheduler implementation.\\
\hline
{\bf scheduler.ping} & Should we use ping to find the fastest storage elements.\\
\hline
{\bf scheduler.filter} & Should we limit the storage element only for the same country.
Works only properly ONLY if hostnames contain the proper country code.\\
{\bf network.put.operation} & Change the behaviour for WebDAV uploads.
0=normal, 1=foreground, 2=foreground and no caching.\\
\hline
{\bf infosys.directory} & The persistent directory used to store information system files,
relative to the working directory or absolute path.\\
\hline
{\bf infosys.fileinfo.class} & The class name for the file indexing info system
implementation. If omitted, no indexing info system and the server acts as a storage-element.\\
\hline
{\bf infosys.fileinfo.prop}& The location of the file indexing configuration file.\\
\end{tabular}
\vspace*{0.5cm}
Settings for storage elements only:
\vspace*{1.0cm}
\begin{tabular}{p{4.0cm} p{8.0cm}}
\hline
{\bf storage.quota} & How many megabytes of storage space is
contributed, can be zero if no files are accepted.\\
\hline
{\bf auth.user} & Username and credentials of the disk used for
transfering files to WebDAV server. Separated with colon. \\
\end{tabular}
\vspace*{0.5cm}
\section{Network Protocols}
\subsubsection{JGroups}
\label{jgroups}
Front-ends are capable of finding available storage elements automatically
by using JGroups group communication channel. GB-DISK is preconfigured to work with two different
types of discovery methods, either multicasting on LAN environments or
gossiping over the Internet. Any other communication method can
be used by changing JGroups configuration files property {\bf infosys.groupinfo.props}.
Select tcp-channel.xml for TCP based communication or mc-channel.xml for multicasting
inside LAN environments.
JBossCache information system between front-ends uses JGroups as well
for communication and discovery. However, the channel is different
from the one used by group communication, so different configuration
need to be modified for information system. {\bf infosys.fileinfo.props} property sets the config file used by
cache. The file {\bf gbinfo-service.xml} can be replaced any of the
alternative files in the same directory. TCP is used for Internet use
and UDP configuration for LAN use.
The anonymous writes are disabled by default in the binary
distribution. If you want to write to your own gb-disk installation
you need to download also the gb-disk-src-1.0.0.jar. Then extract the
sources and build by running {\tt sh build.sh compile-aopc} then you
can add users and passwords by running {\tt sh disk.sh adduser}.
When TCP protocol is used for communication, a list of known services
have to be set in JGroups configurations (TCPPING field). These known
hosts are used as access point to the group membership. Ports used by
default are 7800 for group communication and 7900 for information
system. No firewalls can exists between any node. However, JGroups
Router services could be used to bypass firewalls.
Multicasting is easier to configure since nodes are able to find each others by
sending multicast messages to the network and no firewalls are blocking the traffic inside LAN.
For more information, refer to JGroups documents at
\url{http://www.jgroups.org}.
\subsection{HTTP}
For file transfers over HTTP protocol, firewalls need to be open
for the port used by the HTTP server. The default port is 8080.
\section{Making DISK Secure}
\subsection{Administrative User Accounts}
There are two build-in users: {\it root} and {\it disk}. The {\it root} user account
is used by GB-DISK to manage WebDAV service. Slide's WebDAV extensions
supports controlling of user accounts and changing folder priviledges.
The {\it root} password can also be used to access
all files on the server for performing administative tasks. In most cases the root
password shouldn't be used directly. {\it Disk} user is used
to transfer files between servers. Storage elements mediate their
passwords through group communication channel so that group members
can upload/delete files.
If passwords are not defined in the DISK configuration file
{\bf auth.root.cred} and {\bf auth.user}, they are replaced with
a random integer when the server is started.
\subsection{Personal User accounts}
\label{users}
Authentication for front-ends is done with personal user accounts.
Currently user management is done by using {\bf TestDemo} command line
utility operated thought {\bf disk.sh} script. Make sure that your
configuration files have the matching settings for your front-ends (under
{\it src/etc} directory). Ant release {\it fe-release} have to be run
before all AOP classes are compiled.
Usage of the script:
\begin{verbatim}
sh disk.sh adduser username password [group1 group2...]
sh disk.sh rmuser username
sh disk.sh passwd username newpassword
sh disk.sh addgroup username newgroup1 [newgroup2...]
\end{verbatim}
Since user accounts are stored in the distributed information system,
modifications can be done from any computer having access to the information
system services (correct config files and certificates if required).
Also there is no need to update anything on the servers when user database
is modified. Each front-end
detects modification of the user accounts and sets changes to the
local WebDAV server using Slide's WebDAV extension. The
process might take up to 30 seconds before all changes are replicated
among front-end nodes and user accounts are valid everywhere.
\subsection{Policies and Access Rights}
\label{policies}
All access to WebDAV server is by default denied in the Domain.xml file. When
the server is starting these restrictions are relaxed dynamically
to match the server policy configurations. This process might take
up to 30 seconds, and the service is unavailable until the process
is completed.
By default, nonauthenticated users are able to read files in the root directory
but cannot make any modifications.
When dealing with confidential files this behavior should be changed by setting {\bf
auth.guest.read} to FALSE in the GB-DISK configuration file. The storage can be made
writeable for everyone by changing the {\bf auth.guest.write} property. These
configurations do not affect the {\bf private} folder which is always
protected with passwords.
\subsection{Storage Element Security}\label{sesec}
Only two services are exposed to the internet: WebDAV server used for file transfer
and JGroups for group communication.
WebDAV service is secured with basic HTTP authentication. SSL communication is preferred
in order to keep the passwords secure.
JGroups' AUTH protocol is used to protect and control the access to the group.
Only nodes possessing the valid private X509 key are allowed to become members.
To simplify GB-DISK installation, JGroup's X509Token implementation is modified
to fall back to plain-text authentication if no certificate is found. Make sure to disable
this functionality in real production systems by modifying the AUTH configuration parameter
{\bf cert\_require}.
Certificates can be generated with Java's keytool. Use the same password you
have set in the JGroups' configuration file.
The \$GB-HOME/.keytool file is included into release files from Ant build scripts.
Alternatively, you can copy the certificate file into the server's working directory.
\begin{verbatim}
keytool -genkey -keystore .keystore -keyalg rsa -alias disk
\end{verbatim}
\subsection{Front-End Security}\label{fesec}
Front-ends are storage elements with additional information system service.
JBossCache runs on top of JGroups so in practise there is a separate JGroups channel to
protect. This is done in similar way to storage element as descriped in Section \ref{sesec}.
The whole security of GB-DISK is controlled by the information
system. If an intruder gets an access to the meta-data, all user
passwords are also compromised. The meta-data system contains all information
required to locate and decode files. This also means that removing file
entries from the meta-data system makes it impossible to decode
stripes back to original files. Invidual stripes cannot be identified
without their meta-data. It is very important to make GB-DISK information system secure.
\subsection{QA}
{\bf Q: Howto deal with multiple network interfaces?}
{\bf A:} You may need to add routing command to the desired interface
to get multicast messages through:
\begin{verbatim}
route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0
\end{verbatim}
If JGroups binds to wrong interface or address, start the software with
following command-line option:
\begin{verbatim}
-Dbind.address=192.168.1.2
\end{verbatim}
{\bf Q: Enabling debugging of the Jetty server?}
{\bf A:} Start the software with following options:
\begin{verbatim}
-Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n
\end{verbatim}
%---------------------------------------------------------------
\section{Developer Guide} \label{devguide}
%---------------------------------------------------------------
This section contains technical information about the GB-DISK for
developers.
{\bf Gb-disk-src.jar} file contains all files needed to build GB-DISK from
sources. The latest code is always available from Sourceforge SVN
under {\it trunk/gb-disk} folder. {\it Trunk/builder} is also required (included in
the source release).
Ant-script is used for building and releasing the sources.
The complete list of build targets can be seen with {\tt ant -projecthelp} command.
The most common build operations:
\begin{verbatim}
all Clean, compile and release everything
compile Compile all class files
docs Build Java documentation
release Release all modules
-> se-release Build storage element distribution.
-> fe-release Build front-end distribution including metadata service.
-> client-release Release the client
-> personal-release Release the personal file sharing version.
-> planetlab-release Release the wide area client for PlanetLab tests
-> src-release Release the source
test Unit testing
clean Clean up everything
\end{verbatim}
The best place to look into GB-DISK development status is our JIRA server running
at \url{http://opengrid.cern.ch:8080/jira}. For implementation specific details,
check out Javadocs and source code.
\end{document}
See more files for this project here