Learn more  » Push, build, and install  RubyGems npm packages Python packages Maven artifacts PHP packages Go Modules Bower components Debian packages RPM packages NuGet packages

aaronreidsmith / scipy   python

Repository URL to install this package:

Version: 1.3.3 

/ linalg / src / id_dist / doc / doc.tex

\documentclass[letterpaper,12pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{verbatim}
\usepackage{amsmath}
\usepackage{supertabular}
\usepackage{array}

\def\T{{\hbox{\scriptsize{\rm T}}}}
\def\epsilon{\varepsilon}
\def\bigoh{\mathcal{O}}
\def\phi{\varphi}
\def\st{{\hbox{\scriptsize{\rm st}}}}
\def\th{{\hbox{\scriptsize{\rm th}}}}
\def\x{\mathbf{x}}


\title{ID: A software package for low-rank approximation
       of matrices via interpolative decompositions, Version 0.4}
\author{Per-Gunnar Martinsson, Vladimir Rokhlin,\\
        Yoel Shkolnisky, and Mark Tygert}


\begin{document}

\maketitle

\newpage

{\parindent=0pt

The present document and all of the software
in the accompanying distribution (which is contained in the directory
{\tt id\_dist} and its subdirectories, or in the file
{\tt id\_dist.tar.gz})\, is

\bigskip

Copyright \copyright\ 2014 by P.-G. Martinsson, V. Rokhlin,
Y. Shkolnisky, and M. Tygert.

\bigskip

All rights reserved.

\bigskip

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

\begin{enumerate}
\item Redistributions of source code must retain the above copyright
notice, this list of conditions, and the following disclaimer.
\item Redistributions in binary form must reproduce the above copyright
notice, this list of conditions, and the following disclaimer in the
documentation and/or other materials provided with the distribution.
\item None of the names of the copyright holders may be used to endorse
or promote products derived from this software without specific prior
written permission.
\end{enumerate}

\bigskip

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNERS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

}

\newpage

\tableofcontents

\newpage



\hrule

\medskip

\centerline{\Large \bf IMPORTANT}

\medskip

\hrule

\medskip

\noindent At the minimum, please read Subsection~\ref{warning}
and Section~\ref{naming} below, and beware that the {\it N.B.}'s
in the source code comments highlight key information about the routines;
{\it N.B.} stands for {\it nota bene} (Latin for ``note well'').

\medskip

\hrule

\bigskip



\section{Introduction}

This software distribution provides Fortran routines
for computing low-rank approximations to matrices,
in the forms of interpolative decompositions (IDs)
and singular value decompositions (SVDs).
The routines use algorithms based on the ID.
The ID is also commonly known as
the approximation obtained via skeletonization,
the approximation obtained via subsampling,
and the approximation obtained via subset selection.
The ID provides many advantages in many applications,
and we suspect that it will become increasingly popular
once tools for its computation become more widely available.
This software distribution includes some such tools,
as well as tools for computing low-rank approximations
in the form of SVDs.
Section~\ref{defs} below defines IDs and SVDs,
and provides references to detailed discussions of the algorithms
used in this software package.

Please beware that normalized power iterations are better suited than
the software in this distribution
for computing principal component analyses
in the typical case when the square of the signal-to-noise ratio
is not orders of magnitude greater than both dimensions
of the data matrix; see~\cite{halko-martinsson-tropp}.

The algorithms used in this distribution have been optimized
for accuracy, efficiency, and reliability;
as a somewhat counterintuitive consequence, many must be randomized.
All randomized codes in this software package succeed
with overwhelmingly high probability (see, for example,
\cite{halko-martinsson-tropp}).
The truly paranoid are welcome to use the routines {\tt idd\_diffsnorm}
and {\tt idz\_diffsnorm} to evaluate rapidly the quality
of the approximations produced by the randomized algorithms
(as done, for example, in the files
{\tt idd\_a\_test.f}, {\tt idd\_r\_test.f}, {\tt idz\_a\_test.f},
and {\tt idz\_r\_test.f} in the {\tt test} subdirectory
of the main directory {\tt id\_dist}).
In most circumstances, evaluating the quality of an approximation
via routines {\tt idd\_diffsnorm} or {\tt idz\_diffsnorm} is much faster
than forming the approximation to be evaluated. Still, we are unaware
of any instance in which a properly-compiled routine failed to produce
an accurate approximation.
To facilitate successful compilation, we encourage the user
to read the instructions in the next section,
and to read Section~\ref{naming}, too.



\section{Compilation instructions}


Followed in numerical order, the subsections of this section
provide step-by-step instructions for compiling the software
under a Unix-compatible operating system.


\subsection{Beware that default command-line flags may not be
            sufficient for compiling the source codes!}
\label{warning}

The Fortran source codes in this distribution pass {\tt real*8}
variables as integer variables, integers as {\tt real*8}'s,
{\tt real*8}'s as {\tt complex*16}'s, and so on.
This is common practice in numerical codes, and is not an error;
be sure to provide the relevant command-line flags to the compiler
(for example, run {\tt fort77} and {\tt f2c} with the flag {\tt -!P}).
When following the compilation instructions
in Subsection~\ref{makefile_edit} below,
be sure to set {\tt FFLAGS} appropriately.


\subsection{Install LAPACK}

The SVD routines in this distribution depend on LAPACK.
Before compiling the present distribution,
create the LAPACK and BLAS archive (library) {\tt .a} files;
information about installing LAPACK is available
at {\tt http://www.netlib.org/lapack/} (and several other web sites).


\subsection{Decompress and untar the file {\tt id\_dist.tar.gz}}

At the command line, decompress and untar the file
{\tt id\_dist.tar.gz} by issuing a command such as
{\tt tar -xvvzf id\_dist.tar.gz}.
This will create a directory named {\tt id\_dist}.


\subsection{Edit the Makefile}
\label{makefile_edit}

The directory {\tt id\_dist} contains a file named {\tt Makefile}.
In {\tt Makefile}, set the following:
%
\begin{itemize}
\item {\tt FC} is the Fortran compiler.
\item {\tt FFLAGS} is the set of command-line flags
      (specifying optimization settings, for example)
      for the Fortran compiler specified by {\tt FC};
      please heed the warning in Subsection~\ref{warning} above!
\item {\tt BLAS\_LIB} is the file-system path to the BLAS archive
      (library) {\tt .a} file.
\item {\tt LAPACK\_LIB} is the file-system path to the LAPACK archive
      (library) {\tt .a} file.
\item {\tt ARCH} is the archiver utility (usually {\tt ar}).
\item {\tt ARCHFLAGS} is the set of command-line flags
      for the archiver specified by {\tt ARCH} needed
      to create an archive (usually {\tt cr}).
\item {\tt RANLIB} is to be set to {\tt ranlib}
      when {\tt ranlib} is available, and is to be set to {\tt echo}
      when {\tt ranlib} is not available.
\end{itemize}


\subsection{Make and test the libraries}

At the command line in a shell that adheres
to the Bourne shell conventions for redirection, issue the command
``{\tt make clean; make}'' to both create the archive (library)
{\tt id\_lib.a} and test it.
(In most modern Unix distributions, {\tt sh} is the Bourne shell,
or else is fully compatible with the Bourne shell;
the Korn shell {\tt ksh} and the Bourne-again shell {\tt bash}
also use the Bourne shell conventions for redirection.)
{\tt make} places the file {\tt id\_lib.a}
in the directory {\tt id\_dist}; the archive (library) file
{\tt id\_lib.a} contains machine code for all user-callable routines
in this distribution.



\section{Naming conventions}
\label{naming}

The names of routines and files in this distribution
start with prefixes, followed by an underscore (``\_'').
The prefixes are two to four characters in length,
and have the following meanings:
%
\begin{itemize}
\item The first two letters are always ``{\tt id}'',
      the name of this distribution.
\item The third letter (when present) is either ``{\tt d}''
      or ``{\tt z}'';
      ``{\tt d}'' stands for double precision ({\tt real*8}),
      and ``{\tt z}'' stands for double complex ({\tt complex*16}).
\item The fourth letter (when present) is either ``{\tt r}''
      or ``{\tt p}'';
      ``{\tt r}'' stands for specified rank,
      and ``{\tt p}'' stands for specified precision.
      The specified rank routines require the user to provide
      the rank of the approximation to be constructed,
      while the specified precision routines adjust the rank adaptively
      to attain the desired precision.
\end{itemize}

For example, {\tt iddr\_aid} is a {\tt real*8} routine which computes
an approximation of specified rank.
{\tt idz\_snorm} is a {\tt complex*16} routine.
{\tt id\_randperm} is yet another routine in this distribution.



\section{Example programs}

For examples of how to use the user-callable routines
in this distribution, see the source codes in subdirectory {\tt test}
of the main directory {\tt id\_dist}.



\section{Directory structure}

The main {\tt id\_dist} directory contains a Makefile,
the auxiliary text files {\tt README.txt} and {\tt size.txt},
and the following subdirectories, described in the subsections below:
%
\begin{enumerate}
\item {\tt bin}
\item {\tt development}
\item {\tt doc}
\item {\tt src}
\item {\tt test}
\item {\tt tmp}
\end{enumerate}
%
If a ``{\tt make all}'' command has completed successfully,
then the main {\tt id\_dist} directory will also contain
an archive (library) file {\tt id\_lib.a} containing machine code
for all of the user-callable routines.


\subsection{Subdirectory {\tt bin}}

Once all of the libraries have been made via the Makefile
in the main {\tt id\_dist} directory,
the subdirectory {\tt bin} will contain object files (machine code),
each compiled from the corresponding file of source code
in the subdirectory {\tt src} of {\tt id\_dist}.


\subsection{Subdirectory {\tt development}}

Each Fortran file in the subdirectory {\tt development}
(except for {\tt dfft.f} and {\tt prini.f})
specifies its dependencies at the top, then provides a main program
for testing and debugging, and finally provides source code
for a library of user-callable subroutines.
The Fortran file {\tt dfft.f} is a copy of P. N. Swarztrauber's FFTPACK library
for computing fast Fourier transforms.
The Fortran file {\tt prini.f} is a copy of V. Rokhlin's library
of formatted printing routines.
Both {\tt dfft.f} (version 4) and {\tt prini.f} are in the public domain.
The shell script {\tt RUNME.sh} runs shell scripts {\tt make\_src.sh}
and {\tt make\_test.sh}, which fill the subdirectories {\tt src}
and {\tt test} of the main directory {\tt id\_dist}
with source codes for user-callable routines
and with the main program testing codes.


\subsection{Subdirectory {\tt doc}}

Subdirectory {\tt doc} contains this documentation,
supplementing comments in the source codes.


\subsection{Subdirectory {\tt src}}

The files in the subdirectory {\tt src} provide source code
for software libraries. Each file in the subdirectory {\tt src}
(except for {\tt dfft.f} and {\tt prini.f}) is
Loading ...