|
There are thousands of stand-alone Metamorph programs in the field
today, and over time we have received many requests by application
developers who would like to be able to embed our searching
technology inside their particular application. It has taken us a
long time to figure out how to provide a simple and clean method
to provide a solution to their problems. We have tried to make it
as easy as possible while providing the maximum power and
flexibility.
All of the code that comprises Metamorph has been written in ANSI
compliant 'C' Language. The source code to the API (only) is
provided to the programmer for reference and modification.
Metamorph has currently been compiled and tested on 22 different
UNIX platforms, MS-DOS, and IBM MVS. The API can be ported by
Thunderstone to almost any Machine/OS that has an ANSI compliant
'C' compiler.
The set of calls in the API are structured in a fashion similar to
fopen(), fclose(), ftell(), and gets(), standard library
functions. And just like you can have multiple files open at the
same time, you can open as many simultaneous Metamorph queries as
needed. (One reason you might do this is to have a different
search in effect for two different fields of the same record.)
The API itself allows the software engineer to conduct a Metamorph
search through any buffer or file that might contain text. There
are two data structures that are directly involved with the API:
APICP /* this structure contains all the control parameters */
MMAPI /* this structure is passed around to the API calls */
The APICP structure contains all the default parameters required
by the API. It is separate from the MMAPI structure so that its
contents can be easily manipulated by the developer. An APICP
contains the following information:
- A flag telling Metamorph to do suffix processing
- A flag telling it do prefix processing
- A flag that says whether or not to perform word derivations
- The minimum size a word may be processed down to
- The list of suffixes to use in suffix processing
- The list of prefixes to use in prefix processing
- A start delimiter expression
- An end delimiter expression
- A flag indicating to include the starting delimiter in the hit
- A flag indicating to include the ending delimiter in the hit
- A list of high frequency words/phrases to ignore
- The default names of the Thesaurus files
- Two optional, user-written, Thesaurus list editing functions
- The list of suffixes to use in equivs lookup
- A flag indicating to look for the within operator (w/)
- A flag indicating to lookup see references
- A flag indicating to keep equivalences
- A flag indicating to keep noise words
- A user data pointer
Usually the developer will have no need to modify the contents of
this structure more than one time to tailor it to their
application, but in some applications it will be very desirable to
be able to modify its contents dynamically. Two calls are
provided that handle the manipulation of this structure:
APICP * openapicp(void) /* returns an APICP pointer */
APICP * closeapicp(APICP *cp) /* always returns a NULL pointer */
The openapicp() function creates a structure that contains a set
of default parameters and then returns a pointer to it. The
closapicp() function cleans up and releases the memory allocated
by the openapicp() function. Between these two calls the
application developer may modify any of the contents of the APICP
structure.
There are five function calls that are associated with the actual
API retrieval function; they are as follows:
MMAPI *openmmapi(char *query,APICP *cp)
int setmmapi(MMAPI *mm,char *query)
char *getmmapi(MMAPI *mm, char *buf, char *endofbuf, int operation)
int infommapi(MMAPI *mm, int index, char **what, char **where,
int *size)
MMAPI *closemmapi(MMAPI *mm)
The openmmapi() function takes the set of default parameters from
the APICP structure and builds an MMAPI structure that is ready to
be manipulated by the other four functions. It returns a pointer
to this structure.
The setmmapi() function is passed a standard Metamorph query (see
examples) and does all the processing required to get the API
ready to perform a search that will match the query. If the
application program wishes to, it can define a function that will
be called by the setmmapi() function to perform editing of the
word lists and query items before the initialization is completed
(this is not required).
The getmmapi() function performs the actual search of the data.
All that is required is to pass the getmmapi() function the
beginning and ending locations of the data to be searched. There
are two operations that may be performed with the getmmapi() call;
SEARCHNEWBUF and CONTINUESEARCH. Because there may be multiple
hits within a single buffer, the search-new-buf command tells the
API to locate the first hit, and then by using successive calls
with the command continue-search you will locate all the remaining
hits in the buffer.
The infommapi() function returns information about a hit to the
caller; it will give the following information:
- Where the hit is located within the buffer.
- The overall length of the hit.
- For each set in the search that was matched:
- The query set searched for and located.
- The location of the set item.
- The length of the set item.
- The location and length of the start and end delimiters.
The closemmapi() function cleans up and releases the memory
allocated by the openmmapi() call.
The last of the important calls in the API is the function that
reads data in from files. While your application may not require
this function, if files are being read in as text streams the use
of this function is mandated.
int rdmmapi(char *buf,int n,FILE *fh,MMAPI *mm)
This function works very much like fread() with one important
exception; it guarantees that a hit will not be broken across a
buffer boundary. The way it works is as follows:
- A normal
fread() for the number of requested bytes is performed. -
rdmmapi() searches backwards from the end of the buffer for an
occurrence of the ending delimiter regular-expression. - The data that is beyond the last occurrence of an ending
delimiter is pushed back into the input stream. (The method
that is used depends on whether an
fseek() can be performed or
not.)
Copyright © Thunderstone Software Last updated: Sun Mar 17 21:14:49 EDT 2013
|