|
| 10.3.2 Delimiting an Entire Region |
|
We want to do a text search against our table for a user's query,
but limit the results to, say, a 100x100 mile box centered on a
specific location, say the center of the user's city. Thus we
can find results in adjacent cities, states or ZIP codes that aren't
given in our query but are nonetheless physically nearby.
We modify our table schema and add the following columns:
- Lat
The latitude of the attraction, as an integer, in DDDMMSS
format. Thus, 40 degrees 30 minutes 45 seconds north latitude
is 403045.
- Lon
The longitude, in the same format.
- GeoCode
A long
field which we'll use in a moment.
We also need a table, city
, which lists cities and
their latitude/longitudes, so we can look them up. (Thunderstone
has such data already available, if you contact us.)
The slow/impossible way
At first, this would seem to be the best way to limit our
search to a boxed region:
<SQL MAX=1 "select lat, lon from city
where Place\State like $city
order by Pop desc">
<SQL MAX=10 "select Name from tourist
where Desc likep $query
and Lat between $lat - 10000 and $lat + 10000
and Lon between $lon - 10000 and $lon + 10000">
$Name <BR>
</SQL>
</SQL>
|
First, we look up the user's given $city
in our city
table. This gives us $lat
and $lon
for the center of town. (Note the order-by: we want larger cities
first, eg. Cleveland, OH is probably preferred over Cleveland, TN
if the state wasn't mentioned.)
2 degrees is roughly 120 miles, so we search within +/- 1 degree
of $lat
/$lon
in our inner <SQL>
. Our
latitude/longitude data is in DDDMMSS format, as integers, so 10000
is 1 degree 0 minutes 0 seconds. Now we have a boxed regional
search, centered on a specific location.
The problem
The problem is the post-processing required here. Any of the
three clauses in the inner query, the $query
clause, the
latitude search, and the longitude search, could easily return a
huge number of rows by themselves, no matter which we index
and we put first. Even though the overall result may be a small
number of rows, a lot of index rows would have to be ANDed to
produce the results - slow.
The better way
Our actual geographic region is very small. If we could somehow
tie the latitude and longitude together into one value, a search
against it would return much fewer rows from such a (combined) index.
But that requires a two-dimensional search against a one-dimensional
index; how do we do that?
The <geo2code>
function helps us out here. Available
in version 2.1.904800000 (Sep. 2 1998) and later, it takes a
latitude/longitude pair - in DDDMMSS format - and returns a single
integer that we can store in our GeoCode
column for each row.
We then create a normal index on this column (and drop our lat/long
indexes, by the way; we won't need them). The between
operator has special knowledge of how to search such a column, when
given a pair of encoded integers.
First, we need to encode all our locations in the table. It's
faster to do this by copying to another table than updating in
place. We run this code once to create our new tourist
table:
<SQL ROW "select * from oldtourist">
<geo2code $Lat $Lon>
<SQL MAX=1 NOVARS "insert into tourist
values($Name, $Desc, $Lat, $Lon, $ret)">
</SQL>
</SQL>
|
For each row, <geo2code>
returns a long containing
both latitude and longitude, encoded in a special way. Note the ROW
flag to avoid wasting memory. Then we drop oldtourist
and make our indexes on our new tourist
table: Metamorph
compound inverted on Desc,GeoCode
:
create metamorph inverted index xdesc on tourist(Desc,GeoCode);
Now we're ready to modify our search:
<SQL MAX=1 "select lat, lon from city
where Place\State like $city
order by Pop desc">
<geo2code $lat $lon 10000>
<SQL "select Name
from tourist
where Desc like $query and GeoCode between " $ret>
$Name
</SQL>
</SQL>
|
Here we added a third parameter to <geo2code>
, a
"radius". Given a radius (in DDDMMSS format), <geo2code>
returns a string containing 2 numbers, instead of one. These
two numbers represent the boxed region centered on $lat
/
$lon
with sides of length 20000 (ie. a "radius" of
10000).
We pass this encoding to our SQL statement (but as part of the
statement, not a parameter, because Texis must accept the
double-value that way).
Advantages
Now we've got the ability to set an exact box over our desired
region, regardless of state boundaries. And it's far faster than
our top example, because lat/long are combined into one value.
Also, because of the compound index, we can resolve both the
like
and the latitude/longitude part of the query with
one index.
Disadvantages
The enclosing region is a box, whereas we might want an exact
circle for radius. It's possible to add the math to check the
results and make the square region into a circle, if that's of
concern. The main point to understand here is that with <geo2code>
, we've reduced the enormous cost of regional
searching combined with text searches to a manageable speed.
The hard part is done; post-processing a small result set into
a circle is much faster than trying to process the entire data set.
|