Welcome to dbFreaks.com!
FAQFAQ      ProfileProfile    Private MessagesPrivate Messages   Log inLog in

Dynamic CONTAINSTABLE query issue

 
   Database Help (Home) -> Full Text RSS
Next:  Variable defaults to zero in script task (SSIS)  
Author Message
Nightcrawler

External


Since: Feb 07, 2008
Posts: 12



(Msg. 1) Posted: Thu Oct 02, 2008 3:50 pm
Post subject: Dynamic CONTAINSTABLE query issue
Archived from groups: microsoft>public>sqlserver>fulltext (more info?)

I am adding a search functionality to my website using a full-text
index but I am running into a problem.

Currently, if a user is searching for "love and happiness", I split
the keywords down using whitespace and add NEAR in between them so it
would look like "love NEAR and NEAR happiness". As you can see the
query is dynamically built using code.

The full query would look something like:

SELECT Description
FROM Catalog INNER JOIN
CONTAINSTABLE(dbo.Catalog, Description, ("love NEAR and NEAR
happiness") AS KEY_TBL ON Catalog.CatalogId = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC

The problem is that this will not return anything because of the "and"
keyword. I assume this is because the word "and" is a reserved keyword
and it will mess the query up. Can anyone confirm?

How should I solve this issue? One way to go about it would be to take
all reserved keywords and put them into a table then check if the
search keywords contain any reserved keywords. If so, remove them from
the string as it is built. Before I go ahead with this I want to
ensure that this indeed is a good solution to the problem.

Any input would be greatfully appreciated.

Thanks

 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Nightcrawler

External


Since: Feb 07, 2008
Posts: 12



(Msg. 2) Posted: Fri Oct 03, 2008 7:52 am
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Well, what I am saying is that search criteria is user based which
means I have no control over what the user types in to the serach box.
So when I tried to search for "love and happiness" my query did not
return anything. However if I removed "and" it did. I was looking to
see:

1. Why this is happening? Is this because i use reserved keywords? If
os, how do you get around it.
2. Is there something fundementaloly wrong with my query?

Thaks fo your help.

 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Russell Fields

External


Since: Jan 10, 2008
Posts: 518



(Msg. 3) Posted: Fri Oct 03, 2008 2:33 pm
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Nightcrawler,

If you look at the definition of CONTAINSTABLE in the Books Online:
http://msdn.microsoft.com/en-us/library/ms189760.aspx

It lists as operators the following string:
{ { AND | & } | { AND NOT | &! } | { OR | | } }
And NEAR is a proximity term.

So, yes, your method is causing the full text engine to look for something
difficult to find. For some people, when the user types AND he is trying to
type LOVE & HAPPINESS, not the three word phrase. LOVE NEAR AND NEAR
HAPPINESS with the proximities around the operator changes what is being
asked.

Simple tests to try to figure out what is going on, assuming that you have
data to support this query:

CONTAINSTABLE (table, column, '"red and not"')
CONTAINSTABLE (table, column, 'red and not')
CONTAINSTABLE (table, column, 'red and "not"')
CONTAINSTABLE (table, column, 'red and not blue')
CONTAINSTABLE (table, column, 'red and "not" blue')
CONTAINSTABLE (table, column, 'red "not"')
CONTAINSTABLE (table, column, 'red and not "not"')
CONTAINSTABLE (table, column, 'red and not not')
et cetera

Now add NEAR into the mix.

So, you nead to do more than just replace ' ' with ' AND ', you will need to
take into account the operator and proximity words as well. And maybe train
your users on what they should mean when they type in a query string.

RLF


"Nightcrawler" <thomas.zaleski.RemoveThis@gmail.com> wrote in message
news:0d8c8d8e-4e46-4d29-b2e9-672d724e8c28@k37g2000hsf.googlegroups.com...
> Well, what I am saying is that search criteria is user based which
> means I have no control over what the user types in to the serach box.
> So when I tried to search for "love and happiness" my query did not
> return anything. However if I removed "and" it did. I was looking to
> see:
>
> 1. Why this is happening? Is this because i use reserved keywords? If
> os, how do you get around it.
> 2. Is there something fundementaloly wrong with my query?
>
> Thaks fo your help.
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Nightcrawler

External


Since: Feb 07, 2008
Posts: 12



(Msg. 4) Posted: Mon Oct 06, 2008 7:34 am
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Russell,

Thanks for you response. I guess what I am really looking to
understand is how search engines using full text indexing solve the
problem of users inputting words as "AND" into the textbox.

It seems to me (if I am wrong please point this out as I am trying to
learn) that wether I use CONTAINSTABLE or FREETEXTTABLE there is
always the risk of the end user inputting and seraching using reserved
keywords as "AND" which can cause the query to return nothing (as my
expample shows above). I also don't belive that educating the end user
is the solution. I don't see Google educating their users as to how
NOT to search. If I go to google and type in "love and happiness" I
get many results. I also do understand that Google probably does not
use SQL Server Full text indexing so the comparison might be unfair.

This leads me to belive two things:

1. I am not building my query correctly. Instead of breaking the word
down into pieces and sticking "NEAR" in between them I should be doing
this another way. "Love and Happiness" is a song title and so far
using NEAR has given me the most accurate results for search for song
titles. Maybe there is another way, if so please let me know.

2. Search engines are stripping out reserved keywords before they
build the query. Again, this is a wild assumtion I have and I might be
totally wrong.

Again, any input is highly appreciated.

Thanks
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Russell Fields

External


Since: Jan 10, 2008
Posts: 518



(Msg. 5) Posted: Mon Oct 06, 2008 11:36 am
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Nightcrawler,

I am sure that I don't understand everything about widely used search
engines. Also, by 'training' your users, it does not necessarily mean
giving them a seminar in Boolean logic, but may mean a user interface that
prompts them to success.

Of course, SQL Server Full-Text indexing is not used by Google (as you
observed) so the details are different. I am sure that they have spent a
whole lot of time tweaking their algorithms to answer the general questions
most effectively. If you play with Google a bit you will see what I mean:

In the general query, you get pretty much the same results (except ordering)
with the following queries
Frank Sinatra AND
AND Frank Sinatra
Frank AND Sinatra

It seems that Google does not pay a lot of attention to the AND as a command
in that case. (Maybe something like Frank AND AND AND Sinatra? I don't
know.) For Google, NEAR is definitely just another word to search for.

However, on Google, if you press the Advanced Search link, you will see
choices like this:
all these words:
this exact wording or phrase:
one or more of these words: OR OR

So you see that they are 'training' any users who want to be more specific,
by giving them choices of how to enter their question. ...all these words:
line is the standard Google query, etc.

For me, I guess that inserting NEAR automatically is making an assumption
about the desired results that may not always match the person's needs.
Providing a slightly richer interface will let people be more specific when
they want to be. (It could include a check box for when you want to add
NEAR into the mix, but in the 'love and happiness' example, the search would
be best fitted by '"love and happiness"'.)

FWIW,
RLF






"Nightcrawler" <thomas.zaleski.TakeThisOut@gmail.com> wrote in message
news:fa025aee-9fd1-4647-a467-1ac8a5cb6d49@v28g2000hsv.googlegroups.com...
> Russell,
>
> Thanks for you response. I guess what I am really looking to
> understand is how search engines using full text indexing solve the
> problem of users inputting words as "AND" into the textbox.
>
> It seems to me (if I am wrong please point this out as I am trying to
> learn) that wether I use CONTAINSTABLE or FREETEXTTABLE there is
> always the risk of the end user inputting and seraching using reserved
> keywords as "AND" which can cause the query to return nothing (as my
> expample shows above). I also don't belive that educating the end user
> is the solution. I don't see Google educating their users as to how
> NOT to search. If I go to google and type in "love and happiness" I
> get many results. I also do understand that Google probably does not
> use SQL Server Full text indexing so the comparison might be unfair.
>
> This leads me to belive two things:
>
> 1. I am not building my query correctly. Instead of breaking the word
> down into pieces and sticking "NEAR" in between them I should be doing
> this another way. "Love and Happiness" is a song title and so far
> using NEAR has given me the most accurate results for search for song
> titles. Maybe there is another way, if so please let me know.
>
> 2. Search engines are stripping out reserved keywords before they
> build the query. Again, this is a wild assumtion I have and I might be
> totally wrong.
>
> Again, any input is highly appreciated.
>
> Thanks
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Russell Fields

External


Since: Jan 10, 2008
Posts: 518



(Msg. 6) Posted: Mon Oct 06, 2008 12:05 pm
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Oh, and about AND. If you don't want your users to type a general string
and have AND, OR, NOT, NEAR recognized as directives, your best bets might
be a) to remove those words from the search criterian or b) insert AND
between every word.

a) love and happiness = love happiness
b) love and happiness = love and and and happiness

Neither of those will favor the phrase "love and happiness" of course, but
b) might be close enough in some cases.

FWIW,
RLF

"Nightcrawler" <thomas.zaleski.TakeThisOut@gmail.com> wrote in message
news:fa025aee-9fd1-4647-a467-1ac8a5cb6d49@v28g2000hsv.googlegroups.com...
> Russell,
>
> Thanks for you response. I guess what I am really looking to
> understand is how search engines using full text indexing solve the
> problem of users inputting words as "AND" into the textbox.
>
> It seems to me (if I am wrong please point this out as I am trying to
> learn) that wether I use CONTAINSTABLE or FREETEXTTABLE there is
> always the risk of the end user inputting and seraching using reserved
> keywords as "AND" which can cause the query to return nothing (as my
> expample shows above). I also don't belive that educating the end user
> is the solution. I don't see Google educating their users as to how
> NOT to search. If I go to google and type in "love and happiness" I
> get many results. I also do understand that Google probably does not
> use SQL Server Full text indexing so the comparison might be unfair.
>
> This leads me to belive two things:
>
> 1. I am not building my query correctly. Instead of breaking the word
> down into pieces and sticking "NEAR" in between them I should be doing
> this another way. "Love and Happiness" is a song title and so far
> using NEAR has given me the most accurate results for search for song
> titles. Maybe there is another way, if so please let me know.
>
> 2. Search engines are stripping out reserved keywords before they
> build the query. Again, this is a wild assumtion I have and I might be
> totally wrong.
>
> Again, any input is highly appreciated.
>
> Thanks
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Daniel Crichton

External


Since: Feb 04, 2008
Posts: 59



(Msg. 7) Posted: Tue Oct 07, 2008 6:25 am
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Nightcrawler wrote on Mon, 6 Oct 2008 07:34:59 -0700 (PDT):

> Russell,

> Thanks for you response. I guess what I am really looking to understand
> is how search engines using full text indexing solve the problem of
> users inputting words as "AND" into the textbox.

> It seems to me (if I am wrong please point this out as I am trying to
> learn) that wether I use CONTAINSTABLE or FREETEXTTABLE there is always
> the risk of the end user inputting and seraching using reserved
> keywords as "AND" which can cause the query to return nothing (as my
> expample shows above). I also don't belive that educating the end user
> is the solution. I don't see Google educating their users as to how
> NOT to search. If I go to google and type in "love and happiness" I get
> many results. I also do understand that Google probably does not use
> SQL Server Full text indexing so the comparison might be unfair.

> This leads me to belive two things:

> 1. I am not building my query correctly. Instead of breaking the word
> down into pieces and sticking "NEAR" in between them I should be doing
> this another way. "Love and Happiness" is a song title and so far using
> NEAR has given me the most accurate results for search for song titles.
> Maybe there is another way, if so please let me know.

> 2. Search engines are stripping out reserved keywords before they build
> the query. Again, this is a wild assumtion I have and I might be
> totally wrong.

Personally, for my own sites, I use #2. Every search string entered is
stripped, parsed, and rebuilt. Words like AND, OR, NOT are used as boolean
operators unless they appear within double quoted phrase sections. For
instance, if you typed in

windows and xp

on one of my sites it would get passed to FTS as

windows and xp

but if you entered

"windows and xp"

it would be passed through to FTS as

"windows and xp"

Over the years I've added a lot of complex query handling (AND,OR,NOT,
parentheses, phrase quoting, + and - to denote words that must or must not
be included, and various other tweaks) but on the whole most queries boil
down to just putting AND between every word. I've not used NEAR yet, it's
something I've considered a few times but haven't gotten around to testing
for suitability with what is being entered by users on my sites.

--
Dan
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Nightcrawler

External


Since: Feb 07, 2008
Posts: 12



(Msg. 8) Posted: Tue Oct 07, 2008 9:50 am
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Dan,

Thanks for you reply. Looks like I might not be that crazy after all,
phew...

On another note, when you parse and rebuild the search criteria do you
strip out all SQL server reserved words? The reason I ask is because
there are quite a few.

Thanks
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Daniel Crichton

External


Since: Feb 04, 2008
Posts: 59



(Msg. 9) Posted: Wed Oct 08, 2008 5:25 am
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

Nightcrawler wrote on Tue, 7 Oct 2008 09:50:41 -0700 (PDT):

> Dan,

> Thanks for you reply. Looks like I might not be that crazy after all,
> phew...

> On another note, when you parse and rebuild the search criteria do you
> strip out all SQL server reserved words? The reason I ask is because
> there are quite a few.


If you mean noise word file, my noise file is empty. This way I don't have
to worry about stripping out noise words from the queries.

--
Dan
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Michael Coles

External


Since: Oct 15, 2008
Posts: 3



(Msg. 10) Posted: Wed Oct 15, 2008 2:41 pm
Post subject: Re: Dynamic CONTAINSTABLE query issue [Login to view extended thread Info.]
Archived from groups: per prev. post (more info?)

There's an sp_configure setting (I believe it's "transform noise words")
that causes your noise words like "and" to be transformed into wildcards
("*"). This setting should allow you to query:

love NEAR and NEAR happiness

Since the "and" will be converted to "*". You might want to check this out
as well:
http://www.sqlservercentral.com/articles/Full-Text+Search+(2008)/64248/


"Nightcrawler" <thomas.zaleski.DeleteThis@gmail.com> wrote in message
news:bec8e775-b674-40da-aa9d-3f39f6ca04e6@r66g2000hsg.googlegroups.com...
>I am adding a search functionality to my website using a full-text
> index but I am running into a problem.
>
> Currently, if a user is searching for "love and happiness", I split
> the keywords down using whitespace and add NEAR in between them so it
> would look like "love NEAR and NEAR happiness". As you can see the
> query is dynamically built using code.
>
> The full query would look something like:
>
> SELECT Description
> FROM Catalog INNER JOIN
> CONTAINSTABLE(dbo.Catalog, Description, ("love NEAR and NEAR
> happiness") AS KEY_TBL ON Catalog.CatalogId = KEY_TBL.[KEY]
> ORDER BY KEY_TBL.RANK DESC
>
> The problem is that this will not return anything because of the "and"
> keyword. I assume this is because the word "and" is a reserved keyword
> and it will mess the query up. Can anyone confirm?
>
> How should I solve this issue? One way to go about it would be to take
> all reserved keywords and put them into a table then check if the
> search keywords contain any reserved keywords. If so, remove them from
> the string as it is built. Before I go ahead with this I want to
> ensure that this indeed is a good solution to the problem.
>
> Any input would be greatfully appreciated.
>
> Thanks
 >> Stay informed about: Dynamic CONTAINSTABLE query issue 
Back to top
Login to vote
Display posts from previous:   
   Database Help (Home) -> Full Text All times are: Pacific Time (US & Canada) (change)
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum



[ Contact us | Terms of Service/Privacy Policy ]