Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
M
MariaDB
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
nexedi
MariaDB
Commits
239f0714
Commit
239f0714
authored
Apr 19, 2002
by
serg@serg.mysql.com
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
boolean fulltext search weighting scheme changed
parent
5f2d79c5
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
49 additions
and
3 deletions
+49
-3
.bzrignore
.bzrignore
+1
-0
Docs/internals.texi
Docs/internals.texi
+44
-1
Docs/manual.texi
Docs/manual.texi
+2
-0
myisam/ft_boolean_search.c
myisam/ft_boolean_search.c
+2
-2
No files found.
.bzrignore
View file @
239f0714
...
@@ -463,3 +463,4 @@ mysql-test/r/rpl000001.eval
...
@@ -463,3 +463,4 @@ mysql-test/r/rpl000001.eval
Docs/safe-mysql.xml
Docs/safe-mysql.xml
mysys/test_vsnprintf
mysys/test_vsnprintf
Docs/manual.de.log
Docs/manual.de.log
Docs/internals.info
Docs/internals.texi
View file @
239f0714
...
@@ -57,6 +57,7 @@ This is a manual about @strong{MySQL} internals.
...
@@ -57,6 +57,7 @@ This is a manual about @strong{MySQL} internals.
* mysys functions:: Functions In The @code
{
mysys
}
Library
* mysys functions:: Functions In The @code
{
mysys
}
Library
* DBUG:: DBUG Tags To Use
* DBUG:: DBUG Tags To Use
* protocol:: MySQL Client/Server Protocol
* protocol:: MySQL Client/Server Protocol
* Fulltext Search:: Fulltext Search in MySQL
@end menu
@end menu
...
@@ -535,7 +536,7 @@ Print query.
...
@@ -535,7 +536,7 @@ Print query.
@end table
@end table
@node protocol,
, DBUG, Top
@node protocol,
Fulltext Search
, DBUG, Top
@chapter MySQL Client/Server Protocol
@chapter MySQL Client/Server Protocol
@menu
@menu
...
@@ -785,6 +786,48 @@ Date 03 0A 00 00 |01 0A |03 00 00 00
...
@@ -785,6 +786,48 @@ Date 03 0A 00 00 |01 0A |03 00 00 00
@c @printindex fn
@c @printindex fn
@node Fulltext Search, , protocol, Top
@chapter Fulltext Search in MySQL
Hopefully, sometime there will be complete description of
fulltext search algorithms.
Now it's just unsorted notes.
@menu
* Weighting in boolean mode::
@end menu
@node Weighting in boolean mode, , , Fulltext Search
@section Weighting in boolean mode
The basic idea is as follows: in expression
@code
{
A or B or (C and D and E)
}
, either @code
{
A
}
or @code
{
B
}
alone
is enough to match the whole expression. While @code
{
C
}
,
@code
{
D
}
, and @code
{
E
}
should @strong
{
all
}
match. So it's
reasonable to assign weight 1 to @code
{
A
}
, @code
{
B
}
, and
@code
{
(C and D and E)
}
. And @code
{
C
}
, @code
{
D
}
, and @code
{
E
}
should get a weight of 1/3.
Things become more complicated when considering boolean
operators, as used in MySQL FTB. Obvioulsy, @code
{
+A +B
}
should be treated as @code
{
A and B
}
, and @code
{
A B
}
-
as @code
{
A or B
}
. The problem is, that @code
{
+A B
}
can @strong
{
not
}
be rewritten in and/or terms (that's the reason why this - extended -
set of operators was chosen). Still, aproximations can be used.
@code
{
+A B C
}
can be approximated as @code
{
A or (A and (B or C))
}
or as @code
{
A or (A and B) or (A and C) or (A and B and C)
}
.
Applying the above logic (and omitting mathematical
transformations and normalization) one gets that for
@code
{
+A
_
1 +A
_
2 ... +A
_
N B
_
1 B
_
2 ... B
_
M
}
the weights
should be: @code
{
A
_
i = 1/N
}
, @code
{
B
_
j=1
}
if @code
{
N==0
}
, and,
otherwise, in the first rewritting approach @code
{
B
_
j = 1/3
}
,
and in the second one - @code
{
B
_
j = (1+(M-1)*2
^
M)/(M*(2
^
(M+1)-1))
}
.
The second expression gives somewhat steeper increase in total
weight as number of matched B's increases, because it assigns
higher weights to individual B's. Also the first expression in
much simplier. So it is the first one, that is implemented in MySQL.
@summarycontents
@summarycontents
@contents
@contents
...
...
Docs/manual.texi
View file @
239f0714
...
@@ -48933,6 +48933,8 @@ Our TODO section contains what we plan to have in 4.0. @xref{TODO MySQL 4.0}.
...
@@ -48933,6 +48933,8 @@ Our TODO section contains what we plan to have in 4.0. @xref{TODO MySQL 4.0}.
@itemize @bullet
@itemize @bullet
@item
@item
Boolean fulltext search weighting scheme changed to something more reasonable.
@item
Fixed bug in boolean fulltext search, that caused MySQL to ignore queries of
Fixed bug in boolean fulltext search, that caused MySQL to ignore queries of
@code{ft_min_word_len} characters.
@code{ft_min_word_len} characters.
@item
@item
myisam/ft_boolean_search.c
View file @
239f0714
...
@@ -322,7 +322,7 @@ void _ftb_climb_the_tree(FTB *ftb, FTB_WORD *ftbw, FT_SEG_ITERATOR *ftsi_orig)
...
@@ -322,7 +322,7 @@ void _ftb_climb_the_tree(FTB *ftb, FTB_WORD *ftbw, FT_SEG_ITERATOR *ftsi_orig)
break
;
break
;
if
(
yn
&
FTB_FLAG_YES
)
if
(
yn
&
FTB_FLAG_YES
)
{
{
ftbe
->
cur_weight
+=
weight
;
ftbe
->
cur_weight
+=
weight
/
ftbe
->
ythresh
;
if
(
++
ftbe
->
yesses
==
ythresh
)
if
(
++
ftbe
->
yesses
==
ythresh
)
{
{
yn
=
ftbe
->
flags
;
yn
=
ftbe
->
flags
;
...
@@ -360,7 +360,7 @@ void _ftb_climb_the_tree(FTB *ftb, FTB_WORD *ftbw, FT_SEG_ITERATOR *ftsi_orig)
...
@@ -360,7 +360,7 @@ void _ftb_climb_the_tree(FTB *ftb, FTB_WORD *ftbw, FT_SEG_ITERATOR *ftsi_orig)
}
}
else
else
{
{
ftbe
->
cur_weight
+=
weight
;
ftbe
->
cur_weight
+=
ftbe
->
ythresh
?
weight
/
3
:
weight
;
if
(
ftbe
->
yesses
<
ythresh
)
if
(
ftbe
->
yesses
<
ythresh
)
break
;
break
;
yn
=
(
ftbe
->
yesses
++
==
ythresh
)
?
ftbe
->
flags
:
0
;
yn
=
(
ftbe
->
yesses
++
==
ythresh
)
?
ftbe
->
flags
:
0
;
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment