Commits · 5da9eb6bc9468485bab0e15cba9eb7d244ff5b8a · Kirill Smelkov / Zope

19 May, 2002 4 commits
- Gave it a "-c NNN" context argument (how many leading lines of result · 5da9eb6b
  Tim Peters authored May 19, 2002
```
msgs to display).  Changed the module docstring to separate the index-
generation args from the query args.
```
  5da9eb6b
- Oops! Call the right routine (typo in code just checked in). · a0360090
  Tim Peters authored May 19, 2002
  
  a0360090
- Beef up the reindexing tests to check that they actually fail before the · 94b452e8
  Tim Peters authored May 19, 2002
```
original doc text gets restored.
```
  94b452e8
- QueryParser refactoring step 1: add the lexicon to the constructor args. · bd532bbe
  Guido van Rossum authored May 19, 2002
  
  bd532bbe
18 May, 2002 5 commits
- Rearrange the Okapi reindexing tests to make it easier to figure out what · 97fbb9c9
  Tim Peters authored May 18, 2002
```
went wrong if they fail.
```
  97fbb9c9
- Restore CONTEXT to its original value. · 466d0130
  Tim Peters authored May 18, 2002
  
  466d0130
- Revert braindead change to final pack (it was my change, so it's OK for · f835a0c2
  Tim Peters authored May 18, 2002
```
me to call it braindead <wink>).
```
  f835a0c2
- Pack at the end even if the # of msgs isn't an exact multiple of · 1e8f93fb
  Tim Peters authored May 18, 2002
```
PACK_INTERVAL.
```
  1e8f93fb
- Display total pack time at the end. · eb8de680
  Tim Peters authored May 18, 2002
  
  eb8de680
17 May, 2002 31 commits
- Special-case None search() results in AND, AND NOT, and OR contexts, and · dfbfbe55
  Tim Peters authored May 17, 2002
```
uncomment the test cases that were failing in these contexts.

Read it and weep <wink>:  In an AND context, None is treated like the
universal set, which jibes with the convenient fiction that stop words
appear in every doc.  However, in AND NOT and OR contexts, None is
treated like the empty set, which doesn't jibe with anything except that
we want

    real_word AND NOT stop_word

and

    real_word OR stop_word

to act like

    real_word

If we treated None as if it were the universal set, these results would
be (respectively) the empty set and the universal set instead.

At a higher level, we *are* consistent with the notion that a query with
a stop word acts the same as if the clause with the stop word weren't
present.  That's what really drives this schizophrenic (context-dependent)
treatment of None.
```
  dfbfbe55
- Use the same stop list for both indexes. · f968ebb5
  Jeremy Hylton authored May 17, 2002
  
  f968ebb5
- testDocUpdate(): assert that the common and unique wordsets aren't · 138b3120
  Tim Peters authored May 17, 2002
```
empty.
```
  138b3120
- Added more little OOV query tests. · 4fe5e70c
  Tim Peters authored May 17, 2002
  
  4fe5e70c
- Added a number of tests to trigger search-can-return-None bugs. The three · f4e63c3e
  Tim Peters authored May 17, 2002
```
tests that currently fail are currently commented out.

Key question:  If someone does a search on a stopword, and nothing else is
in the query, what do we want to do?  Return all docs in a random order?
Return no docs?  Raise an exception?

Second question:  What if someone does a query on

    rare_word AND NOT stop_word

?
```
  f4e63c3e
- If -T is passed (query with old TextIndex), try as best as possible to · 86e12d94
  Jeremy Hylton authored May 17, 2002
```
do the same query and work as ZCTextIndex would do.

Produce a result set, pump it into NBest, and extract the 10 best.
```
  86e12d94
- Reindex docs touching as few docid->w(docid, w) maps as possible. · 86fc53ee
  Tim Peters authored May 17, 2002
  
  86fc53ee
- Add a little splitter that behaves pretty much like HTMLWordSplitter, · bad257b8
  Jeremy Hylton authored May 17, 2002
```
but works with a TextIndex Lexicon.
```
  bad257b8
- _del_wordinfo(): Simplify. It's the caller's responsibility to ensure that · 81682acc
  Tim Peters authored May 17, 2002
```
the index knows about the doc and the wid.

_del_wordinfo and _add_wordinfo:  s/map/doc2score/g.  map is a builtin
function, and it's needlessly confusing to name a vrbl that too.
```
  81682acc
- Improve OOV explanation, based on Guido's feedback. · 92c26bc8
  Tim Peters authored May 17, 2002
  
  92c26bc8
- Implement unique using an IITreeSet as suggested by Tim. · 9b736188
  Jeremy Hylton authored May 17, 2002
  
  9b736188
- Make sure stop words are used with old TextIndex. · 0d93f320
  Jeremy Hylton authored May 17, 2002
```
I think that the default Lexicon for TextIndex does not use a stop
word list.  For the comparison with ZCTextIndex, explicitly pass the
default stop word dict from TextIndex to the lexicon.
```
  0d93f320
- Shorten comment so it fits on line. · 8915733b
  Jeremy Hylton authored May 17, 2002
  
  8915733b
- Two changes and a question posing as a comment. · 504af04c
  Jeremy Hylton authored May 17, 2002
```
In unindex_doc() call _del_wordinfo() for each unique wid in the doc,
not for each wid.  Before we had WidCode and phrase searching,
_docwords stored a list of the unique wids.  The unindex code wasn't
updated when _docwords started storing all the wids, even duplicates.

Replace the try/except around __getitem__ in _add_wordinfo() with a
.get() call.

Add XXX comment about the purpose of the try/except(s) in
_del_wordinfo().  I suspect they only existed because _del_wordinfo()
was called repeatedly when a wid existed more than once.
```
  504af04c
- Remove redundant imports of ZODB. · cd596b3f
  Guido van Rossum authored May 17, 2002
```
A ZODB import is only redundant if it is not used and does not
precede an import from Persistence.
```
  cd596b3f
- search_glob(): nuke the OOV wids (if any) before calling _search_wids. · ac419c5b
  Tim Peters authored May 17, 2002
```
It's possible to get OOV wids here due to words the lexicon knows
about that the index has no current instances of.
```
  ac419c5b
- Implement correct (albeit inefficient) reindexing, and stop cheating · 9032867d
  Tim Peters authored May 17, 2002
```
in the reindexing text.
```
  9032867d
- Remove more needless imports. · eebb1a61
  Tim Peters authored May 17, 2002
  
  eebb1a61
- Put an XXX on the important line. · 3e375ea8
  Tim Peters authored May 17, 2002
  
  3e375ea8
- testDocUpdate(): Thanks to stop-word removal, there weren't actually · f2a03547
  Tim Peters authored May 17, 2002
```
*any* words in common across the versions.  Helped Will along by adding
a pragmatic comment to his "knocking indeed" rant.  Reworked to use
the inscrutable magic of dict.setdefault.
```
  f2a03547
- Moved a comment that got disconnected from its class. · 35879b41
  Tim Peters authored May 17, 2002
  
  35879b41
- Factor out most of the code for indexing a doc. The cosine index may · 460bcba1
  Tim Peters authored May 17, 2002
```
take longer to construct now; both indexers' _get_frequencies routines
were fiddled to return the same kind of stuff again, and I had
previously fiddled the cosine indexer's _get_frequencies to do something
weirder but (probably) faster than this.
```
  460bcba1
- Got rid of _add_undoinfo; it's clearer to do the one-liner inline. · 09afa685
  Tim Peters authored May 17, 2002
  
  09afa685
- Remove unused imports. · 7634c526
  Tim Peters authored May 17, 2002
  
  7634c526
- Factor out the code for unindexing a doc. OkapiIndex has very little · 4b9d6844
  Tim Peters authored May 17, 2002
```
code left in it!
```
  4b9d6844
- Added get_words() to the interface. · d398b7ba
  Tim Peters authored May 17, 2002
  
  d398b7ba
- Changed something -- it can't be very important, because I already forgot. · 50f60da8
  Tim Peters authored May 17, 2002
  
  50f60da8
- More docstring corrections. · bed1daed
  Tim Peters authored May 17, 2002
  
  bed1daed
- Documented that .search() has changed. · 7d79b5a5
  Tim Peters authored May 17, 2002
  
  7d79b5a5
- Combine _{add,del}wordinfo; make length() part of IIndex. · 2af0a364
  Tim Peters authored May 17, 2002
  
  2af0a364
- Compute scaled_int the same way everywhere. · 6cb8ab91
  Tim Peters authored May 17, 2002
  
  6cb8ab91