- 17 May, 2002 36 commits
-
-
Tim Peters authored
tests that currently fail are currently commented out. Key question: If someone does a search on a stopword, and nothing else is in the query, what do we want to do? Return all docs in a random order? Return no docs? Raise an exception? Second question: What if someone does a query on rare_word AND NOT stop_word ?
-
Jeremy Hylton authored
do the same query and work as ZCTextIndex would do. Produce a result set, pump it into NBest, and extract the 10 best.
-
Tim Peters authored
-
Jeremy Hylton authored
but works with a TextIndex Lexicon.
-
Tim Peters authored
the index knows about the doc and the wid. _del_wordinfo and _add_wordinfo: s/map/doc2score/g. map is a builtin function, and it's needlessly confusing to name a vrbl that too.
-
Tim Peters authored
-
Jeremy Hylton authored
-
Jeremy Hylton authored
I think that the default Lexicon for TextIndex does not use a stop word list. For the comparison with ZCTextIndex, explicitly pass the default stop word dict from TextIndex to the lexicon.
-
Jeremy Hylton authored
-
Jeremy Hylton authored
In unindex_doc() call _del_wordinfo() for each unique wid in the doc, not for each wid. Before we had WidCode and phrase searching, _docwords stored a list of the unique wids. The unindex code wasn't updated when _docwords started storing all the wids, even duplicates. Replace the try/except around __getitem__ in _add_wordinfo() with a .get() call. Add XXX comment about the purpose of the try/except(s) in _del_wordinfo(). I suspect they only existed because _del_wordinfo() was called repeatedly when a wid existed more than once.
-
Guido van Rossum authored
A ZODB import is only redundant if it is not used and does not precede an import from Persistence.
-
Tim Peters authored
It's possible to get OOV wids here due to words the lexicon knows about that the index has no current instances of.
-
Tim Peters authored
in the reindexing text.
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
*any* words in common across the versions. Helped Will along by adding a pragmatic comment to his "knocking indeed" rant. Reworked to use the inscrutable magic of dict.setdefault.
-
Tim Peters authored
-
Tim Peters authored
take longer to construct now; both indexers' _get_frequencies routines were fiddled to return the same kind of stuff again, and I had previously fiddled the cosine indexer's _get_frequencies to do something weirder but (probably) faster than this.
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
code left in it!
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
get_words, and in calling contexts nothing but a list of wids could possibly make sense.
-
Tim Peters authored
I need a break.
-
Tim Peters authored
indexers. CAUTION: I'm sure I don't understand how persistency needs to be spelled. Is it enough to say just that the base class derives from Persistent, or does that need to be duplicated (or done instead exclusively) in the derived classes? Is there a point to keeping "import ZODB" in the derived-class files? Is there a point to keeping it anywhere <wink>?
-
Tim Peters authored
logic to deal with all cases. All the tests pass again.
-
Tim Peters authored
globToWordIds(): This was building a list of words and then throwing it away without referencing it. Deleted the code.
-
- 16 May, 2002 4 commits
-
-
Jeremy Hylton authored
If we update a document and reindex it, ZCTextIndex is currently broken. The test passes py virtue of calling unindex_object() after each update, then calling index_object() again. We need to fix our code, and then remove the calls to unindex_object() from the test. XXX This code causes OkapiIndex to fail because it doesn't expect to have no wordinfo for a wid. I tried to fix this in CosineIndex, but I want to Tim think more about it and try to fix OkapiIndex.
-
Jeremy Hylton authored
-
Jeremy Hylton authored
-
Jeremy Hylton authored
This case can arise when the last occurence of a word is removed, or when a lexicon is shared across multiple indexes. XXX Not sure this code is correct, but it might be and the tests pass. If it's wrong, we need more tests.
-