- 18 May, 2002 1 commit
-
-
Tim Peters authored
-
- 17 May, 2002 39 commits
-
-
Tim Peters authored
uncomment the test cases that were failing in these contexts. Read it and weep <wink>: In an AND context, None is treated like the universal set, which jibes with the convenient fiction that stop words appear in every doc. However, in AND NOT and OR contexts, None is treated like the empty set, which doesn't jibe with anything except that we want real_word AND NOT stop_word and real_word OR stop_word to act like real_word If we treated None as if it were the universal set, these results would be (respectively) the empty set and the universal set instead. At a higher level, we *are* consistent with the notion that a query with a stop word acts the same as if the clause with the stop word weren't present. That's what really drives this schizophrenic (context-dependent) treatment of None.
-
Jeremy Hylton authored
-
Tim Peters authored
empty.
-
Tim Peters authored
-
Tim Peters authored
tests that currently fail are currently commented out. Key question: If someone does a search on a stopword, and nothing else is in the query, what do we want to do? Return all docs in a random order? Return no docs? Raise an exception? Second question: What if someone does a query on rare_word AND NOT stop_word ?
-
Jeremy Hylton authored
do the same query and work as ZCTextIndex would do. Produce a result set, pump it into NBest, and extract the 10 best.
-
Tim Peters authored
-
Jeremy Hylton authored
but works with a TextIndex Lexicon.
-
Tim Peters authored
the index knows about the doc and the wid. _del_wordinfo and _add_wordinfo: s/map/doc2score/g. map is a builtin function, and it's needlessly confusing to name a vrbl that too.
-
Tim Peters authored
-
Jeremy Hylton authored
-
Jeremy Hylton authored
I think that the default Lexicon for TextIndex does not use a stop word list. For the comparison with ZCTextIndex, explicitly pass the default stop word dict from TextIndex to the lexicon.
-
Jeremy Hylton authored
-
Jeremy Hylton authored
In unindex_doc() call _del_wordinfo() for each unique wid in the doc, not for each wid. Before we had WidCode and phrase searching, _docwords stored a list of the unique wids. The unindex code wasn't updated when _docwords started storing all the wids, even duplicates. Replace the try/except around __getitem__ in _add_wordinfo() with a .get() call. Add XXX comment about the purpose of the try/except(s) in _del_wordinfo(). I suspect they only existed because _del_wordinfo() was called repeatedly when a wid existed more than once.
-
Guido van Rossum authored
A ZODB import is only redundant if it is not used and does not precede an import from Persistence.
-
Tim Peters authored
It's possible to get OOV wids here due to words the lexicon knows about that the index has no current instances of.
-
Tim Peters authored
in the reindexing text.
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
*any* words in common across the versions. Helped Will along by adding a pragmatic comment to his "knocking indeed" rant. Reworked to use the inscrutable magic of dict.setdefault.
-
Tim Peters authored
-
Tim Peters authored
take longer to construct now; both indexers' _get_frequencies routines were fiddled to return the same kind of stuff again, and I had previously fiddled the cosine indexer's _get_frequencies to do something weirder but (probably) faster than this.
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
code left in it!
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
-
Tim Peters authored
get_words, and in calling contexts nothing but a list of wids could possibly make sense.
-
Tim Peters authored
I need a break.
-
Tim Peters authored
indexers. CAUTION: I'm sure I don't understand how persistency needs to be spelled. Is it enough to say just that the base class derives from Persistent, or does that need to be duplicated (or done instead exclusively) in the derived classes? Is there a point to keeping "import ZODB" in the derived-class files? Is there a point to keeping it anywhere <wink>?
-
Tim Peters authored
logic to deal with all cases. All the tests pass again.
-