- 22 May, 2002 5 commits
-
-
Guido van Rossum authored
-
Fred Drake authored
instead of an extension type, and let StopWordRemover be a Python class that uses the helper if available.
-
Andreas Jung authored
-
Shane Hathaway authored
-
Tim Peters authored
-
- 21 May, 2002 12 commits
-
-
Jeremy Hylton authored
already ditched Python 1.5.2. The version of tempfile is many revision behind the one in the Python std library.
-
Guido van Rossum authored
Remove redundant import. Ensure that ZCTextIndex implements the PluggableIndexInterface by adding an unimplemented uniqueValues() method.
-
Andreas Jung authored
(similiar to getPhysicalPath())
-
Guido van Rossum authored
Verify that ZCTextIndex implements the PluggableIndexInterface.
-
Guido van Rossum authored
-
Guido van Rossum authored
neither 'pass' (v 1.2) nor 'break' (v 1.3) but 'continue'. Whitespace normalization.
-
Tim Peters authored
loop-invariant, save a little time by multiplying idf by 1024. outside the loop.
-
Tim Peters authored
-
Guido van Rossum authored
the number of words in the index (at least to return a number comparable to the number displayed under "# objects" by TextIndex).
-
Guido van Rossum authored
Index management screen. Ditto for clear(). So group them together and adjust the comment. (So is manage_main, but since it's a DTML method, it can stay in its separate UI group.)
-
Andreas Jung authored
-
Guido van Rossum authored
still only supports a trailing *, so the pipeline should honor that; added a comment to the Splitter class referring to globToWordIds().
-
- 20 May, 2002 15 commits
-
-
Tim Peters authored
well check it in. This yields an overall 133% speedup on a "hot" search for 'python' in my python-dev archive (a word that appears in all but 2 documents). For those who read the email, turned out it was a significant speedup to iterate over an IIBTree's items rather than to materialize the items into an explicit list first. This is now within 20% of simply doing "IIBucket(the_IIBTree)" (i.e., no arithmetic at all), so there's no significant possibility remaining for speeding the inner score loop.
-
Guido van Rossum authored
creating it anonymously and then pulling it out of the zc_index object.
-
Guido van Rossum authored
once we have more than one on the menu.)
-
Guido van Rossum authored
in percentages; strip the percent sign to avoid a traceback calling int() when these variables are used.
-
Guido van Rossum authored
I'm unclear whether this is really the right thing, but at least this prevents crashes when nothing is entered in the search box.
-
Guido van Rossum authored
_fieldname; simply return 0 in this case.
-
Guido van Rossum authored
is *disabled*.
-
Guido van Rossum authored
-
Guido van Rossum authored
-
Guido van Rossum authored
Fix typo in docstring.
-
Guido van Rossum authored
- Rephrased the description of the grammar, pointing out that the lexicon decides on globbing syntax. - Refactored term and atom parsing (moving atom parsing into a separate method). The previously checked-in version accidentally accepted some invalid forms like ``foo AND -bar''; this is fixed. tests/testQueryParser.py: - Each test is now in a separate method; this produces more output (alas) but makes pinpointing the errors much simpler. - Added some tests catching ``foo AND -bar'' and similar. - Added an explicit test class for the handling of stopwords. The "and/" test no longer has to check self.__class__. - Some refactoring of the TestQueryParser class; the utility methods are now in a base class TestQueryParserBase, in a different order; compareParseTrees() now shows the parse tree it got when raising an exception. The parser is now self.parser instead of self.p (see below). tests/testZCTextIndex.py: - setUp() no longer needs to assign to self.p; the parser is consistently called self.parser now.
-
Guido van Rossum authored
:-)
-
Guido van Rossum authored
-
Guido van Rossum authored
ILexicon.py: - Added parseTerms() and isGlob(). - Added get_word(), get_wid() (get_word() is old; get_wid() for symmetry). - Reflowed some text. IQueryParser.py: - Expanded docs for parseQuery(). - Added getIgnored() and parseQueryEx(). IPipelineElement.py: - Added processGlob(). Lexicon.py: - Added parseTerms() and isGlob(). - Added get_wid(). - Some pipeline elements now support processGlob(). ParseTree.py: - Clarified the error message for calling executeQuery() on a NotNode. QueryParser.py (lots of changes): - Change private names __tokens etc. into protected _tokens etc. - Add getIgnored() and parseQueryEx() methods. - The atom parser now uses the lexicon's parseTerms() and isGlob() methods. - Query parts that consist only of stopwords (as determined by the lexicon), or of stopwords and negated terms, yield None instead of a parse tree node; the ignored term is added to self._ignored. None is ignored when combining terms for AND/OR/NOT operators, and when an operator has no non-None operands, the operator itself returns None. When this None percolates all the way to the top, the parser raises a ParseError exception. tests/testQueryParser.py: - Changed test expressions of the form "a AND b AND c" to "aa AND bb AND cc" so that the terms won't be considered stopwords. - The test for "and/" can only work for the base class. tests/testZCTextIndex.py: - Added copyright notice. - Refactor testStopWords() to have two helpers, one for success, one for failures. - Change testStopWords() to require parser failure for those queries that have only stopwords or stopwords plus negated terms. - Improve compareSet() to sort the sets of keys, and use a more direct way of extracting the keys. This wasn't strictly needed (nothing fails without this), but the old approach of copying the keys into a dict in a loop depends on the dict hashing to always return keys in the same order.
-
Matt Behrens authored
guido@. when/if merge day comes for the installer this will make for less confusion :-)
-
- 19 May, 2002 6 commits
-
-
Tim Peters authored
display the search time in milliseconds too.
-
Tim Peters authored
for start and end of run. Show elapsed wall-clock time in minutes.
-
Tim Peters authored
msgs to display). Changed the module docstring to separate the index- generation args from the query args.
-
Tim Peters authored
-
Tim Peters authored
original doc text gets restored.
-
Guido van Rossum authored
-
- 18 May, 2002 2 commits
-
-
Tim Peters authored
went wrong if they fail.
-
Tim Peters authored
-