- 23 May, 2002 1 commit
-
-
Guido van Rossum authored
valid value is input, or the empty string, and interpret the empty string as the default. Indicate the default in the prompt.
-
- 22 May, 2002 12 commits
-
-
Guido van Rossum authored
Add glob support to the HTMLWordSplitter class.
-
Casey Duncan authored
selected in a mutally exclusive manner (such as splitters). Existing pipeline elements have been grouped appropriately. Added a stop word remover that does not remove single char words. Modified ZMI lexicon add form to use pipeline element groups to render form. Groups with multiple elements are rendered as selects, singletons are rendered as checkboxes.
-
Guido van Rossum authored
-
Guido van Rossum authored
but the pattern may not begin with a glob character (else someone specifying "*" as the pattern can tie up the CPU for a long time).
-
Andreas Jung authored
class
-
Andreas Jung authored
and recognizes the header attribute
-
Casey Duncan authored
* A pipeline factory registry now allows registration of possible pipeline elements for use by Zope lexicons. * ZMI constructor form for lexicon uses pipeline registry to generate form fields * ZMI constructor form for ZCTextindex allows you to choose between Okapi and Cosine relevance algorithms
-
Guido van Rossum authored
-
Fred Drake authored
instead of an extension type, and let StopWordRemover be a Python class that uses the helper if available.
-
Andreas Jung authored
-
Shane Hathaway authored
-
Tim Peters authored
-
- 21 May, 2002 12 commits
-
-
Jeremy Hylton authored
already ditched Python 1.5.2. The version of tempfile is many revision behind the one in the Python std library.
-
Guido van Rossum authored
Remove redundant import. Ensure that ZCTextIndex implements the PluggableIndexInterface by adding an unimplemented uniqueValues() method.
-
Andreas Jung authored
(similiar to getPhysicalPath())
-
Guido van Rossum authored
Verify that ZCTextIndex implements the PluggableIndexInterface.
-
Guido van Rossum authored
-
Guido van Rossum authored
neither 'pass' (v 1.2) nor 'break' (v 1.3) but 'continue'. Whitespace normalization.
-
Tim Peters authored
loop-invariant, save a little time by multiplying idf by 1024. outside the loop.
-
Tim Peters authored
-
Guido van Rossum authored
the number of words in the index (at least to return a number comparable to the number displayed under "# objects" by TextIndex).
-
Guido van Rossum authored
Index management screen. Ditto for clear(). So group them together and adjust the comment. (So is manage_main, but since it's a DTML method, it can stay in its separate UI group.)
-
Andreas Jung authored
-
Guido van Rossum authored
still only supports a trailing *, so the pipeline should honor that; added a comment to the Splitter class referring to globToWordIds().
-
- 20 May, 2002 15 commits
-
-
Tim Peters authored
well check it in. This yields an overall 133% speedup on a "hot" search for 'python' in my python-dev archive (a word that appears in all but 2 documents). For those who read the email, turned out it was a significant speedup to iterate over an IIBTree's items rather than to materialize the items into an explicit list first. This is now within 20% of simply doing "IIBucket(the_IIBTree)" (i.e., no arithmetic at all), so there's no significant possibility remaining for speeding the inner score loop.
-
Guido van Rossum authored
creating it anonymously and then pulling it out of the zc_index object.
-
Guido van Rossum authored
once we have more than one on the menu.)
-
Guido van Rossum authored
in percentages; strip the percent sign to avoid a traceback calling int() when these variables are used.
-
Guido van Rossum authored
I'm unclear whether this is really the right thing, but at least this prevents crashes when nothing is entered in the search box.
-
Guido van Rossum authored
_fieldname; simply return 0 in this case.
-
Guido van Rossum authored
is *disabled*.
-
Guido van Rossum authored
-
Guido van Rossum authored
-
Guido van Rossum authored
Fix typo in docstring.
-
Guido van Rossum authored
- Rephrased the description of the grammar, pointing out that the lexicon decides on globbing syntax. - Refactored term and atom parsing (moving atom parsing into a separate method). The previously checked-in version accidentally accepted some invalid forms like ``foo AND -bar''; this is fixed. tests/testQueryParser.py: - Each test is now in a separate method; this produces more output (alas) but makes pinpointing the errors much simpler. - Added some tests catching ``foo AND -bar'' and similar. - Added an explicit test class for the handling of stopwords. The "and/" test no longer has to check self.__class__. - Some refactoring of the TestQueryParser class; the utility methods are now in a base class TestQueryParserBase, in a different order; compareParseTrees() now shows the parse tree it got when raising an exception. The parser is now self.parser instead of self.p (see below). tests/testZCTextIndex.py: - setUp() no longer needs to assign to self.p; the parser is consistently called self.parser now.
-
Guido van Rossum authored
:-)
-
Guido van Rossum authored
-
Guido van Rossum authored
ILexicon.py: - Added parseTerms() and isGlob(). - Added get_word(), get_wid() (get_word() is old; get_wid() for symmetry). - Reflowed some text. IQueryParser.py: - Expanded docs for parseQuery(). - Added getIgnored() and parseQueryEx(). IPipelineElement.py: - Added processGlob(). Lexicon.py: - Added parseTerms() and isGlob(). - Added get_wid(). - Some pipeline elements now support processGlob(). ParseTree.py: - Clarified the error message for calling executeQuery() on a NotNode. QueryParser.py (lots of changes): - Change private names __tokens etc. into protected _tokens etc. - Add getIgnored() and parseQueryEx() methods. - The atom parser now uses the lexicon's parseTerms() and isGlob() methods. - Query parts that consist only of stopwords (as determined by the lexicon), or of stopwords and negated terms, yield None instead of a parse tree node; the ignored term is added to self._ignored. None is ignored when combining terms for AND/OR/NOT operators, and when an operator has no non-None operands, the operator itself returns None. When this None percolates all the way to the top, the parser raises a ParseError exception. tests/testQueryParser.py: - Changed test expressions of the form "a AND b AND c" to "aa AND bb AND cc" so that the terms won't be considered stopwords. - The test for "and/" can only work for the base class. tests/testZCTextIndex.py: - Added copyright notice. - Refactor testStopWords() to have two helpers, one for success, one for failures. - Change testStopWords() to require parser failure for those queries that have only stopwords or stopwords plus negated terms. - Improve compareSet() to sort the sets of keys, and use a more direct way of extracting the keys. This wasn't strictly needed (nothing fails without this), but the old approach of copying the keys into a dict in a loop depends on the dict hashing to always return keys in the same order.
-
Matt Behrens authored
guido@. when/if merge day comes for the installer this will make for less confusion :-)
-