Commits · 90b46b5959ff0d0d137e57e3a878587fad7a3b82 · Kirill Smelkov / Zope

23 May, 2002 1 commit

Improve the interactive loop to get the encoding; stop only when a · 90b46b59

Guido van Rossum authored May 23, 2002

valid value is input, or the empty string, and interpret the empty
string as the default.  Indicate the default in the prompt.

90b46b59

22 May, 2002 12 commits
- Get rid of the unused HTMLSplitter class (it's too simple). · a8226c4e
  Guido van Rossum authored May 22, 2002
```
Add glob support to the HTMLWordSplitter class.
```
  a8226c4e
- Enhanced pipeline element factory so that you can group elements that must be · fbd41e2f
  Casey Duncan authored May 22, 2002
```
selected in a mutally exclusive manner (such as splitters).

Existing pipeline elements have been grouped appropriately.

Added a stop word remover that does not remove single char words.

Modified ZMI lexicon add form to use pipeline element groups to render form.
Groups with multiple elements are rendered as selects, singletons are rendered
as checkboxes.
```
  fbd41e2f
- Follow-up changes for complete globbing. · 5402d08f
  Guido van Rossum authored May 22, 2002
  
  5402d08f
- Add full globbing. This implements * and ? like in the shell, · d04e5363
  Guido van Rossum authored May 22, 2002
```
but the pattern may not begin with a glob character (else
someone specifying "*" as the pattern can tie up the CPU for
a long time).
```
  d04e5363
- removed document() and xref() since they are identical with the base · fcb27991
  Andreas Jung authored May 22, 2002
```
class
```
  fcb27991
- document() function of HTMLWithImages now behaves like in HTMLClass · 6b3c9211
  Andreas Jung authored May 22, 2002
```
and recognizes the header attribute
```
  6b3c9211
- Improved Zope integration · f6a8b104
  Casey Duncan authored May 22, 2002
```
  * A pipeline factory registry now allows registration of possible
    pipeline elements for use by Zope lexicons.

  * ZMI constructor form for lexicon uses pipeline registry to generate form
    fields

  * ZMI constructor form for ZCTextindex allows you to choose between
    Okapi and Cosine relevance algorithms
```
  f6a8b104
- Oops! Somehow the hidden input field didn't work. Reverting the last checkin. · 4b2ced78
  Guido van Rossum authored May 22, 2002
  
  4b2ced78
- Simplify the "stopper" helper module -- just define a simple function · 8e4b9efe
  Fred Drake authored May 22, 2002
```
instead of an extension type, and let StopWordRemover be a Python class
that uses the helper if available.
```
  8e4b9efe
- *** empty log message *** · 5387492a
  Andreas Jung authored May 22, 2002
  
  5387492a
- Don't fail if no URL is available. · 4758f493
  Shane Hathaway authored May 22, 2002
  
  4758f493
- Just improved the comment at the top. · 5438d262
  Tim Peters authored May 22, 2002
  
  5438d262
21 May, 2002 12 commits
- As per the previous checkin message, I'm ditching this module since we · e5151c87
  Jeremy Hylton authored May 21, 2002
```
already ditched Python 1.5.2.  The version of tempfile is many
revision behind the one in the Python std library.
```
  e5151c87
- Normalize import statement formatting. · 1de61b50
  Guido van Rossum authored May 21, 2002
```
Remove redundant import.
Ensure that ZCTextIndex implements the PluggableIndexInterface by
adding an unimplemented uniqueValues() method.
```
  1de61b50
- the object hook/attribute can now return/be a tuple · c9d911de
  Andreas Jung authored May 21, 2002
```
(similiar to getPhysicalPath())
```
  c9d911de
- Normalize import statement formatting. · a85837ed
  Guido van Rossum authored May 21, 2002
```
Verify that ZCTextIndex implements the PluggableIndexInterface.
```
  a85837ed
- Normalize import statement formatting. · fedeec20
  Guido van Rossum authored May 21, 2002
  
  fedeec20
- Three's a charm: the right way to skip the rest of a loop body is · df474242
  Guido van Rossum authored May 21, 2002
```
neither 'pass' (v 1.2) nor 'break' (v 1.3) but 'continue'.

Whitespace normalization.
```
  df474242
- Since every score is of the form (tf * idf * 1024. + .5), and idf is · aafd0e49
  Tim Peters authored May 21, 2002
```
loop-invariant, save a little time by multiplying idf by 1024. outside
the loop.
```
  aafd0e49
- PyInt_FromLong() can fail, so check the return for NULL. · e44e9e9d
  Tim Peters authored May 21, 2002
  
  e44e9e9d
- length() is used by ZCTextIndex.numWords() -- it is supposed to return · a9357e8e
  Guido van Rossum authored May 21, 2002
```
the number of words in the index (at least to return a number
comparable to the number displayed under "# objects" by TextIndex).
```
  a9357e8e
- I figured out what numObjects() is for -- it is used by ZCatalog's · 8b4268a8
  Guido van Rossum authored May 21, 2002
```
Index management screen.  Ditto for clear().  So group them together
and adjust the comment.  (So is manage_main, but since it's a DTML
method, it can stay in its separate UI group.)
```
  8b4268a8
- Collector 396/397: applied patches for better XHTML compatiblity · 864dbb9f
  Andreas Jung authored May 21, 2002
  
  864dbb9f
- globToWordIds() shouldn't make assumptions about the pipeline. It · 0a0f97a7
  Guido van Rossum authored May 21, 2002
```
still only supports a trailing *, so the pipeline should honor that;
added a comment to the Splitter class referring to globToWordIds().
```
  0a0f97a7
20 May, 2002 15 commits

Since I did the work to write the inner Okapi scoring loop in C, may as · 315bcde9

Tim Peters authored May 20, 2002

well check it in. This yields an overall 133% speedup on a "hot" search
for 'python' in my python-dev archive (a word that appears in all but
2 documents). For those who read the email, turned out it was a
significant speedup to iterate over an IIBTree's items rather than to
materialize the items into an explicit list first.

This is now within 20% of simply doing "IIBucket(the_IIBTree)" (i.e.,
no arithmetic at all), so there's no significant possibility remaining
for speeding the inner score loop.

315bcde9

setUp(): assign the lexicon to self.lexicon directly rather than · 53b46dc9
Guido van Rossum authored May 20, 2002
```
creating it anonymously and then pulling it out of the zc_index
object.
```
53b46dc9
Always have a splitter. (We'll change this to a choice of splitters · 0ff6d33b
Guido van Rossum authored May 20, 2002
```
once we have more than one on the menu.)
```
0ff6d33b
pt_changePrefs(): the dtprefs_cols/rows arguments could be expressed · d53e1580
Guido van Rossum authored May 20, 2002
```
in percentages; strip the percent sign to avoid a traceback calling
int() when these variables are used.
```
d53e1580

_apply_index(): return None when the query string is empty. · 130af9ce

Guido van Rossum authored May 20, 2002

I'm unclear whether this is really the right thing, but at least this
prevents crashes when nothing is entered in the search box.

130af9ce

index_object(): don't die if obj doesn't have an attribute named · 68957496
Guido van Rossum authored May 20, 2002
```
_fieldname; simply return 0 in this case.
```
68957496
Fix a typo. Since the latest change, this always reported "Globbing · 0a97b655
Guido van Rossum authored May 20, 2002
```
is *disabled*.
```
0a97b655
Remove Michel's personal homepage from the link to the ZopeBook. · 3daabd82
Guido van Rossum authored May 20, 2002

3daabd82
Add Zope Copyright notice. · 90bae6a7
Guido van Rossum authored May 20, 2002

90bae6a7
Add Zope Copyright notice. · 53c5d967
Guido van Rossum authored May 20, 2002
```
Fix typo in docstring.
```
53c5d967

QueryParser.py: · 47bb995d

Guido van Rossum authored May 20, 2002

- Rephrased the description of the grammar, pointing out that the
  lexicon decides on globbing syntax.

- Refactored term and atom parsing (moving atom parsing into a
  separate method).  The previously checked-in version accidentally
  accepted some invalid forms like ``foo AND -bar''; this is fixed.

tests/testQueryParser.py:

- Each test is now in a separate method; this produces more output
  (alas) but makes pinpointing the errors much simpler.

- Added some tests catching ``foo AND -bar'' and similar.

- Added an explicit test class for the handling of stopwords.  The
  "and/" test no longer has to check self.__class__.

- Some refactoring of the TestQueryParser class; the utility methods
  are now in a base class TestQueryParserBase, in a different order;
  compareParseTrees() now shows the parse tree it got when raising an
  exception.  The parser is now self.parser instead of self.p (see
  below).

tests/testZCTextIndex.py:

- setUp() no longer needs to assign to self.p; the parser is
  consistently called self.parser now.

47bb995d

Fix unintended recursion in parseQueryEx(). (Unittests are coming up! · 98607a5c
Guido van Rossum authored May 20, 2002
```
:-)
```
98607a5c
Limit copyright to 2002; none of this code existed last year. · 9491bc84
Guido van Rossum authored May 20, 2002

9491bc84

Refactor the query parser to rely on the lexicon for parsing terms. · b82b2746

Guido van Rossum authored May 20, 2002

ILexicon.py:

  - Added parseTerms() and isGlob().

  - Added get_word(), get_wid() (get_word() is old; get_wid() for symmetry).

  - Reflowed some text.

IQueryParser.py:

  - Expanded docs for parseQuery().

  - Added getIgnored() and parseQueryEx().

IPipelineElement.py:

  - Added processGlob().

Lexicon.py:

  - Added parseTerms() and isGlob().

  - Added get_wid().

  - Some pipeline elements now support processGlob().

ParseTree.py:

  - Clarified the error message for calling executeQuery() on a
    NotNode.

QueryParser.py (lots of changes):

  - Change private names __tokens etc. into protected _tokens etc.

  - Add getIgnored() and parseQueryEx() methods.

  - The atom parser now uses the lexicon's parseTerms() and isGlob()
    methods.

  - Query parts that consist only of stopwords (as determined by the
    lexicon), or of stopwords and negated terms, yield None instead of
    a parse tree node; the ignored term is added to self._ignored.
    None is ignored when combining terms for AND/OR/NOT operators, and
    when an operator has no non-None operands, the operator itself
    returns None.  When this None percolates all the way to the top,
    the parser raises a ParseError exception.

tests/testQueryParser.py:

  - Changed test expressions of the form "a AND b AND c" to "aa AND bb
    AND cc" so that the terms won't be considered stopwords.

  - The test for "and/" can only work for the base class.

tests/testZCTextIndex.py:

  - Added copyright notice.

  - Refactor testStopWords() to have two helpers, one for success, one
    for failures.

  - Change testStopWords() to require parser failure for those queries
    that have only stopwords or stopwords plus negated terms.

  - Improve compareSet() to sort the sets of keys, and use a more
    direct way of extracting the keys.  This wasn't strictly needed
    (nothing fails without this), but the old approach of copying the
    keys into a dict in a loop depends on the dict hashing to always
    return keys in the same order.

b82b2746

revert stopper setup.py-age; stopper is not in the Zope module. ok · 5f66a3ce
Matt Behrens authored May 20, 2002
```
guido@.

when/if merge day comes for the installer this will make for less
confusion :-)
```
5f66a3ce