Commits · e5151c873faa0a93673823f6c1e090948a02488f · Kirill Smelkov / Zope

21 May, 2002 12 commits
- As per the previous checkin message, I'm ditching this module since we · e5151c87
  Jeremy Hylton authored May 21, 2002
```
already ditched Python 1.5.2.  The version of tempfile is many
revision behind the one in the Python std library.
```
  e5151c87
- Normalize import statement formatting. · 1de61b50
  Guido van Rossum authored May 21, 2002
```
Remove redundant import.
Ensure that ZCTextIndex implements the PluggableIndexInterface by
adding an unimplemented uniqueValues() method.
```
  1de61b50
- the object hook/attribute can now return/be a tuple · c9d911de
  Andreas Jung authored May 21, 2002
```
(similiar to getPhysicalPath())
```
  c9d911de
- Normalize import statement formatting. · a85837ed
  Guido van Rossum authored May 21, 2002
```
Verify that ZCTextIndex implements the PluggableIndexInterface.
```
  a85837ed
- Normalize import statement formatting. · fedeec20
  Guido van Rossum authored May 21, 2002
  
  fedeec20
- Three's a charm: the right way to skip the rest of a loop body is · df474242
  Guido van Rossum authored May 21, 2002
```
neither 'pass' (v 1.2) nor 'break' (v 1.3) but 'continue'.

Whitespace normalization.
```
  df474242
- Since every score is of the form (tf * idf * 1024. + .5), and idf is · aafd0e49
  Tim Peters authored May 21, 2002
```
loop-invariant, save a little time by multiplying idf by 1024. outside
the loop.
```
  aafd0e49
- PyInt_FromLong() can fail, so check the return for NULL. · e44e9e9d
  Tim Peters authored May 21, 2002
  
  e44e9e9d
- length() is used by ZCTextIndex.numWords() -- it is supposed to return · a9357e8e
  Guido van Rossum authored May 21, 2002
```
the number of words in the index (at least to return a number
comparable to the number displayed under "# objects" by TextIndex).
```
  a9357e8e
- I figured out what numObjects() is for -- it is used by ZCatalog's · 8b4268a8
  Guido van Rossum authored May 21, 2002
```
Index management screen.  Ditto for clear().  So group them together
and adjust the comment.  (So is manage_main, but since it's a DTML
method, it can stay in its separate UI group.)
```
  8b4268a8
- Collector 396/397: applied patches for better XHTML compatiblity · 864dbb9f
  Andreas Jung authored May 21, 2002
  
  864dbb9f
- globToWordIds() shouldn't make assumptions about the pipeline. It · 0a0f97a7
  Guido van Rossum authored May 21, 2002
```
still only supports a trailing *, so the pipeline should honor that;
added a comment to the Splitter class referring to globToWordIds().
```
  0a0f97a7
20 May, 2002 15 commits

Since I did the work to write the inner Okapi scoring loop in C, may as · 315bcde9

Tim Peters authored May 20, 2002

well check it in. This yields an overall 133% speedup on a "hot" search
for 'python' in my python-dev archive (a word that appears in all but
2 documents). For those who read the email, turned out it was a
significant speedup to iterate over an IIBTree's items rather than to
materialize the items into an explicit list first.

This is now within 20% of simply doing "IIBucket(the_IIBTree)" (i.e.,
no arithmetic at all), so there's no significant possibility remaining
for speeding the inner score loop.

315bcde9

setUp(): assign the lexicon to self.lexicon directly rather than · 53b46dc9
Guido van Rossum authored May 20, 2002
```
creating it anonymously and then pulling it out of the zc_index
object.
```
53b46dc9
Always have a splitter. (We'll change this to a choice of splitters · 0ff6d33b
Guido van Rossum authored May 20, 2002
```
once we have more than one on the menu.)
```
0ff6d33b
pt_changePrefs(): the dtprefs_cols/rows arguments could be expressed · d53e1580
Guido van Rossum authored May 20, 2002
```
in percentages; strip the percent sign to avoid a traceback calling
int() when these variables are used.
```
d53e1580

_apply_index(): return None when the query string is empty. · 130af9ce

Guido van Rossum authored May 20, 2002

I'm unclear whether this is really the right thing, but at least this
prevents crashes when nothing is entered in the search box.

130af9ce

index_object(): don't die if obj doesn't have an attribute named · 68957496
Guido van Rossum authored May 20, 2002
```
_fieldname; simply return 0 in this case.
```
68957496
Fix a typo. Since the latest change, this always reported "Globbing · 0a97b655
Guido van Rossum authored May 20, 2002
```
is *disabled*.
```
0a97b655
Remove Michel's personal homepage from the link to the ZopeBook. · 3daabd82
Guido van Rossum authored May 20, 2002

3daabd82
Add Zope Copyright notice. · 90bae6a7
Guido van Rossum authored May 20, 2002

90bae6a7
Add Zope Copyright notice. · 53c5d967
Guido van Rossum authored May 20, 2002
```
Fix typo in docstring.
```
53c5d967

QueryParser.py: · 47bb995d

Guido van Rossum authored May 20, 2002

- Rephrased the description of the grammar, pointing out that the
  lexicon decides on globbing syntax.

- Refactored term and atom parsing (moving atom parsing into a
  separate method).  The previously checked-in version accidentally
  accepted some invalid forms like ``foo AND -bar''; this is fixed.

tests/testQueryParser.py:

- Each test is now in a separate method; this produces more output
  (alas) but makes pinpointing the errors much simpler.

- Added some tests catching ``foo AND -bar'' and similar.

- Added an explicit test class for the handling of stopwords.  The
  "and/" test no longer has to check self.__class__.

- Some refactoring of the TestQueryParser class; the utility methods
  are now in a base class TestQueryParserBase, in a different order;
  compareParseTrees() now shows the parse tree it got when raising an
  exception.  The parser is now self.parser instead of self.p (see
  below).

tests/testZCTextIndex.py:

- setUp() no longer needs to assign to self.p; the parser is
  consistently called self.parser now.

47bb995d

Fix unintended recursion in parseQueryEx(). (Unittests are coming up! · 98607a5c
Guido van Rossum authored May 20, 2002
```
:-)
```
98607a5c
Limit copyright to 2002; none of this code existed last year. · 9491bc84
Guido van Rossum authored May 20, 2002

9491bc84

Refactor the query parser to rely on the lexicon for parsing terms. · b82b2746

Guido van Rossum authored May 20, 2002

ILexicon.py:

  - Added parseTerms() and isGlob().

  - Added get_word(), get_wid() (get_word() is old; get_wid() for symmetry).

  - Reflowed some text.

IQueryParser.py:

  - Expanded docs for parseQuery().

  - Added getIgnored() and parseQueryEx().

IPipelineElement.py:

  - Added processGlob().

Lexicon.py:

  - Added parseTerms() and isGlob().

  - Added get_wid().

  - Some pipeline elements now support processGlob().

ParseTree.py:

  - Clarified the error message for calling executeQuery() on a
    NotNode.

QueryParser.py (lots of changes):

  - Change private names __tokens etc. into protected _tokens etc.

  - Add getIgnored() and parseQueryEx() methods.

  - The atom parser now uses the lexicon's parseTerms() and isGlob()
    methods.

  - Query parts that consist only of stopwords (as determined by the
    lexicon), or of stopwords and negated terms, yield None instead of
    a parse tree node; the ignored term is added to self._ignored.
    None is ignored when combining terms for AND/OR/NOT operators, and
    when an operator has no non-None operands, the operator itself
    returns None.  When this None percolates all the way to the top,
    the parser raises a ParseError exception.

tests/testQueryParser.py:

  - Changed test expressions of the form "a AND b AND c" to "aa AND bb
    AND cc" so that the terms won't be considered stopwords.

  - The test for "and/" can only work for the base class.

tests/testZCTextIndex.py:

  - Added copyright notice.

  - Refactor testStopWords() to have two helpers, one for success, one
    for failures.

  - Change testStopWords() to require parser failure for those queries
    that have only stopwords or stopwords plus negated terms.

  - Improve compareSet() to sort the sets of keys, and use a more
    direct way of extracting the keys.  This wasn't strictly needed
    (nothing fails without this), but the old approach of copying the
    keys into a dict in a loop depends on the dict hashing to always
    return keys in the same order.

b82b2746

revert stopper setup.py-age; stopper is not in the Zope module. ok · 5f66a3ce
Matt Behrens authored May 20, 2002
```
guido@.

when/if merge day comes for the installer this will make for less
confusion :-)
```
5f66a3ce

19 May, 2002 6 commits
- For queries, show the total number of results as well as the nbest number; · 7b3de8db
  Tim Peters authored May 19, 2002
```
display the search time in milliseconds too.
```
  7b3de8db
- Show index and pack times in minutes instead of seconds. Show timestamps · f357f8a6
  Tim Peters authored May 19, 2002
```
for start and end of run.  Show elapsed wall-clock time in minutes.
```
  f357f8a6
- Gave it a "-c NNN" context argument (how many leading lines of result · 5da9eb6b
  Tim Peters authored May 19, 2002
```
msgs to display).  Changed the module docstring to separate the index-
generation args from the query args.
```
  5da9eb6b
- Oops! Call the right routine (typo in code just checked in). · a0360090
  Tim Peters authored May 19, 2002
  
  a0360090
- Beef up the reindexing tests to check that they actually fail before the · 94b452e8
  Tim Peters authored May 19, 2002
```
original doc text gets restored.
```
  94b452e8
- QueryParser refactoring step 1: add the lexicon to the constructor args. · bd532bbe
  Guido van Rossum authored May 19, 2002
  
  bd532bbe
18 May, 2002 5 commits
- Rearrange the Okapi reindexing tests to make it easier to figure out what · 97fbb9c9
  Tim Peters authored May 18, 2002
```
went wrong if they fail.
```
  97fbb9c9
- Restore CONTEXT to its original value. · 466d0130
  Tim Peters authored May 18, 2002
  
  466d0130
- Revert braindead change to final pack (it was my change, so it's OK for · f835a0c2
  Tim Peters authored May 18, 2002
```
me to call it braindead <wink>).
```
  f835a0c2
- Pack at the end even if the # of msgs isn't an exact multiple of · 1e8f93fb
  Tim Peters authored May 18, 2002
```
PACK_INTERVAL.
```
  1e8f93fb
- Display total pack time at the end. · eb8de680
  Tim Peters authored May 18, 2002
  
  eb8de680
17 May, 2002 2 commits

Special-case None search() results in AND, AND NOT, and OR contexts, and · dfbfbe55

Tim Peters authored May 17, 2002

uncomment the test cases that were failing in these contexts.

Read it and weep <wink>:  In an AND context, None is treated like the
universal set, which jibes with the convenient fiction that stop words
appear in every doc.  However, in AND NOT and OR contexts, None is
treated like the empty set, which doesn't jibe with anything except that
we want

    real_word AND NOT stop_word

and

    real_word OR stop_word

to act like

    real_word

If we treated None as if it were the universal set, these results would
be (respectively) the empty set and the universal set instead.

At a higher level, we *are* consistent with the notion that a query with
a stop word acts the same as if the clause with the stop word weren't
present.  That's what really drives this schizophrenic (context-dependent)
treatment of None.

dfbfbe55

Use the same stop list for both indexes. · f968ebb5
Jeremy Hylton authored May 17, 2002

f968ebb5