1. 22 May, 2002 1 commit
  2. 21 May, 2002 12 commits
  3. 20 May, 2002 15 commits
    • Tim Peters's avatar
      Since I did the work to write the inner Okapi scoring loop in C, may as · 315bcde9
      Tim Peters authored
      well check it in.  This yields an overall 133% speedup on a "hot" search
      for 'python' in my python-dev archive (a word that appears in all but
      2 documents).  For those who read the email, turned out it was a
      significant speedup to iterate over an IIBTree's items rather than to
      materialize the items into an explicit list first.
      
      This is now within 20% of simply doing "IIBucket(the_IIBTree)" (i.e.,
      no arithmetic at all), so there's no significant possibility remaining
      for speeding the inner score loop.
      315bcde9
    • Guido van Rossum's avatar
      setUp(): assign the lexicon to self.lexicon directly rather than · 53b46dc9
      Guido van Rossum authored
      creating it anonymously and then pulling it out of the zc_index
      object.
      53b46dc9
    • Guido van Rossum's avatar
      Always have a splitter. (We'll change this to a choice of splitters · 0ff6d33b
      Guido van Rossum authored
      once we have more than one on the menu.)
      0ff6d33b
    • Guido van Rossum's avatar
      pt_changePrefs(): the dtprefs_cols/rows arguments could be expressed · d53e1580
      Guido van Rossum authored
      in percentages; strip the percent sign to avoid a traceback calling
      int() when these variables are used.
      d53e1580
    • Guido van Rossum's avatar
      _apply_index(): return None when the query string is empty. · 130af9ce
      Guido van Rossum authored
      I'm unclear whether this is really the right thing, but at least this
      prevents crashes when nothing is entered in the search box.
      130af9ce
    • Guido van Rossum's avatar
      index_object(): don't die if obj doesn't have an attribute named · 68957496
      Guido van Rossum authored
      _fieldname; simply return 0 in this case.
      68957496
    • Guido van Rossum's avatar
      0a97b655
    • Guido van Rossum's avatar
    • Guido van Rossum's avatar
      Add Zope Copyright notice. · 90bae6a7
      Guido van Rossum authored
      90bae6a7
    • Guido van Rossum's avatar
      Add Zope Copyright notice. · 53c5d967
      Guido van Rossum authored
      Fix typo in docstring.
      53c5d967
    • Guido van Rossum's avatar
      QueryParser.py: · 47bb995d
      Guido van Rossum authored
      - Rephrased the description of the grammar, pointing out that the
        lexicon decides on globbing syntax.
      
      - Refactored term and atom parsing (moving atom parsing into a
        separate method).  The previously checked-in version accidentally
        accepted some invalid forms like ``foo AND -bar''; this is fixed.
      
      tests/testQueryParser.py:
      
      - Each test is now in a separate method; this produces more output
        (alas) but makes pinpointing the errors much simpler.
      
      - Added some tests catching ``foo AND -bar'' and similar.
      
      - Added an explicit test class for the handling of stopwords.  The
        "and/" test no longer has to check self.__class__.
      
      - Some refactoring of the TestQueryParser class; the utility methods
        are now in a base class TestQueryParserBase, in a different order;
        compareParseTrees() now shows the parse tree it got when raising an
        exception.  The parser is now self.parser instead of self.p (see
        below).
      
      tests/testZCTextIndex.py:
      
      - setUp() no longer needs to assign to self.p; the parser is
        consistently called self.parser now.
      47bb995d
    • Guido van Rossum's avatar
    • Guido van Rossum's avatar
    • Guido van Rossum's avatar
      Refactor the query parser to rely on the lexicon for parsing terms. · b82b2746
      Guido van Rossum authored
      ILexicon.py:
      
        - Added parseTerms() and isGlob().
      
        - Added get_word(), get_wid() (get_word() is old; get_wid() for symmetry).
      
        - Reflowed some text.
      
      IQueryParser.py:
      
        - Expanded docs for parseQuery().
      
        - Added getIgnored() and parseQueryEx().
      
      IPipelineElement.py:
      
        - Added processGlob().
      
      Lexicon.py:
      
        - Added parseTerms() and isGlob().
      
        - Added get_wid().
      
        - Some pipeline elements now support processGlob().
      
      ParseTree.py:
      
        - Clarified the error message for calling executeQuery() on a
          NotNode.
      
      QueryParser.py (lots of changes):
      
        - Change private names __tokens etc. into protected _tokens etc.
      
        - Add getIgnored() and parseQueryEx() methods.
      
        - The atom parser now uses the lexicon's parseTerms() and isGlob()
          methods.
      
        - Query parts that consist only of stopwords (as determined by the
          lexicon), or of stopwords and negated terms, yield None instead of
          a parse tree node; the ignored term is added to self._ignored.
          None is ignored when combining terms for AND/OR/NOT operators, and
          when an operator has no non-None operands, the operator itself
          returns None.  When this None percolates all the way to the top,
          the parser raises a ParseError exception.
      
      tests/testQueryParser.py:
      
        - Changed test expressions of the form "a AND b AND c" to "aa AND bb
          AND cc" so that the terms won't be considered stopwords.
      
        - The test for "and/" can only work for the base class.
      
      tests/testZCTextIndex.py:
      
        - Added copyright notice.
      
        - Refactor testStopWords() to have two helpers, one for success, one
          for failures.
      
        - Change testStopWords() to require parser failure for those queries
          that have only stopwords or stopwords plus negated terms.
      
        - Improve compareSet() to sort the sets of keys, and use a more
          direct way of extracting the keys.  This wasn't strictly needed
          (nothing fails without this), but the old approach of copying the
          keys into a dict in a loop depends on the dict hashing to always
          return keys in the same order.
      b82b2746
    • Matt Behrens's avatar
      revert stopper setup.py-age; stopper is not in the Zope module. ok · 5f66a3ce
      Matt Behrens authored
      guido@.
      
      when/if merge day comes for the installer this will make for less
      confusion :-)
      5f66a3ce
  4. 19 May, 2002 6 commits
  5. 18 May, 2002 5 commits
  6. 17 May, 2002 1 commit
    • Tim Peters's avatar
      Special-case None search() results in AND, AND NOT, and OR contexts, and · dfbfbe55
      Tim Peters authored
      uncomment the test cases that were failing in these contexts.
      
      Read it and weep <wink>:  In an AND context, None is treated like the
      universal set, which jibes with the convenient fiction that stop words
      appear in every doc.  However, in AND NOT and OR contexts, None is
      treated like the empty set, which doesn't jibe with anything except that
      we want
      
          real_word AND NOT stop_word
      
      and
      
          real_word OR stop_word
      
      to act like
      
          real_word
      
      If we treated None as if it were the universal set, these results would
      be (respectively) the empty set and the universal set instead.
      
      At a higher level, we *are* consistent with the notion that a query with
      a stop word acts the same as if the clause with the stop word weren't
      present.  That's what really drives this schizophrenic (context-dependent)
      treatment of None.
      dfbfbe55