Logic.core package#

Subpackages#

Logic.core.search module#

class SearchEngine#

Bases: object

aggregate_scores(weights, scores, final_scores)#

Aggregates the scores of the fields.

Parameters:
  • weights (dict) – The weights of the fields.

  • scores (dict) – The scores of the fields.

  • final_scores (dict) – The final scores of the documents.

find_scores_with_safe_ranking(query, method, weights, scores)#

Finds the scores of the documents using the safe ranking method.

Parameters:
  • query (List[str]) – The query to be scored

  • method (str ((n|l)(n|t)(n|c).(n|l)(n|t)(n|c)) | OkapiBM25) – The method to use for searching.

  • weights (dict) – The weights of the fields.

  • scores (dict) – The scores of the documents.

find_scores_with_unigram_model(query, smoothing_method, weights, scores, alpha=0.5, lamda=0.5)#

Calculates the scores for each document based on the unigram model.

Parameters:
  • query (str) – The query to search for.

  • smoothing_method (str (bayes | naive | mixture)) – The method used for smoothing the probabilities in the unigram model.

  • weights (dict) – A dictionary mapping each field (e.g., ‘stars’, ‘genres’, ‘summaries’) to its weight in the final score. Fields with a weight of 0 are ignored.

  • scores (dict) – The scores of the documents.

  • alpha (float, optional) – The parameter used in bayesian smoothing method. Defaults to 0.5.

  • lamda (float, optional) – The parameter used in some smoothing methods to balance between the document probability and the collection probability. Defaults to 0.5.

find_scores_with_unsafe_ranking(query, method, weights, max_results, scores)#

Finds the scores of the documents using the unsafe ranking method using the tiered index.

Parameters:
  • query (List[str]) – The query to be scored

  • method (str ((n|l)(n|t)(n|c).(n|l)(n|t)(n|c)) | OkapiBM25) – The method to use for searching.

  • weights (dict) – The weights of the fields.

  • max_results (int) – The maximum number of results to return.

  • scores (dict) – The scores of the documents.

merge_scores(scores1, scores2)#

Merges two dictionaries of scores.

Parameters:
  • scores1 (dict) – The first dictionary of scores.

  • scores2 (dict) – The second dictionary of scores.

Returns:

The merged dictionary of scores.

Return type:

dict

search(query, method, weights, safe_ranking=True, max_results=10, smoothing_method=None, alpha=0.5, lamda=0.5)#

searches for the query in the indexes.

Parameters:
  • query (str) – The query to search for.

  • method (str ((n|l)(n|t)(n|c).(n|l)(n|t)(n|c)) | OkapiBM25 | Unigram) – The method to use for searching.

  • weights (dict) – The weights of the fields.

  • safe_ranking (bool) – If True, the search engine will search in whole index and then rank the results. If False, the search engine will search in tiered index.

  • max_results (int) – The maximum number of results to return. If None, all results are returned.

  • smoothing_method (str (bayes | naive | mixture)) – The method used for smoothing the probabilities in the unigram model.

  • alpha (float, optional) – The parameter used in bayesian smoothing method. Defaults to 0.5.

  • lamda (float, optional) – The parameter used in some smoothing methods to balance between the document probability and the collection probability. Defaults to 0.5.

Returns:

A list of tuples containing the document IDs and their scores sorted by their scores.

Return type:

list