Logic.core package#
Subpackages#
- Logic.core.classification package
- Logic.core.clustering package
- Logic.core.clustering.clustering_metrics module
- Logic.core.clustering.clustering_utils module
ClusteringUtilsClusteringUtils.cluster_hierarchical_average()ClusteringUtils.cluster_hierarchical_complete()ClusteringUtils.cluster_hierarchical_single()ClusteringUtils.cluster_hierarchical_ward()ClusteringUtils.cluster_kmeans()ClusteringUtils.cluster_kmeans_WCSS()ClusteringUtils.get_most_frequent_words()ClusteringUtils.plot_kmeans_cluster_scores()ClusteringUtils.visualize_elbow_method_wcss()ClusteringUtils.visualize_kmeans_clustering_wandb()ClusteringUtils.wandb_plot_hierarchical_clustering_dendrogram()
- Logic.core.clustering.dimension_reduction module
- Logic.core.clustering.main module
- Logic.core.indexer package
- Logic.core.indexer.LSH module
- Logic.core.indexer.document_lengths_index module
- Logic.core.indexer.index module
IndexIndex.add_document_to_index()Index.check_add_remove_is_correct()Index.check_if_index_loaded_correctly()Index.check_if_indexing_is_good()Index.check_if_key_exists()Index.delete_dummy_keys()Index.get_posting_list()Index.index_documents()Index.index_genres()Index.index_stars()Index.index_summaries()Index.load_index()Index.remove_document_from_index()Index.store_index()
- Logic.core.indexer.index_reader module
- Logic.core.indexer.indexes_enum module
- Logic.core.indexer.metadata_index module
- Logic.core.indexer.tiered_index module
- Logic.core.link_analysis package
- Logic.core.utility package
- Logic.core.utility.crawler module
IMDbCrawlerIMDbCrawler.crawl()IMDbCrawler.crawl_page_info()IMDbCrawler.extract_movie_info()IMDbCrawler.extract_top_250()IMDbCrawler.get_budget()IMDbCrawler.get_countries_of_origin()IMDbCrawler.get_director()IMDbCrawler.get_first_page_summary()IMDbCrawler.get_genres()IMDbCrawler.get_gross_worldwide()IMDbCrawler.get_id_from_URL()IMDbCrawler.get_imdb_instance()IMDbCrawler.get_languages()IMDbCrawler.get_mpaa()IMDbCrawler.get_rating()IMDbCrawler.get_related_links()IMDbCrawler.get_release_year()IMDbCrawler.get_review_link()IMDbCrawler.get_reviews_with_scores()IMDbCrawler.get_stars()IMDbCrawler.get_summary()IMDbCrawler.get_summary_link()IMDbCrawler.get_synopsis()IMDbCrawler.get_title()IMDbCrawler.get_writers()IMDbCrawler.headersIMDbCrawler.read_from_file_as_json()IMDbCrawler.start_crawling()IMDbCrawler.top_250_URLIMDbCrawler.write_to_file_as_json()
main()
- Logic.core.utility.evaluation module
EvaluationEvaluation.cacluate_DCG()Evaluation.cacluate_MRR()Evaluation.cacluate_NDCG()Evaluation.cacluate_RR()Evaluation.calculate_AP()Evaluation.calculate_F1()Evaluation.calculate_MAP()Evaluation.calculate_evaluation()Evaluation.calculate_precision()Evaluation.calculate_recall()Evaluation.log_evaluation()Evaluation.print_evaluation()
- Logic.core.utility.preprocess module
- Logic.core.utility.scorer module
ScorerScorer.compute_score_with_unigram_model()Scorer.compute_scores_with_unigram_model()Scorer.compute_scores_with_vector_space_model()Scorer.compute_socres_with_okapi_bm25()Scorer.get_idf()Scorer.get_list_of_documents()Scorer.get_okapi_bm25_score()Scorer.get_query_tfs()Scorer.get_vector_space_model_score()
- Logic.core.utility.snippet module
- Logic.core.utility.spell_correction module
- Logic.core.utility.crawler module
- Logic.core.word_embedding package
Logic.core.search module#
- class SearchEngine#
Bases:
object- aggregate_scores(weights, scores, final_scores)#
Aggregates the scores of the fields.
- find_scores_with_safe_ranking(query, method, weights, scores)#
Finds the scores of the documents using the safe ranking method.
- find_scores_with_unigram_model(query, smoothing_method, weights, scores, alpha=0.5, lamda=0.5)#
Calculates the scores for each document based on the unigram model.
- Parameters:
query (str) – The query to search for.
smoothing_method (str (bayes | naive | mixture)) – The method used for smoothing the probabilities in the unigram model.
weights (dict) – A dictionary mapping each field (e.g., ‘stars’, ‘genres’, ‘summaries’) to its weight in the final score. Fields with a weight of 0 are ignored.
scores (dict) – The scores of the documents.
alpha (float, optional) – The parameter used in bayesian smoothing method. Defaults to 0.5.
lamda (float, optional) – The parameter used in some smoothing methods to balance between the document probability and the collection probability. Defaults to 0.5.
- find_scores_with_unsafe_ranking(query, method, weights, max_results, scores)#
Finds the scores of the documents using the unsafe ranking method using the tiered index.
- merge_scores(scores1, scores2)#
Merges two dictionaries of scores.
- search(query, method, weights, safe_ranking=True, max_results=10, smoothing_method=None, alpha=0.5, lamda=0.5)#
searches for the query in the indexes.
- Parameters:
query (str) – The query to search for.
method (str ((n|l)(n|t)(n|c).(n|l)(n|t)(n|c)) | OkapiBM25 | Unigram) – The method to use for searching.
weights (dict) – The weights of the fields.
safe_ranking (bool) – If True, the search engine will search in whole index and then rank the results. If False, the search engine will search in tiered index.
max_results (int) – The maximum number of results to return. If None, all results are returned.
smoothing_method (str (bayes | naive | mixture)) – The method used for smoothing the probabilities in the unigram model.
alpha (float, optional) – The parameter used in bayesian smoothing method. Defaults to 0.5.
lamda (float, optional) – The parameter used in some smoothing methods to balance between the document probability and the collection probability. Defaults to 0.5.
- Returns:
A list of tuples containing the document IDs and their scores sorted by their scores.
- Return type: