Logic.core package#
Subpackages#
- Logic.core.classification package
- Logic.core.clustering package
- Logic.core.clustering.clustering_metrics module
- Logic.core.clustering.clustering_utils module
ClusteringUtils
ClusteringUtils.cluster_hierarchical_average()
ClusteringUtils.cluster_hierarchical_complete()
ClusteringUtils.cluster_hierarchical_single()
ClusteringUtils.cluster_hierarchical_ward()
ClusteringUtils.cluster_kmeans()
ClusteringUtils.cluster_kmeans_WCSS()
ClusteringUtils.get_most_frequent_words()
ClusteringUtils.plot_kmeans_cluster_scores()
ClusteringUtils.visualize_elbow_method_wcss()
ClusteringUtils.visualize_kmeans_clustering_wandb()
ClusteringUtils.wandb_plot_hierarchical_clustering_dendrogram()
- Logic.core.clustering.dimension_reduction module
- Logic.core.clustering.main module
- Logic.core.indexer package
- Logic.core.indexer.LSH module
- Logic.core.indexer.document_lengths_index module
- Logic.core.indexer.index module
Index
Index.add_document_to_index()
Index.check_add_remove_is_correct()
Index.check_if_index_loaded_correctly()
Index.check_if_indexing_is_good()
Index.check_if_key_exists()
Index.delete_dummy_keys()
Index.get_posting_list()
Index.index_documents()
Index.index_genres()
Index.index_stars()
Index.index_summaries()
Index.load_index()
Index.remove_document_from_index()
Index.store_index()
- Logic.core.indexer.index_reader module
- Logic.core.indexer.indexes_enum module
- Logic.core.indexer.metadata_index module
- Logic.core.indexer.tiered_index module
- Logic.core.link_analysis package
- Logic.core.utility package
- Logic.core.utility.crawler module
IMDbCrawler
IMDbCrawler.crawl()
IMDbCrawler.crawl_page_info()
IMDbCrawler.extract_movie_info()
IMDbCrawler.extract_top_250()
IMDbCrawler.get_budget()
IMDbCrawler.get_countries_of_origin()
IMDbCrawler.get_director()
IMDbCrawler.get_first_page_summary()
IMDbCrawler.get_genres()
IMDbCrawler.get_gross_worldwide()
IMDbCrawler.get_id_from_URL()
IMDbCrawler.get_imdb_instance()
IMDbCrawler.get_languages()
IMDbCrawler.get_mpaa()
IMDbCrawler.get_rating()
IMDbCrawler.get_related_links()
IMDbCrawler.get_release_year()
IMDbCrawler.get_review_link()
IMDbCrawler.get_reviews_with_scores()
IMDbCrawler.get_stars()
IMDbCrawler.get_summary()
IMDbCrawler.get_summary_link()
IMDbCrawler.get_synopsis()
IMDbCrawler.get_title()
IMDbCrawler.get_writers()
IMDbCrawler.headers
IMDbCrawler.read_from_file_as_json()
IMDbCrawler.start_crawling()
IMDbCrawler.top_250_URL
IMDbCrawler.write_to_file_as_json()
main()
- Logic.core.utility.evaluation module
Evaluation
Evaluation.cacluate_DCG()
Evaluation.cacluate_MRR()
Evaluation.cacluate_NDCG()
Evaluation.cacluate_RR()
Evaluation.calculate_AP()
Evaluation.calculate_F1()
Evaluation.calculate_MAP()
Evaluation.calculate_evaluation()
Evaluation.calculate_precision()
Evaluation.calculate_recall()
Evaluation.log_evaluation()
Evaluation.print_evaluation()
- Logic.core.utility.preprocess module
- Logic.core.utility.scorer module
Scorer
Scorer.compute_score_with_unigram_model()
Scorer.compute_scores_with_unigram_model()
Scorer.compute_scores_with_vector_space_model()
Scorer.compute_socres_with_okapi_bm25()
Scorer.get_idf()
Scorer.get_list_of_documents()
Scorer.get_okapi_bm25_score()
Scorer.get_query_tfs()
Scorer.get_vector_space_model_score()
- Logic.core.utility.snippet module
- Logic.core.utility.spell_correction module
- Logic.core.utility.crawler module
- Logic.core.word_embedding package
Logic.core.search module#
- class SearchEngine#
Bases:
object
- aggregate_scores(weights, scores, final_scores)#
Aggregates the scores of the fields.
- find_scores_with_safe_ranking(query, method, weights, scores)#
Finds the scores of the documents using the safe ranking method.
- find_scores_with_unigram_model(query, smoothing_method, weights, scores, alpha=0.5, lamda=0.5)#
Calculates the scores for each document based on the unigram model.
- Parameters:
query (str) – The query to search for.
smoothing_method (str (bayes | naive | mixture)) – The method used for smoothing the probabilities in the unigram model.
weights (dict) – A dictionary mapping each field (e.g., ‘stars’, ‘genres’, ‘summaries’) to its weight in the final score. Fields with a weight of 0 are ignored.
scores (dict) – The scores of the documents.
alpha (float, optional) – The parameter used in bayesian smoothing method. Defaults to 0.5.
lamda (float, optional) – The parameter used in some smoothing methods to balance between the document probability and the collection probability. Defaults to 0.5.
- find_scores_with_unsafe_ranking(query, method, weights, max_results, scores)#
Finds the scores of the documents using the unsafe ranking method using the tiered index.
- merge_scores(scores1, scores2)#
Merges two dictionaries of scores.
- search(query, method, weights, safe_ranking=True, max_results=10, smoothing_method=None, alpha=0.5, lamda=0.5)#
searches for the query in the indexes.
- Parameters:
query (str) – The query to search for.
method (str ((n|l)(n|t)(n|c).(n|l)(n|t)(n|c)) | OkapiBM25 | Unigram) – The method to use for searching.
weights (dict) – The weights of the fields.
safe_ranking (bool) – If True, the search engine will search in whole index and then rank the results. If False, the search engine will search in tiered index.
max_results (int) – The maximum number of results to return. If None, all results are returned.
smoothing_method (str (bayes | naive | mixture)) – The method used for smoothing the probabilities in the unigram model.
alpha (float, optional) – The parameter used in bayesian smoothing method. Defaults to 0.5.
lamda (float, optional) – The parameter used in some smoothing methods to balance between the document probability and the collection probability. Defaults to 0.5.
- Returns:
A list of tuples containing the document IDs and their scores sorted by their scores.
- Return type: