Semantic Search Using Embeddings

Python Django PyTorch Sentence Transformers NLTK LemmInflect CSS HTML JavaScript Beautiful Soup

This is a web app I created with Django to search Wikipedia pages for Academy Award-winning films using both keyword and semantic search. The Semantic Search component generates sentence embeddings using a LLM and computes cosine similarity with query embeddings. I used the Sentence Transformers library with the multi-qa-mpnet-base-dot-v1 model from sbert. The model generates embeddings for ~1300 articles in ~6 minutes. Each article is chunked into 512-word segments with 50-word overlap to preserve context, and mean pooling is applied to embeddings using PyTorch. Semantic Search often provides more relevant results than Keyword Search. It can struggle with uncommon words or short queries. Hybrid Search, which combines keyword and semantic scores, can improve performance and may be implemented in future versions.

Github Repository