๐๏ธ Retrievers
A retriever is an interface that returns documents given an unstructured query.
๐๏ธ Activeloop Deep Memory
Activeloop Deep Memory is a suite of tools that enables you to optimize your Vector Store for your use-case and achieve higher accuracy in your LLM apps.
๐๏ธ Amazon Kendra
Amazon Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.
๐๏ธ Arcee
Arcee helps with the development of the SLMsโsmall, specialized, secure, and scalable language models.
๐๏ธ Arxiv
arXiv is an open-access archive for 2 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.
๐๏ธ AskNews
AskNews infuses any LLM with the latest global news (or historical news), using a single natural language query. Specifically, AskNews is enriching over 300k articles per day by translating, summarizing, extracting entities, and indexing them into hot and cold vector databases. AskNews puts these vector databases on a low-latency endpoint for you. When you query AskNews, you get back a prompt-optimized string that contains all the most pertinent enrichments (e.g. entities, classifications, translation, summarization). This means that you do not need to manage your own news RAG, and you do not need to worry about how to properly convey news information in a condensed way to your LLM.
๐๏ธ Azure AI Search
Azure AI Search (formerly known as Azure Cognitive Search) is a Microsoft cloud search service that gives developers infrastructure, APIs, and tools for information retrieval of vector, keyword, and hybrid queries at scale.
๐๏ธ Bedrock (Knowledge Bases)
Knowledge bases for Amazon Bedrock is an Amazon Web Services (AWS) offering which lets you quickly build RAG applications by using your private data to customize FM response.
๐๏ธ BM25
BM25 (Wikipedia) also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.
๐๏ธ BREEBS (Open Knowledge)
BREEBS is an open collaborative knowledge platform.
๐๏ธ Chaindesk
Chaindesk platform brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).
๐๏ธ ChatGPT plugin
OpenAI plugins connect ChatGPT to third-party applications. These plugins enable ChatGPT to interact with APIs defined by developers, enhancing ChatGPT's capabilities and allowing it to perform a wide range of actions.
๐๏ธ Cohere reranker
Cohere is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.
๐๏ธ Cohere RAG
Cohere is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.
๐๏ธ DocArray
DocArray is a versatile, open-source tool for managing your multi-modal data. It lets you shape your data however you want, and offers the flexibility to store and search it using various document index backends. Plus, it gets even better - you can utilize your DocArray document index to create a DocArrayRetriever, and build awesome Langchain apps!
๐๏ธ Dria
Dria is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This notebook demonstrates how to use the Dria API for data retrieval tasks.
๐๏ธ ElasticSearch BM25
Elasticsearch is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
๐๏ธ Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It supports keyword search, vector search, hybrid search and complex filtering.
๐๏ธ Embedchain
Embedchain is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.
๐๏ธ FlashRank reranker
FlashRank is the Ultra-lite & Super-fast Python library to add re-ranking to your existing search & retrieval pipelines. It is based on SoTA cross-encoders, with gratitude to all the model owners.
๐๏ธ Fleet AI Context
Fleet AI Context is a dataset of high-quality embeddings of the top 1200 most popular & permissive Python Libraries & their documentation.
๐๏ธ Google Drive
This notebook covers how to retrieve documents from Google Drive.
๐๏ธ Google Vertex AI Search
Google Vertex AI Search (formerly known as Enterprise Search on Generative AI App Builder) is a part of the Vertex AI machine learning platform offered by Google Cloud.
๐๏ธ JaguarDB Vector Database
[JaguarDB Vector Database](http://www.jaguardb.com/windex.html
๐๏ธ Kay.ai
Kai Data API built for RAG ๐ต๏ธ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.
๐๏ธ Kinetica Vectorstore based Retriever
Kinetica is a database with integrated support for vector similarity search
๐๏ธ kNN
In statistics, the k-nearest neighbours algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression.
๐๏ธ LLMLingua Document Compressor
LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.
๐๏ธ LOTR (Merger Retriever)
Lord of the Retrievers (LOTR), also known as MergerRetriever, takes a list of retrievers as input and merges the results of their getrelevantdocuments() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.
๐๏ธ Metal
Metal is a managed service for ML Embeddings.
๐๏ธ Milvus Hybrid Search
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.
๐๏ธ NanoPQ (Product Quantization)
Product Quantization algorithm (k-NN) in brief is a quantization algorithm that helps in compression of database vectors which helps in semantic search when large datasets are involved. In a nutshell, the embedding is split into M subspaces which further goes through clustering. Upon clustering the vectors the centroid vector gets mapped to the vectors present in the each of the clusters of the subspace.
๐๏ธ Outline
Outline is an open-source collaborative knowledge base platform designed for team information sharing.
๐๏ธ Pinecone Hybrid Search
Pinecone is a vector database with broad functionality.
๐๏ธ PubMed
PubMedยฎ by The National Center for Biotechnology Information, National Library of Medicine comprises more than 35 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.
๐๏ธ Qdrant Sparse Vector
Qdrant is an open-source, high-performance vector search engine/database.
๐๏ธ RAGatouille
RAGatouille makes it as simple as can be to use ColBERT!
๐๏ธ RePhraseQuery
RePhraseQuery is a simple retriever that applies an LLM between the user input and the query passed by the retriever.
๐๏ธ Rememberizer
Rememberizer is a knowledge enhancement service for AI applications created by SkyDeck AI Inc.
๐๏ธ SEC filing
SEC filing is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.
๐๏ธ Self-querying retrievers
21 items
๐๏ธ SingleStoreDB
SingleStoreDB is a high-performance distributed SQL database that supports deployment both in the cloud and on-premises. It provides vector storage, and vector functions including dotproduct and euclideandistance, thereby supporting AI applications that require text similarity matching.
๐๏ธ SVM
Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.
๐๏ธ TavilySearchAPI
Overview
๐๏ธ TF-IDF
TF-IDF means term-frequency times inverse document-frequency.
๐๏ธ **NeuralDB**
NeuralDB is a CPU-friendly and fine-tunable retrieval engine developed by ThirdAI.
๐๏ธ Vespa
Vespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query.
๐๏ธ Weaviate Hybrid Search
Weaviate is an open-source vector database.
๐๏ธ WikipediaRetriever
Overview
๐๏ธ You.com
you.com API is a suite of tools designed to help developers ground the output of LLMs in the most recent, most accurate, most relevant information that may not have been included in their training dataset.
๐๏ธ Zep Cloud
Retriever Example for Zep Cloud
๐๏ธ Zep Open Source
Retriever Example for Zep
๐๏ธ Zilliz Cloud Pipeline
Zilliz Cloud Pipelines transform your unstructured data to a searchable vector collection, chaining up the embedding, ingestion, search, and deletion of your data.