WikipediaRetriever
Overview
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki.
Wikipedia
is the largest and most-read reference work in history.
This notebook shows how to retrieve wiki pages from wikipedia.org
into the Document format that is used downstream.
Integration details
Retriever | Namespace | Native async | Local |
---|---|---|---|
WikipediaRetriever | langchain_community.retrievers | ❌ | ❌ |
Setup
If you want to get automated tracing from runs of individual tools, you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installation
The integration lives in the langchain-community
package. We also need to install the wikipedia
python package itself.
%pip install -qU langchain_community wikipedia
Instantiation
Now we can instantiate our retriever:
WikipediaRetriever
has these arguments:
- optional
lang
: default="en". Use it to search in a specific language part of Wikipedia - optional
load_max_docs
: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now. - optional
load_all_available_meta
: default=False. By default only the most important fields downloaded:Published
(date when document was published/last updated),title
,Summary
. If True, other fields also downloaded.
get_relevant_documents()
has one argument, query
: free text which used to find documents in Wikipedia
Usage
from langchain_community.retrievers import WikipediaRetriever
retriever = WikipediaRetriever()
retriever.invoke("TOKYO GHOUL")
[Document(metadata={'title': 'Tokyo Ghoul', 'summary': "Tokyo Ghoul (Japanese: 東京喰種(トーキョーグール), Hepburn: Tōkyō Gūru) is a Japanese dark fantasy manga series written and illustrated by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump from September 2011 to September 2014, with its chapters collected in 14 tankōbon volumes. The story is set in an alternate version of Tokyo where humans coexist with ghouls, beings who look like humans but can only survive by eating human flesh. Ken Kaneki is a college student who is transformed into a half-ghoul after an encounter with one of them. He must navigate the complex social and political dynamics between humans and ghouls while struggling to maintain his humanity.\nA prequel, titled Tokyo Ghoul [Jack], ran online on Jump Live in 2013, with its chapters collected in a single tankōbon volume. A sequel, titled Tokyo Ghoul:re, was serialized in Weekly Young Jump from October 2014 to July 2018, its chapters were collected in 16 tankōbon volumes.\nA 12-episode anime television series adaptation produced by Pierrot, aired on Tokyo MX from July to September 2014. A 12-episode second season, titled Tokyo Ghoul √A (pronounced Tokyo Ghoul Root A), which follows an original story, aired from January to March 2015. A live-action film based on the manga was released in Japan in July 2017, with a sequel being released in July 2019. An anime adaptation based on the sequel manga, Tokyo Ghoul:re, aired for two seasons; the first from April to June 2018, and the second from October to December 2018. In North America, Viz Media licensed the manga for an English release, while Funimation licensed the anime series for streaming and home video distribution.\nBy January 2021, Tokyo Ghoul had over 47 million copies in circulation worldwide, making it one of the best-selling manga series of all time.", 'source': 'https://en.wikipedia.org/wiki/Tokyo_Ghoul'}, page_content='Tokyo Ghoul (Japanese: 東京喰種(トーキョーグール), Hepburn: Tōkyō Gūru) is a Japanese dark fantasy manga series written and illustrated by Sui Ishida. It was serialized in Shueisha\'s seinen manga magazine Weekly Young Jump from September 2011 to September 2014, with its chapters collected in 14 tankōbon volumes. The story is set in an alternate version of Tokyo where humans coexist with ghouls, beings who look like humans but can only survive by eating human flesh. Ken Kaneki is a college student who is transformed into a half-ghoul after an encounter with one of them. He must navigate the complex social and political dynamics between humans and ghouls while struggling to maintain his humanity.\nA prequel, titled Tokyo Ghoul [Jack], ran online on Jump Live in 2013, with its chapters collected in a single tankōbon volume. A sequel, titled Tokyo Ghoul:re, was serialized in Weekly Young Jump from October 2014 to July 2018, its chapters were collected in 16 tankōbon volumes.\nA 12-episode anime television series adaptation produced by Pierrot, aired on Tokyo MX from July to September 2014. A 12-episode second season, titled Tokyo Ghoul √A (pronounced Tokyo Ghoul Root A), which follows an original story, aired from January to March 2015. A live-action film based on the manga was released in Japan in July 2017, with a sequel being released in July 2019. An anime adaptation based on the sequel manga, Tokyo Ghoul:re, aired for two seasons; the first from April to June 2018, and the second from October to December 2018. In North America, Viz Media licensed the manga for an English release, while Funimation licensed the anime series for streaming and home video distribution.\nBy January 2021, Tokyo Ghoul had over 47 million copies in circulation worldwide, making it one of the best-selling manga series of all time.\n\n\n== Synopsis ==\n\n\n=== Setting ===\nTokyo Ghoul is set in an alternate reality where ghouls, creatures that look like normal people but can only survive by eating human flesh, live among the human population in secrecy, hiding their true nature in order to evade pursuit from the authorities. Ghouls have powers including enhanced strength, speed, endurance and regenerative abilities—a regular ghoul produces 4–7 times more kinetic energy in their muscles than a normal human; they also have several times the RC cells, a cell that flows like blood and can become solid instantly. A ghoul\'s skin is resistant to ordinary piercing weapons, and it has at least one special predatory organ called a Kagune (赫子), which it can manifest and use as a weapon during combat. Another distinctive trait of ghouls is that when they are excited or hungry, the color of their sclera in both eyes turns black and their irises red. This mutation is known as kakugan (赫眼, "red eye").\nA half-ghoul can either be born naturally as a ghoul and a human\'s offspring, or artificially created by transplanting some ghoul organs into a human. In both cases, a half-ghoul is usually much stronger than a pure-blood ghoul. In the case of a half-ghoul, only one of the eyes undergoes the "red eye" transformation. Natural born half-ghouls are very rare, and creating half-ghouls artificially initially has a low success rate. There is also the case of half-humans, hybrids of ghouls and humans that can feed like normal humans and lack a Kagune while possessing enhanced abilities, like increased reaction speeds, but shortened lifespans. Naturally born half-ghouls can also eat like normal humans or full ghouls.\n\n\n=== Plot ===\n\nThe story follows 17-year-old Ken Kaneki, a student who barely survives a deadly encounter with Rize Kamishiro (his date who reveals herself as a ghoul and tries to eat him) when she gets hit by falling construction girders. He is taken to the hospital in critical condition. After recovering, Kaneki discovers that he underwent a surgery that transformed him into a half-ghoul. This was accomplished because some of Rize\'s organs were transferred into his body, and now, like'),
Document(metadata={'title': 'List of Tokyo Ghoul characters', 'summary': 'The following article is a list of characters from the manga series Tokyo Ghoul.', 'source': 'https://en.wikipedia.org/wiki/List_of_Tokyo_Ghoul_characters'}, page_content="The following article is a list of characters from the manga series Tokyo Ghoul.\n\n\n== Main characters ==\nKen Kaneki (金木 研, Kaneki Ken)\nVoiced by: Natsuki Hanae (Japanese); Austin Tindle (English)\nPlayed by: Masataka Kubota\nThe main protagonist of the story, Ken Kaneki (金木 研, Kaneki Ken) is a seventeen-year-old black haired university freshman that receives an organ transplant from Rize, who was trying to kill him before she was struck by a fallen I-beam and seemingly killed. After the operation Kaneki develops ghoul-like tendencies and characteristics, and his rationality begins to wane. As one that now doesn't belong to humans or ghouls he struggles to keep his ghoul identity secret, always fighting against his ghoul side while trying to continue to live like a normal human. He later works as a waiter for Anteiku under Yoshimura's guidance. After his fight with a CCG investigator named Amon he gains the name Eye Patch (眼帯) because of his mask's design and becomes somewhat famous after a ghoul saw him defeating the investigator. He loves to read and is normally quiet and reserved but can also be calculating when fighting. He has a bad trait of easily trusting strangers which sometimes puts him in life-threatening situations. After being kidnapped by the ghoul run organization known as Aogiri Tree, he is mercilessly tortured by a sadistic ghoul named Yakumo Ōmori (Yamori), and later develops similar traits to his torturer. While being tortured he has hallucinations of Rize in which she mocks him about his mother, leading to him finally embracing his inner ghoul. In the manga at chapter 61, Kaneki's hair slowly changes color from black to white. In the anime, Tokyo ghoul, his hair changes instantly after eating Rize and accepting his ghoul side. He then goes on and eats his torturer, Yakumo Ōmori. His view on strength changes and he goes on a power hungry path by cannibalizing other ghouls in order to get stronger. The continued cannibalization led him to become a half kakuja, where he develops a centipede shaped kagune and gains the alias Centipede (百足). After the incident at Kanō's Lab, Kaneki begins to regret this path and begins reflecting on his actions and motives.\nAfter being defeated and captured by Arima, Ken loses his memories and is given the new identity of Haise Sasaki (佐々木 琲世, Sasaki Haise), member and ace of the Mado Squad, and mentor of the Quinx Squad, a special unit composed of artificial ghouls. Despite having a new identity, Haise still retains some traits from his former self, like the love for reading and the determination to protect his companions with his life. When pressured to the limit, Haise has glimpses of his former self and unlocks his powers as a ghoul, forcing the CCG to strike him with RC suppressors to calm him down. Haise usually has an internal conflict with his past self, fearing that one day he would lose his current life with all his new friends and personality, which intensifies as he obtains more information about his former life until the raid at the Tsukiyama Headquarters, when he finally regains his memories and accepts his identity as Kaneki, not only showing a ruthless demeanor when facing his enemies, but a cold attitude toward his current and former allies as well. Haise's achievements allow him to quickly climb among the CCG's ranks, being promoted to First Class after the raid at the human auction and receiving a special promotion to Associate Special Class after single-handedly driving away the One-Eyed Owl during the raid at the Tsukiyama Headquarters, becoming known as the Black Reaper (黒の死神, Kuro no Shinigami), until he betrays the CCG to help Tōka rescue Hinami from the Cochlea, releasing the other ghouls imprisoned there in the occasion. After Kaneki defeats Arima, he claims for himself the title of One-Eyed King (隻眼の王, Sekigan no Ō), who according to Eto, has the potential to break the current status quo between ghouls and humans. For that purpose, he assembles the survi"),
Document(metadata={'title': 'Tokyo Ghoul √A', 'summary': 'The second season of the Tokyo Ghoul anime television series, titled Tokyo Ghoul √A, is produced by Pierrot, and directed by Shuhei Morita. The season aired from January to March 2015 on Tokyo MX, TVO, TVA, TVQ, MRO, BS Dlife and AT-X. \nThe season roughly adapts the second half of the Tokyo Ghoul manga, although, √A does not directly adapt everything from the manga. Rather, it mixes in the manga\'s content with an anime original story composition credited towards the author Sui Ishida. The season follows Ken Kaneki after he joins Aogiri Tree, as the group begins their battle against the CCG, who are trying to exterminate the ghoul organization.\nThe music is composed by Yutaka Yamada, who also produced the score for the first season. The opening theme for the season is "Munou" (無能, Munō, transl. "Incompetence") by österreich, and the ending theme is "Kisetsu wa Tsugitsugi Shinde Iku" (季節は次々死んでいく, transl. "The Seasons Die Out, One After Another") by amazarashi.\nTC Entertainment released the series in Japan onto six volumes from March 27 to August 28, 2015. A complete set containing all twelve episodes was later released on September 30, 2016.\nThe series is licensed by Crunchyroll, which produced an English dub as it aired, and released the series on home video on May 24, 2016. Madman Entertainment licensed the series in Australia and New Zealand, simulcasted the series on AnimeLab, and released the series on home video on July 6, 2016. Anime Limited licensed the series in the United Kingdom and Ireland, who simulcasted the series on Wakamin, and released the series on home video on June 13, 2016. The season ran on Adult Swim\'s Toonami programming block in the United States from July to October 2017.', 'source': 'https://en.wikipedia.org/wiki/Tokyo_Ghoul_%E2%88%9AA'}, page_content='The second season of the Tokyo Ghoul anime television series, titled Tokyo Ghoul √A, is produced by Pierrot, and directed by Shuhei Morita. The season aired from January to March 2015 on Tokyo MX, TVO, TVA, TVQ, MRO, BS Dlife and AT-X. \nThe season roughly adapts the second half of the Tokyo Ghoul manga, although, √A does not directly adapt everything from the manga. Rather, it mixes in the manga\'s content with an anime original story composition credited towards the author Sui Ishida. The season follows Ken Kaneki after he joins Aogiri Tree, as the group begins their battle against the CCG, who are trying to exterminate the ghoul organization.\nThe music is composed by Yutaka Yamada, who also produced the score for the first season. The opening theme for the season is "Munou" (無能, Munō, transl. "Incompetence") by österreich, and the ending theme is "Kisetsu wa Tsugitsugi Shinde Iku" (季節は次々死んでいく, transl. "The Seasons Die Out, One After Another") by amazarashi.\nTC Entertainment released the series in Japan onto six volumes from March 27 to August 28, 2015. A complete set containing all twelve episodes was later released on September 30, 2016.\nThe series is licensed by Crunchyroll, which produced an English dub as it aired, and released the series on home video on May 24, 2016. Madman Entertainment licensed the series in Australia and New Zealand, simulcasted the series on AnimeLab, and released the series on home video on July 6, 2016. Anime Limited licensed the series in the United Kingdom and Ireland, who simulcasted the series on Wakamin, and released the series on home video on June 13, 2016. The season ran on Adult Swim\'s Toonami programming block in the United States from July to October 2017.\n\n\n== Episodes ==\n\n\n== Home media release ==\n\n\n=== Japanese ===\n\n\n=== English ===\n\n\n== Notes ==\n\n\n== References ==\n\n\n== External links ==\nTokyo Ghoul official anime website (in Japanese)\nTokyo Ghoul √A (anime) at Anime News Network\'s encyclopedia')]
Use within a chain
We can easily combine this retriever in to a chain.
from dotenv import load_dotenv
load_dotenv()
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_template(
"""
Answer the question based only on the context provided.
Context: {context}
Question: {question}
"""
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke("Who is the main character in `Tokyo Ghoul` and how he transforms into a ghoul?")
'The main character in Tokyo Ghoul is Ken Kaneki. He transforms into a half-ghoul after undergoing surgery that involved receiving some ghoul organs from Rize, a ghoul who was trying to kill him.'
Question Answering on facts
# get a token: https://platform.openai.com/account/api-keys
# from getpass import getpass
# OPENAI_API_KEY = getpass()
········
# import os
# os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-3.5-turbo") # switch to 'gpt-4'
retriever = WikipediaRetriever()
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
questions = [
"What is Apify?",
"What is Uncertainty principle?",
"What is the Abhayagiri Vihāra?",
# "How big is Wikipédia en français?",
]
chat_history = []
for question in questions:
result = qa.invoke({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
print(f"-> **Question**: {question} \n")
print(f"**Answer**: {result['answer']} \n")
-> **Question**: What is Apify?
**Answer**: Apify is a web scraping and automation platform that provides tools for extracting data from websites, automating workflows, and creating APIs for web data access. It allows users to easily create web scraping tasks, schedule them, and manage the extracted data. Apify also offers a marketplace where users can find pre-built web scraping actors for various websites and use them to extract data without needing to write custom code.
-> **Question**: What is Uncertainty principle?
**Answer**: The uncertainty principle, also known as Heisenberg's indeterminacy principle, is a fundamental concept in quantum mechanics. It states that there is a limit to the precision with which certain pairs of physical properties, such as position and momentum, can be simultaneously known. In other words, the more accurately one property is measured, the less accurately the other property can be known. This principle was introduced in 1927 by German physicist Werner Heisenberg and is mathematically expressed in terms of the standard deviations of position and momentum.
-> **Question**: What is the Abhayagiri Vihāra?
**Answer**: Abhayagiri Vihāra was a major monastery site in Anuradhapura, Sri Lanka, that was a significant religious institution for Theravada, Mahayana, and Vajrayana Buddhism. It was one of the most sacred Buddhist pilgrimage cities in the nation and was a large monastic center with magnificent structures and religious units. Abhayagiri Vihāra included the Abhayagiri Dagaba, which was an ancient stupa and a focal point of the complex. Additionally, it was a seat of the Northern Monastery and held the original custodian of the Tooth relic in Sri Lanka. The term "Abhayagiri Vihāra" refers not only to the complex of monastic buildings but also to a fraternity of Buddhist monks who maintained their own historical records and traditions. Founded in the 2nd century BC, it grew into an international institution by the 1st century AD and attracted scholars from distant locations.
API reference
For detailed documentation of all WikipediaRetriever
features and configurations head to the API reference.
Related
- Retriever conceptual guide
- Retriever how-to guides