john-adeojo / custom_websearch_agent Goto Github PK
View Code? Open in Web Editor NEWCustom Websearch Agent Built with Local Models, vLLM, and OpenAI
License: MIT License
Custom Websearch Agent Built with Local Models, vLLM, and OpenAI
License: MIT License
Maybe using this repo can help to make the prompt more precise: Click here
I would be awesome to see if the llama 8b model can get similar results as the paid models
Can you also integrate Groq api?
a request example can be
def interact_with_groq_api(message_content):
headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {GROQ_API_KEY}'
}
data = {
'messages': [
{
'role': 'user',
'content': message_content
}
],
'model': 'llama3-8b-8192',
'temperature': 1,
'max_tokens': 1024,
'top_p': 1,
'stream': False,
'stop': None
}
response = requests.post(API_URL, headers=headers, data=json.dumps(data))
if response.status_code == 200:
return response.json()
else:
return {
'error': f"Request failed with status code {response.status_code}",
'details': response.text
}
and an answer looks like this
Enter your message: test
{
"id": "chatcmpl-3d79b5fe-9728-4c80-a26a-b6a0b539cce4",
"object": "chat.completion",
"created": 1716974410,
"model": "llama3-8b-8192",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "It looks like you're trying to test something! Is there something specific I can help you with or should I just say "Hello!""
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 11,
"prompt_time": 0.003151141,
"completion_tokens": 27,
"completion_time": 0.021647777,
"total_tokens": 38,
"total_time": 0.024798918
},
"system_fingerprint": "fp_af05557ca2",
"x_groq": {
"id": "req_01hz1tcp5kf3fr5zpprzyza69c"
}
}
Current code :
def scrape_website_content(self, website_url, failed_sites=[]):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Referer': 'https://www.google.com/',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Accept-Encoding': 'gzip, deflate, br'
}
def is_garbled(text):
# Count non-ASCII characters
non_ascii_chars = sum(1 for char in text if char not in string.printable)
try:
# Calculate the proportion of non-ASCII characters
return non_ascii_chars / len(text) > 0.2
except ZeroDivisionError:
# If the text is empty, it cannot be garbled
return False
try:
# Making a GET request to the website
response = requests.get(website_url, headers=headers, timeout=15)
response.raise_for_status() # This will raise an exception for HTTP errors
# Detecting encoding using chardet
detected_encoding = chardet.detect(response.content)
response.encoding = detected_encoding['encoding'] if detected_encoding['confidence'] > 0.5 else 'utf-8'
# Handling possible issues with encoding detection
try:
content = response.text
except UnicodeDecodeError:
content = response.content.decode('utf-8', errors='replace')
# Parsing the page content using BeautifulSoup
soup = BeautifulSoup(content, 'html.parser')
text = soup.get_text(separator='\n')
# Cleaning up the text: removing excess whitespace
clean_text = '\n'.join([line.strip() for line in text.splitlines() if line.strip()])
split_text = clean_text.split()
first_5k_words = split_text[:5000]
clean_text_5k = ' '.join(first_5k_words)
if is_garbled(clean_text):
print(f"Failed to retrieve content from {website_url} due to garbled text.")
failed = {"source": website_url, "content": "Failed to retrieve content due to garbled text"}
failed_sites.append(website_url)
return failed, failed_sites, False
return {"source": website_url, "content": clean_text_5k}, "N/A", True
def scrape_site_jina(self, website_url, failed_sites=[]):
prefixurl="https://r.jina.ai/"
response = requests.get(prefixurl+website_url)
if response.status_code == 200:
print(response.text)
return {"source": website_url, "content":response.text[0:20*1000]}, "N/A", True
else:
print('Failed to retrieve the webpage. Status code:', response.status_code)
failed = {"source": website_url, "content": "Failed to retrieve content due to an error: "}
failed_sites.append(website_url)
return failed, failed_sites, False
Additional links for the jina ai reader api :
https://jina.ai/reader/#demo
how do i use a locally hosted model (with ollama) with this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.