The custom_websearch_agent's discuss from john-adeojo

custom_websearch_agent's Issues

To help the Ollama Agent

Maybe using this repo can help to make the prompt more precise: Click here

I would be awesome to see if the llama 8b model can get similar results as the paid models

Groq api Integration

Can you also integrate Groq api?
a request example can be
def interact_with_groq_api(message_content):
headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {GROQ_API_KEY}'
}

data = {
    'messages': [
        {
            'role': 'user',
            'content': message_content
        }
    ],
    'model': 'llama3-8b-8192',
    'temperature': 1,
    'max_tokens': 1024,
    'top_p': 1,
    'stream': False,
    'stop': None
}

response = requests.post(API_URL, headers=headers, data=json.dumps(data))

if response.status_code == 200:
    return response.json()
else:
    return {
        'error': f"Request failed with status code {response.status_code}",
        'details': response.text
    }

and an answer looks like this
Enter your message: test
{
"id": "chatcmpl-3d79b5fe-9728-4c80-a26a-b6a0b539cce4",
"object": "chat.completion",
"created": 1716974410,
"model": "llama3-8b-8192",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "It looks like you're trying to test something! Is there something specific I can help you with or should I just say "Hello!""
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 11,
"prompt_time": 0.003151141,
"completion_tokens": 27,
"completion_time": 0.021647777,
"total_tokens": 38,
"total_time": 0.024798918
},
"system_fingerprint": "fp_af05557ca2",
"x_groq": {
"id": "req_01hz1tcp5kf3fr5zpprzyza69c"
}
}

#suggestion Could write a simpler code for the websearch agents scrape website function which does not fail as much in retrieving content

Current code :

def scrape_website_content(self, website_url, failed_sites=[]):
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'Accept-Language': 'en-US,en;q=0.9',
            'Referer': 'https://www.google.com/',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
            'Accept-Encoding': 'gzip, deflate, br'
        }

        def is_garbled(text):
            # Count non-ASCII characters
            non_ascii_chars = sum(1 for char in text if char not in string.printable)
            try:
                # Calculate the proportion of non-ASCII characters
                return non_ascii_chars / len(text) > 0.2
            except ZeroDivisionError:
                # If the text is empty, it cannot be garbled
                return False

        
        try:
            # Making a GET request to the website
            response = requests.get(website_url, headers=headers, timeout=15)
            response.raise_for_status()  # This will raise an exception for HTTP errors
            
            # Detecting encoding using chardet
            detected_encoding = chardet.detect(response.content)
            response.encoding = detected_encoding['encoding'] if detected_encoding['confidence'] > 0.5 else 'utf-8'
            
            # Handling possible issues with encoding detection
            try:
                content = response.text
            except UnicodeDecodeError:
                content = response.content.decode('utf-8', errors='replace')
            
            # Parsing the page content using BeautifulSoup
            soup = BeautifulSoup(content, 'html.parser')
            text = soup.get_text(separator='\n')
            # Cleaning up the text: removing excess whitespace
            clean_text = '\n'.join([line.strip() for line in text.splitlines() if line.strip()])
            split_text = clean_text.split()
            first_5k_words = split_text[:5000]
            clean_text_5k = ' '.join(first_5k_words)

            if is_garbled(clean_text):
                print(f"Failed to retrieve content from {website_url} due to garbled text.")
                failed = {"source": website_url, "content": "Failed to retrieve content due to garbled text"}
                failed_sites.append(website_url)
                return failed, failed_sites, False
            

            return {"source": website_url, "content": clean_text_5k}, "N/A",  True

My modification:

def scrape_site_jina(self, website_url, failed_sites=[]):
        prefixurl="https://r.jina.ai/"
        response = requests.get(prefixurl+website_url)
        if response.status_code == 200:
            print(response.text)
            return {"source": website_url, "content":response.text[0:20*1000]}, "N/A",  True
        else:
            print('Failed to retrieve the webpage. Status code:', response.status_code)
            failed = {"source": website_url, "content": "Failed to retrieve content due to an error: "}
            failed_sites.append(website_url)
            return failed, failed_sites, False

Additional links for the jina ai reader api :
https://jina.ai/reader/#demo

local models

how do i use a locally hosted model (with ollama) with this?

john-adeojo / custom_websearch_agent Goto Github PK

custom_websearch_agent's Issues

To help the Ollama Agent

Groq api Integration

#suggestion Could write a simpler code for the websearch agents scrape website function which does not fail as much in retrieving content

My modification:

local models

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent