The AI Data Preparer Tool is a robust solution designed to assist industries in need of an efficient web crawler that gathers categorized data from the internet using specified keywords. This SaaS web data integration (WDI) platform is capable of transforming unstructured web data into a structured format. It achieves this by extracting, preparing, and integrating web data into Azure Machine Learning models.
The tool provides a visual environment for automating the workflow of extracting and transforming web data. After the target website URL is specified, the web data extraction module offers a visual environment for designing automated workflows to harvest data. This goes beyond traditional HTML/XML parsing of static content by automating user interactions, thereby accessing data that might not be immediately visible. Once the data is extracted, the software offers comprehensive data preparation capabilities for harmonizing and cleansing the web data.
For consuming the results, the AI Data Preparer Tool provides several options. It has its own visualization and dashboarding module to assist criminal investigators in gaining the insights they need. Additionally, it provides APIs that offer full access to all functionalities available on the platform, allowing for direct integration of web data.
The AI Data Preparer Tool is capable of crawling ten million links and scraping one million links per month using workers. Furthermore, it has the potential to exceed this number if tested under standard cloud platforms. This makes it a powerful tool for industries that rely heavily on web data for their operations.
- Structured data
- News
- Email PDF report
- Graphical Data
- Scraped Data in text, images,pdf, docx and video format.
- Trends and Analytica
- Recent Tweets (Social Media Crawling/Scraping)
- Multiple keywords search
- Multiple filters option
- Schedule the scraping time
- User registration and advance admin panel
- AI to search smart
- Search with rotating proxy to avoid ban.
- Falcon own custom REST API
- Multilingual support
- Media data AI Processing
- Scalable to Crawl 10 Millions Link and Scrape 1 Million Links
AI DATA PREPARER uses the worker feature to take up multiple tasks from the user and perform it in a queue. We can have upto 10 workers at a time. This feature allows us to crawl around ten million links and scrap around one million links.
username - sih2020sk216 password - Sih#2020
- Create virtual enviorement, then activate it.
- Install all the requirements file,
pip install -r requirements.txt
- Setup RabbitMQ server for broker service,
docker run -it --rm --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management
- start your rabbitmq broker service.
- In falcon setting change
CELERY_BROKER_URL = 'your_rabbitmq_address'
, if your not using the default port for RabbitMQ. - Run celery worker,
celery -A falcon worker -l info
- For first time usage,
python manage.py migrate
and create adminpython manage.py createsuperuser
- Run FALCON,
python manage.py runserver