Comments (2)
hey @dldx @jarmitage
We forked the Falcon tool a while back and integrated the import of the existing history and bookmarks.
We have done it by importing it via the chrome.history/bookmarks api.
You can check it out here: https://github.com/WorldBrain/Research-Engine
We are more than happy to collaborate on this in the future!
Best,
Oliver
from falcon.
I've come up with a hackish way and a little technical way of doing this. Chrome/Opera stores the past 3 months worth of history, not more, which is annoying but that is what we have to work with. For me, that's still a helluva lot of urls so I had to come up with various ways of filtering it down to something more manageable. I don't really want to load every random website I visited in any case.. So here's what I did. These instructions are for Linux but I'm sure they would be similar on Mac too:
- Change Chrome's settings to not load any images to save bandwidth and memory. Also close/save any tabs you care about because we're going to load a lot of new tabs at once and you won't be able to rescue old ones.
- Close all windows of Chrome/Opera - you can't open the history file if you don't.
- Install SQliteman or a similar SQLite database viewer and sqlite3-pcre (a regex plugin for sqlite)
- Open the History database which is located at ~/.config/google-chrome/Default/History (or something similar if you have several profiles) or ~/.config/opera/History
- Load the regex plugin into Sqliteman with SELECT load_extension('/usr/lib/sqlite3/pcre.so');
- Run the following code to create a list of websites you want.
select urls.url from urls inner join visits on urls.id = visits.url where urls.url not like '%google.%' and urls.url not like '%facebook.com%' and urls.url not like '%youtube.com%' and urls.url not like '%localhost%' and urls.url not like '%127.0%' and urls.url not like '%192.168%' and urls.url not like '%zero%' and urls.url not like '%out.reddit.com%' and urls.url not regexp '^https?:\/\/[\w\.]+[a-z\/]?$' and urls.title like '%income%' or urls.title like '%climate%' group by urls.url order by sum(visits.visit_duration) desc;
This is just an example but you can change it to suit your needs. For example, I filtered out facebook, youtube, localhost, etc because they wouldn't be interesting. Then I filtered out all urls that go to the homepage of a site and finally I searched for the words "income" or "climate" in the page titles because I'm interested in basic income and climate change. Without those final filters, I would get thousands of urls but with them, I only get about 200. Anyway, play with the filters a bit in sqliteman to get a list of urls you want to archive but make sure it isn't too long. Save the SQL code you used, including the load_extension line to a file called interesting_sites.sql. Then close sqliteman. - Open a terminal and run something like this:
cat interesting_sites.sql | sqlite3 ~/.config/opera-developer/History | while read line; do opera-developer --new-page $line &; done
Replace opera-developer with google-chrome, etc, etc - This command will get the list of urls from sqlite, then load up each url in chrome/opera and hopefully, falcon will automatically index every site. It worked pretty well for me and only took a few seconds to load about 150 sites.
Hope that helps. I'll try to find a way to do better filtering of history but this is what I have so far!
Cheers,
Durand
from falcon.
Related Issues (20)
- Accept donations HOT 2
- Extension breaks go to unread post on ArsTechnica forums
- Credit Min for project inspiration? HOT 2
- Feature Req: Ability to index bookmarks even when history cleared. HOT 1
- Is it possible to export a cached page? HOT 3
- Firefox
- Feature request
- Falcon opens https://github.com/lengstrom/falcon every time it is updated HOT 4
- We added support for existing History, Bookmarks and PDFs. Want to collaborate? HOT 2
- [BUG] 1 Blacklist option not working, rest just Regex
- [BUG] Default Blacklist not working
- Release new version on chrome webstore HOT 5
- Great Addon! hope support Firefox browse as well!!! HOT 2
- Can not work in vivaldi HOT 1
- Show results page with more entries? HOT 2
- Allow customization for the hot key (f)? HOT 1
- Add PDF support
- [Q] In what format and where are the indices stored? HOT 3
- Chrome Web Store says "press enter", which is wrong HOT 1
- Show all search results
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from falcon.