Giter Club home page Giter Club logo

front-end's Introduction

banner

Mwmbl - No ads, no tracking, no cruft, no profit

Matrix

Mwmbl is a non-profit, ad-free, free-libre and free-lunch search engine with a focus on useability and speed. At the moment it is little more than an idea together with a proof of concept implementation of the web front-end and search technology on a small index.

Our vision is a community working to provide top quality search particularly for hackers, funded purely by donations.

mwmbl

Crawling

Update 2022-02-05: We now have a distributed crawler that runs on our volunteers' machines! If you have Firefox you can help out by installing our extension. This will crawl the web in the background, retrieving one page a second. It does not use or access any of your personal data. Instead it crawls the web at random, using the top scoring sites on Hacker News as seed pages. After extracting a summary of each page, it batches these up and sends the data to a central server to be stored and indexed.

Why a non-profit search engine?

The motives of ad-funded search engine are at odds with providing an optimal user experience. These sites are optimised for ad revenue, with user experience taking second place. This means that pages are loaded with ads which are often not clearly distinguished from search results. Also, eitland on Hacker News comments:

Thinking about it it seems logical that for a search engine that practically speaking has monopoly both on users and as mattgb points out - [to some] degree also on indexing - serving the correct answer first is just dumb: if they can keep me going between their search results and tech blogs with their ads embedded one, two or five times extra that means one, two or five times more ad impressions.

But what about...?

The space of alternative search engines has expanded rapidly in recent years. Here's a very incomplete list of some that have interested me:

  • YaCy - an open source distributed search engine
  • search.marginalia.nu - a search engine favouring text-heavy websites
  • Gigablast - a privacy-focused search engine whose owner makes money by selling the technology to third parties
  • Brave
  • DuckDuckGo

Of these, YaCy is the closest in spirit to the idea of a non-profit search engine. The index is distributed across a peer-to-peer network. Unfortunately this design decision makes search very slow.

Marginalia Search is fantastic, but it is more of a personal project than an open source community.

All other search engines that I've come across are for-profit. Please let me know if I've missed one!

Designing for non-profit

To be a good search engine, we need to store many items, but the cost of running the engine is at least proportional to the number of items stored. Our main consideration is thus to reduce the cost per item stored.

The design is founded on the observation that most items rank for a small set of terms. In the extreme version of this, where each item ranks for a single term, the usual inverted index design is grossly inefficient, since we have to store each term at least twice: once in the index and once in the item data itself.

Our design is a giant hash map. We have a single store consisting of a fixed number N of pages. Each page is of a fixed size (currently 4096 bytes to match a page of memory), and consists of a compressed list of items. Given a term for which we want an item to rank, we compute a hash of the term, a value between 0 and N - 1. The item is then stored in the corresponding page.

To retrieve pages, we simply compute the hash of the terms in the user query and load the corresponding pages, filter the items to those containing the term and rank the items. Since each page is small, this can be done very quickly.

Because we compress the list of items, we can rank for more than a single term and maintain an index smaller than the inverted index design. Well, that's the theory. This idea has yet to be tested out on a large scale.

How to contribute

There are lots of ways to help:

If you would like to help in any of these or other ways, thank you! Please join our Matrix chat server or email the main author (email address is in the git commit history).

Development

Local Testing

For trying out the service locally see the section in the Mwmbl book.

Using Dokku

Note: this method is not recommended as it is more involved, and your index will not have any data in it unless you set up a crawler to crawl to your server. You will need to set up your own Backblaze or S3 equivalent storage, or have access to the production keys, which we probably won't give you.

Follow the deployment instructions

Frequently Asked Question

How do you pronounce "mwmbl"?

Like "mumble". I live in Mumbles, which is spelt "Mwmbwls" in Welsh. But the intended meaning is "to mumble", as in "don't search, just mwmbl!"

front-end's People

Contributors

adjagu avatar colinespinas avatar daoudclarke avatar echedellelr avatar groundcat avatar sarayourfriend avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

front-end's Issues

Remove google dependencies

This is mostly about removing Google Fonts as it seems that it does not respect GDPR standards for privacy.

Here is a reddit post linking to an article (in German) talking about Google exploiting IPs from their CDN's users.

This is an occasion to not use any Google dependencies anymore as our primary concern is privacy.

Intermediary queries in browser history

While you typing a search query, with each character URL updates and this commits in browser history.
Suggest to update URL on search form lost focus or on page scroll.

This is how browser history looks like

Browser History Flood

Add links to the social pages and GitHub repository

Having links to the different social pages of the project could be a huge improvement for the project's visibility.

It would also give incentive to contribute to the project.

The first link that could be added are the wiki and the GitHub repository.

Respect reduced motion settings

While the minimalist homepage is quite pleasant, the movement on it can be jarring for some folks. For these cases there is the prefers-reduced-motion media query that can be used to develop alternative (usually much more minimal to no) animations.

Originally I had overcomplicated the potential solution (you can see it in the revisions of this issue). There is a simple fix though: wait until the user submits the query, just like other forms. At this point, move the search bar. With reduced motion enabled, pop the search bar up to the top without a transition. This will solve the movement and will also remove the possibility that the movement happens while you are typing, as in the video below.

mwmbl-motion-mid-typing.mp4

Anyway, love the project! I will be following it closely. If you ever want to pop over to Openverse we're also doing a non-profit search engine thing, though admittedly with a specific niche and goal in mind.

Also, if you're happy to accept a PR and any have suggestions for which solution you'd like to see implemented, I'd be glad to take a go at it ๐Ÿ™‚

[Feature request] Add support to UXP-based web browsers

Given that this project is in its beginnings yet, it would be ideal to have into account side web browsers such as UXP-based ones like Pale Moon or Basilisk from the start while adding support to certain web features.

By now, the support is none and the website doesn't even load with a maximum of this being thrown in the console:

SyntaxError: expected expression, got keyword 'import'[Learn More]  
index.166e7ea2.js:1:1328

Load Results In Limit & A Button To Show The Next Results!

Hello.

This 'Mwmbl' search engine loads all results in once, and there is no button or any arrow to load the next. And isn't it will effect the performance of the website? It is very important to implement this feature. And a arrow to go back to previous result, there is no-need to create a new html page for the next results, just modify html elements.

Thanks.

Race condition when typing in search bar

  • Type something, e.g. "Hacker news"
  • E.g. two requests are made, ""Hacker new", then "Hacker news"
  • The second one returns first and updates the results
  • Then the first one returns and updates it to the results for "Hacker new"

I think we should cancel existing requests before we fire a new one

Disable source maps in production build

The actual build that is made in the deployment workflow generates useless source maps.

We need to use the --no-source-maps to disable this behavior in the build step.

Use atomic design to organize components

Currently there are not a lot of components, but as the project grows we'll need to tidy things a bit more for maintainability.

Organizing our components using Atomic Design could be a great first step in that direction.

My proposal is to create 3 directories in the current components directory, one for each component type (atoms, molecules, organisms).

We don't have pages currently as we rely on a main index.html to display our components, the pages directory should only be added once/if we implement routing and multiple pages.

Optimize CSS resources

Currently, this project has a poor performance rating on https://pagespeed.web.dev/.

This is largely due to the fact that we load our icons from a distant source that lead to a lot of unused CSS (we currently load the entirety of Phosphor Icons). By making this file local, we can reduce it at build time.

The second factor would be font loading through Google Fonts (though we may not want to change that as of now).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.