ols.wtf / blog / search-engine-ideas

Search engine ideas

Oliver Leaver-Smith <oliver@leaversmith.com> on 2022-11-07 08:44:37

There is yet more big-web resistance brewing, and for this reason I think it's time that I dusted off an old project of a curated search engine that anyone can run their own instance of (not federated).

I had relative personal success with a project called veri which met my immediate goals, however I now want to improve this project to make it a viable alternative for people. It will still be called veri which is both the Latin word for truth, reality, or fact, and also the Turkish word for data; all of these definitions seem pretty apt.

I have previously stated that the goals of veri are as follows:

All of these points still stand.

The workflow of veri will be as follows

  1. An instance operator maintains a list of "tier 1" URLs which are trusted sources of good information. A good example would be the sitemap of a useful site.
  2. veri-crawl periodically spiders this list of URLs to generate a list of "tier 2" URLs
  3. veri-scrape periodically scrapes the full list of URLs for their text content and metadata to be stored in a database
  4. veri-search is an API that processes queries, ad hoc reindexing, and deletion
  5. veri-www is a web front end to interact with the search API

Following this pattern, an example search engine might be one created by the Go community, which uses the trending Stack Overflow pages, Go documentation, popular Go bloggers and blog aggregators, etc. in order to provide a trusted and curated way to search for information on a particular subject.

Some additional goals that are not set in stone are:

As and when I have time to work on this, I will be updating this blog with relevant changes. And for those that know me, no this won't use YAML for storage, I actually want people to use this particular project!