Showing posts with label Internet directories. Show all posts
Showing posts with label Internet directories. Show all posts

2025-12-16

Internet Search: Yesterday, Today & Tomorrow

In the beginning (yesterday) the Internet was an academic network and then the Free-nets (including the National Capital FreeNet of which I was an early member and information provider) were created providing a place for community organizations and bringing the Internet to the people.

In these early days, before the World Wide Web, the Internet was primarily text based and used search tools known as Archie, Gopher, Veronica and Jughead to search for documents stored online. People also used services such as Usenet to access the equivalent of today’s web forums and IRC (Internet Relay Chat) as a group and private real time messaging service. Most importantly we all had Email, which IMHO is still the most important thing the Internet gives us as individuals.

Then came World Wide Web and HTML and everything changed. The Internet was still non-corporate being primarily educational institutions, non-profit organizations and individuals but that soon changed, many say for the worst, when corporations were allowed onto the network. I would certainly miss online banking and shopping and streaming services have given us access to non-North American “television” we would not have had otherwise.

The WWW gave individuals an opportunity to have their own place on the Internet through personal websites (also called Home Pages back then). Internet Service Providers would provide customers with web storage they could use to create their own web pages using HTML and sites like GeoCities made it even easier. Then came Myspace, a sort of Facebook lite. There were other sites serving the same user base that wanted their own place on the Internet and they all co-existed peacefully. And then came Facebook and everything changed for the worst. Most people criticize Facebook for it’s tracking of users and monetization of their and their “friends” personal information, but to me the most evil thing about it is it’s business model of trying to keep users away from the open Internet and dependent on their proprietary site.

At one time, long before Facebook, there was even a print Internet Yellow Pages that listed all the significant websites on the World Wide Web but it quickly became necessary to have some online tool for people to find what they were interested in without depending on prior knowledge, friends or just luck.

When we started using the Internet for research or to find information were not looking for specific answers to specific questions but for resources where we could find those answers.

And perhaps the best tool for that was the original Yahoo Directory which was a hierarchical listing by subject of web resources curated by librarians to ensure the legitimacy of the sources. Other directories also existed, particularly subject specific ones. As the Internet grew exponentially keeping up a complete directory became an impossible task, or at least economically impossible to compete with search engines that also existed at that time,

In the beginning we used search engines the same as way the Yahoo Directory, to find resources where we could find the information we were seeking. Perhaps the best of the early search engines and my personal preference was Digital Equipment Corporation's AltaVista search engine which allowed users to do a Boolean Search using AND, OR & NOT operators. Soon people started using search engines to find specific answers to specif questions.

Alta Vista and almost all other search engines were surpassed by the original Google search engine whose algorithm impressed everyone so much that it became the dominant search engine. It’s advanced search mode also allowed Boolean searches. It became my (and most peoples) search engine of choice for a long time.

Then came the enshittification of both search and the Internet as a whole.

The enshitification of search happened as Google gained an effective monopoly on Internet search, so much that to search the Internet became “to google” as nearly all searchers were done using Google. And then we saw the gradual degrading of Google as it monetized it’s search engine. We would see promoted links at the top of search results that were paid for. Searches for, as an example, Ford F-150 would have Chevy Silverado as the first listed result because General Motors paid for that. And then we started getting results in the form of answers to questions rather than as links and people referring to “Google said/told me” rather than referring to the sources Google found.

Somewhere along the line the advanced Boolean search capability disappeared from Google and then it became contaminated by LLM chatbots spouting spurious answers and information. It may be possible with enough effort to turn off the AI slop in Google but personally I would not trust that that is so. Google’s once famed reliability is now in the dumpster. And of course Google has become infamous for tracking it’s users.

People have started to slowly move away form Google to privacy supporting search engines like DuckDuckGo, although it has been criticized for it’s optional AI features although it is a lot easier to disable them in settings than with Google. I personally use the non-AI version of Duck Duck Go (https://noai.duckduckgo.com/) which has the AI features disabled. I only wish it had obvious Boolean search capabilities, although there are apparently ways to do Boolean searches and other advanced search techniques for DDG (that I did not know about until I researched this post).

But the enshitification goes beyond Google search and has infected the whole of the Internet/World Wide Web. Over the last 20 years or so we have seen a proliferation of fake news and disinformation sites and social media has increased the amount of misinformation and misinformation online by orders of magnitude.

But the user is also to blame. The reason for Facebook’s success is the fact that consumers today put convenience above all else and when you add the super convenient magic answer machine LLM based AI chatbots that base their answers on whatever is repeated most (the GIGO principle) the result is inevitably garbage.

Tomorrow’s search function requires a better way for those of us more interested in accuracy than convenience. Let us suggest a new model that puts a boolean search engine on top of a directory of trusted sites and builds from there.

We start with an original Yahoo type directory curated by librarians and subject specialists. The directory is hierarchical starting with broader subjects going to lower ones. One can browse or search directory to find the field of knowledge you are interested in and select relevant websites from there.

The curators will not attempt the impossible task of vetting all contents on the websites/resources but they will be selected according to the trustworthiness of those responsible. Different categories of resource will be vetted differential according to their nature.

Information resources on science, the humanities and the social sciences will be judged according to the reliability of the content as ascertained by the trustworthiness of those responsible for them.

There will be a general information category for encyclopedias and similar broad works.

Journalistic sources will be judged again according to the journalistic principles of the organizations, ethical, fact checking, distinguishing opinion from news content, etc.. Sites that are solely expressing opinions will be identified as such and where possible identified according to bias, right leaning, left leaning, etc. Satire sites will be identified as such for those that cannot figure that out.

Political sites will not be vetted according to accuracy but according to whether they are actually who they say they are and not attempts to spoof or misrepresent the opinions of politicians or political organizations. Similarly for corporate and banking sites as a protection against fraud.

Social media sites will be included in the listings for those that seek them out but will not be included automatically in searches.

The next level of search will be the ability to search not just for information resources/websites but also within them like a normal web search but restricted to sites within the directory, as a whole or by specific subject matter, or specific website.

And finally a full internet search will be available where that is desired. The ability to exclude social media sites (and perhaps certain other categories) will be included. All searches will have full Boolean search capability and resources on how to understand and use the Boolean search capability will be provided.

A final capability, which i am on the fence about whether it should be included, is a natural language question search capability with an algorithm to translate that into boolean search terms.

The big question here becomes how can this be funded. Ideally enough users would be willing to pay for accurate search to make it work, but let’s not delude ourselves about the majority of Internet users. So it would probably require some major donors willing to fund it because it is good for society, and hopeful broadly distributed, with small individual donations being at least a significant portion of the funding.