Showing posts with label Google. Show all posts
Showing posts with label Google. Show all posts

2025-12-16

Internet Search: Yesterday, Today & Tomorrow

In the beginning (yesterday) the Internet was an academic network and then the Free-nets (including the National Capital FreeNet of which I was an early member and information provider) were created providing a place for community organizations and bringing the Internet to the people.

In these early days, before the World Wide Web, the Internet was primarily text based and used search tools known as Archie, Gopher, Veronica and Jughead to search for documents stored online. People also used services such as Usenet to access the equivalent of today’s web forums and IRC (Internet Relay Chat) as a group and private real time messaging service. Most importantly we all had Email, which IMHO is still the most important thing the Internet gives us as individuals.

Then came World Wide Web and HTML and everything changed. The Internet was still non-corporate being primarily educational institutions, non-profit organizations and individuals but that soon changed, many say for the worst, when corporations were allowed onto the network. I would certainly miss online banking and shopping and streaming services have given us access to non-North American “television” we would not have had otherwise.

The WWW gave individuals an opportunity to have their own place on the Internet through personal websites (also called Home Pages back then). Internet Service Providers would provide customers with web storage they could use to create their own web pages using HTML and sites like GeoCities made it even easier. Then came Myspace, a sort of Facebook lite. There were other sites serving the same user base that wanted their own place on the Internet and they all co-existed peacefully. And then came Facebook and everything changed for the worst. Most people criticize Facebook for it’s tracking of users and monetization of their and their “friends” personal information, but to me the most evil thing about it is it’s business model of trying to keep users away from the open Internet and dependent on their proprietary site.

At one time, long before Facebook, there was even a print Internet Yellow Pages that listed all the significant websites on the World Wide Web but it quickly became necessary to have some online tool for people to find what they were interested in without depending on prior knowledge, friends or just luck.

When we started using the Internet for research or to find information were not looking for specific answers to specific questions but for resources where we could find those answers.

And perhaps the best tool for that was the original Yahoo Directory which was a hierarchical listing by subject of web resources curated by librarians to ensure the legitimacy of the sources. Other directories also existed, particularly subject specific ones. As the Internet grew exponentially keeping up a complete directory became an impossible task, or at least economically impossible to compete with search engines that also existed at that time,

In the beginning we used search engines the same as way the Yahoo Directory, to find resources where we could find the information we were seeking. Perhaps the best of the early search engines and my personal preference was Digital Equipment Corporation's AltaVista search engine which allowed users to do a Boolean Search using AND, OR & NOT operators. Soon people started using search engines to find specific answers to specif questions.

Alta Vista and almost all other search engines were surpassed by the original Google search engine whose algorithm impressed everyone so much that it became the dominant search engine. It’s advanced search mode also allowed Boolean searches. It became my (and most peoples) search engine of choice for a long time.

Then came the enshittification of both search and the Internet as a whole.

The enshitification of search happened as Google gained an effective monopoly on Internet search, so much that to search the Internet became “to google” as nearly all searchers were done using Google. And then we saw the gradual degrading of Google as it monetized it’s search engine. We would see promoted links at the top of search results that were paid for. Searches for, as an example, Ford F-150 would have Chevy Silverado as the first listed result because General Motors paid for that. And then we started getting results in the form of answers to questions rather than as links and people referring to “Google said/told me” rather than referring to the sources Google found.

Somewhere along the line the advanced Boolean search capability disappeared from Google and then it became contaminated by LLM chatbots spouting spurious answers and information. It may be possible with enough effort to turn off the AI slop in Google but personally I would not trust that that is so. Google’s once famed reliability is now in the dumpster. And of course Google has become infamous for tracking it’s users.

People have started to slowly move away form Google to privacy supporting search engines like DuckDuckGo, although it has been criticized for it’s optional AI features although it is a lot easier to disable them in settings than with Google. I personally use the non-AI version of Duck Duck Go (https://noai.duckduckgo.com/) which has the AI features disabled. I only wish it had obvious Boolean search capabilities, although there are apparently ways to do Boolean searches and other advanced search techniques for DDG (that I did not know about until I researched this post).

But the enshitification goes beyond Google search and has infected the whole of the Internet/World Wide Web. Over the last 20 years or so we have seen a proliferation of fake news and disinformation sites and social media has increased the amount of misinformation and misinformation online by orders of magnitude.

But the user is also to blame. The reason for Facebook’s success is the fact that consumers today put convenience above all else and when you add the super convenient magic answer machine LLM based AI chatbots that base their answers on whatever is repeated most (the GIGO principle) the result is inevitably garbage.

Tomorrow’s search function requires a better way for those of us more interested in accuracy than convenience. Let us suggest a new model that puts a boolean search engine on top of a directory of trusted sites and builds from there.

We start with an original Yahoo type directory curated by librarians and subject specialists. The directory is hierarchical starting with broader subjects going to lower ones. One can browse or search directory to find the field of knowledge you are interested in and select relevant websites from there.

The curators will not attempt the impossible task of vetting all contents on the websites/resources but they will be selected according to the trustworthiness of those responsible. Different categories of resource will be vetted differential according to their nature.

Information resources on science, the humanities and the social sciences will be judged according to the reliability of the content as ascertained by the trustworthiness of those responsible for them.

There will be a general information category for encyclopedias and similar broad works.

Journalistic sources will be judged again according to the journalistic principles of the organizations, ethical, fact checking, distinguishing opinion from news content, etc.. Sites that are solely expressing opinions will be identified as such and where possible identified according to bias, right leaning, left leaning, etc. Satire sites will be identified as such for those that cannot figure that out.

Political sites will not be vetted according to accuracy but according to whether they are actually who they say they are and not attempts to spoof or misrepresent the opinions of politicians or political organizations. Similarly for corporate and banking sites as a protection against fraud.

Social media sites will be included in the listings for those that seek them out but will not be included automatically in searches.

The next level of search will be the ability to search not just for information resources/websites but also within them like a normal web search but restricted to sites within the directory, as a whole or by specific subject matter, or specific website.

And finally a full internet search will be available where that is desired. The ability to exclude social media sites (and perhaps certain other categories) will be included. All searches will have full Boolean search capability and resources on how to understand and use the Boolean search capability will be provided.

A final capability, which i am on the fence about whether it should be included, is a natural language question search capability with an algorithm to translate that into boolean search terms.

The big question here becomes how can this be funded. Ideally enough users would be willing to pay for accurate search to make it work, but let’s not delude ourselves about the majority of Internet users. So it would probably require some major donors willing to fund it because it is good for society, and hopeful broadly distributed, with small individual donations being at least a significant portion of the funding.

2025-09-01

How to Build an Intelligent Online Answer Machine

Ever since Facebook and Amazon people have become lazier, or perhaps more accurately addicted to convenience over all else, including ethics or accuracy.

When it comes to information we used to search out reliable sources and read information in detail to find answers to our questions Now people just seek to ask so-called “chatbots” the question and accept whatever it gives them based on so called artificial intelligence (AI) which has nothing to do with intelligence or even accurate knowledge, being based on Large Language Models (LLMs) which probe the depths of the Internet to try to guess at what type of answer a real person would give based on all the garbage ever posted on the Internet, ignoring the Garbage In Garbage Out (GIGO) principal.

If people insist on not doing their own research there must be a better way for an “Online Answer Machine” to do it for them.

First you need a decent search engine that can handle AND, OR, NOT and “quotation marks for exact phrase searches” Boolean operators. Google Advanced Search at it’s prime before enshitification would be ideal.

You also need a sophisticated algorithm (some people might call this AI) that can translate natural language questions into Boolean search terms and identify the subject of the question.

The next part is the key to the whole process. You need a human curated database of accurate, reliable and authoritative information sources (web sites or other online sources) indexed by subject matter.

When a question is asked the algorithm would translate it into search terms, determine the subject and search the appropriate sources for that subject to extract an answer for the user, along with citations and links to the sources the answer was taken from.

This certainly will not be as good as doing your own research choosing your own sources but this would not be built for people who want to, or know how to, do their own research.

2024-01-03

AI Has Nothing To Do With Intelligence

AI has nothing to do with intelligence but people believe the marketing hype, mostly because we have a distorted idea of what intelligence is, largely due to the media.

Take the quiz show “Are You Smarter Than a Fifth Grader” that says in its name that it’s about whether contestants are as intelligent as a fifth grade student. What the show actually tests is who is more familiar with the grade five curriculum, grade five students or people who have not been in school for twenty tears or more. I know who I am betting on.

And take the famously super intelligent Jeopardy champions. Maybe some of these people are highly intelligent but that is not why they are Jeopardy champions because Jeopardy is not about intelligence. It is about knowing stuff, particularly the type of stuff Jeopardy asks questions about. At best it is about knowledge, not intelligence.

The Cambridge Dictionary defines intelligence as: “the ability to learn, understand, and make judgments or have opinions that are based on reason”. (Source)

I would refine that to: “the ability to understand and analyze information in order to make rational decisions based on that information”.

Intelligence is not about information it is about reasoning.

I remember what some might call the first forerunner to Alexa and other chat bots. It was called Eliza

ELIZA's creator, Weizenbaum, intended the program as a method to explore communication between humans and machines. He was surprised and shocked that individuals, including Weizenbaum's secretary, attributed human-like feelings to the computer program.[3] Many academics believed that the program would be able to positively influence the lives of many people, particularly those with psychological issues, and that it could aid doctors working on such patients' treatment.[3][13] While ELIZA was capable of engaging in discourse, it could not converse with true understanding.[14] However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary.[6] (Source)

This was not artificial intelligence and neither are the latest claimants, the large language models (LLMs).

A large language model (LLM) is a language model notable for its ability to achieve general-purpose language understanding and generation. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive self-supervised and semi-supervised training process.[1] LLMs are artificial neural networks following a transformer architecture.[2]

As autoregressive language models, they work by taking an input text and repeatedly predicting the next token or word.[3] Up to 2020, fine tuning was the only way a model could be adapted to be able to accomplish specific tasks. Larger sized models, such as GPT-3, however, can be prompt-engineered to achieve similar results.[4] They are thought to acquire knowledge about syntax, semantics and "ontology" inherent in human language corpora, but also inaccuracies and biases present in the corpora.[5]

Notable examples include OpenAI's GPT models (e.g., GPT-3.5 and GPT-4, used in ChatGPT), Google's PaLM (used in Bard), and Meta's LLaMA, as well as BLOOM, Ernie 3.0 Titan, and Anthropic's Claude 2. (Source)

Using statistics to mimic what a human might say or write is not reasoning and it is certainly not intelligence.

It might not be so bad if these systems did not claim to intelligent but only claimed to be able to retrieve accurate information and did that well but they are designed to NOT do that.

I remember the early Internet and search engines with advanced boolean search capability like Alta Vista and the early versions of Google before they sold their top search results to the highest bidder.

Then the Internet was mainly academic institutions and community based organizations. The information on the Internet was relatively reliable most of the time. That information is still there if you pay attention to the actual source.

LLMs could use an information base based on actual reliable sources like Encyclopedia Britannica or Wikipedia, or the collections of actual scientific journals or other respected sources.

But instead they have adopted the bigger/more is better approach feeding as much of the Internet as possible into their models, often without permission of the sources/creators. This leads to an information base dominated by misinformation and disinformation leading to results like “there is no water in the Atlantic Ocean”. But obvious errors are not the danger here but the amplification of misinformation and disinformation in the political sphere.

But it is worse. These disinformation models are proving to be even more wasteful of energy and harmful to the planet than the cryptocurrency scam and their believers/followers just as faithful and misguided. And for what. Obviously they hope to make a shitload of money from this scam.

AI is clearly not intelligent, just dangerous.

2012-02-29

Facebook is NOT The Internet - The Internet IS The (Social) Network

In the beginning there were BBSs (Bulletin Board Systems). In a foreshadowing of things to come, almost immediately following the invention of the Personal Computer (PC) they became communications devices as BBS systems were set up for hobbyists to use to share information and home-written programs. At this time PC users were primarily computer hobbyists and the BBSs were mainly confined to dealing with techie things, although in another foreshadowing you could soon download Sunshine Girl like pin-up photos.

As personal computers became more prevalent and the Internet was established in academia more broadly based online service providers such as CompuServe, Prodigy and America Online (AOL) were established to allow people to access and share information on various interests and hobbies. These services while proprietary and limited to their own online resources also provided an interface to Internet email so people could communicate between service providers using email.

The first access the public had to the Internet was via Freenets, such as the Cleveland Freenet and National Capital Freenet (Ottawa). These used a text interface to allow people to access documents stored online, which were mainly of serious academic interest at that time. These documents were accessible via something called Gopher using search engineswith names like Archie and Veronica. This was before the invention of Hyper Text Markup Language (HTML) and the World Wide Web (WWW). The Freenets also provided members with access to the Internet email network.

The Freenets allowed community organizations to communicate with members and the public by becoming Information Providers. Freenet Information providers included hobbyists in many different fields as well as community activists. This quickly became a way for the Internet to become a community organizing tool and extended it's usefulness beyond academia to the general public.

You could also connect into other Freenets from your local Freenet.

With the creation of the World Wide Web the Freenets established interfaces to access the content on the web as well as allowing information providers to provide information in HTML format.

All of these early online information providers were accessed via dial-up telephone at slow modem speeds but were soon to be followed by full fledged Internet Service Providers (ISPs) that provided the public with full access to the Internet and the emerging World Wide Web.

Although today most users access the Internet via the web, discussion forums, known as Usenet newsgroups can still be accessed via dedicated software and messaging and live chat can be accessed via Internet Relay Chat software, and many people still use dedicated email software. So the Internet is not just the World Wide Web.

But things were changing, high speed Internet via Digital Subscriber Lines (DSL) and cable was becoming available and the controversial idea of allowing commercial and business use of the net was being proposed, again foreshadowing the current controversy over net neutrality and what is becoming commercial dominance of the Internet. While we cannot go back, and I would not want to give up access to Internet commerce and banking and the ability to research products online, we must maintain and protect the most important role of the Internet as a public utility and public information and communications network.

Which brings us to the seemingly most popular Internet phenomenon, Facebook. It seems that for many people the Internet, and they themselves, could not exist without this commercial proprietary site that makes millions be leveraging not only people's personal and private information but that of their friends, in what can best be described as a social marketing business plan.

Perhaps I have no right to criticize Facebook as I do not use it. But I do not use it because of what I have learned about it and my intuitive sense, as an early personal computer and Internet user, that Facebook is evil. While I may also have some concerns about the empire Google is building, and avoid Google Plus because of that, my intuition is that Google is still managing to remain true to it's "don't be evil" principles.

What surprises and concerns me most about Facebook is that it has been able to extend that same sense of necessity, that "we have to be on Facebook to reach the public", to progressive community organizations, that I believe should know better. Everyone that is on Facebook, the so-called social network, is on the Internet. The Internet is The Network and there are many organizing tools on the network for progressive organizations to use.

So what tools do progressive community organizations have available on the Internet.

The main tool for providing an online presence has always been a website. Although it does not have the sexy new cachet of a blog or Twitter, or even Facebook, a website provides the basis for connecting all of an organizations online tools. That is why the web was designed the way it was, why HTML was written the way it was, and why Uniform Resource Locators (URLs) allow all online tools to connect to each other.

A website allows an organization to provide basic and comprehensive information to it's members and the public as well as links to documents stored online using resources such as Google Docs. Organization websites can also to link to other resources such as blogs or Twitter accounts. The first website I was responsible for is now archived here.

Web forums connected to websites, which have replaced Usenet newsgroups, provide an excellent means for organizations to communicate with and hold discussions amongst their members and the general public. Forums can be organized by subjects with separate threads for each discussion and can be open to the public or private, in terms of ability to read them or post to them. They can allow interested persons to choose what to read and respond to and avoid receiving massive amounts of email, that can be restricted to more important urgent messages. An example of an effective web forum can be seen here.

Blogs are also very useful for organizations and their members to provide information and express opinions and can be linked from the organizations website, allowing individuals to use whichever blogging platform they choose. Two of the most popular platforms are Blogger and WordPress. This blog is written on Blogger and an example of a WordPress blog is here.

Blogging aggregators, such as Progressive Bloggers are great resources too. They allow you to reach like-minded people with your blogs as well as read blogs of interest. Aggregators are available according to political philosophy, region and subject interest

Another very interesting and little known, little used, Internet resources is Internet Relay Chat (IRC) which provides for real time group discussions, as well as one on one one chats and document transfers. It can be used to hold online meetings. All you have to do is log onto an IRC server using appropriate software and create a room, which can be public or invite only.

Twitter is one Internet resource in particular that I want to talk about. Twitter is the newest Internet tool and one of the most interesting - sort of like a mass e-mailer with a character limit, but not exactly. And of course like most Internet tools Twitter can be abused.

Twitter can be used to tell everyone you know what you had for breakfast or what you're wearing to the prom, but, please don't. I find one of its best uses is by journalists to tweet out breaking news before they have written their complete stories and to live tweet public events, sort of a current affairs play-by-play service. It can also be used effectively by organizations to send out news or event information to their followers.

I follow a few key guidelines in using Twitter. I only try to send out a few tweets a day, either links to my latest blog posts or blog or news entries I think are important and sometimes insightful or witty thoughts. My Twitter feed can be found here.

I limit myself to following people that post interesting and useful information and limit their amount of posting, I do not have all day to read tweets. I recently added, and then quickly deleted, WikiLeaks from my followers due to their over-tweeting. Tweeting a countdown from 10 to 1 in separate tweets before tweeting an announcement is not clever. It is just annoying. But not quite as annoying as random messages inviting people to porn sites.

I also do not understand people who collect followers by following random people hoping they will follow them. Do people who follow thousands of people actually read their tweets. If they have that little of a real life they are probably not worth following.

As the Internet evolves there will, of course, be various other new online resources organizations can use, all of which can be connected together via the main website.

It is very important that we, the public, do not let the telecommunications industry, or other commercial or proprietary interests take control of the Internet and progressive community organizations should avoid being co-opted by such attempts. The Internet IS The Network.