AI Terms You Need To Know

There is no doubt that Artificial Intelligence (AI) is the hottest topic in IT and business at the moment. As it continues to evolve, so does the terminology we use to describe it. All these words can be overwhelming, especially for those of us who do not have a data science background. As a solution architect, I have to be able to apply these technologies in a business context without having to understand the mathematics behind it. In other words, I have to be able to use the Handy Andy without understanding the chemistry in the product. If you are in the same boat as me, here is an explanation of a few commonly used terms in AI that which will help you navigate through the avalanche of documentation.

Cognitive tools:
So why do we talk about cognitive tools within artificial intelligence products? The answer lies in the lofty objective to create intelligent machines that mimic human abilities, such as learning, reasoning, problem-solving, perception, natural language understanding, and decision-making. That means that AI systems can see, can hear, can think and can respond in response to their senses – simulating human visual, auditory and reasoning ability. AI systems typically learn from large amounts of data from which they identify patterns that allow them to improve their performance over time. For that reason, we refer to them as cognitive tools. Many already match or exceed human ability in a variety of domains, with the potential to improve our lives in countless ways.

AGI or Artificial General Intelligence:
Artificial General Intelligence (AGI) refers to the development of an AI system that possesses all the human cognitive abilities. Narrow AI, on the other hand, is skilled in only one of the human senses. The challenge is to develop an AGI capability that will one day have the capacity to learn and reason just like a human. There are a number of narrow AI systems that can function in a specific domain such as text, image or other pattern recognition, however, the development of AGI is still in its infancy, and there is much debate about whether it is even possible to achieve.

Generative Pre-trained Transformer (GPT):
A Generative Pretrained Transformer, or GPT for short, is a type of deep learning neural network that uses a transformer architecture and is pretrained on a large corpus of text data. The goal of the model is to generate high-quality text that is similar in style and structure to the text we use in daily communication.
The GPT model was introduced by OpenAI in 2018 and has since been used in a wide range of natural language processing (NLP) applications such as text completion, question answering, and language translation. The model is called a “generative” model because it can generate new text, rather than just recognising or classifying text that has already been seen. The model first ingested a large amount of text from a variety of sources, such as Wikipedia, books, web crawls, before it was fine-tuned for downstream tasks.
Overall, the GPT model is one of the most powerful and flexible language models and has been used to achieve superb results on a wide range of language understanding tasks.

Foundation models:
A foundation model generally refers to a large pre-trained neural network that can be used as a starting point for various other machine learning tasks. Foundation models are typically trained on an enormous amount of data and are designed to learn general patterns and representations that can be applied to a wide range of tasks.
Large language models (LLMs) such as Open AI’s GPT-3 and Google’s LaMDA are the best-known examples of foundation models because of their ability to generate realistic natural language text and engage in sustained coherent dialogues.
One of the main advantages of using a foundation model is that it can dramatically reduce the cost and computational resources needed to train a brand-new model for a specific purpose.

Domain-specific Foundation models:
Domain-specific foundation models are fine-tuned on a complementary dataset, to rapidly create industry specific or corporate models.
For example, a domain-specific foundation model for image classification may be pre-trained on a large dataset of images, such as ImageNet, to recognize general patterns and features within different types of images. This pre-trained model can then be adapted for a specific task, such as classifying different types of flowers, by fine-tuning it on a smaller dataset of flower images.
In other words, by using domain-specific foundation models, developers can build on top of the pre-existing knowledge and learning of the foundation model to perform specific tasks within a domain.

Generative AI:
Generative models refer to AI models that can create new content, such as language or images, after being trained on large amounts of data, often referred to as ‘web-scale’ or ‘internet-scale’. The output generation is usually activated through natural language interfaces or ‘prompts’. Examples of these are ChatGPT for text generation and Midjourney or Stable Diffusion for image generation.

Large Language Models (LLMs):
Large Language Models are a type of foundation model built specifically for language. Because of their scale, LLMs can generate an understanding of the context of language, with impressive versatility and the ability to generate new language content. Open AI’s GPT-3 and Google’s LaMDA are the best-known, however there are numerous laboratories working on similar models around the world.
These models can use businesses to radically improve their communications which are all based on text creation. They underpin many language-based AI applications such as chatbots and translators.

Multimodal Models:
A Multimodal Model can be seen as the next step toward AGI, because it attempts to understand multiple modalities of data, such as images or text, and identify the relationships between them.
Gato, a deep neural network, developed by researchers at the London-based AI company DeepMind is an artificial intelligence system that exhibits multimodality. It has an ability to perform a wide variety of tasks, including playing video games, controlling a robotic arm to stack blocks, engaging in dialogue, compose poetry and more. The MIT Technology Review recently reported that Gato can learn multiple tasks simultaneously, allowing it to switch between tasks without forgetting previously acquired skills, unlike “narrow” AI systems that are limited to specific tasks such as generating text or images. Gato is a transformer model similar to GPT-3.

Training and Learning:
The first step in the building of a foundation model is pre-training, which consists of the ingestion of a large amount of unlabeled data, to produce tokens and parameters, which are mathematical and statistical structures based on bias and probability. The pre-trained data is not stored – only the outcome of the training.
The next step is Fine-Tuning on a smaller amount of labelled data. Fine-tuning can improve the model’s accuracy in solving specific tasks.
Few-Shot Learning is used to train the model with examples. It is an important technique used to quickly develop new AI models to solve specific problems. For example, a business could use few-shot learning to train a chatbot to respond to specific customer inquiries or to develop a recommendation system for an e-commerce platform.
Finally, Reinforcement Learning is used to reward an AI model for taking preferred actions to reach a goal. For businesses, reinforcement learning has many potential applications, including optimising process efficiency that reduce costs that ultimately lead to increased profits. I personally use reinforcement learning to teach Midjourney how to create new repeatable, recognisable synthetic avatars with varying facial expressions, thereby controlling the randomisation of images by the AI’s imagination.

Natural Language Processing (NLP):
This is a branch of AI that deals with the interaction between computers and human languages. NLP allows machines to understand, interpret, and generate human language, enabling applications such as chatbots, voice assistants, and language translation. Businesses can benefit from NLP by using it to analyse customer feedback or social media posts to gain insights into customer sentiment and improve customer experience. NLP can also be used to automate customer support, to reduce response times and increase customer satisfaction.

Computer Vision:
This is a branch of AI that enables machines to interpret and understand visual data from the world around them. Computer vision is used in applications such as Hypersience’s and Laiye’s intelligent document digitisation, such as reading an invoice, accurately recognising each field and preparing the data for ingestion into an ERP system. In other applications, computer vision is used for self-driving cars, facial recognition, and image classification. For example, a business can use vision based systems to automate processes with the digitisation of all documents that are usually captured by hand, or automate an onboarding process for new employees, contractors or students by digitising the content of all required supporting documents. This radically reduces time and resources while improving efficiency.

Prompt:
Prompts are used to generate a response from a generative AI model in a vaguely similar manner that a SQL query extracts data from a database. Prompts can be a few words or a complete sentence, upon which the model will generate a response based on the information contained within the prompt. For example, “create a picture of a cat wearing pyjamas in a dancing hall.” Prompts can be used in a large variety of models for the generation of text, speech, images and even music.

Find us at www.mavco.co.za or reach out to us at muller@mavco.co.za. We are on a mission to bring affordable AI to businesses and knowledge workers.

Leave a Comment