Artificial intelligence (AI) has seen massive advances in recent years, largely driven by new neural network architectures like transformers that power chatbots and large language models (LLMs). But as impressive as today's AI systems may be, they still lack robust reasoning abilities and struggle with complex multimodal inputs like images, figures, and text together.
Choosing the right models, or the right mix of models, is key to driving innovation today, but exciting new startups and research initiatives are already working to create more versatile, polymathic AI architectures that will push these boundaries.
In Hum’s newest whitepaper, The Bright Future of AI, we explored two key trends in play to make AI systems smarter and less narrow: New model architectures designed for scientific multimodal data, and efforts to move beyond reactive responses to deliberate, human-like reasoning.
Tackling Multimodal Scientific Data
Much of the recent progress in AI has come from supervised learning on narrow datasets like news articles and Wikipedia. This has resulted in powerful but specialized systems that falter when confronted with complex, multimodal data like research papers and scientific corpora.
A startup called Polymathic is working to overcome this, developing versatile foundation models that can understand and generate scientific data encompassing text, images, figures, and more.
This requires architecting model architectures specially designed for such data. Rather than treating modalities like text and images separately, Polymathic research aims to unite them in a joint embedding space. Their models are directly trained on target corpora from science and engineering to capture intrinsic relationships between modalities.
This represents a promising evolution beyond today's LLMs which struggle when data includes images, charts, and other non-text inputs. Early indications suggest these polymathic models may develop something closer to actual understanding of scientific phenomena compared to generic LLMs.
The success of Polymathic and companies working on similar models could massively expand the usefulness of AI for scientists, researchers, and other technical professions. Models that grasp concepts, not just words, may enable a new class of AI assistants, tutors, and lab partners. If models can demonstrate comprehension and ability to communicate expertly on technical subjects, they could even help democratize access to scientific knowledge.
New Architectures for Reasoning: System 1 to System 2 Thinking
Today's largest AI models like GPT-3 are reactive systems with effectively fixed compute budgets. This allows them to respond quickly, but it means they don’t deeply search for solutions or contemplate problems. The Nobel prize-winning psychologist Daniel Kahneman characterized this type of snap judgment as "System 1" thinking in his book Thinking Fast and Slow.
Humans function in "System 2" thinking, meaning we devote more energy for deliberate analytical thinking and reasoning. We leverage this for planning, problem-solving, and parsing complex concepts. Most current AI models cannot be given a problem and "think on it" for an arbitrary amount of time.
But some specialized AI systems do feature more robust reasoning in narrow domains. Game-playing algorithms search future move sequences, while robot navigation systems plan paths. These work best in constrained environments, where planning utilizes hardcoded assumptions about the world. General problem solving requires flexible world models that humans intuitively develop through life experience.
OpenAI and others are investigating how to equip AI agents with such inner world models they can leverage for reasoning and planning. This could enable imagining solutions, breaking down high-level goals into tractable steps, and even simple common sense. With sufficiently robust models, systems could be asked to solve problems without predefined domains and search over hypothetical plans.
Architectures that support complex scene and world modeling remain largely theoretical but would vastly expand AI capabilities. Less reactive and more deliberative systems could mean assistants that study requests or tutors that work step-by-step. The shift from System 1 to System 2 is critical for enabling more general intelligence going forward.
Are you ready for the Future of AI?
As AI expands and companies lay the foundations for future reasoning engines, we’ll begin to see models grow more versatile and deliberative, genuinely grasping concepts instead of just recognizing patterns. The next generation of architectures could yield polymathic assistants able to collaborate with humans more as partners rather than just convenient tools.
To explore other trends publishers need to know for an AI future, download a copy of The Bright Future of AI.