- Choosing the right generative AI model—from OpenAI, Google, or Anthropic—can be tricky amid a growing alphabet soup of names and versions. But experts say testing different tools, understanding the complexity of your tasks, knowing how quickly you need results, and how often you’ll need to make changes or improvements are key to using AI effectively—and avoiding being left behind.
If you’re finding it tricky to navigate the ever-changing generative AI landscape, which shifts weekly as vendors compete to top leaderboards, you’re not alone.
An AI training gap has emerged across the business world. According to one recent survey, two-thirds of leaders expect employees to have AI skills, but only a third of companies have clear policy on what technology to use—and how to use it.
While utilizing generative AI chatbots has a relatively low barrier to entry, figuring out the right model remains a challenge for people who have not studied their nuances. Here’s what you need to know about picking the right AI tools, such as ChatGPT, Claude, Gemini, Perplexity, and Copilot—and the underlying models that power them.
Learn more: Mastering AI at work: a practical guide to using ChatGPT, Gemini, Claude, and more
Speed date with AI
Today’s generative AI is still relatively new, and for this reason, there is an abundance of models on the market; no one company or model dominates. As a result, experimenting with a variety of tools is beneficial.
“Think of it like test driving a car,” Jules White, a computer-science professor at Vanderbilt University, told Fortune. “What is it like to drive on the highway, to park, how does the stereo work, how cushy is the seat, etc. I suggest doing the equivalent for models.”
For Maggie Vo, head of user education at Anthropic, that test drive should include consideration of three factors: task complexity, time sensitivity, and the need to refine your work.
Something like writing a strategic plan might call for using the most capable model whereas a faster model might be most useful for reformatting data or quick summaries. If you’re planning to iterate multiple times, then use a combination: “Start with a smarter model to refine your approach, then use a smaller model to execute,” she said.
“The real skill is developing ‘Platform Awareness’—understanding not just different models but different AI systems entirely,” Vo added. “What works in Claude might need adjustment in other systems. Experiment across platforms to build intuition about their unique strengths.”
In practice, this can be as simple as typing the same prompt in different AI chatbots—as well as the models for each one (the option to change may be at top of the screen or close to the search bar)—and compare which tool gives the most helpful response.

As Vo implied, as AI models have been developed, certain tasks are more easily completed with some AI models over others. The faster, typically free models, are usually best for simple chats—whereas the more advanced models may cost a fee and take more than a few moments to process, but will typically yield better results.
The best models, according to UPenn’s Ethan Mollick
Ethan Mollick, a professor at the University of Pennsylvania’s Wharton School—is one of the most widely-read experts on how to use AI models to help with business tasks. He is known for his prolific research and analysis on LinkedIn. According to a recent Substack post, his everyday AI use is focused within the products of OpenAI, Anthropic, and Google.
“With all (three) of the options, you get access to both advanced and fast models, a voice mode, the ability to see images and documents, the ability to execute code, good mobile apps, the ability to create images and video (Claude lacks here, however), and the ability to do Deep Research,” he wrote.
The ultimate challenge comes with deciding which model is best to use, he added. But like White, he connects it to selecting a vehicle: “Think of it like choosing between a sports car and a pickup truck; both are vehicles, but you’d use them for very different tasks.”
Mollick summarized the decision making into three categories: good for chats, good for work, and good for hard problems.
Good for chats
Claude 4 Sonnett, ChatGPT-4o, and Gemini 2.5 Flash are fast—and not as smart, according to Mollick. But as generally free AI products, they are the most widely accessible and can still make a difference in workplace efficiency.
Here are some examples of their capabilities:
- Market and competitive analysis: “Compare OpenAI, Anthropic, and Google Gemini’s positioning in the enterprise AI space.”
- Financial modeling and forecasting: “Create a three-year revenue forecast for a SaaS startup with 20% YoY growth and 75% gross margin.”
- Content creation and editing: “Draft a LinkedIn post announcing a $10M Series A funding round that sounds confident but not boastful.”
Good for work
Claude 4 Opus, ChatGPT o3, and Gemini 2.5 Pro are smarter, have lower error rates, but are slower, according to Mollick. They may still hallucinate, but tend to be more successful at coding, logic, or solving tougher problems that require multiple steps. Here’s an example of their capability:
- Market and competitive intelligence dashboards:

Who better to summarize AI’s priorities than AI itself? Here’s what the models had to say:
Prompt: “In one sentence, please describe your relative strengths. I am writing this in the context of other generative tools, including [Competitor] and [Competitor].”
OpenAI’s GPT-4o: “ChatGPT’s relative strength lies in its ability to generate highly polished, context-aware, and conversationally natural responses—particularly strong in editing, writing, and task-oriented problem-solving—while integrating tools like code execution and image analysis more seamlessly than Claude or Gemini.”
Anthropic’s Claude Sonnet 4: “I excel at nuanced reasoning, ethical decision-making, and producing well-structured, thoughtful responses while maintaining strong safety guardrails, though I’m generally more cautious and less willing to engage with borderline requests compared to some other AI assistants.”
Google’s Gemini 2.5 Flash: “Gemini excels in multimodal understanding and generation, seamlessly integrating text, code, images, and other data formats while leveraging real-time information from Google’s vast ecosystem to provide highly accurate and contextually rich responses.”
Some other examples include the ability to conduct customer-service automation & quality control as well as financial-scenario modeling & risk analysis. Mollick admitted he primarily uses the “Good for work” models, though he also has favorites for specific tasks outside of this list, like using GPT-4.5 when writing.
Good for hard problems
Claude 4 Opus extended thinking, ChatGPT-o3-pro, and Gemini 2.5 Pro are the most advanced AI models on the market, and for that reason, they remain slow and have limited use, Mollick added in his Substack post.
Here’s an example of a prompt that would push the limit of ChatGPT of, but ChatGPT-o3-pro might be best equipped to tackle:
“Below is the full text of a 180-page Phase III oncology trial dossier (including statistical appendices and 40 pages of adverse-event tables).
- Extract every efficacy endpoint, its p-value, confidence interval, and sample size.
- Re-compute the primary endpoint’s hazard ratio from the raw survival-curve data provided in Appendix C, flagging any inconsistencies with the sponsor’s reported figure.
- Summarize all Grade 3–4 adverse events, grouped by organ system, and calculate their absolute-risk increase versus control.
- Draft a 500-word briefing for the FDA that:
- Identifies any statistical or methodological red flags.
- Assesses whether the benefit–risk profile justifies accelerated approval.
- Proposes two post-marketing study designs to validate long-term safety.”
Other models
No matter what model or AI company used, it’s always important to double check the AI for errors or hallucinations. Accuracy is the priority of Perplexity, according to Jesse Dwyer, head of communications at the AI company.
“Perplexity’s sole focus is accurate, trustworthy AI—we use all top models and post-train them for accuracy,” Dwyer said. “Models that have been trained to experience some hallucination are helpful if you want videos of high-diving cats, but they can be dangerous when you are making business or financial decisions.”
Copilot is also a widely used AI chatbot, thanks to its integration with Microsoft products, but it can be difficult to switch between models. DeepSeek r1 and Grok by Elon Musk’s xAI also are options on the market, but, according to Mollick, each have missing features.
Mollick did not respond to Fortune’s request for comment.
The takeaway: Practice makes perfect
While there remains no perfect method for using AI in the workplace, the most effective ways to remain on top of your game is exploration and education—and that starts with simply using them.
“The difference between casual users and power users isn’t prompting skill (that comes with experience); it’s knowing these features exist and using them on real work,” Mollick wrote.
And because many business leaders, including CEOs, have already begun using AI, Dwyer suggested trying to emulate how they are using it in their work.
“AI is one of the first business tools adopted first by managers instead of frontline workers,” he told Fortune. “It makes sense that leaders with experience getting the best work from their teams and software tools will be naturally prepared to work with AI.”

