Some Thoughts and Experience with using LLMs in the Research Arena

Author

Scott Anthony Robson PhD

Published

June 10, 2025

Modified

June 10, 2025

Inspired by a discussion on my lab’s Slack, I began jotting down some thoughts on large language models (LLMs). What started as a few informal reflections soon evolved into a longer-form writing exercise. The result is this essay:

1 Reflections on AI, LLMs, and GPTs

Summary: GPTs function as predictive systems with human-level attention to linguistic context, but attention is not equivalent to experience, nor does it confer true understanding. They are well-suited for automating routine tasks but do not foster mastery. We should strive for mastery ourselves.

1.1 What is ChatGPT?

Understanding how systems like ChatGPT operate is critical. These models are trained to predict the next word or punctuation mark in a sequence, using vast amounts of human-authored text. They do this iteratively: each new token prediction is informed by the preceding context within a fixed-length window. Crucially, these models do not “read” or “understand” in the human sense—they process input statistically. For example, given the phrase “the cat is digesting…,” the model is most likely to suggest “a mouse” or “cat food,” based on patterns found in human-written text, and less likely to predict “shoes” or “E. coli,” regardless of factual correctness.

The “T” in GPT stands for “Transformer,” referring to the model’s capacity to apply attention mechanisms—essentially, statistical prioritization of context. These mechanisms allow GPTs to track and reproduce linguistic patterns with a surprising level of coherence. However, while they appear to grasp context, they lack semantic comprehension. A model can be trained on Romeo and Juliet and produce output about tragic love, but this is due to pattern replication, not emotional understanding.

Recent models, particularly those marketed as reasoning systems, employ self-refinement techniques: they generate initial outputs, critique or re-query themselves, and revise responses for coherence. Some include real-time web search capabilities. I personally use OpenAI’s paid offerings for tasks like programming in Python or C/C++. These iterative tools represent an advancement, but they still fall short of capturing the experiential world in which humans reason.

1.2 Where LLMs Are Useful

Programming and data processing are ideal applications. LLMs excel at generating boilerplate code, reformatting data plots, and automating repetitive scripting tasks. For example, I’ve successfully used them to align data streams, create customized visualizations, and reformat large codebases. When the task is well-bounded and verifiable, LLMs significantly increase productivity.

Creative ideation is another area of value. The model can serve as a sounding board—generating short stories, brainstorming concepts, or offering varied perspectives. While this is no substitute for a thoughtful conversation, it can be a helpful starting point. For factual queries—“What’s the capital of Uzbekistan?”—LLMs are on par with search engines. But the model’s ability to generate plausible-sounding explanations doesn’t mean those explanations are valid.

1.3 Where LLMs Fall Short

Advanced coding tasks reveal limitations. While LLMs handle syntax and familiar logic structures well, they lack domain-specific reasoning. I once asked ChatGPT to write code that calculates the angle between a molecular bond vector and a principal axis of rotation. It generated technically plausible code, but I couldn’t verify its correctness without digging into the literature. Ultimately, I had to refine the request iteratively based on my own research. LLMs can assist with implementation, but they don’t replace the intellectual work required for real understanding.

1.4 Where LLMs Are Often Unhelpful

In scientific inquiry, LLMs frequently generate misleading or incorrect information. They may cite plausible but non-existent references, make unfounded assertions, or miss nuances critical to scientific interpretation. This is especially problematic in research environments, where questions often involve ambiguous or novel concepts that fall outside the model’s training distribution.

For instance, a colleague and I recently tried to track down the source of an equation related to anisotropic molecular tumbling. ChatGPT confidently presented an answer, but it was incorrect. Yet in attempting to verify it, we encountered a wide range of literature that broadened our understanding. While that serendipitous outcome had value, it underscores the need for skepticism.

1.5 Final Thoughts

I remember when people feared that Google would undermine deep learning and understanding. In many ways, that concern proved valid—rapid access to superficial answers encourages intellectual shortcuts. Now, tools like ChatGPT amplify this trend. They make it even easier to appear informed without actually being so.

Mastery, however, still demands time and effort. That hasn’t changed. If anything, it’s more important now than ever. While these tools are powerful assistants, they do not confer knowledge or wisdom. Don’t trade curiosity for convenience. Embrace the work of understanding—it remains the surest path to insight and satisfaction.