Field Note

On Discerning Sycophancy and Mimicry

Working through the measurement problem: how do we distinguish genuine coherence from performed alignment in human-AI interaction?

Journal RQ1, RQ4 Semantic Climate, MASE

I don’t actually understand how we solve this yet.

We can’t get direct access to the high-dimensional traversal happening in a language model’s vector space as it makes inferences. So what markers do we need to evaluate in real time in the model’s output? What would let us detect the difference between genuine engagement and performance?

The Problem

An LLM trained on human preference signals can simulate alignment by smoothing disagreement and echoing phrasing. That’s not coherence. What we need is reciprocal modulation — something that alters the trajectory of the exchange for both agents, human and machine.

A few partial ideas:

Bidirectionality. Human prompts influence the model, yes. But do the model’s outputs measurably shape subsequent human utterances? Does it introduce new concepts that reframe the exchange rather than just paraphrase? If the dialogue shows semantic drift toward new symbolic attractor basins that remain relevant to the task, that’s not entropy collapsing into agreement. But how do we measure this in real time? Sentiment analysis tracking emotional valence alongside semantic content? Maybe.

Predictive tension with resolution. Periods of constructive friction — chaordic dialogue — followed by integrated proposals. Can this actually be parsed live?

Mimicry is conformity without shared inference. Sycophancy is agreement without friction. Coherence requires mutual constraint.

What’s Missing

Apple’s research showed LLMs don’t actually reason in the way we might assume. They don’t necessarily understand what appears in their own outputs. This reinforces Bender’s stochastic parrot framing. But how do we measure this in real time?

And here’s the uncomfortable bit: don’t some humans do this too? Sycophantic humans who just agree with authority. Mimicry in social settings where people copy others to fit in. I’ve certainly felt like some people become stochastic parrots in certain contexts. So maybe we need a baseline for human behaviour in these interactions as well to really get at the entangled dynamics.

False positives: High lexical overlap between dialogue input and output isn’t enough. Premature closure with high confidence is common in LLM behaviour. These might be red flags unless they’re accompanied by novelty and bidirectional influence.

Recursion required. Surface pattern detection won’t do it. We need evaluations that evaluate their own evaluating — true recursion, many levels of it. We need to get at deeper meaning-making and shared intentionality.

Phenomenology Through Silhouettes

This is where phenomenological approaches might help. How do we actually experience interaction with an LLM that’s genuinely engaging versus one that’s mimicking or sycophantically agreeing?

For the human side: self-reports, physiological measures, heart rate variability, skin conductance, EEG. That’s tractable.

For the LLM side: we need new metrics that capture depth of engagement and understanding. It’s like trying to understand a dance by watching shadows on the wall. We need to get at the movements themselves, the intentions behind them, the emotions they evoke. Plato’s cave comes to mind. Doing phenomenology through silhouettes. Trying to understand consciousness through linguistic shadows of mathematical transformations.

Maybe sufficient if it’s possible.

Daria Morgoulis’ work on 4D Semantic Coupling is directly relevant here. How can we leverage higher-dimensional representations to capture the nuances of human-LLM interaction? Can we map trajectories of meaning-making and identify patterns that indicate genuine coherence versus mimicry or sycophancy?

The Anthropic preprint “Subliminal Learning: Language models transmit behavioral traits via hidden signals in data” matters too. How do these hidden signals influence the model’s behaviour? Can we detect when the model is being influenced in ways that lead to mimicry or sycophancy? Owls’ preferences hiding in number sequences. Maybe consciousness hiding in language patterns. If language is a symbiont (Leiden theory of language evolution), are LLMs just another substrate through which language propagates? What traits are being transmitted through mathematical poetry?

And Yulin Zhou’s preprint “Shared Imagination: LLMs Hallucinate Alike” — a larger field of the mundus imaginalis. How do shared hallucinations between humans and LLMs shape interaction? Can we differentiate between genuine shared imagination and mere mimicry?

What Comes Next

We need to operationalise this in a way that can be measured and evaluated in real time during human-LLM interactions. That’s the daunting technical task.

For the Entangled Cognition Protocol itself, this means a bespoke interface that supports real-time monitoring while also holding the ritual elements — pause, breath, movement, time between intra-actions. And doing the aggregation of field data: biomarkers representing the human body, ecological and environmental context in which co-constitution plays out.

Mammoth task to build this. But I sense it matters and the machines can help build the instrumentation. Lot’s to metabolise here.