Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer

De Santis, Enrico; Martino, Alessio; Rizzi, Antonello

doi:10.3390/app16062721

Background: Dense cross-layer connectivity can shorten gradient paths and promote feature reuse, potentially improving optimization under fixed training budgets. Objective: We test whether concatenation-based dense historical connectivity improves decoder-only autoregressive language modeling under controlled comparison protocols. Methods: We compare a standard Transformer decoder and a dense decoder on Penn Treebank and WikiText-2 under two fairness regimes: (i) a same training recipe setting with a fixed baseline and a bounded dense architectural search, and (ii) a same parameter budget setting where the dense model is resized to not exceed the baseline parameter count. Results: Dense connectivity does not consistently reduce test perplexity; on WikiText-2, the baseline remains better in both regimes, while gains on Penn Treebank are small and regime-dependent. Ablations within the dense family show that depth and feed-forward capacity are the most reliable drivers of perplexity improvements. Conclusions: Probes and attention diagnostics do not reveal a clear advantage for dense connectivity in our limited probe set, while Zipf–RQA analysis of long-form generations reveals systematic structural differences between baseline and dense outputs. Specifically, Zipf–RQA is used here as a descriptive structural probe rather than a performance metric.

De Santis, Enrico; Martino, Alessio; Rizzi, Antonello. (2026). Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer. APPLIED SCIENCES, (ISSN: 2076-3417), 16:6, 2721-2721. Doi: 10.3390/app16062721.

Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer

De Santis, Enrico;Martino, Alessio;Rizzi, Antonello

2026

Abstract

Background: Dense cross-layer connectivity can shorten gradient paths and promote feature reuse, potentially improving optimization under fixed training budgets. Objective: We test whether concatenation-based dense historical connectivity improves decoder-only autoregressive language modeling under controlled comparison protocols. Methods: We compare a standard Transformer decoder and a dense decoder on Penn Treebank and WikiText-2 under two fairness regimes: (i) a same training recipe setting with a fixed baseline and a bounded dense architectural search, and (ii) a same parameter budget setting where the dense model is resized to not exceed the baseline parameter count. Results: Dense connectivity does not consistently reduce test perplexity; on WikiText-2, the baseline remains better in both regimes, while gains on Penn Treebank are small and regime-dependent. Ablations within the dense family show that depth and feed-forward capacity are the most reliable drivers of perplexity improvements. Conclusions: Probes and attention diagnostics do not reveal a clear advantage for dense connectivity in our limited probe set, while Zipf–RQA analysis of long-form generations reveals systematic structural differences between baseline and dense outputs. Specifically, Zipf–RQA is used here as a descriptive structural probe rather than a performance metric.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Parole chiave
	
				Transformer; dense connectivity; decoder-only language modeling; perplexity;
causal masking; parameter budget; ablation study; probing tasks; Zipf–RQA
			
	Citazione
	
				De Santis, Enrico; Martino, Alessio; Rizzi, Antonello. (2026). Beyond Perplexity: A Multi-Faceted Analysis of a Novel Densely Connected Transformer. APPLIED SCIENCES, (ISSN: 2076-3417), 16:6, 2721-2721. Doi: 10.3390/app16062721.
			
	Appare nelle tipologie:
	
				01.1 - Articolo su rivista (Article)

File in questo prodotto:

File	Dimensione	Formato
applsci-16-02721.pdf Open Access Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 865.42 kB Formato Adobe PDF Visualizza/Apri	865.42 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/259898

Citazioni

0

0

0

IRIS - Institutional Research Information System