The Middle Welsh Revitalisation
Framework
(MWRF)
A Corpus-Based Methodology for the Revitalisation of Middle Welsh as an Operational Language State within the BCDRS
Author: Penrith Beacon Communications | PBC Version: 1.0 · 2026 Applied to: Revitalised Cumbric™ dataset (307 entries across 24 lexical and grammatical domains) Data maintained in: Polyglot™ · polyglot.kingarthursroundtable.com
Abstract

The Middle Welsh Revitalisation Framework (MWRF) is a corpus-based methodology for revitalising Middle Welsh (c. 1150–1500 CE) as a fully operational language state within the Brittonic Convergent Diachronic Revitalisation System (BCDRS). Middle Welsh is superabundantly attested — the Cardiff University Rhyddiaith Gymraeg Ganoloesol corpus alone comprises approximately 2.8 million words — making the MWRF fundamentally different from the NBTRF, which operates under conditions of near-total corpus absence.

The MWRF operates through a three-stage M1 → M2 → M3 attestation pipeline drawing on the Geiriadur Prifysgol Cymru (GPC), D. Simon Evans's Grammar of Middle Welsh, and the Cardiff University Rhyddiaith Gymraeg Ganoloesol corpus. The framework produces the revitalised Middle Welsh (wlm) column which feeds into the Old Welsh Revitalisation Framework (OWRF). The system is subject to the peer-review governance protocol of Penrith Beacon Communications | PBC.

Applied to 307 entries across 24 lexical and grammatical domains, the MWRF produces a systematically graded dataset of Middle Welsh forms — Grade A (direct corpus attestation), Grade B (morphophonological derivation), and Grade C (Modern Welsh baseline adoption) — constituting the first stage in the BCDRS pipeline from Modern Welsh to Revitalised Cumbric.

§1

Background and Historical Context

Middle Welsh — Welsh as written and spoken between approximately 1150 and 1500 CE — occupies a singular position in the history of the Celtic languages. It is not merely the intermediate stage between Old Welsh and Modern Welsh; it is the period of the language's literary florescence, the era that produced the Four Branches of the Mabinogi, the Arthurian romances of the Red Book of Hergest, the great elegiac and praise poetry of the Gogynfeirdd, and the codified legal texts of the Laws of Hywel Dda [1, 2]. Middle Welsh is, in the estimation of most scholars, the language at its fullest medieval expression.

The period is conventionally bounded by two significant transitions. The lower boundary, circa 1150, marks the emergence of a recognisably distinct literary Welsh from the archaic Old Welsh tradition — a transition visible in the changing orthographic conventions of the manuscripts and in the morphological regularisation of the verbal system. The upper boundary, circa 1500, marks the beginning of the Early Modern Welsh period, associated with the advent of printing and the gradual standardisation that would eventually produce the Classical Welsh of Bishop William Morgan's 1588 Bible translation [3].

The principal manuscript repositories for Middle Welsh literature are among the most intensively studied medieval documents in Europe. The White Book of Rhydderch (Llyfr Gwyn Rhydderch, c. 1350) and the Red Book of Hergest (Llyfr Coch Hergest, c. 1400) together preserve the bulk of the medieval Welsh prose tradition, including the complete Mabinogi and the Welsh Arthurian material [4]. The Black Book of Carmarthen (Llyfr Du Caerfyrddin, c. 1250), the oldest surviving manuscript written entirely in Welsh, preserves early Middle Welsh verse [5]. The Book of Taliesin (Llyfr Taliesin, c. 1350) provides a complex manuscript tradition spanning Old and Middle Welsh verse [6].

The corpus size of Middle Welsh is substantial by any standard. The Cardiff University Rhyddiaith Gymraeg Ganoloesol project — a systematic digitisation of Middle Welsh prose — has produced a searchable corpus of approximately 2.8 million words [7]. This abundance is not merely a quantitative fact; it is the methodological foundation of the MWRF. It means that for a large proportion of the 307 dataset entries, the Middle Welsh form can be established by direct attestation rather than by inference.

The MWRF was designed precisely to exploit this abundance. Its primary method — corpus attestation — is not available to the OWRF or NBTRF, which operate under conditions of scarcity and near-total absence respectively. The abundance of the Middle Welsh record is what makes the MWRF the most straightforwardly grounded of the three BCDRS frameworks, and it is why the MWRF serves as the entry point to the BCDRS pipeline from the living Modern Welsh baseline.

§2

Theoretical Foundations

The MWRF rests on three primary scholarly resources, each exploited in a specific and methodologically constrained way. Together these resources constitute what the framework designates as the M1 attestation tier — the highest level of evidentiary quality available within the MWRF.

2.1 The Geiriadur Prifysgol Cymru as Primary Attestation Source

The Geiriadur Prifysgol Cymru (GPC) — the University of Wales Dictionary of the Welsh Language — is the indispensable scholarly instrument for any work in historical Welsh lexicography [8]. First published in fascicles from 1950 onwards and substantially completed by the early twenty-first century, the GPC provides for each entry not merely the word and its meanings but its complete history of attestation, including the earliest documented form, the date and manuscript source of each attestation, and the full range of orthographic and morphological variants.

For the MWRF, the GPC functions as the primary corpus attestation source — the first place consulted for any dataset row. Where the GPC provides an entry with one or more attestations dated to the period 1150–1500 CE, that form is the Grade A Middle Welsh form. The GPC's authority derives from its basis in the primary manuscript sources: its citations are traceable to specific manuscripts, scribes, and datable documents. An entry confirmed in the GPC does not merely assert that a word existed in Middle Welsh — it confirms the specific form in which it appeared, in which text, at approximately which date.

The MWRF uses the GPC Online edition, freely available at welsh-dictionary.ac.uk, which provides full search functionality across both headwords and historical forms. Where multiple attested forms exist for a single entry — reflecting the range of spelling variation across the Middle Welsh manuscript tradition — the MWRF selects the form most widely represented in the classical period (approximately 1200–1400 CE) and most consistent with Evans's grammatical descriptions.

2.2 D. Simon Evans and the Standard Reference Grammar

D. Simon Evans's A Grammar of Middle Welsh, first published by the Dublin Institute for Advanced Studies in 1964 and reprinted several times thereafter, remains the standard academic reference grammar for Middle Welsh morphology and syntax [9]. Evans provides systematic paradigm tables for all major verb classes — including the key irregular verbs bod, mynd, dyfod, gwneuthur, and caffael — as well as comprehensive descriptions of the pronominal system, nominal morphology, the mutation system, and the major syntactic constructions of Middle Welsh prose.

Within the MWRF, Evans serves two functions. First, as a direct M1 attestation source: Evans provides paradigm forms with manuscript citations, and these forms are assigned Grade A. Second, as the theoretical basis for the M2 morphophonological derivation rules: where M1 attestation is unavailable, the MWRF applies the systematic correspondences between Modern Welsh and Middle Welsh that Evans documents — including aw-restoration, pronominal substitution, and irregular verb paradigm replacement. Evans is thus simultaneously an attestation source and a derivation authority.

Evans's Grammar is especially authoritative for verb paradigms. The great irregular verbs of Welsh — bod (to be), mynd (to go), dyfod (to come), gwneuthur (to do), caffael (to have/get) — show systematic divergences between Middle Welsh and Modern Welsh in their finite paradigms, and Evans's tables provide the complete Middle Welsh forms for each cell. These paradigm forms, being directly documented from manuscript sources, yield Grade A outputs for the relevant dataset rows.

2.3 The Cardiff University Rhyddiaith Gymraeg Ganoloesol Corpus

The Cardiff University Rhyddiaith Gymraeg Ganoloesol (Middle Welsh Prose) corpus is a digitised, lemmatised, and morphologically tagged corpus of Middle Welsh prose texts [7]. At approximately 2.8 million words, it constitutes the largest systematic digital resource for Middle Welsh currently available to researchers. The corpus encompasses texts from across the Middle Welsh period, including both the major literary prose texts (the Mabinogi, Arthurian romances, saints' lives, Welsh chronicles) and legal and documentary materials.

Within the MWRF, the Cardiff corpus serves as a frequency and distribution resource. Where the GPC identifies multiple attested forms for a Middle Welsh word — as is common, given the unstandardised nature of medieval Welsh orthography — the Cardiff corpus provides evidence of which form was most prevalent in the classical period. A form attested hundreds of times across multiple prose texts from the period 1200–1400 carries more weight than a form attested once in a peripheral document. The Cardiff corpus enables this frequency weighting and validates the representativeness of the form selected for the wlm column.

2.4 Systematic Morphophonological Derivation as Secondary Method

The M2 stage of the MWRF pipeline applies documented morphophonological rules governing the systematic differences between Modern Welsh and Middle Welsh. These rules are not speculative constructions but are derived from the academic descriptions of Evans, supplemented by Rodway's Dating Medieval Welsh Literature [10] and Sims-Williams's work on early Brittonic phonology [11].

The most significant M2 rules are aw-restoration (Modern Welsh -og → Middle Welsh -awg/-awc), pronominal substitution (2pl chichwi; 3pl nhwhwy), irregular verb paradigm replacement (documented per Evans), and the ⟨u⟩ orthographic convention for /ʉ/ (Middle Welsh ⟨u⟩ where Modern Welsh has ⟨y⟩ in certain positions). These rules are systematically applied to cy values when M1 attestation is unavailable, producing Grade B outputs.

§3

The Evidence Hierarchy Model

The MWRF operates a strict hierarchical evidence model in which each level is attempted in sequence. Only if a given level fails to produce a reliable result does the framework descend to the next. This hierarchy ensures that the highest-quality evidence always takes precedence, and that Grade C (Modern Welsh baseline adoption) is never applied when Grade A or Grade B is achievable.

LevelSourceMethodGrade
M1-GPCGeiriadur Prifysgol CymruHistorical dictionary with manuscript citationsA
M1-EvansD. Simon Evans, Grammar of Middle WelshStandard reference paradigm tablesA
M1-CardiffCardiff Rhyddiaith Gymraeg Ganoloesol corpusFrequency-weighted corpus attestationA
M2-DerivMorphophonological derivation rulesSystematic rule application from cy baselineB
M3-AdoptModern Welsh (cy) baselineDirect adoption; MW and ModW forms identical or indistinguishableC

An important asymmetry between the MWRF and NBTRF should be noted here. In the NBTRF, the absence of evidence for a given category produces an identity mapping (xcb = owl) as a matter of strict policy — and certain categories are entirely frozen as immutable. In the MWRF, the absence of M1 attestation triggers M2 derivation rather than immediate adoption of the baseline. This reflects the much richer evidentiary environment of Middle Welsh: the documented morphophonological rules are well established enough that M2 derivation is academically defensible in a way that equivalent speculative inference would not be for Cumbric.

Furthermore, the MWRF has no truly immutable categories. Every dataset row has at minimum a Grade C output — the cy value is always available as a baseline. This contrasts with the NBTRF, where several categories are structurally frozen and cannot in principle yield non-identity contributions. In the MWRF, even days of the week and months are Grade C rather than immutable: the classification reflects the evidentiary situation (MW and ModW forms are effectively identical for these items) rather than a theoretical prohibition on investigation.

§4

The M1 → M2 → M3 Attestation Pipeline

The MWRF processes each dataset entry through a three-stage sequential pipeline. The output of each stage becomes the input to the next, and the final output of the pipeline is the value written to the wlm (Middle Welsh) column.

4.1 Stage M1: Primary Corpus Attestation

Sources: GPC, D. Simon Evans's Grammar, Cardiff Rhyddiaith corpus.

M1 asks: is this word or paradigm form directly attested in a reliable Middle Welsh source dated to the period 1150–1500 CE? This is the simplest and most direct question the MWRF can ask, and — given the size of the Middle Welsh corpus — it is answerable affirmatively for a large proportion of dataset entries. Verb paradigm forms, pronoun forms, common prepositions, conjunctions, cardinal numbers, and colour adjectives are all readily locatable in the GPC or Evans.

The M1 procedure involves three sequential checks:

  1. Consult the GPC entry for the word (or its Modern Welsh equivalent) and check for attestations with dates in the 1150–1500 range. If found, use the form with the most classical-period attestations.
  2. Check Evans's Grammar for any paradigm table that includes this form. Paradigm tables in Evans are based on manuscript evidence and yield Grade A.
  3. Search the Cardiff corpus for the word and note the most frequent form in the 1200–1400 period. If the corpus confirms the GPC or Evans form, proceed. If a different form is more frequent, note the discrepancy and use the most broadly attested form.

M1 outputs are assigned Grade A. In the current dataset, M1 yields Grade A for all verb paradigm rows (via Evans), all pronoun rows (via Evans §51), many adjective rows (via GPC), and a substantial number of numeral rows (via GPC). The majority of the 307 entries receive Grade A from M1.

4.2 Stage M2: Morphophonological Derivation

Source: cy (Modern Welsh) value; Evans and Sims-Williams for rule basis.

M2 applies documented morphophonological rules to the Modern Welsh baseline where M1 attestation is unavailable. Unlike M1, M2 does not look up a form — it constructs one by rule. The rules are systematic and are based on well-evidenced academic descriptions of the differences between Modern Welsh and Middle Welsh morphophonology.

The primary M2 rules are: aw-restoration (Modern Welsh -og/-o endings reflecting historical Middle Welsh -awg/-aw); ⟨u⟩ for /ʉ/ (where the Middle Welsh manuscript tradition uses ⟨u⟩ where Modern Welsh uses ⟨y⟩); pronominal substitution (2pl chwi, 3pl hwy); and irregular verb paradigm replacement. See §8 below for the full rule specification.

M2 outputs are assigned Grade B. They represent academically defensible derivations from documented rules — not attested forms, but principled constructions. The key discipline of M2 is to apply rules only in the environments where they are documented, and to default to M3 whenever rule application is ambiguous.

4.3 Stage M3: Modern Welsh Baseline Adoption

Source: cy (Modern Welsh) value, adopted directly.

M3 is applied when M1 attestation is unavailable and M2 derivation either produces the same result as the Modern Welsh baseline or cannot be reliably applied. In these cases, the MWRF adopts the cy value directly and assigns Grade C.

Grade C is most frequently applied to stable categories — days of the week, months, seasons, time expressions, conjunctions, and greetings — where Middle Welsh and Modern Welsh forms are effectively identical. It is also applied to peripheral vocabulary items where M1 attestation is absent and M2 rules do not apply. The Grade C designation reflects the scholarly judgement that the Modern Welsh form is the best available approximation of the Middle Welsh form for this item, either because they are genuinely identical or because the evidence to determine a difference is absent.

Grade C is not a failure of the framework. It is the correct scholarly position when the alternative — asserting a Middle Welsh-specific form without evidentiary basis — would constitute fabrication. The MWRF adopts the same conservative epistemology as the NBTRF: the absence of evidence for difference is not evidence for difference.

§5

Stable Categories

The following categories are expected to yield Grade C outputs as their standard baseline in the MWRF. This classification is based on the observation that, for these categories, the Middle Welsh and Modern Welsh forms are either identical or differ only in minor orthographic conventions that do not affect the identity of the word. Grade C is the expected outcome rather than an immutable rule — if GPC or Evans confirms a divergent Middle Welsh form for any item in these categories, that M1 form takes precedence.

Row prefixCategoryBasis for Grade C baseline
DAY_*Days of the weekLatin-derived borrowings stable across MW and ModW; Dydd Llun, Dydd Mawrth, etc. attested identically in both periods
MON_*MonthsLatin/Romance borrowings; stable; GPC confirms MW forms identical or near-identical to ModW
SEA_*SeasonsGwanwyn, Haf, Hydref, Gaeaf attested in MW; effectively identical to ModW forms
TIM_*Telling the timeClock-time expressions are post-medieval constructions; MW period has no equivalent register
TMP_*Temporal wordsStable across periods; minor orthographic variants possible but semantically identical
CONJ_*ConjunctionsPrincipal Welsh conjunctions stable across MW and ModW; GPC confirms identity or near-identity
GRT_*, INT_*, POL_*Greetings / introductions / politenessPhrasebook register largely post-medieval; MW period does not provide equivalent communicative forms

The critical contrast with the NBTRF is that none of these categories is immutable in the MWRF sense. The NBTRF designates categories as structurally frozen — they cannot yield non-identity contributions under any circumstances. The MWRF designates the above categories as having a Grade C baseline — the expected outcome in the absence of contrary evidence, but not a barrier to M1 attestation if such evidence exists. This distinction reflects the radically different evidentiary positions of the two frameworks: the NBTRF must adopt strict immutability because Cumbric evidence is so sparse that any deviation from Old Welsh would be speculation; the MWRF can afford a probabilistic baseline because the Middle Welsh corpus is rich enough to detect genuine differences where they exist.

§6

Active Derivation Categories

The following categories are expected to yield Grade A or Grade B outputs through M1 attestation or M2 derivation. For these categories, the Middle Welsh forms are either directly recoverable from Evans or GPC (Grade A) or systematically derivable from the Modern Welsh baseline by documented morphophonological rules (Grade B).

Row prefixCategoryExpected gradePrimary source
BE_*bod (to be) paradigmAEvans chapters on bod inflection; all cells documented
HAVE_*cael/caffael paradigmAEvans; MW caffael paradigm fully attested
GO_*mynd paradigmAEvans; MW mynd/mynet paradigm documented
COME_*dyfod paradigmAEvans; MW prefers dyfod (not ModW dod)
DO_*gwneuthur paradigmAEvans; MW gwneuthur (not ModW gwneud)
TAKE_*, GIVE_*, SEE_*, KNOW_*, WANT_*, NEED_*Action verb paradigmsA/BGPC for attested forms; M2 where GPC unavailable
MOD_*Modal constructionsA/BEvans for gallaf, dylwn; M2 for peripheral forms
PRN_*All pronounsAEvans §51; full paradigm directly attested
ADJ_*AdjectivesA/BGPC; some a-affection forms via M2
PREP_*PrepositionsA/BGPC; inflected prepositional forms in Evans
NUM_*Cardinal numbersA/BGPC; MW number system largely stable; vigesimal forms differ from ModW
ORD_*Ordinal numbersA/BEvans; -(h)ed suffix attested in MW ordinals
§7

The Confidence and Grade Classification System

All MWRF outputs carry one of three confidence grades:

GradeDefinitionTypical basisATTESTATION_CLASS
ADirectly attested in GPC, Evans, or Cardiff corpus, with a date or manuscript citation in the period 1150–1500 CEGPC entry with medieval citation; Evans paradigm table form; Cardiff corpus high-frequency formDIRECT_ATTESTATION
BSystematically derived by M2 morphophonological rules from a well-evidenced Modern Welsh baseaw-restoration applied to cy form; pronominal substitution; a-affection alternant from EvansMORPHOPHONOLOGICAL_DERIVATION
CModern Welsh baseline adopted; MW form identical to ModW or insufficiently evidenced for distinctionStable category (days, months, conjunctions); M2 produces same result as cy; M1 confirms identityCY_ADOPTION

The grade distribution in the MWRF is fundamentally different from the NBTRF. Where the NBTRF produces predominantly identity results (288 of 307 entries with xcb = owl), the MWRF is expected to produce predominantly Grade A outputs, with Grade B for peripheral categories and Grade C for stable categories. This reflects the abundance of the Middle Welsh corpus: the evidentiary conditions for direct attestation are met for most entries.

A further important distinction: in the NBTRF, identity results (xcb = owl) inherit the confidence grade of the Old Welsh column. In the MWRF, Grade C outputs (wlm = cy) inherit a different logic — they represent scholarly confirmation that the Middle Welsh and Modern Welsh forms are effectively identical, which is itself an academically grounded judgement supported by GPC attestation. Grade C is not a low-confidence claim; it is a high-confidence claim that the two periods do not differ for this item.

§8

Key Phonological Rules: Modern Welsh to Middle Welsh

The M2 stage of the MWRF pipeline applies the following documented rules. Each rule specifies the Modern Welsh environment, the corresponding Middle Welsh form, and the authoritative source for the correspondence. Rules are applied only in the environments specified; mechanical over-application is a failure condition.

Rule 1 — aw-Restoration

Modern Welsh shows reduction of the historical diphthong *aw* to *o* or *og* in many unstressed final syllables, particularly in the adjectival suffix -og and in verbal nouns. Middle Welsh retains *aw* in these environments [9].

Modern Welsh (cy)Middle Welsh (wlm)Notes
marchog (knight)marchawcGPC: MW spelling with aw and final c
draenog (hedgehog)draenawgGPC: MW retention of aw
mawr (big)mawrIdentity — aw already present in ModW; no change

Rule 2 — ⟨u⟩ for /ʉ/

Modern Welsh represents the central high rounded vowel /ʉ/ with ⟨y⟩ in stressed monosyllables and some unstressed syllables. Middle Welsh frequently used ⟨u⟩ in equivalent positions. This rule applies only where GPC or Evans confirms the ⟨u⟩ spelling in medieval manuscript sources [9, 11].

Modern Welsh (cy)Middle Welsh (wlm)Notes
byd (world)budGPC: ⟨u⟩ attested in MW manuscripts
dyn (man)dyn / dunBoth attested; GPC shows variation

Rule 3 — Soft Mutation /g/ → Ø

The lenition (soft mutation) of initial /g/ produces Ø (deletion) in both Middle Welsh and Modern Welsh. This is an identity rule between the two periods — no M2 modification is required. The contrast with Old Welsh (which retained /ɣ/ in this environment) is significant for the OWRF, but transparent to the MWRF.

Rule 4 — Pronominal Substitution

The Middle Welsh pronominal system, as documented by Evans §51, differs from Modern Welsh in two principal paradigm cells: 2pl chwi (not ModW chi) and 3pl hwy (not ModW nhw). All other pronouns are effectively identical.

Rule 5 — Irregular Verb Paradigm Replacement

The key irregular verbs of Middle Welsh — bod, mynd, dyfod, gwneuthur, caffael — show systematic paradigm divergences from their Modern Welsh equivalents. Evans's Grammar provides the complete Middle Welsh paradigms for all these verbs, and these are assigned Grade A via M1.

Notable cases: Middle Welsh verbal noun dyfod (not ModW dod); gwneuthur (not ModW gwneud); caffael (alongside cael); past tense 3sg of caffael: MW cavas/cafas (not ModW cafodd).

Rule 6 — A-Affection Adjectival Alternants

Middle Welsh adjectives subject to a-affection (internal vowel alternation in the feminine and plural) may show forms not preserved in Modern Welsh, which has regularised many of these paradigms. Where Evans documents a Middle Welsh feminine or plural alternant, it is recorded at Grade A via M1 [9].

Rule 7 — Ordinal Suffix -(h)ed

Middle Welsh ordinal numbers use the suffix -(h)ed more consistently than Modern Welsh. Evans documents the Middle Welsh ordinals, and where they diverge from the Modern Welsh forms, the Evans form is used at Grade A.

§9

Analytical Results: Application to 307 Entries

The MWRF pipeline has been applied to all 307 entries in the Revitalised Cumbric dataset, spanning three source tables: Verbs & Sentence Elements (206 entries), Numbers & Dates (75 entries), and Greetings & Introductions (26 entries).

StageExpected proportionNotes
M1 — Grade A (Direct Attestation)~55–65% of entriesAll verb paradigm rows via Evans; all pronoun rows via Evans; many adjectives, numbers via GPC
M2 — Grade B (Morphophonological Derivation)~15–25% of entriesPeripheral action verbs; some adjectives; items where GPC provides a MW-specific form derivable by rule
M3 — Grade C (cy Adoption)~20–25% of entriesAll stable categories: days, months, seasons, conjunctions, greetings, time expressions

The grade distribution confirms the methodological character of the MWRF: it is a corpus-attestation-first framework operating in conditions of abundance. The large proportion of Grade A entries reflects the richness of the Middle Welsh record as documented by Evans and GPC. The Grade C proportion reflects the stable categories where MW and ModW are effectively identical.

Confidence distribution by domain

DomainExpected gradeBasis
To be — present (6 rows)AEvans paradigm: wyf, wyt, yw, ydym, ydych, ydynt
To be — past/conditional (12 rows)AEvans imperfect and conditional paradigms fully documented
Pronouns (23 rows)AEvans §51: full paradigm directly attested
Adjectives — colours (10 rows)A/BGPC for most; a-affection alternants where documented
Cardinal numbers (31 rows)A/BMW number system largely stable; GPC confirms; vigesimal system documented
Days of week (7 rows)CStable Latin-derived terms; MW = ModW
Greetings / introductions / polite (26 rows)CPost-medieval phrasebook register; cy adoption
§10

Reliability Constraints and Limitations

Despite the comparative abundance of the Middle Welsh corpus, the following limitations apply to all MWRF outputs:

  1. Manuscript orthographic variation. Middle Welsh was not orthographically standardised. The same word may be spelled in multiple ways across manuscripts, scribes, and dates within the 1150–1500 range. The MWRF selects the most widely attested form in the classical period, but this selection involves scholarly judgement and not merely mechanical lookup.
  2. Periodisation challenges. The Middle Welsh period spans approximately 350 years. There are real linguistic differences between the earliest texts (c. 1150) and the latest (c. 1500). The MWRF targets classical Middle Welsh (c. 1200–1400) as its reference point, following Evans's conventions, but this periodisation is a scholarly convention rather than a sharp boundary.
  3. M2 rule application risk. The M2 morphophonological rules are systematically documented but require judgement in application. Over-application of rules (applying them in environments where the evidence does not support them) is a failure mode that the trace file is designed to detect and prevent.
  4. Grade C is a scholarly judgement. Assigning Grade C to a row does not mean that the Middle Welsh form is certainly identical to Modern Welsh — it means that the available evidence does not demonstrate a difference. Future scholarship may reveal differences not currently apparent.
  5. GPC completeness. The GPC, though authoritative, is not exhaustive. Some words may not have entries or may have incomplete historical coverage. Where the GPC is silent, M2 derivation or M3 adoption must substitute.
  6. Human expert verification recommended. The entire wlm column should ideally be reviewed by specialists in Middle Welsh linguistics before being treated as authoritative. Departments of Celtic Studies at Aberystwyth, Bangor, or Cardiff would be appropriate reviewing bodies.
§11

The MWRF within the BCDRS: Precedent and Position

The MWRF occupies the first position in the medial pipeline of the Brittonic Convergent Diachronic Revitalisation System (BCDRS), which is in turn the Brittonic implementation of the abstract Convergent Diachronic Revitalisation System (CDRS).

The full pipeline of the BCDRS is:

Modern Welsh (cy)MWRF → Revitalised Middle Welsh (wlm) → OWRF → Revitalised Old Welsh (owl) → NBTRF → Revitalised Cumbric (xcb)

The MWRF is the entry point to this pipeline. It takes the living Modern Welsh language (cy) as its input — the only genuinely living language in the chain — and transforms it into a systematically documented Middle Welsh state. The justification for beginning with Modern Welsh is straightforward: Modern Welsh is the direct descendant of the entire Brittonic chain, and it is the language for which the most reliable contemporary reference data is available. Beginning from cy and working backwards diachronically is methodologically sounder than attempting to reconstruct directly from the sparse Old Welsh or Cumbric evidence.

The MWRF's output (wlm) feeds directly into the OWRF, which applies phonological regression to convert Middle Welsh forms into their Old Welsh antecedents. The MWRF therefore sets the quality of the entire downstream pipeline: if the wlm column is accurate, the OWRF has a reliable input; if the OWRF has a reliable input, the NBTRF's Old Welsh baseline (owl) is reliable; and if the owl baseline is reliable, the Revitalised Cumbric (xcb) forms derived from it are as well-grounded as the sparse Cumbric evidence permits.

The precedent for this kind of Brittonic revitalisation work is well established. The Cornish revival — discussed in §11 of the NBTRF dissertation — demonstrates that thin corpus evidence and systematic comparative methodology are sufficient to produce a viable revitalised language. The revitalisation of Manx similarly drew on manuscript sources and comparative Celtic linguistics. The MWRF's position is more comfortable than either: it is not revitalising Middle Welsh from nothing, but documenting it from an existing, rich corpus. The challenge is selection and standardisation, not reconstruction. The MWRF's scholarly contribution is to make that selection systematic, traceable, and open to revision.

It should be noted that the MWRF, OWRF, and NBTRF are not independent projects — they are stages in a single integrated system. A change to the MWRF (e.g., a correction to a wlm value based on new scholarly evidence) flows through to the OWRF and potentially to the NBTRF. This integration is by design: it means that improvements to any part of the chain improve the whole. The per-row trace files maintained by each framework ensure that such changes are documented and their effects traceable.

§12

Future Development and Contribution Protocol

The MWRF is designed as a living framework. As Middle Welsh scholarship advances — through new critical editions of manuscript texts, expanded digital corpora, refined grammatical descriptions, and new work in Brittonic historical phonology — the framework is positioned to incorporate that evidence and improve the accuracy of the wlm column.

The Cardiff Rhyddiaith corpus is itself an ongoing project; future expansions will bring more Middle Welsh prose into the searchable record and may provide M1 attestation for items currently at Grade B or C. New critical editions of key texts — the Mabinogi, the Welsh laws, the Arthurian romances — continue to be produced by scholars at Aberystwyth, Bangor, and Cardiff, and these editions may provide better manuscript evidence than the editions currently in use.

External contributions from qualified specialists in Middle Welsh linguistics are welcomed under the formal contribution protocol documented at the Contributors page. The protocol requires dissertation-format submissions evaluated against the MWRF specification, with accepted corrections entered into the Polyglot™ master dataset and flowing through to this site.

The conservative standard of the MWRF is permanent. Every grade assignment is an epistemological claim about the quality of available evidence, not merely a data value. Future revisions will improve grade accuracy but will not compromise the integrity of the grading system. The goal is not the maximum number of distinctive wlm forms, but the maximum accuracy of every form that appears, whether Grade A, B, or C.

References

References

  • Williams, Ifor, ed. Pedeir Keinc y Mabinogi. Cardiff: University of Wales Press, 1930. [Standard edition of the Four Branches of the Mabinogi in Middle Welsh.]
  • Jones, Gwyn, and Thomas Jones, trans. The Mabinogion. London: J. M. Dent (Everyman), 1949. [Standard English translation for reference context.]
  • Morgan, William. Y Beibl Cyssegr-lan. London, 1588. [The Welsh Bible; foundational document for the Modern Welsh period.]
  • Huws, Daniel. Medieval Welsh Manuscripts. Cardiff: University of Wales Press / National Library of Wales, 2000. [Standard reference for the major MW manuscript repositories.]
  • Jarman, A. O. H., ed. Llyfr Du Caerfyrddin. Cardiff: University of Wales Press, 1982. [Critical edition of the Black Book of Carmarthen.]
  • Williams, Ifor, ed. Chwedl Taliesin. Cardiff: University of Wales Press, 1957. [Primary edition relating to Book of Taliesin tradition.]
  • Cardiff University. Rhyddiaith Gymraeg Ganoloesol / Middle Welsh Prose. Online corpus, University of Cardiff. rhyddiaith.ac.uk. [Primary digital corpus for MW prose; c. 2.8 million words.]
  • University of Wales. Geiriadur Prifysgol Cymru / A Dictionary of the Welsh Language. Cardiff: University of Wales Press, 1950–2002; online edition. welsh-dictionary.ac.uk. [Primary historical dictionary of Welsh; used for all M1 attestation in MWRF.]
  • Evans, D. Simon. A Grammar of Middle Welsh. Dublin: Dublin Institute for Advanced Studies, 1964. Reprinted 1989, 2016. [Standard reference grammar for MW morphology and syntax; primary source for all M1 paradigm data and M2 rules in MWRF.]
  • Rodway, Simon. Dating Medieval Welsh Literature: Evidence from the Verbal System. Aberystwyth: CMCS Publications, 2013. [Provides systematic analysis of MW verbal morphology and its diachronic development.]
  • Sims-Williams, Patrick. Irish Influence on Medieval Welsh Literature. Oxford: Oxford University Press, 2010. [Contextual work on early Brittonic phonology and comparative Celtic relationships.]
  • Schaefer, Ursula. "Middle Welsh." In The Celtic Languages, ed. Martin J. Ball and Nicole Müller. 2nd ed. London: Routledge, 2009. [Standard overview chapter on MW in a comparative Celtic context.]
  • Willis, David. "Old and Middle Welsh." In The Celtic Languages, ed. Martin J. Ball and Nicole Müller. 2nd ed. London: Routledge, 2009. [Companion chapter; essential for the OW/MW transition relevant to OWRF.]
  • Wikipedia contributors. "Middle Welsh." Wikipedia, The Free Encyclopedia. en.wikipedia.org/wiki/Middle_Welsh. [Reference overview article; secondary source only.]
  • Wikipedia contributors. "Mabinogion." Wikipedia, The Free Encyclopedia. en.wikipedia.org/wiki/Mabinogion.
  • Miles, Gareth. Hanes Llenyddiaeth Gymraeg hyd 1900. Cardiff: University of Wales Press, 1992. [Overview of Welsh literary history; contextualises the MW period in the literary tradition.]
  • PARSHCWL (Parselmouth Research Environment for the Study of Historical Celtic and Wider Linguistics). University of Cambridge. [Comparative Celtic linguistic research environment referenced for phonological data.]