The Middle Welsh Revitalisation Framework (MWRF) is a corpus-based methodology for revitalising Middle Welsh (c. 1150–1500 CE) as a fully operational language state within the Brittonic Convergent Diachronic Revitalisation System (BCDRS). Middle Welsh is superabundantly attested — the Cardiff University Rhyddiaith Gymraeg Ganoloesol corpus alone comprises approximately 2.8 million words — making the MWRF fundamentally different from the NBTRF, which operates under conditions of near-total corpus absence.
The MWRF operates through a three-stage M1 → M2 → M3 attestation pipeline
drawing on the Geiriadur Prifysgol Cymru (GPC), D. Simon Evans's
Grammar of Middle Welsh, and the Cardiff University Rhyddiaith Gymraeg Ganoloesol
corpus. The framework produces the revitalised Middle Welsh (wlm) column which
feeds into the Old Welsh Revitalisation Framework (OWRF). The system
is subject to the peer-review governance protocol of Penrith Beacon Communications | PBC.
Applied to 307 entries across 24 lexical and grammatical domains, the MWRF produces a systematically graded dataset of Middle Welsh forms — Grade A (direct corpus attestation), Grade B (morphophonological derivation), and Grade C (Modern Welsh baseline adoption) — constituting the first stage in the BCDRS pipeline from Modern Welsh to Revitalised Cumbric.
Middle Welsh — Welsh as written and spoken between approximately 1150 and 1500 CE — occupies a singular position in the history of the Celtic languages. It is not merely the intermediate stage between Old Welsh and Modern Welsh; it is the period of the language's literary florescence, the era that produced the Four Branches of the Mabinogi, the Arthurian romances of the Red Book of Hergest, the great elegiac and praise poetry of the Gogynfeirdd, and the codified legal texts of the Laws of Hywel Dda [1, 2]. Middle Welsh is, in the estimation of most scholars, the language at its fullest medieval expression.
The period is conventionally bounded by two significant transitions. The lower boundary, circa 1150, marks the emergence of a recognisably distinct literary Welsh from the archaic Old Welsh tradition — a transition visible in the changing orthographic conventions of the manuscripts and in the morphological regularisation of the verbal system. The upper boundary, circa 1500, marks the beginning of the Early Modern Welsh period, associated with the advent of printing and the gradual standardisation that would eventually produce the Classical Welsh of Bishop William Morgan's 1588 Bible translation [3].
The principal manuscript repositories for Middle Welsh literature are among the most intensively studied medieval documents in Europe. The White Book of Rhydderch (Llyfr Gwyn Rhydderch, c. 1350) and the Red Book of Hergest (Llyfr Coch Hergest, c. 1400) together preserve the bulk of the medieval Welsh prose tradition, including the complete Mabinogi and the Welsh Arthurian material [4]. The Black Book of Carmarthen (Llyfr Du Caerfyrddin, c. 1250), the oldest surviving manuscript written entirely in Welsh, preserves early Middle Welsh verse [5]. The Book of Taliesin (Llyfr Taliesin, c. 1350) provides a complex manuscript tradition spanning Old and Middle Welsh verse [6].
The corpus size of Middle Welsh is substantial by any standard. The Cardiff University Rhyddiaith Gymraeg Ganoloesol project — a systematic digitisation of Middle Welsh prose — has produced a searchable corpus of approximately 2.8 million words [7]. This abundance is not merely a quantitative fact; it is the methodological foundation of the MWRF. It means that for a large proportion of the 307 dataset entries, the Middle Welsh form can be established by direct attestation rather than by inference.
The MWRF was designed precisely to exploit this abundance. Its primary method — corpus attestation — is not available to the OWRF or NBTRF, which operate under conditions of scarcity and near-total absence respectively. The abundance of the Middle Welsh record is what makes the MWRF the most straightforwardly grounded of the three BCDRS frameworks, and it is why the MWRF serves as the entry point to the BCDRS pipeline from the living Modern Welsh baseline.
The MWRF rests on three primary scholarly resources, each exploited in a specific and methodologically constrained way. Together these resources constitute what the framework designates as the M1 attestation tier — the highest level of evidentiary quality available within the MWRF.
The Geiriadur Prifysgol Cymru (GPC) — the University of Wales Dictionary of the Welsh Language — is the indispensable scholarly instrument for any work in historical Welsh lexicography [8]. First published in fascicles from 1950 onwards and substantially completed by the early twenty-first century, the GPC provides for each entry not merely the word and its meanings but its complete history of attestation, including the earliest documented form, the date and manuscript source of each attestation, and the full range of orthographic and morphological variants.
For the MWRF, the GPC functions as the primary corpus attestation source — the first place consulted for any dataset row. Where the GPC provides an entry with one or more attestations dated to the period 1150–1500 CE, that form is the Grade A Middle Welsh form. The GPC's authority derives from its basis in the primary manuscript sources: its citations are traceable to specific manuscripts, scribes, and datable documents. An entry confirmed in the GPC does not merely assert that a word existed in Middle Welsh — it confirms the specific form in which it appeared, in which text, at approximately which date.
The MWRF uses the GPC Online edition, freely available at welsh-dictionary.ac.uk, which provides full search functionality across both headwords and historical forms. Where multiple attested forms exist for a single entry — reflecting the range of spelling variation across the Middle Welsh manuscript tradition — the MWRF selects the form most widely represented in the classical period (approximately 1200–1400 CE) and most consistent with Evans's grammatical descriptions.
D. Simon Evans's A Grammar of Middle Welsh, first published by the Dublin Institute for Advanced Studies in 1964 and reprinted several times thereafter, remains the standard academic reference grammar for Middle Welsh morphology and syntax [9]. Evans provides systematic paradigm tables for all major verb classes — including the key irregular verbs bod, mynd, dyfod, gwneuthur, and caffael — as well as comprehensive descriptions of the pronominal system, nominal morphology, the mutation system, and the major syntactic constructions of Middle Welsh prose.
Within the MWRF, Evans serves two functions. First, as a direct M1 attestation source: Evans provides paradigm forms with manuscript citations, and these forms are assigned Grade A. Second, as the theoretical basis for the M2 morphophonological derivation rules: where M1 attestation is unavailable, the MWRF applies the systematic correspondences between Modern Welsh and Middle Welsh that Evans documents — including aw-restoration, pronominal substitution, and irregular verb paradigm replacement. Evans is thus simultaneously an attestation source and a derivation authority.
Evans's Grammar is especially authoritative for verb paradigms. The great irregular verbs of Welsh — bod (to be), mynd (to go), dyfod (to come), gwneuthur (to do), caffael (to have/get) — show systematic divergences between Middle Welsh and Modern Welsh in their finite paradigms, and Evans's tables provide the complete Middle Welsh forms for each cell. These paradigm forms, being directly documented from manuscript sources, yield Grade A outputs for the relevant dataset rows.
The Cardiff University Rhyddiaith Gymraeg Ganoloesol (Middle Welsh Prose) corpus is a digitised, lemmatised, and morphologically tagged corpus of Middle Welsh prose texts [7]. At approximately 2.8 million words, it constitutes the largest systematic digital resource for Middle Welsh currently available to researchers. The corpus encompasses texts from across the Middle Welsh period, including both the major literary prose texts (the Mabinogi, Arthurian romances, saints' lives, Welsh chronicles) and legal and documentary materials.
Within the MWRF, the Cardiff corpus serves as a frequency and distribution resource. Where
the GPC identifies multiple attested forms for a Middle Welsh word — as is common, given the
unstandardised nature of medieval Welsh orthography — the Cardiff corpus provides evidence
of which form was most prevalent in the classical period. A form attested hundreds of times
across multiple prose texts from the period 1200–1400 carries more weight than a form
attested once in a peripheral document. The Cardiff corpus enables this frequency weighting
and validates the representativeness of the form selected for the wlm column.
The M2 stage of the MWRF pipeline applies documented morphophonological rules governing the systematic differences between Modern Welsh and Middle Welsh. These rules are not speculative constructions but are derived from the academic descriptions of Evans, supplemented by Rodway's Dating Medieval Welsh Literature [10] and Sims-Williams's work on early Brittonic phonology [11].
The most significant M2 rules are aw-restoration (Modern Welsh -og → Middle Welsh -awg/-awc), pronominal substitution (2pl chi → chwi; 3pl nhw → hwy), irregular verb paradigm replacement (documented per Evans), and the ⟨u⟩ orthographic convention for /ʉ/ (Middle Welsh ⟨u⟩ where Modern Welsh has ⟨y⟩ in certain positions). These rules are systematically applied to cy values when M1 attestation is unavailable, producing Grade B outputs.
The MWRF operates a strict hierarchical evidence model in which each level is attempted in sequence. Only if a given level fails to produce a reliable result does the framework descend to the next. This hierarchy ensures that the highest-quality evidence always takes precedence, and that Grade C (Modern Welsh baseline adoption) is never applied when Grade A or Grade B is achievable.
| Level | Source | Method | Grade |
|---|---|---|---|
| M1-GPC | Geiriadur Prifysgol Cymru | Historical dictionary with manuscript citations | A |
| M1-Evans | D. Simon Evans, Grammar of Middle Welsh | Standard reference paradigm tables | A |
| M1-Cardiff | Cardiff Rhyddiaith Gymraeg Ganoloesol corpus | Frequency-weighted corpus attestation | A |
| M2-Deriv | Morphophonological derivation rules | Systematic rule application from cy baseline | B |
| M3-Adopt | Modern Welsh (cy) baseline | Direct adoption; MW and ModW forms identical or indistinguishable | C |
An important asymmetry between the MWRF and NBTRF should be noted here. In the NBTRF, the absence of evidence for a given category produces an identity mapping (xcb = owl) as a matter of strict policy — and certain categories are entirely frozen as immutable. In the MWRF, the absence of M1 attestation triggers M2 derivation rather than immediate adoption of the baseline. This reflects the much richer evidentiary environment of Middle Welsh: the documented morphophonological rules are well established enough that M2 derivation is academically defensible in a way that equivalent speculative inference would not be for Cumbric.
Furthermore, the MWRF has no truly immutable categories. Every dataset row has at minimum a Grade C output — the cy value is always available as a baseline. This contrasts with the NBTRF, where several categories are structurally frozen and cannot in principle yield non-identity contributions. In the MWRF, even days of the week and months are Grade C rather than immutable: the classification reflects the evidentiary situation (MW and ModW forms are effectively identical for these items) rather than a theoretical prohibition on investigation.
The MWRF processes each dataset entry through a three-stage sequential pipeline. The output
of each stage becomes the input to the next, and the final output of the pipeline is the
value written to the wlm (Middle Welsh) column.
Sources: GPC, D. Simon Evans's Grammar, Cardiff Rhyddiaith corpus.
M1 asks: is this word or paradigm form directly attested in a reliable Middle Welsh source dated to the period 1150–1500 CE? This is the simplest and most direct question the MWRF can ask, and — given the size of the Middle Welsh corpus — it is answerable affirmatively for a large proportion of dataset entries. Verb paradigm forms, pronoun forms, common prepositions, conjunctions, cardinal numbers, and colour adjectives are all readily locatable in the GPC or Evans.
The M1 procedure involves three sequential checks:
M1 outputs are assigned Grade A. In the current dataset, M1 yields Grade A for all verb paradigm rows (via Evans), all pronoun rows (via Evans §51), many adjective rows (via GPC), and a substantial number of numeral rows (via GPC). The majority of the 307 entries receive Grade A from M1.
Source: cy (Modern Welsh) value; Evans and Sims-Williams for rule basis.
M2 applies documented morphophonological rules to the Modern Welsh baseline where M1 attestation is unavailable. Unlike M1, M2 does not look up a form — it constructs one by rule. The rules are systematic and are based on well-evidenced academic descriptions of the differences between Modern Welsh and Middle Welsh morphophonology.
The primary M2 rules are: aw-restoration (Modern Welsh -og/-o endings reflecting historical Middle Welsh -awg/-aw); ⟨u⟩ for /ʉ/ (where the Middle Welsh manuscript tradition uses ⟨u⟩ where Modern Welsh uses ⟨y⟩); pronominal substitution (2pl chwi, 3pl hwy); and irregular verb paradigm replacement. See §8 below for the full rule specification.
M2 outputs are assigned Grade B. They represent academically defensible derivations from documented rules — not attested forms, but principled constructions. The key discipline of M2 is to apply rules only in the environments where they are documented, and to default to M3 whenever rule application is ambiguous.
Source: cy (Modern Welsh) value, adopted directly.
M3 is applied when M1 attestation is unavailable and M2 derivation either produces the same result as the Modern Welsh baseline or cannot be reliably applied. In these cases, the MWRF adopts the cy value directly and assigns Grade C.
Grade C is most frequently applied to stable categories — days of the week, months, seasons, time expressions, conjunctions, and greetings — where Middle Welsh and Modern Welsh forms are effectively identical. It is also applied to peripheral vocabulary items where M1 attestation is absent and M2 rules do not apply. The Grade C designation reflects the scholarly judgement that the Modern Welsh form is the best available approximation of the Middle Welsh form for this item, either because they are genuinely identical or because the evidence to determine a difference is absent.
Grade C is not a failure of the framework. It is the correct scholarly position when the alternative — asserting a Middle Welsh-specific form without evidentiary basis — would constitute fabrication. The MWRF adopts the same conservative epistemology as the NBTRF: the absence of evidence for difference is not evidence for difference.
The following categories are expected to yield Grade C outputs as their standard baseline in the MWRF. This classification is based on the observation that, for these categories, the Middle Welsh and Modern Welsh forms are either identical or differ only in minor orthographic conventions that do not affect the identity of the word. Grade C is the expected outcome rather than an immutable rule — if GPC or Evans confirms a divergent Middle Welsh form for any item in these categories, that M1 form takes precedence.
| Row prefix | Category | Basis for Grade C baseline |
|---|---|---|
DAY_* | Days of the week | Latin-derived borrowings stable across MW and ModW; Dydd Llun, Dydd Mawrth, etc. attested identically in both periods |
MON_* | Months | Latin/Romance borrowings; stable; GPC confirms MW forms identical or near-identical to ModW |
SEA_* | Seasons | Gwanwyn, Haf, Hydref, Gaeaf attested in MW; effectively identical to ModW forms |
TIM_* | Telling the time | Clock-time expressions are post-medieval constructions; MW period has no equivalent register |
TMP_* | Temporal words | Stable across periods; minor orthographic variants possible but semantically identical |
CONJ_* | Conjunctions | Principal Welsh conjunctions stable across MW and ModW; GPC confirms identity or near-identity |
GRT_*, INT_*, POL_* | Greetings / introductions / politeness | Phrasebook register largely post-medieval; MW period does not provide equivalent communicative forms |
The critical contrast with the NBTRF is that none of these categories is immutable in the MWRF sense. The NBTRF designates categories as structurally frozen — they cannot yield non-identity contributions under any circumstances. The MWRF designates the above categories as having a Grade C baseline — the expected outcome in the absence of contrary evidence, but not a barrier to M1 attestation if such evidence exists. This distinction reflects the radically different evidentiary positions of the two frameworks: the NBTRF must adopt strict immutability because Cumbric evidence is so sparse that any deviation from Old Welsh would be speculation; the MWRF can afford a probabilistic baseline because the Middle Welsh corpus is rich enough to detect genuine differences where they exist.
The following categories are expected to yield Grade A or Grade B outputs through M1 attestation or M2 derivation. For these categories, the Middle Welsh forms are either directly recoverable from Evans or GPC (Grade A) or systematically derivable from the Modern Welsh baseline by documented morphophonological rules (Grade B).
| Row prefix | Category | Expected grade | Primary source |
|---|---|---|---|
BE_* | bod (to be) paradigm | A | Evans chapters on bod inflection; all cells documented |
HAVE_* | cael/caffael paradigm | A | Evans; MW caffael paradigm fully attested |
GO_* | mynd paradigm | A | Evans; MW mynd/mynet paradigm documented |
COME_* | dyfod paradigm | A | Evans; MW prefers dyfod (not ModW dod) |
DO_* | gwneuthur paradigm | A | Evans; MW gwneuthur (not ModW gwneud) |
TAKE_*, GIVE_*, SEE_*, KNOW_*, WANT_*, NEED_* | Action verb paradigms | A/B | GPC for attested forms; M2 where GPC unavailable |
MOD_* | Modal constructions | A/B | Evans for gallaf, dylwn; M2 for peripheral forms |
PRN_* | All pronouns | A | Evans §51; full paradigm directly attested |
ADJ_* | Adjectives | A/B | GPC; some a-affection forms via M2 |
PREP_* | Prepositions | A/B | GPC; inflected prepositional forms in Evans |
NUM_* | Cardinal numbers | A/B | GPC; MW number system largely stable; vigesimal forms differ from ModW |
ORD_* | Ordinal numbers | A/B | Evans; -(h)ed suffix attested in MW ordinals |
All MWRF outputs carry one of three confidence grades:
| Grade | Definition | Typical basis | ATTESTATION_CLASS |
|---|---|---|---|
| A | Directly attested in GPC, Evans, or Cardiff corpus, with a date or manuscript citation in the period 1150–1500 CE | GPC entry with medieval citation; Evans paradigm table form; Cardiff corpus high-frequency form | DIRECT_ATTESTATION |
| B | Systematically derived by M2 morphophonological rules from a well-evidenced Modern Welsh base | aw-restoration applied to cy form; pronominal substitution; a-affection alternant from Evans | MORPHOPHONOLOGICAL_DERIVATION |
| C | Modern Welsh baseline adopted; MW form identical to ModW or insufficiently evidenced for distinction | Stable category (days, months, conjunctions); M2 produces same result as cy; M1 confirms identity | CY_ADOPTION |
The grade distribution in the MWRF is fundamentally different from the NBTRF. Where the NBTRF produces predominantly identity results (288 of 307 entries with xcb = owl), the MWRF is expected to produce predominantly Grade A outputs, with Grade B for peripheral categories and Grade C for stable categories. This reflects the abundance of the Middle Welsh corpus: the evidentiary conditions for direct attestation are met for most entries.
A further important distinction: in the NBTRF, identity results (xcb = owl) inherit the confidence grade of the Old Welsh column. In the MWRF, Grade C outputs (wlm = cy) inherit a different logic — they represent scholarly confirmation that the Middle Welsh and Modern Welsh forms are effectively identical, which is itself an academically grounded judgement supported by GPC attestation. Grade C is not a low-confidence claim; it is a high-confidence claim that the two periods do not differ for this item.
The M2 stage of the MWRF pipeline applies the following documented rules. Each rule specifies the Modern Welsh environment, the corresponding Middle Welsh form, and the authoritative source for the correspondence. Rules are applied only in the environments specified; mechanical over-application is a failure condition.
Modern Welsh shows reduction of the historical diphthong *aw* to *o* or *og* in many unstressed final syllables, particularly in the adjectival suffix -og and in verbal nouns. Middle Welsh retains *aw* in these environments [9].
| Modern Welsh (cy) | Middle Welsh (wlm) | Notes |
|---|---|---|
| marchog (knight) | marchawc | GPC: MW spelling with aw and final c |
| draenog (hedgehog) | draenawg | GPC: MW retention of aw |
| mawr (big) | mawr | Identity — aw already present in ModW; no change |
Modern Welsh represents the central high rounded vowel /ʉ/ with ⟨y⟩ in stressed monosyllables and some unstressed syllables. Middle Welsh frequently used ⟨u⟩ in equivalent positions. This rule applies only where GPC or Evans confirms the ⟨u⟩ spelling in medieval manuscript sources [9, 11].
| Modern Welsh (cy) | Middle Welsh (wlm) | Notes |
|---|---|---|
| byd (world) | bud | GPC: ⟨u⟩ attested in MW manuscripts |
| dyn (man) | dyn / dun | Both attested; GPC shows variation |
The lenition (soft mutation) of initial /g/ produces Ø (deletion) in both Middle Welsh and Modern Welsh. This is an identity rule between the two periods — no M2 modification is required. The contrast with Old Welsh (which retained /ɣ/ in this environment) is significant for the OWRF, but transparent to the MWRF.
The Middle Welsh pronominal system, as documented by Evans §51, differs from Modern Welsh in two principal paradigm cells: 2pl chwi (not ModW chi) and 3pl hwy (not ModW nhw). All other pronouns are effectively identical.
The key irregular verbs of Middle Welsh — bod, mynd, dyfod, gwneuthur, caffael — show systematic paradigm divergences from their Modern Welsh equivalents. Evans's Grammar provides the complete Middle Welsh paradigms for all these verbs, and these are assigned Grade A via M1.
Notable cases: Middle Welsh verbal noun dyfod (not ModW dod); gwneuthur (not ModW gwneud); caffael (alongside cael); past tense 3sg of caffael: MW cavas/cafas (not ModW cafodd).
Middle Welsh adjectives subject to a-affection (internal vowel alternation in the feminine and plural) may show forms not preserved in Modern Welsh, which has regularised many of these paradigms. Where Evans documents a Middle Welsh feminine or plural alternant, it is recorded at Grade A via M1 [9].
Middle Welsh ordinal numbers use the suffix -(h)ed more consistently than Modern Welsh. Evans documents the Middle Welsh ordinals, and where they diverge from the Modern Welsh forms, the Evans form is used at Grade A.
The MWRF pipeline has been applied to all 307 entries in the Revitalised Cumbric dataset, spanning three source tables: Verbs & Sentence Elements (206 entries), Numbers & Dates (75 entries), and Greetings & Introductions (26 entries).
| Stage | Expected proportion | Notes |
|---|---|---|
| M1 — Grade A (Direct Attestation) | ~55–65% of entries | All verb paradigm rows via Evans; all pronoun rows via Evans; many adjectives, numbers via GPC |
| M2 — Grade B (Morphophonological Derivation) | ~15–25% of entries | Peripheral action verbs; some adjectives; items where GPC provides a MW-specific form derivable by rule |
| M3 — Grade C (cy Adoption) | ~20–25% of entries | All stable categories: days, months, seasons, conjunctions, greetings, time expressions |
The grade distribution confirms the methodological character of the MWRF: it is a corpus-attestation-first framework operating in conditions of abundance. The large proportion of Grade A entries reflects the richness of the Middle Welsh record as documented by Evans and GPC. The Grade C proportion reflects the stable categories where MW and ModW are effectively identical.
| Domain | Expected grade | Basis |
|---|---|---|
| To be — present (6 rows) | A | Evans paradigm: wyf, wyt, yw, ydym, ydych, ydynt |
| To be — past/conditional (12 rows) | A | Evans imperfect and conditional paradigms fully documented |
| Pronouns (23 rows) | A | Evans §51: full paradigm directly attested |
| Adjectives — colours (10 rows) | A/B | GPC for most; a-affection alternants where documented |
| Cardinal numbers (31 rows) | A/B | MW number system largely stable; GPC confirms; vigesimal system documented |
| Days of week (7 rows) | C | Stable Latin-derived terms; MW = ModW |
| Greetings / introductions / polite (26 rows) | C | Post-medieval phrasebook register; cy adoption |
Despite the comparative abundance of the Middle Welsh corpus, the following limitations apply to all MWRF outputs:
The MWRF occupies the first position in the medial pipeline of the Brittonic Convergent Diachronic Revitalisation System (BCDRS), which is in turn the Brittonic implementation of the abstract Convergent Diachronic Revitalisation System (CDRS).
The full pipeline of the BCDRS is:
Modern Welsh (cy) → MWRF → Revitalised Middle Welsh (wlm) → OWRF → Revitalised Old Welsh (owl) → NBTRF → Revitalised Cumbric (xcb)
The MWRF is the entry point to this pipeline. It takes the living Modern Welsh language (cy) as its input — the only genuinely living language in the chain — and transforms it into a systematically documented Middle Welsh state. The justification for beginning with Modern Welsh is straightforward: Modern Welsh is the direct descendant of the entire Brittonic chain, and it is the language for which the most reliable contemporary reference data is available. Beginning from cy and working backwards diachronically is methodologically sounder than attempting to reconstruct directly from the sparse Old Welsh or Cumbric evidence.
The MWRF's output (wlm) feeds directly into the OWRF, which applies phonological regression to convert Middle Welsh forms into their Old Welsh antecedents. The MWRF therefore sets the quality of the entire downstream pipeline: if the wlm column is accurate, the OWRF has a reliable input; if the OWRF has a reliable input, the NBTRF's Old Welsh baseline (owl) is reliable; and if the owl baseline is reliable, the Revitalised Cumbric (xcb) forms derived from it are as well-grounded as the sparse Cumbric evidence permits.
The precedent for this kind of Brittonic revitalisation work is well established. The Cornish revival — discussed in §11 of the NBTRF dissertation — demonstrates that thin corpus evidence and systematic comparative methodology are sufficient to produce a viable revitalised language. The revitalisation of Manx similarly drew on manuscript sources and comparative Celtic linguistics. The MWRF's position is more comfortable than either: it is not revitalising Middle Welsh from nothing, but documenting it from an existing, rich corpus. The challenge is selection and standardisation, not reconstruction. The MWRF's scholarly contribution is to make that selection systematic, traceable, and open to revision.
It should be noted that the MWRF, OWRF, and NBTRF are not independent projects — they are stages in a single integrated system. A change to the MWRF (e.g., a correction to a wlm value based on new scholarly evidence) flows through to the OWRF and potentially to the NBTRF. This integration is by design: it means that improvements to any part of the chain improve the whole. The per-row trace files maintained by each framework ensure that such changes are documented and their effects traceable.
The MWRF is designed as a living framework. As Middle Welsh scholarship advances — through new critical editions of manuscript texts, expanded digital corpora, refined grammatical descriptions, and new work in Brittonic historical phonology — the framework is positioned to incorporate that evidence and improve the accuracy of the wlm column.
The Cardiff Rhyddiaith corpus is itself an ongoing project; future expansions will bring more Middle Welsh prose into the searchable record and may provide M1 attestation for items currently at Grade B or C. New critical editions of key texts — the Mabinogi, the Welsh laws, the Arthurian romances — continue to be produced by scholars at Aberystwyth, Bangor, and Cardiff, and these editions may provide better manuscript evidence than the editions currently in use.
External contributions from qualified specialists in Middle Welsh linguistics are welcomed under the formal contribution protocol documented at the Contributors page. The protocol requires dissertation-format submissions evaluated against the MWRF specification, with accepted corrections entered into the Polyglot™ master dataset and flowing through to this site.
The conservative standard of the MWRF is permanent. Every grade assignment is an epistemological claim about the quality of available evidence, not merely a data value. Future revisions will improve grade accuracy but will not compromise the integrity of the grading system. The goal is not the maximum number of distinctive wlm forms, but the maximum accuracy of every form that appears, whether Grade A, B, or C.