Atlas Introduction Linguistic Profiles

ANGUS McINTOSH, University of Edinburgh
M.L. SAMUELS, University of Glasgow
MICHAEL BENSKIN, University of Oslo
with the assistance of Margaret Laing and Keith Williamson
University of Edinburgh

INTRODUCTION TO LINGUISTIC PROFILES

1. Introductory

1.1 The linguistic profiles (hereafter ‘LPs’) are the completed copies of a standard survey questionnaire. Each LP is an inventory, for some specified sample of text, of the forms observed which correspond to the test-items on the questionnaire. Completion of the questionnaire is in effect the selective indexing of the sample text: the index attends only to those words, or parts of words, which are elicited by the questionnaire; and the points of text at which a given form is attested are not part of the final record. The making of such linguistic analyses is described in volume I of the Atlas, General Introduction, Chapter 2, 2.2.1–4; the principles on which the questionnaire is based, and the evaluation of the ensuing analyses, are treated in Chapters 2 and 3 of the General Introduction. The southern material appended to the County Dictionary in volume IV––the collection of ‘notable forms’––is not incorporated in the LPs.

1.2 This present introduction treats of the codification of the analyses as LPs in printed form, and explains the organisation of this volume.

2. Organisation of the volume

2.1 The LPs are grouped in sections, according to the county or other specified region to which their language belongs. Within each section, the LPs are ordered by key-number. The county sections are ordered alphabetically by county name; first printed are the counties of England, then of Wales, then of Scotland. The counties are the modern descendants of the mediaeval ones, substantially as they were just before their disruption by the local government reforms of 1974 (England and Wales) and 1975 (Scotland); but Ely, the Soke of Peterborough, Middlesex and London are separately treated. The ridings of Yorkshire are assigned separate sections, and so also is the city of York. Sections relating to areas that cannot be described by county names are printed after the county sections for the relevant country of origin (London and York are among the county sections): so, in the English part, ‘Northern’ appears after the last county, ‘Yorkshire, West Riding’.

2.2 London here means strictly the mediaeval city and Westminster: it is with that small area that the texts have been associated. However, largely for reasons of space required to accommodate the data, the LP locations (defined by grid reference) have been placed more widely, and London is represented on the maps in volume II by the boundaries of the old County of London. (For discussion of the problems of fitting texts associated with cities and cultural centres, see the General Introduction, 2.3.7, in volume I.)

2.3 All of the LPs contributory to the maps in volumes I and II are printed here. Additionally, there appear collections of LPs whose language cannot be localised with sufficient confidence to justify their inclusion in the maps, but for which placing within useful limits is still possible. The largest single body of such material belongs to northern England, north of the Humber-Lune line, and is designated ‘Northern’. These LPs are identifiable from the lack of a grid reference, and the phrase ‘Not entered on maps’ in the summary description at the head of the LP (see 6.1 below).

3. Scope of the linguistic profiles

3.1 In principle, each LP represents the language of one and only one scribe. In relatively few cases have the products of two or more hands knowingly been incorporated in a single LP; in one or two such, the fact of conflation was discovered only at a late stage in the survey, after detailed palaeographical work established what the linguistic analysis had failed to reveal. In the first instance, the policy of segregation was an obvious and necessary safeguard against treating as a single état de langue what in fact were two or more dialectally independent contributions to the same manuscript or text; the problems that would attend the incorporation of such an artificial conglomorate into the maps are self-evident. As the survey progressed, however, it came to be realised that the LPs themselves constituted a unique taxonomic resource: given the ‘fit’-technique (volume I, General Introduction, 2.3.3–7), the geographical organisation of the material afforded a systematic means of associating the dispersed writings of individual scribes and scriptoria, and hence of contributing on a broad front to the study of mediaeval literary culture (McIntosh 1974 and 1975; Benskin 1977 and 1981a; volume I, General Introduction, Chapter 4). At that stage was abandoned the occasional and legitimate economy of presenting a single LP for collaborating scribes whose usages, viewed conventionally, differed but trivially. The individual characteristics of each scribe recognised, regardless of their significance for regional dialectology, came increasingly into focus. (See especially McIntosh 1974 and 1975; Benskin 1982a.) Not all of the potential source-material for the Atlas, however, can well be treated so straightforwardly.

3.2 Enactments of municipal government, recorded in town books, assembly rolls, or other forms of register, are commonly difficult to incorporate in representative scribal profiles: the individual contribution to the text may be elusive. Typically, such records are fair copies made from drafts or miscellaneous papers: so much is clear from the occasional dittographies and haplographies, and still more so from their scripts, which are almost never consonant with the activities of a minutes secretary. In course of analysis of a single script, change of orthography from one set of enactments to another, or even from one capitulum to the next, is a familiar discovery: within the limits of the local range of variation, it appears, a clerk might set down in his fair copy the forms of the drafts from which he worked. (For discussion of this phenomenon, ‘constrained selection’, see volume I, General Introduction, Chapter 3, 3.4.1–4. So, for example, on f. 22r of ‘The Great Red Book of Bristol’, the text of the assize of bread and the first three regulations for brewing establish that (passim), vpon nine times, and eny five times, as regular usage. The fourth regulation breaks the pattern, with þt and þat (once each, against that four times), any twice, and apon once, all these in a mere three lines. For ‘if’, yf contrasts with the previous form yef; although each appears once only on this folio, yef is established in the previous texts by twelve or more instances, and yf is not found. Again, in the last ten lines of f. 23v (‘Crafteholders’), ‘them’ is thaim thrice, ‘their’ is thair four times, har and their once. All of these forms are new: the preceding folio, in the same script, contains ham ten times, hem thrice, ther eleven times, and her nine. A scribal profile is here less a characterisation of an individual writer’s spontaneous usage than a statement of his linguistic tolerance. The range and relative frequency of variant forms, since they are text-determined, cannot be held to represent the usage of any one writer, but only of an indeterminate sample from the literate community to which he belonged. The profile is hence in some ways of more value for dialect mapping than a genuinely scribal profile: the individual contribution is only dimly discernible, but something of the urban heterogeneity appears.

3.3 The toleration of forms which are perhaps not the copyist’s own, but familiar from the usage of his contemporaries, is not the only contamination to be encountered: older documents may be rehearsed in the course of setting down new legislation, and forms old-fashioned or even archaic brought into the modern record. Occasionally, the use of antique forms seems to be deliberate affectation: one of the hands of the Liber Primus of Waterford uses euch for ‘each’ in documents of the 1540s, which spelling had been all but obsolete for the last hundred years. (The Waterford scribe, however, is outclassed by the writer of Morsbach’s document no. xix, whose resurrection of the letter wynn in the twenty-ninth year of Henry VI, a full hundred and fifty years after it had disappeared from the English alphabet, can hardly depend on other than a familiarity with writings already antique by his own day.)

4. Northern and Southern versions of the questionnaire

4.1 The survey of counties north of the Wash was the responsibility mainly of Professor McIntosh, that of the counties south of the Wash falling mainly to Professor Samuels. The two parts of the survey overlap in a belt running across the Midlands, and Norfolk was likewise jointly surveyed. The forms of questionnaire used for the two parts differ considerably, although there is a substantial common core (see further 6.3–6.7 below). The two corpora that are the results of this survey are designated NOR and SOU respectively. The Lincolnshire material is mostly the work of Dr. Laing, and forms part of the NOR corpus.

5. Linguistic profile numbers

5.1 Each LP is identifiable by a reference number, printed at the head of the LP and in bold type. Any such number in the range 5000–9999 indicates a LP belonging to the SOU corpus. NOR LPs have reference numbers from 1–2000. Any LP number in the series 4000–4999 is a conflation, the result of merging a codified SOU LP with a codified NOR LP (see further section 8 below). SOU numbers were assigned originally as an unbroken sequence for each county; but this geographical schema has been interrupted by the addition of some thirty later LPs, and by the removal of certain material since found to be unreliable. The NOR numbers have never been geographically ordered, and reflect merely the sequence in which material was incorporated in the maps; for present purposes they are arbitrary.

5.2 For reference within the Atlas, the primary category is either (a) the county or other specified area of origin, or (b) the manuscript from which the LP is derived. The LP numbers themselves are not used as an independent system. On the maps, the LP numbers are used to identify the locations at which information from the corresponding LPs is entered. Suppose that the reader wishes to find out the manuscript source for such a location: since the starting point is the map, the county to which the LP belongs can be immediately ascertained; the LP may then be found in the relevant county section of this volume, and at the head of the LP will be found the required information. (Or, the reader may refer to the county section of the Index of Sources in volume I, where the same information will be found.) Only within the county section is the ordering of the LPs numerical, and within this ordering there are, of course, diverse gaps in the numerical sequence.

5.3 Again, suppose that the starting point is the manuscript, and that the reader wishes to know what LPs have been derived from it and where they are to be found. The key here is the repository section of the Index of Sources. In the entry for the manuscript in question––and it should be noted that there may be several successive entries for but a single manuscript in that index––there will be found a note of the LP number or numbers relevant, with grid references to their map locations, and the name of the county to which they are assigned. Once the county and LP number(s) are known, the LPs themselves may be consulted in this present volume.

5.4 Otherwise, except for cross-reference within sub-sections of the county lists, LP numbers are used only in the County Dictionary in volume IV. Since they are there printed in sequences introduced by the name of the county or other region to which they are assigned, cross-reference to this volume is straightforward.

6. General appearance of the linguistic profiles

6.1 As presented here, each LP is preceded by a summary description of the manuscript or other source from which the LP is derived. The summary descriptions are identical with the corresponding entries in the Index of Sources in volume I. They contain the following categories of information.

(1) The manuscript or other source from which the LP is derived. Reference to manuscripts is by present location and repository, followed by collection and peculiar designation. Sources known only from printed versions are introduced as ‘Printed:’, followed by the name of the work in which they are presented.

(2) The hand that is represented by the attached LP. If the manuscript is the work of only one scribe, the fact is duly noted. A scribe responsible for by far the bulk of a manuscript is ‘main hand’. If the hand represented by the LP is one of two or more making substantial contributions, normally the sigla ‘A, B, C,...’ are used to distinguish them. For a hand which contributes separate types of language to a single manuscript, a formula such as ‘Hand C, language 3’ is used.

(3) The parts of the manuscript for which the present scribe is responsible, usually identified by folio or page, but sometimes by text (so ‘the hand of ff. 27r-91v’, ‘the hand of the Prick of Conscience’).

(4) Such codicological or other information as may be relevant for dialectal inquiry.

(5) The particular sample of the scribe’s work that was analysed for incorporation into the LP, identified as in (3) above. ‘Analysis from original’ means that the analysis was made directly from manuscript, or (usually) from a facsimile (microfilm or photostat). If the analysis depends on a printed version, the edition is duly identified.

(6) The grid reference to the place at which the LP is entered on the maps.

For documents, as opposed to literary manuscripts, the repository details are normally followed by a brief statement of the character of the document (e.g. ‘lease’, ‘arbitration’, ‘marriage settlement’), the principal parties, the date, and declared place of origin. For some further account, reference should be made to the introduction to the Index of Sources in volume I. The key to abbreviations used in these descriptions is reprinted on p. 696 of this volume; on p. 700 will be found a table of regnal years.

6.2 The body of the LP is an inventory, printed down the page and in three columns. Each column consists of two parts. On the left appear the test-items of the questionnaire; with some exceptions (see below) these are modern English words printed in small capitals. On the right are printed the Middle English forms elicited by the test-items opposite; the forms are printed in lower case roman, with italics for expanded abbreviations (see further 14.10 below). Where for the same item two or more forms co-occur, as for example swilk and sich for SUCH, their relative frequencies are marked by parentheses: swilk (sich) indicates that sich is significantly less common than swilk, and swilk ((sich)) that sich is relatively rare (see further section 13 below).

6.3 The full survey questionnaire, duly annotated, is printed in pp. xviii–xix below. Most LPs invoke only a sub-set of the full range of items, according as the northern or southern version of the questionnaire was appropriate to the text at issue. It is hardly to be expected that even for these sub-sets a given text will yield information in respect of all of the items sought: null ‘responses’ are common. In the LPs as printed, if an item is not found in the source text under consideration, then the item-name is suppressed. So, for example, if a text lacks the word ‘through’, then the test-item THROUGH will not appear in the LP derived from it. The alternative, which is to print the item-name at its due place in the list, with a blank opposite in the event of non-attestation, is in some respects preferable, not least because each item would then be found always at the same point in a fixed-format inventory; but the cost in blank paper, particularly for the many documents in which only a hundred or so items are attested, is prohibitive.

6.4 In the annotated version of the survey questionnaire presented below, the separate test-items are numbered, and in the following text reference to items is by number as well as item-name. The numbers are merely for convenience of exposition, however, and are not used in the LPs as printed in the main body of the text.

6.5 The body of the LP falls into two parts. The first part, items 1–64, conforms in effect to the original survey questionnaire. The division, and especially the order of the test-items in Part I, will doubtless seem perverse to many readers; a unified list, alphabetically ordered, has obvious attractions, particularly for casual consultation. Yet for linguistic purposes, alphabetical ordering is seldom desirable: rather, grouping by phonological or grammatical or semantic categories may be preferred. Done systematically, this naturally involves much repetition: a single word may contain several segments, each of which calls for the entry of that same word in a separate phonological category, and it may also be eligible for inclusion in grammatical and semantic categories as well. There are in fact some fairly obvious local coherences of this sort in Part I of the LP, but it must be admitted that its shape is largely accidental. The justification for presenting it in this form, leaving aside the pitfalls attendant on even the computer-aided re-organisation of such an archive, is chiefly its taxonomic value. Although a more powerful and economical set of test-items could undoubtedly be devised, for efficient preliminary classification this present list has worked remarkably well over its many years of application.

6.6 Most of the items in Part I were collected for both the NOR and SOU material. The second part is historically an expansion of the original questionnaire, and although it was codified at a fairly early stage, it reflects considerable experience of dialectal analysis. It was inevitable that contrasts and distributions not originally the object of inquiry would force themselves on the attention of the investigators, and among them were some of a wholly unexpected sort. Many items were seen at an early stage to be important for the dialectology of some particular region, but were less obviously relevant to areas beyond. The northern and southern versions of the questionnaire naturally diverged: to take but a single example there is evidently little point in attempting a systematic collection of Scandinavian loan words as a basis for the dialect geography of southern England.

6.7 Part II of the LP is too large for other than a more systematic ordering. Items 65–266 are complete words, listed alphabetically. Items 267–80 are phonological categories (267–71) or suffixes (262–80), and within this sub-section the order is again alphabetical. The suffixes might reasonably have been interspersed in the main, lexical, section, but the phonological categories could not well appear there: separation of 267 -ALD from 268 -AMB by 70 ALL, or of 268 -AMB from 269 -AND by 71 AMONG, would be merely irksome.

7. Sub-profiles

7.1 In the SOU corpus, there are several LPs to which are appended sub-profiles. These arise when two or more scribes contributing to the same manuscript, perhaps turn and turn about, write substantially the same type of language, for which language a single local origin can be assumed. The profile represents the usage of just one of the scribes, and, for his language, is comprehensive. The language of the other scribe(s) is represented only in so far as it differs from that of the main informant. Suppose, for example, that each of three collaborating scribes uses hit as the dominant form for ‘it’, with yt as a rare variant: hit ((yt)) appears in the main profile, and there is no separate indication of the usages of other scribes. Suppose, however, that in addition to the main informant’s whoch for ‘which’, the other scribes contribute whuch and woch respectively: the additional forms are entered in the appropriate sub-profiles, one for each scribe, as a supplement to the main profile. For dialect mapping the presentation is unobjectionable: the testimony of one informant is recorded in detail, and other informants belonging to the same place are reported only when their forms supplement that account. For most places, by contrast, only one informant is available. This mode of presentation, however, is obviously deficient as an account of the language of the subsidiary informants, because it cannot be assumed that all of the features recorded in the base profile are attested in their own writings. Their language, unless otherwise noted in the sub-profile, is a sub-set of the usage of the main informant, and their relative frequencies of co-variants may or may not conform.

7.2 The sub-profiles are not printed as separate inventories, but incorporated in the main profile, as appendices to the individual entries for the relevant items. The figure 1, followed by a semi-colon, closes the record of the main informant’s contribution; the second contributor’s forms are terminated by ‘2’ and a semi-colon; the third contributor’s forms are closed by ‘3;’, and so on. The presence of the subsidiary informants is noted in the summary description at the head of the LP.

7.3 In the County Dictionary in volume IV, LPs so affected are recognisable from the form of their identification numbers. These consist of four figures, but unlike other SOU LP numbers they do not end in 0. The main profile in such a set has a key number ending in 1; subsidiary profiles have key numbers ending in the range 2–9.

8. Conflations

8.1 As noted above, there are two different versions of the survey questionnaire, and which one is used for any given analysis depends on whether the text under scrutiny belongs to the NOR corpus or to SOU. Although the two parts of the survey were conducted largely independently, it was recognised from the beginning that a geographical overlap was essential: the final stages would require adjustments along their common border, and interlocking of the two parts, not merely their juxtaposition. The overlap is a belt across the Midlands, which broadens eastward to include all of Norfolk and the northern edge of Suffolk; within the belt fall over sixty points for which information is entered on the maps; they are identifiable by their key numbers, which lie in the range 4000–4999.

8.2 Accordingly, each of some sixty LPs rests upon two separate analyses, one of which yields a NOR LP, the other a SOU LP. The two LPs are codified separately, and then conflated to produce a LP in the 4000 series. Such a LP contains, in addition to the common core items, the items collected only from NOR texts plus the items collected only from SOU texts. Except for the common core items, therefore, conflation represents a simple extension of the questionnaire. Conflation in respect of common core items, however, is less straightforward. If precisely the same stretch of text had been analysed for NOR as for SOU, and if the performance of the two analysts were identical, then conflation would involve no more than the printing, item by item, of just one of the two identical records. In practice, however, the records are usually similar but not identical.

8.3 SOU analyses tend to be selective once a stable pattern of usage has been defined. Commonly, they are fairly complete records for part of the text, supplemented by scanning for items not found in the initial sample. NOR analyses, by contrast, tend to be full reports for more strictly delimited samples, and are relatively seldom amplified by scanning beyond.

8.4 For a lengthy text, therefore, the corresponding SOU and NOR LPs may differ not only in respect of the stretch of text reported, but also in respect of the sampling method. (It should be emphasised that ‘text’ here means ‘text written by just one scribe’.) A pair of such LPs can of course be expected to supplement each other: provided that they are not dialectally incongruous, occasional forms attested in one LP but not in the other can be incorporated in a more comprehensive account of a single état de langue. So, for example, that LP-SOU reports þai ((þei)) for ‘they’, whereas LP-NOR has only þai, would not of itself render their conflation suspect if the language of the text belonged to some area where, on the evidence of other texts, þai and þei co-occurred. A fortiori it would not be suspect if LP-SOU’s þai ((þei)) rested on fifteen attestations of ‘they’, whereas LP-NOR’s þai rested on only four. If, however, the incorporation of (say) a SOU LP into the NOR corpus involved the intrusion of unexpected or anomalous forms into the NOR area of survey, then the whole of the text whence the two LPs derived would be rescrutinised as a matter of course.

8.5 Suppose, for example, that LP-SOU’s þei were unknown in the relevant part of the NOR area: were þei regularly attested in the adjacent SOU area, then its occurrence in this peripheral LP would no longer be perceived as anomalous; indeed, its low relative frequency, ((þei)), is precisely what is to be expected towards the limit of its geographical range. The distribution of þai and þei through the text itself, however, would also have to be examined. A direction of change from (say) þai early in the text via þai ((þei)) to regular þei in the later folios, might well call into question the linguistic integrity of the text: apart from the possibility of translational drift (General Introduction, 3.3.2–4 and Appendix I) or constrained selection (General Introduction, 3.4.1–4), hitherto undetected changes of hand may also be at issue.

8.6 This last proved to be the case with the medical texts of British Library MS Royal 12 G. iv. The manuscript was put together under the supervision of the infirmarius of St. Mary’s Hospital, Coventry, and prima facie constitutes a solidly localised text. Yet conflation of the discrepant NOR and SOU LPs would here have proved especially misleading, for it now appears that not all of the hands write Coventry language, and some of them could be isolated only after close palaeographical analysis. Had these texts been initially unlocalised, then the discrepant NOR and SOU LPs would inevitably have been assigned to different parts of the dialect map; an attempt to reconcile the discrepant placings would then have followed automatically, and with it a closely controlled re-analysis of the text(s). In the event, re-analysis of the Royal manuscript was called for by the incompatibility of the NOR and SOU placings for certain other texts in that area, which placings were heavily influenced by the assumed character of the Coventry material.

8.7 If divergent NOR and SOU LPs are to be conflated, the most important condition to be satisfied is that the independent placings of the two LPs within, respectively, the NOR and SOU configurations, should coincide geographically. (Mutatis mutandis, this same condition applies to conflation into a single LP for analyses from diverse parts of a single text which falls exclusively within one or other of NOR and SOU.) If that condition is satisfied, then such discrepancies as appear between the LPs may be provisionally accounted for as accidents of sampling.

8.8 It has been noted that the conflations here printed were made not directly from the original analysis sheets, but from the versions codified as (respectively) NOR and SOU LPs. Since the constituent LPs were by that stage in machine-readable form, conflation could be automated, which procedure conferred various advantages. Apart from the obvious saving of labour, the computer here offers a degree of accuracy scarcely to be attained otherwise. Visual conflation of analysis sheets, especially when the two sets are in different formats, is trying in the extreme; inadvertent omission is almost inevitable, and likewise miscounting. The conflated version, moreover, must be rewritten from the original analyses; the final report is yet further from the mediaeval writer’s spellings, and with two separate source analyses to check, the detection of miscopyings is the less certain. These objections are not merely theoretical, but derive from long collective experience.

8.9 The one disadvantage of proceeding from the codified LPs is that relative frequencies are already stated impressionistically; they may rest on numerically disparate records anyway, and the application of parentheses itself is commonly a matter of judgement. A difference of one degree on the scale is here unlikely to be significant; if such differences constitute a clear trend throughout the record for the common core items, then the linguistic integrity of the text is unlikely to have escaped question at an earlier stage. In the conflated LP, forms are represented in the higher of the two relative frequencies contributed by the constituent LPs. This procedure is not ideal, but it serves. At the head of each LP, in the introductory text, there appears a statement of the NOR and SOU contributions to the conflation. These are marked by the initials of the persons responsible for the constituent analyses: ‘AM’ contributes NOR material, ‘MLS’ contributes SOU.

9. Conflations for mapping

9.1 In some cases, two or more LPs are conflated as a single LP for presentation on the maps only. This expedient is adopted where there is insufficient space to enter a separate text-block for each of the LPs which belongs to the same place. Where five or more LPs are so localised, the procedure is automatically invoked: four text-blocks are the most that can be effectively disposed about a point. (For one such example, see Penrith in Cumberland (LP 109), for which ten LPs are entered.) The conflated LP is made in the same way as those in the 4000 series, in that each variant form attested is represented in the highest relative frequency recorded for any of the contributory LPs. The LP thus conflated for the maps only is not, however, printed as a separate entity in this volume: it is merely a notional, collective representation, of which the constituent parts are separately printed. It is necessary, however, to assign a single key-number to such a representation, again solely for cartographic purposes. In the county sections of the Index of Sources (volume I), the conflated LP number is duly listed in the first category for each county, among the ‘Sources mapped’; there follows then not a manuscript reference, but a list by key-number of the constituent LPs (‘Conflation of LPs...’) The constituent LPs are separately entered in the list; these entries, like those for any other LP, contain the relevant manuscript notes, but additionally they contain a cross-reference to the LP number for the conflation in which they are mapped (‘Conflated for mapping with LPs...as LP...’)

10. The linguistic scope of an item

10.1 Most items are in principle likely to be found in all texts, so that non-attestation is usually an accident of sampling rather than a fact about the language. For example, a text in which the word ‘work’ failed to appear would not be construed as evidence that its writer lacked such a word in his vocabulary. Similarly for the grammatical and phonological categories, like the suffix of the 3sg present indicative, or the reflex of OE ā. There are, however, some items whose mere presence in a text is itself a matter of linguistic significance: the point at issue is not the particular form the item takes, but whether it occurs at all. Relevant here are at as an infinitive marker (item 74), gar or ger ‘cause (to do)’ (134), and at as a relative participle (75); of these, the first two are of Scandinavian origin, and the third may commonly be so. They are features of obvious interest to historians of the language, and they are important taxonomic criteria for the dialectologist. There is, however, a certain cost in according such features the status of independent items, namely an indeterminacy in the detail of their distributions. That gar is absent from a given LP may reflect only that the idea ‘cause (to do)’ had no place in the sample text; it need not of itself indicate that gar was foreign to the scribal dialect in question. Ideally, of course, the item would have been framed not as GAR ~ GER but as CAUSE (TO DO): the absence of the word from a LP could then have been interpreted, either in terms of the exclusion by some equivalent expression, or as a deficiency in the source text. For at as a relative particle, the problem of indeterminacy –– which even for gar ~ger is not very serious––is fairly trivial: the relative ‘that’ is to be found in almost any text, even the shortest of documents, and a null entry for 75 AT can be taken as a fairly reliable indication that at is excluded by some variant of that. For the infinitive marker, a full collection of forms for the NOR corpus is presented, regular to and rare til entered as 27 TO +inf; at is entered diagnostically, as a separate item (75).

10.2 The verbs ‘make’ and ‘take’ have likewise been recorded only selectively. The contracted types mas ‘makes’ and tas ‘takes’, with tane ‘taken’, are the substance of items 175 and 229. Uncontracted forms are not entered, and no other parts of these verbs are systematically recorded. This is not to deny the possibility that variant forms of the uncontracted types have coherent regional distributions (consider mak-, maik-, mayk-, maak-, for example); but, like the variant spellings for ‘that’ (yat, þat, that, etc.), they held no promise of immediate dialectal interest at the time the survey questionnaire was codified. Operationally, their exclusion was undoubtedly justified: collection of several thousand instances of that or make, merely for the sake of a few hundred instances of at or a few score of mas, is no economical procedure.

11. Items and sub-items

11.1 Item-names, except for those which are abbreviated grammatical labels, are printed in small capitals. Many items admit sub-items; these are always introduced by an abbreviated grammatical label in lower case italics, as ‘pt-sg’, ‘sb’, and the like. Typically, sub-items are grammatical variants of the base form, as with ‘got’ in relation to ‘get’; or, they separate what may be the same word in different grammatical functions, as with adverbial ‘among’ versus prepositional ‘among’. This device was adopted not as a matter of linguistic theory or taxonomy, but simply as a means of reducing the volume of the computer corpus to more manageable proportions; the data structures are such that if each separately-recorded category has the status of a separate item, storage space is significantly expanded, with concomitant disadvantages for machine-processing.

11.2 The principle of presentation is best illustrated by example. For many verbs, including of course all strong verbs, past tense forms and past participles are treated independently of the present stem. These are normally entered as sub-items, and the item-name is the infinitive form of the verb. Thus ‘gave’ is entered as a sub-item of 137 GIVE, introduced by the qualifier ‘pt-sg’ or ‘pt-pl’, according as ‘gave’ is singular or plural; likewise, ‘given’ is found also under GIVE, and identified by the leading qualifier ‘ppl’. A full attestation for the verb would be recorded in the format

For the modal verbs, however, the past tense forms are treated as separate items: so 22 SHALL ~ 23 SHOULD, 24 WILL ~ 25 WOULD, 176 MAY ~ 53 MIGHT, 95 CAN ~ 99 COULD.

11.3 Ordinal numbers are entered as sub-items for the corresponding cardinals: thus ‘fifth’ is found under 126 FIVE (sub-item ‘ord’), not as an independent item preceding 121 FIGHT; ‘third’ is similarly appended to its ordinal 237 THREE, not placed independently before 232 THOU.

11.4 A full alphabetic index to the items and sub-items of the LP, in which all sub-items are identified verbally and assigned to the relevant item-names, appears below, pp. xxv–xxvi.

12. The sub-category ‘cf’

12.1 This sub-category can appear for any item. It contains elements that are in some way comparable with the item proper, but which cannot be regarded as occurrences of the item itself. So, for example, ‘sixty’ is sometimes reported in the cf sub-category for 218 SIX, especially if ‘six’ itself is not attested; the stem vowel is usually the same as that for ‘six’, but it cannot be relied upon to be so. Similarly, occurrences of an item in personal or place names, in so far as they are noted at all, are entered here; they may or may not reflect the form taken by that same item in independent contemporary usage (cf. Davis 1968, p. 270). These are duly annotated prs-n and p-n respectively. Material entered under cf is not included in the systematic index to the LPs, the County Dictionary in volume IV.

13. Relative frequencies of co-variants

13.1 Typically, in any lengthy text a given writer uses variant spellings for the same one word or morpheme. So, for example, slyk may alternate with swylk for ‘such’, -eþ with -eth for the suffix of the 3sg present indicative. Such co-variants may occur in widely differing relative frequencies: in one text, say, 27 swylk to 20 slyk, in another 12 slyk to 2 swylk. In the LPs, the relative frequencies of co-variants are represented by a system of parentheses. Variants–– forms–– not enclosed by parentheses stand in dominant frequency. Single parentheses enclose forms that occur about one third to two thirds as frequently as the dominant form. Double parentheses enclose forms that occur less than about one third as commonly as the dominant form. So a representation like swylk (sylk) ((slik)) implies occurrences in the ratio of roughly 9 : 4 : 2. The treatment is obviously crude, but for the material in question it works reasonably well: an overtly impressionistic rendering, that is immediately accessible to the reader, has been preferred to presentation of the raw analyses in quantitative and indigestible form. Otherwise, a complex and immensely time-consuming statistical operation would have been required, in order to establish a common basis for comparison of widely divergent sample sizes, not merely from text to text but also from item to item within the text. Such treatment would have far exceeded the resources available to us. (See further, Benskin, Cowham and Doyle 1985.)

14. Editorial Practice

14.1 The representation of mediaeval spellings is in principle diplomatic; but, if comparison with the practice of phonetic transcription be allowed, the renderings are broad rather than narrow.

In the LPs, capitals and minuscules are not usually distinguished: the lower-case printed form implies either or both. Manuscript I ~ J, however, is reported always as I, never as i or j, unless the LP depends on an edition in which the mediaeval usage was incorrectly reported. In some of the SOU LPs, capital H-, and capital T- in the combination Th-, are distinguished.

H-. Late ME ‘it’ admits forms with and without intitial ‘h-’. In some southerly writings, ‘h-’ forms are preferred for apparently stressed positions, the ‘h-’less forms for unstressed positions. Forms with ‘h-’ may also be preferred at the beginning of a sentence, clause or line of verse, and there written H-; in such positions, H- need not imply a stressed variant. In SOU LPs, therefore, sentence, clause or line initial H- forms are segregated: an entry like it (Hit) implies that the selection of ‘h-’ and ‘h-’less forms is independent of stress; whereas in it (hit, Hit), variation between it and hit may well be stress-conditioned. In a more refined analysis, of course, stressed and unstressed occurrences would be systematically segregated. In NOR LPs, H- and h- forms are recorded indifferently with h-.

In classical and mediaeval tradition, i and j are merely variant forms (figurae) of the same letter (littera): j is ‘i-longa’. In the LPs, however, minuscule i and j are regularly distinguished. It should be noted, however, that in some manuscripts the two forms are not sharply distinguished, and that in particular cases it may be hard to decide whether a form is i rather than j; elements of judgement cannot be excluded altogether from the report. The correspondents of modern capital I and J are not distinguished in the manuscripts, and are reported here as I.

As with i and j, these are merely alternative forms of what, in mediaeval tradition, was the same letter. In the LPs, however, u and v are distinguished throughout.

For reasons connected with the development of handwriting in the late 12th and 13th centuries, the letters ‘þ’ and ‘y’ came to be written identically in some modes of script. By the later Middle Ages, insular practice was regionally coherent: south of a line running roughly from the Mersey to the Wash, but excluding much of East Anglia, the distinction was regularly maintained; north of this line, and also over much of East Anglia, ‘þ’ and ‘y’ were represented by the same (usually y-like) symbol. (See further Benskin 1982a). In the LPs, the use of þ implies a systematic distinction between the two letters in the manuscript at issue, regardless of the letter-shapes used to effect it. If the letters are confused, then y is used throughout, regardless of whether the mediaeval symbol is þ-like or y-like. Renderings like mþkþll, corresponding to familiar mykyll, are hence not to be found, but appear with y; and in a manuscript where þ is so used, other þ- spellings are likewise reported as y (so þe ‘the’ appears as ye). The system of transcription, which attends to functional distinction rather than to form, is not ideal; but it is a practical means of reporting, in outline, an important facet of the written language, and it cuts through the taxonomic problems that would otherwise be presented by the many scripts in which þ-like, y-like, and indeterminately þ ~ y-like symbols are used interchangeably (see Benskin 1982a, p. 23).

The letter ‘z’ is reported as z or as ȝ, according to manuscript usage. (Commonly, the ME letters ‘z’ and ‘ȝ’ are not distinguished, but both written ȝ.) When z is written for ‘ȝ’, it is so reported: for example, manuscript zet ‘yet’ is preserved, not altered to ȝet.

It should be noted that word-final z/ƶ/ȝ may have origins other than in the letters ‘z’ and ‘ȝ’: they may derive from the abbreviation for -et found in Latin and French usage. The sign z/ƶ/ȝ was here originally syllabic, but was reinterpreted as a simple consonant, equivalent to t, and written post-vocalically. (So habeȝ from earlier habȝ, for habet. English asset(s) from French assez ‘enough’, is a back-formation.) The usage was adopted in some ME writings where forms like habbeȝ ‘have’ may imply a suffix of the -eþ type (with t from þ), rather than -es inflexion.

In most cursive scripts save the formal varieties, the letters ‘n’ and ‘u’ are not distinguished in form. Normally this presents no difficulty, and n or u is printed according to lexical identity and etymology. In some cases, however, the sequence of four minims may be ambiguous: is nn or un intended? So, for example, -aund or -annd in ‘land’ and the suffix of the present participle; -oun- or -onn- in ‘hundred’ and ‘young’. Here, typographical representations are almost bound to be arbitrary, and should be regarded as such in the LPs. In cursive scripts, whether formal or informal, final ‘n’ is commonly written with the second minim recurved upwards over the whole letter; the recurve may be reinforced by being drawn rightward again, into a tilde. Historically, this is a sign of abbreviation; the form may be identical with that form of ‘u’ in which the recurved stroke abbreviates a following m or n. In many, perhaps most late mediaeval scripts, however, the mark of abbreviation is a mere flourish, and modern editors usually ignore it. In general, we have followed suit, but in some scripts the flourish has seemed not to be otiose. Here, expansion to -ne has been preferred to expansion as -nn; and so also -me has been preferred to -mm. In some cases, it is unclear whether the two minims and tilde should be read as -un or -ne (or -nn): so, e.g., soun or sone (or sonn? Again, representation is in some degree arbitrary; in spite of efforts to interpret these spellings in terms of the script and orthography of the texts in which they occur, it can hardly be claimed that the practice of representation is impeccable.

These are commonly abbreviations, as in þɩdde from þridde ‘third’, qan from quan ‘when’. For most items, such conventional abbreviations are expanded, and the letters implied by the abbreviation are italicised. So fam is represented as fram ‘from’, pei as prei ‘pray’, gow as grow ‘grow’. In some items, however, the superscript is always printed: whic and wch ‘which’, ic ‘I’, þu ‘thou’, wt ‘with’. In cases where the expansion is uncertain, the superscript is likewise preserved.

Superscripts that are not abbreviations, or only arguably so, are retained: ye ‘the’, þei ‘they’, yam ‘them’, boþe ‘both’, not ‘not’. In NOR, usually no attempt is made to distinguish between superscripts placed directly above an on-line letter, and those placed to the right. In SOU, the manuscript positioning is reflected by the typesetting. It should be noted, however, that the variation tends to be clinal, and that the binary classification is not always a sure guide.

Corrections entered by the writer of the mediaeval text are normally not distinguished in the LPs: insertions and corrected spellings are treated as running text. In the rare cases where a given correction may provide evidence for a copyist’s tolerance of the language of his exemplar, the correction is explicitly marked, either with an appended annotation (‘corrected form’), or (in the case of deleted letters) with the scribal sub punctum.

In those cases where a manuscript form is clearly erroneous –– that is, in cases where the scribe himself could be expected to have corrected the form, had it come to his notice –– the fact is noted by appending ‘error’ to the form in question. It should be noted that such annotation is used only sparingly: various forms which appeared on first acquaintance to be aberrant were later confirmed in the usage of other writings subsequently found to be from the same area.

In cases where interpretation of a manuscript form is doubtful, readings are prefixed by a question mark ‘?’. It may be assumed that all such instances are nonce-occurrences in the writings from which they are reported: a second occurrence normally resolves any doubt.

Abbreviations are conventionally expanded, and italicised. Expansions are conventional: except for the macron or tilde, and for ad hoc uses of the bar of contraction, a given sign of abbreviation is always expanded in the same way. Thus the abbreviation of the noun plural suffix appears always as italicised -es, never as -is or -ys. This involves departure from the traditional renderings of Scots texts, where the expansion is nearly always -is; but it would be wholly misleading to imply a difference between mediaeval English and Scots practice on this point, with -es giving way to -is north of the Border, when all that is at issue is the variant national practices of modern scholars.

The attempt to expand abbreviations to conform with the other spelling practice of the scribe in question we believe to be mistaken. Consider, for example, the writer who employs both þar and þ+abbreviation for ‘there’. It could be held that since the fully written form always has a, then the abbreviated form must be expanded þar: the writer ‘really meant’ þar. But equally it could be held that the writer ‘really meant’ þer and þar by turns; for since there is a historically regular abbreviation for -er, he could always save a little effort by using the abbreviation when he intended þer; whereas there is no distinctive abbreviation for ar, so that when he intended þar he had to write the word fully. The problem can be avoided altogether, however, by representing the form of the abbreviation rather than its supposed significance; and although letter sequences italicised have been used here, a case could be made for a strictly formal representation (e.g. by figures, in the absence of record type).

The bar through h and ll is disregarded, unless, as in some formal scripts, it is clearly an abbreviation and not a mere flourish. Similarly treated is the return-stroke from the ascender of final d.

Recurved final -r is printed ‘-re’; the flourish is generally otiose, but is of some taxonomic relevance in an inventory of ME scribes, and its expansion presents none of the problems that may arise with the other marks of (questionable) abbreviation. Note that its presence is in some degree determined by the mode of script: in textura it is rarely used as a mere flourish, whereas in some varieties of anglicana and secretary it is habitual.

Words containing an unstressed prefix commonly appear as two words in the manuscripts. So bi for ‘before’, a geyn ‘again’, wt out ‘without’. In all such cases, the space between the words is represented by a hyphen (LP to-gader for MS to gader, etc.) In manuscripts the hyphen is almost never so used, save at line-ends; in the LPs, all hyphens are editorial. It should be noted that spacing between words in mediaeval writing is usually clinal, and that the insertion of hyphens may therefore be a matter of judgement. Hyphens are also used to represent the letter-spaces between the separate elements of periphrastic constructions, answering to a single test-word on the survey questionnaire. So manuscript vn to the tyme is reported as vn-to-the-tyme (243 UNTIL), manuscript þeiȝ al as þeiȝ-al (32 THOUGH), and so on. Such renderings are merely editorial convention, designed chiefly to aid in machine-processing; that they are unfailingly explicit has seemed good reason to keep them in the final copy.

For most such items, the form of the stem only is the object of present inquiry; in so far as the inflexions are considered, they are treated as separate items, and without regard to their lexical adhesion. (So items 56 Sb pl, the noun plural suffix, to 64 Str ppl, the strong past participle suffix.) In general, therefore, inflexions have been suppressed in the lexical items, and replaced by trailing hyphens. Hence MS þenking ‘thinking’ is represented as þenk-, MS fryndis ‘friends’ as frynd-. (Such stems are accordingly not normally distinguishable in the LPs from similar stems abstracted from compounds: MS fryndschip would likewise yield frynd-). Relative frequencies are expressed in terms of the whole (undifferentiated) sg. and pl. entry.

Words ending in ‘-er’. When an inflexion of the ‘-es’ type is added, ‘e’ of ‘-er’ may either be retained in an ending of the full ‘-eres’ type, or syncopated so that the ending is ‘-res’. So, e.g., fader ~ faderes or fader ~ fadres, oþir ~ oþiris or oþir ~ oþris. Representations of the type fadr- and oþr- should therefore not be taken to imply fadre and oþre in the uninflected stem: they may well answer to stems like fadir and oþur instead.

In some of the most recently compiled LPs, the pl. and gen. sg. forms are assigned to the sub-category ‘pl’, regardless of their form, and the inflexions are preserved. Otherwise, the only regular use of the sub-category is occasioned by such inflected forms of 87 BROTHER and of 100 DAUGHTER as display a distinctively pl. stem vowel: plurals like breþer (sg. broþer) and deghter (sg. doghter) are always so treated.

In general, rhyming and alliterative usages are not reported in the LPs. Scribes who, in course of copying, regularly translate the language of an exemplar into (presumably) their own familiar form of ME, commonly transmit rhyming and alliterative forms unaltered: were these forms likewise translated, then the organising principles of the verse would be variously disrupted. (See further sections 3.3.5–7 of the General Introduction in volume I.) However, there may be reason to suppose that a copyist’s usage differed little from that of the authorial version of his text; or, a copyist’s usage may be in flagrant breach of the verse requirements, implying a degree of carelessness (and hence, perhaps, spontaneity), or plain intolerance. Such forms are obviously to be treated with caution: see General Introduction, 3.3.6. In the LPs, all such instances of rhyme are marked ‘rh’; forms occurring in alliteration have been excluded.

No attempt has been made to record manuscript punctuation, except in the case of 158 I (personal pronoun) where a number of instances of the forms I, i, y highlighted by points have been noted; so, e.g. .I, i., .y.

References

Benskin, M. (1977) ‘Local archives and Middle English dialects’, Journal of the Society of Archivists 5, pp.500–514.

Benskin, M. (1981a) ‘The Middle English dialect atlas’ in M. Benskin and M.L. Samuels, ed., So meny people longages and tonges: philological essays in Scots and mediaeval English, presented to Angus McIntosh (Edinburgh: the Editors).

Benskin, M. (1982a) ‘The letters þ and y in later Middle English, and some related matters’, Journal of the Society of Archivists 7, pp. 13–30.

Benskin, M., R.H. Cowham and A. Doyle (1985) ‘A computer-aided system for printing variable text-blocks on crowded maps’, Association for Literary and Linguistic Computing Journal 5, pp. 1–24.

Davis, N. (1968) Review of G. Kristensson, A Survey of Middle English Dialects 1290–1350: the six northern counties and Lincolnshire, Notes and Queries NS XV, pp. 270–272.

McIntosh, A. (1974) ‘Towards an inventory of Middle English Scribes’, Neuphilologische Mitteilungen 75, pp. 602–624.

McIntosh, A. (1975) ‘Scribal profiles from Middle English texts’, Neuphilologische Mitteilungen 76, pp. 218–235.

McIntosh, A. (1983) ‘Present indicative plural forms in the later Middle English of the North Midlands’, in Middle English Studies presented to Norman Davis in honour of his seventieth birthday, ed. Douglas Gray and E.G. Stanley (Oxford: Clarendon Press), pp. 235–244.

GIVE:	yew ((yeue))
pt-sg:	yaw ((yaf))
pt-pl:	yawe
ppl:	yewyn