Artificial Intelligence: A Warning for History

Recent years have seen an explosion in the use of artificial intelligence (A.I.) across the workplace and the classroom, most prominently Large Language Models such as ChatGPT and Microsoft CoPilot. As with any innovation, many proponents have made exaggerated claims about the inevitable changes, such as A.I.’s potential to ‘literally rewrite history’. To say that such claims are an exaggeration is not to deny that some types of A.I. may be beneficial to historians. For example, A.I. can help historians to decipher older texts that may be damaged or faded. In one sense, this is part of a longer-term trend in the discipline of history. Since at least the 1980s, historians have regularly been using the latest computer technology to enhance their research. There is therefore nothing new in historians using new technologies.

I am a historian of the British Isles between the years 1300 and 1600. My research normally focuses on the political culture of the period, and I am often happy enough to use quantitative methods in my research. As both a researcher and teacher, I am often told about the potential value of artificial intelligence. This article is a personal reflection on the use of A.I. in research, specifically Microsoft CoPilot. CoPilot is a readily available piece of software that seeks to summarise and analyse documents, helping to produce visual materials that, in theory, can be used for teaching and research. I recently conducted an experiment with CoPilot using a piece of research I had already undertaken. In sharing this experience, I aim to show how this specific piece of A.I. does not fully grasp what historical analysis is. I will also discuss how this acts as a warning for historians regarding the future of A.I. and historical research.

I recently decided to test Microsoft Copilot’s supposed ability to analyse files and produce relevant summaries. I used a Microsoft Excel Spreadsheet that I produced whilst researching an article on ‘Licenses to Retain in Tudor England’. The spreadsheet was based on entries in the patent rolls of the English crown which allowed recipients to bring into their service more individuals than were permitted under the existing retaining acts. The entries themselves were formulaic, enabling them to be recorded in a systematic way that allowed for some meaningful quantitative analysis (more on this later). The initial run was fairly basic. I asked Copilot to ‘analyse the following spreadsheet’ with my original spreadsheet dropped into the chat function. The initial analysis it gave me was a straightforward set of tables and statistics that could be gleaned from any database or spreadsheet programme. It did not feel particularly meaningful for a historian and was more of a description of the spreadsheet’s contents. Nevertheless, it is a reasonable start for understanding these sources.

CoPilot then suggested two ways forward: ‘Would you like a visual breakdown?’, and ‘What insights can be drawn from Edward VI’s reign?’ The first seemed pretty standard (and indicative of a constant desire to visualize everything) while the second question seemed a better way to test CoPilot’s analytical ability. The suggestion of Edward VI over his father and siblings can be explained by the fact that his reign had the highest number of licenses. Saying yes to both questions produced a further set of more refined tables. CoPilot also presented me with the following sentences:

‘Knights overwhelmingly dominate the list, reflecting their central role in mid-Tudor military and administrative service.’

‘The Household and Servant roles were most common, indicating a strong emphasis on court service and domestic loyalty.’

‘The Duke of Somerset, Edward VI’s uncle and Lord Protector, stands out with two separate licenses of 200 retainers each—highlighting his exceptional political and military authority.’

There is nothing factually wrong with any of this, but there is also nothing particularly new or enlightening. Granted, in the days of index cards it would have taken a few hours to tally up the respective material. Historians, however, have not worked with such methods for several decades now. The statistics produced could easily have been generated by Microsoft Access (a file type that is currently not supported in Copilot) with only a little training required.

A colourful illustration depicting a lavishly dressed king sitting in a gold throne. Everything about the scene is slightly warped, for example, the king's face is slumped and he has several noses. He also has two hands at the ends of each of his arms. — ‘Warped perception’ illustration by Jas Martin

CoPilot’s response moved on to some comparisons between Edward VI and his siblings, before suggesting other medieval and early modern topics that could be discussed. To return the discussion to analysis of the initial source material, I asked CoPilot what insights can be found about the mid-Tudor crisis. Once again, several general, but factually correct, points were made. However, amongst the bullet points were the following statements:

‘Edward’s licenses, by contrast, favoured reformist councillors and household men, aligning with the Protestant direction of his government’.

Mary’s licenses to Catholic loyalists and household officials reflect her efforts to reassert control and restore traditional authority after the Protestant reforms of Edward’s reign.’

At no point in my research for this article did I note the religious denomination of the recipients. Indeed, I even remember a colleague joking with me that not including such information did not make me a ‘real’ Tudor historian. Sixteenth-century England was characterized in part by the religious changes of the Reformation, which is well known. What these statements show us is another example of A.I. telling you what you already know, which recently has aptly described as ‘history by confirmation bias’. After half an hour of queries on specific aspects of the spreadsheet, CoPilot asked, ‘Would you like to turn these insights into a research note, teaching resource, or publication draft?’. This was unexpected but I wanted to see what would happen if I tested this opportunity to see if CoPilot could write a journal article for me.

CoPilot suggested The Journal of Economic History as a place for submission because of the quantitative nature of the evidence. It even produced a plausible, albeit bland, abstract, noting that the article would present new quantitative and qualitative data. In reality, I knew there was very little ‘economic’ history in my spreadsheet, so I asked CoPilot for other possible journals. It came back with four other options: The Historical Journal, English Historical Review, Parliamentary History and History. Three out of the four would have been useful but the material was only very tangentially related to parliament and therefore Parliamentary History would have been an odd choice. For each suggested journal, CoPilot provided a quick overview of its remit along with a sentence on why my source material would be a good fit and what the angle of the article should be. None of this was inaccurate and it concurred with the publishing advice often give to early career historians.

I decided to ask CoPilot to prepare a publication for The Historical Journal. What was produced was laughable and, if submitted to an editor, would have been immediately rejected. I was presented with a PDF that was only eleven pages long, and two of these were exclusively taken up with the title and author’s details. Each main section was a short, nondescript paragraph and the overall article was very brief with no references (except for notes to three books, the most recent of which was published in 1991). There was no discussion of the historiography or any meaningful analysis of the sources in my spreadsheet.

A bright illustration depicting a battle scene with two soldiers facing one another on horseback. There is an elaborate border around the scene comprising mythological creatures and other soldiers, the features of which are all slightly warped. — ‘The battle against A.I.’ illustration by Jas Martin

Ironically, the journal where I actually published my findings was not one of those suggested by CoPilot, which is revealing about its limitations. I chose Continuity and Change because a key argument of my article was to link my earlier research on the fifteenth century to that in the sixteenth. I wanted to place the Tudor sources I was studying into a much longer historical context, hence submitting to a journal that emphasized change over time. Moreover, like any good history publication, my article was a collective endeavour, benefitting from the suggestions of peer reviewers and the editor. Broader comparisons are needed for good history. One reviewer suggested looking at some comparative material for Habsburg Spain and work on the military revolution which also helping in the framing of my argument. These suggestions of improvement probably came about organically from reviewers taking the time to consider the wider implications of my research in ways that I had not. The ‘Military Revolution’ in particular is rarely associated with England, and it seems that A.I. is unable to break out of this engrained narrative.

In light of this, it seems that we do not currently need to worry about A.I. written publications. However, we cannot be complacent as the technology may be moving faster. In 1950, mathematician and computer scientist, Alan Turing, proposed his test to consider if machines could think like humans. If a person could not distinguish a conversation with a human from that of a computer, then the machine would have passed the test. Academics in 2025 may want to consider a similar test for A.I publications. If the peer reviewer of a journal article cannot spot that something was written by A.I., and that A.I. could have engaged with their comments in a meaningful fashion, then such material could be published.

A vibrant illustration depicting a bunch of golden flowers, amongst which is a crushed tin can and other detritus. — ‘Corrupted tapestry’ illustration by Jas Martin

The emergence of A.I. as the stuff of everyday life, rather than science fiction, has come at an awkward time for historians in the U.K. This is particularly true for those working in higher education, which has been in crisis for several years now, with little signs of financial worries abating any time soon. The Royal Historical Society is currently doing important work to advocate for the value of History whilst the University and College Union is doing what it can to support the thousands of university staff facing the prospect of losing their job.

The current crisis in Higher Education in the UK is well known. The humanities have felt the brunt of it, not only in the UK but in places like the United States and Australia too. Academics in these disciplines are often encouraged to take more ‘vocational based’ subjects that are more likely to result in jobs with higher salaries. In the UK, the consequence has been a seemingly never-ending tale of job redundancies across the sector. These problems are not caused by the rise of A.I. Instead, they are the inevitable consequence of reforms to Higher Education funding enacted by the Cameron-Osborne government, followed by the lifting of student caps. This is not just a crisis affecting university staff, but also students, many of whom are working more hours than previous generations.

In an era of squeezed budgets, excessive metrics and continued job losses leading to fewer people being asked to do more, the dangers of A.I. become apparent. One recent STEM publication has suggested that Large Language Models (LLMs) such as CoPilot are ideally suited to writing academic publications because it is a mechanical task, thus freeing academics up to do research. At the moment, such thinking seems an outlier but that might not be the case forever. There is always a danger that policy makers or university managers will start encouraging academics to use LLMs to write their publications, as a time-saving measure. I am skeptical as to whether LLMs could write publications in STEM, and I am certain this could not work for the humanities where writing is an integral part of the research process.

My own experience demonstrates that writing a piece of history constantly requires checking and rethinking, along with the benefit of insightful peer reviewers. Writing history is emphatically not simply reporting what a body of primary sources say. Historical analysis is fundamentally different and more complex than producing a mass of visualizations and statistics that are the lifeblood of many A.I. programmes. Like any other piece of technology, A.I. is dependent on the assumptions made by the initial programmers, including their definition of an ‘analysis’. The initial response from my request to ‘analyse’ the spreadsheet was not quite what a historian would regard as an analysis. To put it another way: if this was an undergraduate assignment, it would be something in the 2:2 range because it is simply describing a source rather than producing any meaningful analysis.

It is clear that A.I. is going to become part of everyday life, and there are doubtlessly some areas where A.I. will be of great use. For historians however – particularly at a time of almost perpetual crisis for many of us – we must be alert to the very real dangers of A.I. and its potential to simplify, and ultimately impoverish, the study of the past.

2 Comments

C B Newham says:

1 month ago

LLMs can be quite different from one another because their training data is different. I’d probably not have used CoPilot for these things. Here are my recommendations from an arts history perspective:

GPT-5 (OpenAI)
Good: Translations of foreign languages. It’s vision capability is very good for old hand-written wills (Secretary hand, for instance).
Bad: Generating any writing – just about always long-winded and very “AI-like”.

Claude (Anthropic)
Good: General purpose. Useful for bouncing ideas. Also good at creating graphs and charts from spreadsheet data. Excellent at writing – I write my own things but I like to get Claude’s opinion and any helpful suggestions that it can make.
Bad: Don’t use it for wills or other handwritten documents because if it can’t work it out it will just make up the text!

Grok (X)
Good: Only useful for current affairs. Totally inept for anything else.
Bad: Just about everything except the most basic of historical facts.

Your mileage may vary. It’s always best to tell it exactly what you want to achieve rather than some vague direction you want to go.

Elisa says:

1 month ago

Wow What an amazing Artist! 🤩

2 Comments

Cancel reply