Artificial Intelligence and Images: Investigative Tool or Evidence?

Artificial intelligence (AI) is in the news almost daily. It is being used in many contexts, some for salutary purposes, and some for more insidious reasons. When considering the intersection of AI and image-based evidence in criminal investigations and prosecutions, it is important to draw a distinction between the use of AI for investigative purposes and its role as evidence in the courtroom. The value that AI may generate in terms of investigative leads is evaluated differently than it is when being contemplated for purposes of proof. This article discusses the value of AI in the preliminary evaluation of images and the caution that must be applied so that AI generated investigative results are not conflated with evidential proof.

Though the use of AI applies in both the domestic and international criminal justice arenas, I will focus on its use in international criminal investigations and prosecutions. In this context, a considerable amount of image-based evidence is recorded and submitted to investigative authorities by a large and diverse pool of professional and non-professional photographers. To centralize the collection of such evidence, in May 2023, the Office of The Prosecutor (OTP) in the International Criminal Court (ICC) launched OTP Link, which it describes as an advanced evidence submission platform. It is intended to be used for the online and email-based submission of evidence by external stakeholders and witnesses. This repository is separate from the large volume of images that are submitted to social media and other open source sites, some of which come to the attention of OTP investigators and analysts.

Though the OTP has some in-house capacity to receive, process, and evaluate submitted media via OTP Link, there are clear practical and legal challenges to overcome. In this article, I will address only one of those issues – image volume and the role of AI in identifying and assessing images. The jurisdiction of the ICC is considerable, and it is difficult to predict how much media will be submitted to an OTP repository rather than or in addition to social media and other open source sites. Given the willingness of many people to capture evidence of atrocities, especially in situations where most people have smartphones (e.g., the Russian invasion of Ukraine and the current Israel-Palestine conflict), the volume is reasonably expected to be high.

To cope with the volume of image-based evidence, the OTP must ensure that it has sufficient qualified personnel to administer OTP Link and must amass the requisite software and hardware needed to perform these functions. The OTP already employs investigative analysts and investigators who search the internet for relevant open source media and cause it to be downloaded by in-house cyber experts for examination. A dedicated repository such as OTP Link will accentuate the need to increase the resources currently available, though even with the addition of personnel the volume of images and recordings that would be required to be assessed will likely be larger than human personnel could ever manage. The potential for a veritable treasure trove of evidence is undeniable and institutions such as the OTP are only beginning to grasp the task before them. The human labour and expense required to evaluate such a large collection of images would be prohibitive, let alone the human cost incurred by the people required to watch images of horrific events.

Under these circumstances, there is a vital role for machine learning and AI to play in doing some of the preliminary work that humans would otherwise be tasked with doing. The solution is not necessarily to work harder but rather better and more efficiently. Machine learning is the science of programming computers to arrive at logical conclusions about the world due to being programmed and exposed to relevant data. Algorithms, which are a series of rules or processes performed sequentially, are used to sort through large amounts of data and search for targeted commonalities, e.g., vehicles, buildings, landforms, people, or events, often from multiple sources and perspectives. Being able to identify additional media sources and perspectives can help to ameliorate the epistemic challenges that may be faced when only sparse imagery is available, and when its context may be equivocal. A single source may favour one party’s narrative, but additional sources may call that narrative into question. Context is often critical.

Machine learning occurs when computer scientists give the system training data from which it can learn patterns and then uses that newly acquired knowledge to examine questioned data. Used in conjunction with computer vision, which is the analysis of digital images to detect objects and related scenes (e.g., colour, shadow, and light analysis; geometrical analysis of curves and edges; and photogrammetry), machine learning assists with the identification of relevant image content and the ability to generate corresponding investigational leads. Computer scientists are currently developing machine learning and AI products that are designed to automate the evaluation and categorization of images for later review by a human analyst.

For example, researchers at Carnegie Mellon University in the US have developed a machine learning video analysis system called Event Labelling through Analytic Media Processing (E-LAMP), which is designed to detect specified objects, sounds, speech, and events in video recordings, potentially reducing work that might have taken years to days. Through the use of training videos that depict objects or events of interest to researchers, E-LAMP evaluates the videos for targeted content and then examines a larger collection of non-training videos to look for examples of the targeted content. It then returns a set of videos that it thinks matches the targeted content to the E-LAMP operator who confirms whether E-LAMP is correct or not, and using this feedback the computer examines more videos. The result is the development of a “classifier” or event kit, which can be visual, aural, semantic or a combination of the three, and which is then used to search for targeted content in large video collections. It can also be used to detect duplicates or near-duplicate images.

E-LAMP has thus far proven most useful as a filter for removing irrelevant video from an analyst’s workflow and for identifying videos that bear further examination. It has not developed to the point where it can generate a collection of relevant videos that do not require significant analyst evaluation.[1] The algorithms are probabilistic, though not necessarily conclusive. Thus, a human analyst would still be required to evaluate images identified by the computer to see if they bear any relationship to one another and are relevant to the investigation. The computer would be trained to look for certain types of image content and the analysts would be trained to evaluate the proffered images, with the primary goal being reducing the need for human-centered manual review, though not its elimination.

E-LAMP’s efficacy is dependent upon the quality of the video images selected for evaluation and the refinement of its algorithms. It will likely not replace human knowledge and judgment. Further, E-LAMP does not assess the authenticity of the images (that work must still be performed by an imagery analyst), nor does it mediate poor quality images or biased perspectives. The real value in machine learning is finding potential evidence that might otherwise be overlooked due to sheer volume. Nonetheless, analysts would need to guard against automation bias which is the tendency to favor results generated by a computer over those generated otherwise, irrespective of the error rate.

One of the more common domestic uses for machine learning and computer vision is for the detection of child pornography. It also plays a role in the detection of deepfakes. This technology is significant and as it improves it should be of tremendous assistance to investigators, but it should be viewed as an investigative tool, not a method of proof. E-LAMP and similar tools are just that – tools. They do not generate new evidence in isolation, rather they wade through vast swaths of media and identify images that require human analysis. The identification and evaluation of relevant image content through machine learning may form part of the case narrative that will require expert analysis and other corroborative testimony. Similarly, facial recognition software may identify similarities between questioned and known faces, but the computer generated comparisons are not the evidence. Rather, they provide the basis upon which a facial recognition expert can evaluate the images and render an expert opinion, which can then be presented as evidence before the court.

Undertaking a robust machine learning approach to media repositories, which is contemplated by OTP Link, requires significant technical and human resources. AI plays a very important role in the preliminary identification and assessment of potential image-based evidence. It helps to separate images of potential value from a much wider collection of irrelevant or duplicitous media, but AI is not and cannot be viewed as the final arbiter of what will become evidence. Analysts must view AI generated results and identify images that are believed to be relevant to the investigation. Further, AI does not authenticate image-based evidence. Imagery experts must be used to undertake the often-complex task of determining whether images are authentic and thus worthy of consideration as evidence. This three-step approach – AI scanning, sifting, and isolation of images from large media repositories; analyst evaluation and confirmation; and imagery expert authentication via content-based and metadata analysis – is essential in the search for the truth in image-based evidence. AI is integral but is only one step in a much larger investigational and evidential process.

Note – the role of AI in the identification and assessment of image-based evidence is one of many topics that are covered in much greater detail in my upcoming book Image-Based Evidence in International Criminal Prosecutions: Charting a Path Forward, being published by Oxford University Press, and expected to be released in March 2024.

[1] See Aronson, J.D. (2018). Computer Vision and Machine Learning for Human Rights Video Analysis: Case Studies, Possibilities, Concerns, and Limitations. Law and Social Inquiry, 43(4), 1188-1209.

Jonathan W. Hak KC PhD

Artificial Intelligence and Images: Investigative Tool or Evidence?

Jonathan W. Hak KC

Artificial Intelligence and Images: Investigative Tool or Evidence?

Jonathan W. Hak KC

Share this: