Forensic Data Science |
||
Calibration and Validation of Forensic-Evaluation Systems |
Selected Publications
- Morrison G.S. (2021). In the context of forensic casework, are there meaningful metrics of the degree of calibration? Forensic Science International: Synergy, 3, article 100157.
https://doi.org/10.1016/j.fsisyn.2021.100157
- Morrison G.S., Enzinger E., Hughes V., Jessen M., Meuwly D., Neumann C., Planting S., Thompson W.C., van der Vloed D., Ypma R.J.F., Zhang C., Anonymous A., Anonymous B. (2021). Consensus on validation of forensic voice comparison. Science & Justice, 61, 229–309.
https://doi.org/10.1016/j.scijus.2021.02.002
- Morrison G.S., Neumann C., Geoghegan P.H., Edmond G., Grant T., Ostrum R.B., Roberts P., Saks M., Syndercombe Court D., Thompson W.C., Zabell S. (2021). Reply to Response to Vacuous standards – subversion of the OSAC standards-development process. Forensic Science International: Synergy, 3, article 100149.
https://doi.org/10.1016/j.fsisyn.2021.100149
- Morrison G.S., Neumann C., Geoghegan P.H. (2020). Vacuous standards – subversion of the OSAC standards-development process. Forensic Science International: Synergy, 2, 206–209.
https://doi.org/10.1016/j.fsisyn.2020.06.005
- Morrison G.S., Ballantyne K., Geoghegan P.H. (2018). A response to Marquis et al (2017) What is the error margin of your signature analysis? Forensic Science International, 287, e11–e12.
https://doi.org/10.1016/j.forsciint.2018.03.009
- Morrison G.S., Poh N. (2018). Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios / Bayes factors. Science & Justice, 58, 200–218.
http://dx.doi.org/10.1016/j.scijus.2017.12.005
- Morrison G.S., Enzinger E. (2018). Score based procedures for the calculation of forensic likelihood ratios – Scores should take account of both similarity and typicality. Science & Justice, 58, 47–58.
http://dx.doi.org/10.1016/j.scijus.2017.06.005
Consensus on validation of forensic-comparison systems in the context of casework– European Network of Forensic Science Institutes’ 2nd AFORE Webinar on The Validation of Analytical Methods in Forensic Science
Recording of presentation by Geoffrey Stewart Morrison originally live-streamed 2021-12-01
- Over a series of rounds of drafting and meetings in 2019–2020, a group of authors developed a consensus on validation of forensic voice comparison. Group members included individuals who had knowledge and experience of validating forensic-voice-comparison systems in research and/or casework contexts, and individuals who had actually presented validation results to courts. They also included individuals who could bring a legal perspective on these matters, and individuals with knowledge and experience of validation in forensic science more broadly. Although the scope was forensic voice comparison, with minor wording changes the resulting statement of consensus would be applicable to validating source-comparison systems in any branch of forensic science. The scope was validation for the purpose of demonstrating whether, in the context of specific cases, forensic-comparison systems that output likelihood ratios are (or are not) good enough for their output to be used in court. In this presentation, I provide an overview of the statement of consensus and underlying concepts. I also discuss my reflections on broader issues related to validation and standard/guidelines.
Symposium on Calibration in Forensic Science– Forensic Data Science Laboratory, Aston University
Recordings of symposium originally live-streamed 2021-06-03
- Abstract
In the first decade of the 2000s, procedures and statistical models were developed for calibrating the likelihood-ratio output of automatic-speaker-recognition systems. These procedures and models were quickly adopted for calibrating the likelihood-ratio output of human-supervised-automatic forensic-voice-comparison systems. Since at least the early 2010s, recommendations have been made to use the same calibration procedures and models in other branches of forensic science. Interest in doing this is now growing. Published examples can be found in the context of multiple branches of forensic science, including fingerprints, DNA, mRNA, glass fragments, and mobile telephone colocation. There are also published examples of the use of these procedures and models to calibrate human judgements. The 2021 Consensus on validation of forensic voice comparison and the Forensic Science Regulator of England & Wales’s 2021 Development of evaluative opinions both recommend/require the use of calibration.
This symposium brings together some of the leading researchers in the calibration of the likelihood-ratio output of automatic-speaker-recognition systems and of forensic-evaluation systems. They explain what calibration is and why it is important. They present algorithms used for calibrating likelihood-ratio systems, and metrics used for assessing the degree of calibration of likelihood-ratio systems. They discuss aspects of calibration on which there is consensus, aspects on which there is disagreement, and aspects requiring additional research. They also discuss how to encourage wider adoption of calibration of likelihood-ratio systems in forensic practice.
- Introduction
Roberto Puch-Solis
Forensic Data Science Laboratory, Aston University
Slides
https://www.youtube.com/watch?v=Z9EpvowD7Zc
- Calibration in forensic science
Geoffrey Stewart Morrison
Forensic Data Science Laboratory, Aston University
In the first decade of the 2000s, procedures and statistical models were developed for calibrating the likelihood-ratio output of automatic-speaker-recognition systems. These calibration procedures and models were quickly adopted for calibrating the likelihood-ratio output of human-supervised-automatic forensic-voice-comparison systems. They were adopted in both research and casework. The 2021 Consensus on validation of forensic voice comparison recommended that “In order for the forensic-voice-comparison system to answer the specific question formed by the propositions in the case, the output of the system should be well calibrated” and that “forensic-voice-comparison system should be calibrated using a statistical model that forms the final stage of the system”. Since at least the early 2010s, recommendations have been made to use the same calibration procedures and models in other branches of forensic science. Interest in doing this is now growing. Published examples can be found in the context of multiple branches of forensic science, including fingerprints, DNA, mRNA, glass fragments, and mobile telephone colocation. There are also published examples of the use of these procedures and models to calibrate human judgements. In this presentation I answer the questions: What is calibration? Why is it important? and How is it performed? I also discuss how this approach to calibration relates to the calibration requirements in the Forensic Science Regulator of England & Wales’s 2021 appendix to the Codes of Practice and Conduct: Development of evaluative opinions.
Slides
https://youtu.be/Z9EpvowD7Zc?t=371
- Calibration in automatic speaker recognition
Luciana Ferrer
Instituto de Ciencias de la Computación, Universidad de Buenos Aires – CONICET
Most modern speaker verification systems produce uncalibrated scores at their output. Although these scores contain valuable information to separate same-speaker from different-speaker trials, their values cannot be interpreted in absolute terms – they can only be interpreted in relative terms. A calibration stage is usually applied to convert scores to useful absolute measures that can be interpreted, and that can be reliably thresholded to make decisions. In this presentation, I review the definition of calibration and explain its relationship with Bayes decision theory. I then present ways to measure quality of calibration, discuss when and why we should care about it, and show different methods that can be used to fix calibration when necessary.
Slides
https://www.youtube.com/watch?v=KRLNBk4Q-dA
- Calibration in forensic voice comparison
Daniel Ramos
AUDIAS Lab, Universidad Autónoma de Madrid
In this presentation, I describe the role of calibration in forensic voice comparison, focusing on the use of automatic systems in a Bayesian decision framework. I describe computation of calibrated likelihood ratios in the context of scenarios and recording conditions typically encountered in forensic casework. I present algorithms commonly used for calibration. I also discuss the importance of calibration in the process of validating forensic-voice-comparison systems, and discuss recommendations and guidelines published by the European Network of Forensic Science Institutes (ENFSI).
Slides
https://www.youtube.com/watch?v=xDAGZMFlxdo
- Measuring calibration of likelihood-ratio systems
Peter Vergeer
Netherlands Forensic Institute
In this presentation, I explain the concepts of what constitutes well-calibrated probabilities and well-calibrated likelihood ratios. I briefly describe graphical representations for assessing degree of calibration. I then focus on several metrics designed to assess degree of calibration, and present the results of a study comparing the performance of different metrics. Three metrics are taken from the existing literature, and one is a novel metric. One existing metric is based on the expected value of different-source likelihood-ratio values and the expected value of the inverse of same-source likelihood-ratio values (after Good, 1985), another is based on the proportion of different-source likelihood ratios above 2 and the proportion of same-source likelihood ratios below 0.5 (after Royall, 1997), and the third is Cllrcal (Brümmer & du Preez, 2006). The novel metric is devPAV (Vergeer et al., 2021).
Slides
https://www.youtube.com/watch?v=LQEif1XP3-o
- Panel Discussion
Moderator: Rolf J.F. Ypma
Principal Scientist, Netherlands Forensic Institute
Forensic Data Science Laboratory, Aston University
The presenters discuss aspects of calibration on which there is consensus, aspects on which there is disagreement, and aspects requiring additional research. They will also discuss how to encourage wider adoption of calibration of likelihood-ratio systems in forensic practice.
https://www.youtube.com/watch?v=y5v0HVRmbKw
http://forensic-data-science.net/calibration-and-validation/
This webpage is maintained by Geoffrey Stewart Morrison and was last updated 2022-01-23