Out of sorts —
“Willoughby-Hoye” scripts used OS call that caused incorrect measurements on Linux, Catalina.
In a paper published October 8, researchers at the University of Hawaii found that a programming error in a set of Python scripts commonly used for computational analysis of chemistry data returned varying results based on which operating system they were run on—throwing doubt on the results of more than 150 published chemistry studies. While trying to analyze results from an experiment involving cyanobacteria, the researchers—Jayanti Bhandari Neupane, Ram Neupane, Yuheng Luo, Wesley Yoshida, Rui Sun, and Philip Williams—discovered significant variations in results run against the same nuclear magnetic resonance spectroscopy (NMR) data.
The scripts, called the “Willoughby-Hoye” scripts after their authors—Patrick Willoughby and Thomas Hoye of the University of Minnesota—were found to return correct results on macOS Mavericks and Windows 10. But on macOS Mojave and Ubuntu, the results were off by nearly a full percent.
The reason for the variation was the scripts’ use of Python’s glob module, which searches for files matching a specific name pattern—the scripts generated a list of input files to read based on the glob results. But the module depends on the operating system for the order in which the files are returned. And the results of the scripts’ calculations are affected by the order in which the files are processed. Rui Sun and Phillip Williams wrote corrected sorting code that fixes the problem, ensuring consistent results.
Patrick Willoughby, now an assistant professor of chemistry at Ripon College, acknowledged the research’s findings and the correction made to the scripts by the University of Hawaii team in a post to Twitter:
Really great find by Rui and Prof. Williams. When I wrote the scripts 6 years ago, the OS was able to handle the sorting. Rui and Williams added the necessary sort code and added a function to ensure the calcs were properly aligned. Kudos!
— Patrick Willoughby (@pat_willoughby) October 8, 2019
Williams told The Register that he believes between 150 and 160 research projects may have been affected by the bug. While the variations the University of Hawaii team found in data didn’t impact the results of their work, it may have had more significant impact on other studies. Williams said he hopes the paper will get scientists to pay more attention to the computational side of experiments in the future.