The WildList Organization International
How Scientific Naming Works
Joe Wells, President & CEO
There's a weed growing in my back yard. I call it Lamb's Quarters.
It's a very common weed, and is also known as Goose Foot, Pig Weed, Sow Bane, and a few other things. I don't know what it's called in South Africa, Hungary, India, or Taiwan, but I'm sure it has lots and lots of names.
The weed also has a "correct" scientific name. Somewhere out there, in some museum or university, there is a sample of this weed with a tag tied to its toe that has chenopodium album written on it. That sample is the "official" reference sample. This is true for all species known to modern biology, because the science of biology uses a sample-based system of naming. Anyone doing biological research has access to that reference sample to verify the species.
I'm not a biological researcher. I'm just an end user. I yank the weeds and mow the lawn. I call it Lambs Quarters and don't care what it's called in Hungary.
However, in scientific, biological research, precise sample identification can be very important. For example, Retrovirus.Ebola.Zaire has an 80 percent chance of killing you horribly, but Retrovirus.Ebola.Reston (which looks identical) might give you a runny-nose.
Imagine you work for the Center for Disease Control. One day someone at the World Health Organization calls and says "We have an outbreak of Ebola. Send us a sample." I hope you send the right one. I hope you ask exactly which Ebola sample they needed and send the correct one.
In the field of computer viruses, getting the correct sample of a virus that is spreading on users' systems is also important. But, unlike the life sciences, scientific, virus sample naming is a major problem.
A well-known incident comes to mind in which a popular antivirus program happened to detect two viruses by the same name. One virus was merely a minor nuisance while the other was destructive. Unfortunately, the name used for both was the name commonly used by antivirus products for just the more-benign virus. When the company released a warning about the dangerous virus, but used the common name for the benign one, much confusion followed.
Many other examples could be cited, but the point is that sample identification is made much easier where there is a scientific, sample-based, naming standard like the one used in the biological sciences.
Naming Problems in the WildList
For some time now, much effort has gone into giving the WildList a scientific, sample-based foundation. The goal has been to provide a reference sample for each in-the-wild virus to qualified persons who have a valid need (i.e. product developers who have users to protect). The idea is to get a sample from the reporter, replicate it, identify it, and give samples to those with a valid need.
This would appear to be a fairly simple task. Instead it has been something more akin to a nightmare. While there continues to be some problem in getting samples from the participants, the real problem lies in identification. Specifically, in naming the sample.
I recently realized that this is because of a major blunder on my part. The mistake is not in the method of sample identification. It is in my presumed basis for sample naming. From the beginning, the WildList has a column with the heading 'CARO Name of Virus'. That phrase is the basis of the problem.
There is no CARO virus collection. Therefore, CARO naming is not scientifically, sample-based. That is, there is no single reference collection that any qualified researcher can access and verify a species by looking at the tag on its toe.
Some may object to this and point out that, while there is no single CARO collection, different CARO members maintain their own reference collections and that names and samples can indeed be matched. However, In real-world terms such an approach still falls short of a universal, scientific, sample-based naming standard. Why?
It cannot be a true standard because there are many trustworthy, bona-fide, antivirus product developers who have no access to any of these CARO-member collection. Therefore, as a sample-based system, CARO naming is, at best, a CARO-centric standard and cannot be represented as an industry-wide standard. And while CARO naming is CARO-centric, the WildList is not. The WildList reports for the antivirus industry.
In light of this, trying to conform an industry-wide reporting mechanism to such a limited naming system was wrong. To infer in the WildList that CARO names were the "correct" names was wrong.
Indeed, with all the divergence in virus naming in the industry, it would be baseless presumption for any one researcher, developer, or company (or group of researchers, developers, or companies) to claim their virus names are 'correct' and others are not.
The WildList is an antivirus product. (Actually, just free monthly updates.) But unlike other products, it's produced cooperatively by many competitors. From its beginning, the virus name column in the WildList should have simply said 'WildList virus name' or just 'virus name.' So that, like any other antivirus product, the WildList name for any virus is just one of (sometimes) many names.
Are some products' virus names wrong? Is Scan wrong? Is NAV wrong? Is the WildList wrong? No. None are wrong. They're just different. A sad state of affairs, perhaps, but one we've all lived with peacefully for some years now (well, except for those poor people in tech-support).
In the absence of a universally available, comprehensive, scientific, reference-collection-sample-based (and over-hyphenated) system of computer virus naming, the WildList needs only be a list of sample-based toe tags. A call-it-what-you-will-this-virus-is-in-the-wild list. WildList names cannot be expected to be more 'correct' than any other antivirus product.
In the antivirus world, there is no 'chenopodium album' reference sample. So Lambs Quarters is as 'correct' as Goose Foot. 'It may not have a scientific name, but it's sure strangling my lawn!'
Less Focus on Naming Issues
Fortunately, even in the absence of a universal sample-based virus naming system, identification does not depend on any "correct" naming scheme visible only from some illusory ivory tower.
Assume there is a new virus. F-Prot calls it foo.a, IBMAV called it FBAR, scan calls it foobar.d, and Findvirus called it foo.mp.b. Who's right? No one is. Different doesn't mean "wrong." The bottom line is that it's just one virus, regardless of what they call it.
Now assume they all report it to the WildList and provide a sample. It is the responsibility of the WildList Organization to spot this as a single virus, but it is also our job to tie a tag on its toe. Say it's a new virus. Scanners don't detect it yet. The ones that do have different names. Ian Whalley hasn't added it to VGrep yet. We've said there is no "correct" name, so what do we write on the tag?
Bear in mind three things: First, whatever name appears in the WildList, it cannot be considered the "correct" name. Second, it doesn't matter what the virus is called as long as you can hold up a disk and say, "This virus is in the wild." Third (and by far the most important) the primary purpose of the WildList is to benefit end-users.
From the perspective of the end-user, whatever virus name flashes on the screen is correct. They have that virus. Now assume product A has 40 percent of the antivirus market and product B has 2 percent. A lot more users will see product A's name for the virus. A's name is more correct to more users. And if well-known products C and D also use A's name, that name is preferable to B's regardless of any naming standard. From a user's perspective, B's name is less correct.
As a real-world example, the WildList used to use the old CARO name Stoned.Michelangelo.A. But people would look in the M's and I got lots of complaints that Michelangelo wasn't on the list. (In fact, people still look in the C's and ask me why Concept isn't on the list.)
In the past CARO naming has actually been mostly a guideline in producing the WildList. The truth is that, even though the CARO name column was still there, when Shane Coursen and I had been doing the WildList, we'd often used more of a majority approach. We'd check the identified virus sample using VGrep and use the name most used by different scanners. Also, we have often stuck to the name given a virus by the person who first reported it, especially when there is little agreement.
It's not that sample-based naming doesn't exist. It exists all over the place. All scanners are (hopefully) based on a virus collection and have a name for each sample (with some degree of obvious variation). Some even try to reflect the CARO names precisely. So, why not use one scanner to name viruses?
Since even CARO members' scanners vary widely in naming, we have specifically chosen not to use any single scanner in naming. Using one scanner would be, not only an extremely unreliable way of identifying and naming, but (more importantly) using one scanner could easily be construed as some kind of a product endorsement. (I can see the boxes rolling out: 'The only antivirus system that the WildList Organization International uses.')
In light of the foregoing, it should be clear that it is not the job of the WildList Organization International to attempt the impossible: To name viruses 'correctly' or 'authoritatively.' If you view the WildList as an antivirus, it can be no more 'correct' in its virus naming than any other antivirus product.
The primary purpose of the WildList is to report exactly which viruses are spreading in the wild, to collect samples of those viruses, and to provide the viruses to bona fide antivirus researchers and developers who need to have them in order to protect end users.
Naming the viruses 'correctly' is (aside from being impossible) unimportant to accomplishing this purpose and attempting to do so is actually detrimental to the process (i.e. a waste of time trying to conform and re-conform names to anyone's pet standard).
The main goal of efficiently identifying and delivering samples, and the minor goal of accurately naming each sample, often collide head on. The reason they collide is that naming so often slows the whole process to a crawl. This is because naming is not standardized. Naming cannot be accurate.
Since naming can be shown to be inexact, naming involves only a perceived (not actual) accuracy. So we must choose between an important factor, efficiency, and a time-wasting factor, pseudo-accuracy. True accuracy is in the identification, not the naming, of which viruses are in the wild. Efficiency and accuracy must work together in the sample identification, not in what we write on the toe tag.
Our focus then must be less on naming issues and more on accuracy of sample collation and identification.
Official WildList Position
There are no "correct" virus names. We use one name to designate each virus. The WildList name should not be considered the "correct" name in the sense that other names are wrong. We do not "endorse" any product or organization's naming convention. Each name in the WildList will represent a specific virus sample. Each sample virus is in the wild. Developers should replicate it and add it to their product. We gave it a name, but we don't care what the developer calls it. We just care that the developer protects their users.
What is our exact procedure? For naming the samples, we currently try to use the CARO name if it can be quickly verified. However, due to the fact that CARO naming has no independent, scientific, reference sample basis, this is often not possible within our time constraints. Where it is not expeditious to pursue a CARO name, we try to use a majority-naming scheme based on what most products call the virus. If there is little agreement among products or if the virus is not detected by most products we use the name that the person who first reported the virus used. Moreover, names in any WildList may vary previous lists where a stronger naming basis exists (e.g. more products detect it, or products conform their names).
Thus the WildList name may or may not reflect the CARO name. And whether it does or not, the WildList name cannot be considered the officially "correct" name.
The WildList Organization International is a non-profit, public interest, scientific research organization. Its primary purpose is to identify, track, and report the computer virus threat.