среда, 4 января 2012 г.

PVS-Studio – the Greatest Trial of the 21st Century

The Installation

John Carmack says, Visual C++ developers should try PVS-Studio – painless demo download. Yes, the demo download is indeed painless and clicking through makes the program just install. Well, not exactly – first you have to close all instances of Visual Studio you have running, but then it's a simple click-through. And then you get the reduced functionality.

The Trial Limitations

The functionality is not reduced as in “only the first episode of Quake until you pay, Sir”, it is done in much more clever way – instead the analyzer output is slightly garbled.

While the analyzer scans through code it emits messages like “in file X.cpp on line Y there's this odd thing”. Some messages will look like that, but some will read “file X.cpp [AND I WON'T TELL YOU WHICH LINE – TRIAL RESTRICTION]” which looks kind of silly but is in fact a very good greediness-usability tradeoff.

Later versions added a bonus layer of greediness – messages that do contain the line numbers ungarbled can be double-clicked and that will open the problematic file in the VS IDE editor and scroll right to problematic line. Now with an extra layer of greediness once you double-click you're presented a modal dialog with a progress bar that runs for about 15 seconds and until it finishes you're not allowed to the code.

That's a very cleverly engineered greediness-usability tradeoff. You can't ask 3,5K euros for a piece of software without showing it first and you can't allow a full-blown version to be used right off the shelf without a proof of purchase. With these limitations the program is still mostly usable, but its commercial use is effectively prevented. Think how you run analysis during a daily build and it says you have “a problem in file X.cpp at [DON'T TELL YOU WHICH LINE]” and you need to hire Hercule Poirot just to deduce where the warning belongs to because the file is 3 thousand lines long and you even don't know whether the warning reports an actual problem.

Now we get to actual (trial) use.

The Good

First of all, the amount of really weird subtle stuff the program can find in real code is amazing.

It will look through a mumbo-jumbo of some bitwise “or” of a dozen Win32 API flags and note that SHITTY_FLAG_PROHIBIT_DIRECTORIES is used twice in that bitwise “or”. That's not a kind of a problem a human can reliably find, but software does that just fine.

It will look at some very old code and see that you try to “delete” a smart pointer. Who would “delete” a smart pointer in the first place? Well, your code will, because that was a raw pointer before refactoring and all but one occurrence were edited during refactoring and the problem wouldn't manifest itself because that code actually ran once a year when a runtime error occurred during a full moon and when that unlikely combination of event took place the program would crash nastily but an affected user wouldn't be able to reliably reproduce it and so wouldn't be able to file a useful report but remembered that ur program crashed on him a couple of times.

It will note that you have two enumerations and a switch where the expression to switch on is using members of one enumeration but the values in case labels are from the other enumeration. That code would work for years until you altered one enumeration but failed to alter the other.

It will find numerous other very weird pieces of code that are completely legal C++, but for whatever reason don't make sense and often constitute and error. Look into “General Analysis” section in the online manual – the list is pretty impressive and most of those problems will indeed reside and stay dormant in commercial software that has been shipped for years.

Once analysis is complete it's not uncommon to think “How the F would this program be shipped with that many defects?”

The authors will often present such examples and use them as proof of the tool being essential as they claim is finds crap in code at very low cost. This is a very bold claim and needs careful verification.

Have you noted that this post has been very excited and optimistic so far? Well, let's go to the dark side.

And Then It Goes Wrong

The key to any automated analysis tool is that it should fail as little as possible. This means exactly the following. Suppose the program can detect cases where you have a long bitwise “or” and two components of that “or” are the same.

The program

1. should emit a relevant warning for every occurrence of such case and
2. it should not emit that warning anywhere else.

The truth is PVS-Studio is a program and like every usable program it contains its fare share of bugs. Yes, the program for finding bugs can and often will contain bugs.

A short digression needed here. Compilers also contain bugs (an lost of and they are nasty) and that leads to compilation unexpectedly failing or the emitted program code behaving not conforming to the language Standard. If you don't realize that – get out of the industry and go to a local McDonalds outlet right now – they often have “help needed” signs on display. Digression ends here.

So PVS-Studio contains bugs. Those bugs sometimes lead to warnings not being emitted. Like

1. you debug or review your program for an hour and
2. see that there's a bug caused by a situation PVS-Studio documentation lists a warning for but
3. PVS-Studio will not emit a warning when presented that code.

That's real world where all programs have bugs.

This area (no warning where a warning should be) is quite problematic – to estimate how many warnings PVS-Studio fails to emit one would have to somehow analyze the code himself and that's incredibly time-consuming and sometimes just impossible for any large codebase.

Again, the authors are not to be blamed here. They fix many bugs being reported right away and all users (trial users included) need to report those bugs promptly.

Currently the weakest link is that templates are not fully supported, so warnings are not always emitted for suspicious pieces of code if those pieces are inside a template. This is not a minor problem.

Non-templated code is final – you can gather a dozen senior dev wise owls and together read through the code and conclude that it is okay. Templated code is not final – it is not actual code until you fully parameterize it. And btw templates can be parameterized with other templates. So you can have a five-layer templated apple pie (like a vector (template class) storing some smart pointers (also template class) and using some custom allocator (also template class) with something else templated as well) and some really problematic joint between the layers that could lead to a singularity developing into a black hole – very hard to diagnose with wise owls, this is where an automated tool would be of great help. So not having full templates support is not a minor problem at all.

Also bugs in PVS-Studio will sometimes lead to a warning being emitted where there's no problem even formally – the program will look at something and say “you have function parameter passed by copy” where the parameter is in fact passed by reference or something equally irrelevant. Such bugs are usually fixed by the authors very fast and users should of course report such bugs promptly. This happens quite rarely and is not that of a problem.

And Then No-one Knows Whether It Went Wrong

And finally after “warning not emitted where they should be” and “warning emitted where it definitely shouldn't be” there's a giant grey area where code looks suspicious and it's impossible to say whether it contains a problem without further analysis. For example, 4 is number of bytes in a 32-bit “int” and 32 is number of bits in a 32-bit “int” type. So when you use either of the numbers it might be that you manipulate a 32-bit number byte-wise or bit-wise and then your code is unportable – technically it's impossible to know unless you analyze the surrounding code.

Such grey area warnings are emitted very often for different portability cases (the “Viva64” warnings group). The program definitely can do much better.

For example, there will be a switch with different numbers being returned from different case labels – that can be used for computing some weight coefficients for each element depending on the element's state. Like “for new elements return 1, for partially prepared return 2, for super prepared return 3, etc.” Now if there're numbers 4 and 32 in that sequence – a warning is emitted about “dangerous magic number used”.

Wow, The Mighty Program, surely when numbers 1 through 32 are used in a uniform way only numbers 4 and 32 are dangerous and can be used by Chuck Norris only, but other numbers are not dangerous and can be used by anyone. People only get screwed by abusing 4 and 32 and never by abusing 13765 – Oprah tells that in every show.

This can be improved – the program could identify this and other reasonable use-cases and not emit a warning for them. This is only one example, but there're many of those and they all can be improved. The program contains a really impressive copy-paste detection technology and this technology can be used to detect legitimate cases in the grey area.

To be fair, PVS-Studio has a not that high rate of false warnings – try Visual C++ /analyze that emits a warning separately for each time a header is included into the translation unit – that's what's called “barely usable”. Of course, comparing well to a nearly impossible to use tool doesn't automatically make PVS-Studio brilliant – both have to improve.

And the Evil Warnings Suppression

You'd perhaps object that those grey area warnings are inevitable and that's what warning suppression is for. If you really believe in that – the nearest McDonalds outlet is preparing a “help needed” sign for you right now.

I should have regressed to ALL CAPS here, because it's the single most important thing in the whole post.

Warning suppression is to be used as the last resort only, not as a casual thing. The reason is it damages your code.

Suppose you have an imaginary warning V999 emitted on some line where there's actually no problem and you decide to suppress it. You have to add a “//-V999” comment on that line. Done.

Now you're screwed.

Whenever you put a “//-V999” warning suppression comment onto a line of code that V999 warning is no longer emitted on that line no matter what. Sooo....

Each time you edit that line of code you have to reevaluate whether the same warning is not emitted for some other reason. You have to drop the comment, re-verify the code, likely put the comment back. Good luck if you have more than one warning emitted on the same line. You won't die, but your life is no luxury anymore.

This means that you have to document precisely what the warning was about. Otherwise with the next analyzer upgrade it may happen that the warning is no longer emitted for the original code subpiece (“warning where everything is okay” type fixed in the analyzer), but is now emitted for some other unrelated problem (remember, weird stuff hides in codebases for ages). If you just look and ensure that “okay, only V999 is emitted as before” and place the comment back you effectively suppress another occurrence of the warning.  Good luck with that too. Again you won't die. Or maybe you will.

And of course if there's a line that contain a problematic thing that should be diagnosed with V999 and also a thing that triggered V999 where the code was okay and you suppressed V999 for that line and now the analyzer improved and can detect V999 where it should there still be no warning.

So effectively you have to reevaluate all suppressed warnings after each analyzer upgrade so that you don't miss that billion dollars bug being reported.

That's all for the technical part. The program is indeed very technically advanced – it detects really stupid things in real code where you least expected them – both in junior developers crappy code and in senior developers well-tested and long-shipping code. As Carmack says, you will find bugs.

Finally Computers Do What They Are Good At

It's worth noting that a lot of  defects in almost every real codebase is due to copy-paste. The same operand used twice alongside and operation in a long expression. The same code in both “if” and “else”. Different functions implemented identically. This actually happens in actual code shipping for years.

Detecting such stuff is what computers can be very good at. “Dumb” robots just browse code and find patterns very reliably (unless there's a bug – see above) – not something a human can do at reasonable speed and with reasonable reliability.

This reminds of good old days if Windows 95.

If you're old enough to actually have used floppy disks (rectangular things storing 1.44 decimal megabytes) you might have also heard of ancient people using modems for connecting their computers to the internet (btw back then it was considered right to start the word “internet” with capital “I”). So Windows 95 contained an API and a minimalistic interface for that – you had to enter the ISP modem pool phone number, the username, the password, click “Connect” and wait a bit.

And then... If there was someone talking over the same line (typical if you share the place to live) or if the was no dial tone (typical on rather old phone lines) or if the ISP number was busy (typical on peak load hours) you would be display an error message and the connection would fail. And if a connection was established and then closed for whatever reason then again an error message would be displayed right in your face.

And the minimalistic interface would not make a slightest move to do something. A human would just re-dial, but the program showed the error message and stopped.

This could not last long. Numerous programs replacing that interface and reusing the built-in API emerged. They would (optionally) re-dial if the line was busy, they would (optionally) re-dial if the connection was lost, they would try several numbers in turn (the original UI would only have a place for one number). The world was saved.

Why is this long story here? Just because PVS-Studio is not an example of software designed by a brain-dead person as the one described right above. It uses the computer power for searching and pinpointing crap in code, not for making people feel miserable by displaying endless modal error messages. The technology is finally applied right. That's the most important achievement of the authors.

TL;DR; Do I License It Yet?

Now we go to the business applicability part. The truth is PVS-Studio will not find a gazillion of bugs and that might even disappoint you. Don't get in despair too fast.

Also don't fall prey to the multiplication trick. People will say something like this.

Suppose one bug discovered by static analysis would cost you X money if it got to the customers, so it takes N such bugs for the duration of the license for it to pay off and after that it even makes you more efficient.”

Number X is usually quite high and number N is usually quite low, so licensing a static analysis tool looks like a no-brainer.

Hold on.

Brief Trial is a Failed Trial

Try the tool thoroughly (something like analyze a million lines of code over the course of a month at very slow pace – a small portion like 50K lines of code at a time). Estimate each warning to find whether that warning reports a bug and then estimate what it takes to trigger that bug – the amount of bugs that are indeed ever triggered is very low – something like no more than five bugs for a million lines of code unless your code is really crappy. Most of the bugs reported by the program will not manifest themselves in code that has been shipping for as long as three years to hundreds thousands of users and has been used to process millions of input datasets.
Meahwhile the same codebase would contain a much higher ratio of other problems that automatic analysis will hardly diagnose in ten years from now but that would be triggerable and would cause real problems to real users.

So do you have to license a tool that
1. costs a fortune
2. doesn't find each occurrence of a situation it claims to find
3. reports a lot of warnings that need further analysis
4. requires lots of discipline and very developed technical culture?

You have to decide this for yourself. That's what the trial is for.

There's no silver bullet. It's not like you license the program and now magically your code is free of bugs. No. The tool will sometimes spot some problematic code and someone in your team will have to deal with that – maybe fix the code, maybe suppress the warning, maybe write a bug report to the analyzer authors.

Brushing Teeth Twice a Day Sums up to a lot of Time

An inverse of the multiplication trick is the continuous integration trick. To make the most use of the tool you have to use it at all times, not just twice a year.

The program even contains an “incremental analysis” option that runs analysis of modified files after each compilation so that defects are reported as early as possible. This is just great – the defects no longer slip even into the daily build. This also means they are not counted – how would one file a defect report for a piece of code that was fixed even before being committed to version control? This improvement is great by itself, but it's very hard to count how efficient it is and so it also prevents making a fact-based decision of whether the tool is worth the money.

Of course, continuous integration also means that you have to address all the new warnings reported by the tool every build, not twice a year. That's the other side of continuous.

The deeper the analysis goes the more easy it is to dismiss such an expensive tool. Yes, the tool finds crap really well. It is just not enough data to reliably back the claim that the tool indeed finds crap at very low cost.

The best way to think of static code analysis is to compare it to version control. Version control won't work immediately after you install it – you have to teach people how to use is, when to commit and what to commit and how to describe commits and how to properly tag and branch and merge, and once all that is set up, your workflow improves, but estimating how much more efficient you became is not that easy.

Granted there're high quality version control systems under free licenses – not so for static code analysis.

The Outcome

So once again.

The program is just great and will likely become even better as development progresses.

The same program is not oxygen and neither it is water. You decide whether it is actually useful for your business workflow and whether it is worth time and money.