If you have followed the last developement in C/C++ static analysis
tools you must have heard of PVS-Studio. I heard of them through the
articles they publish on their site where they analyze open source
projects. They have analyzed quite big projects including the Linux
kernel, Qt, Unreal, … and they have always managed to find crazy
bugs that have been siting there for some time, undetected. Typos, bad
copy-paste, undefined behaviours, non-sense code, syntax errors that
miraculously still compile… As John Carmack said:
Everything that is syntactically legal that the compiler will accept
will eventually wind up in your codebase.
Unfortunately, the tool is advertized as Windows-only. The program
comes in the form of a Visual Studio plugin or a separate independent
program if you don't have the former. I first used it back in
2014 on a relatively large C++ codebase used internally in the
computer graphics department of my university in Lyon (LIRIS). We were
using Visual Studio (which I normally rarely use) so I thought I should
give it a try. I was really pleased with the results and kept checking
the PVS-Studio website for more articles.
Two years and several PVS-Studio articles later I started working on
Samba. The whole project is about 2 millions lines of C code and I
thought it would be a good candidate for PVS-Studio. A static analysis
tool shouldn't have too much platform-specific code so I started
thinking about it. The analyzer works on preprocessed code so it needs
to run the preprocessor on your sources and for that it needs all your
preprocessor flags, macros and includes path. Gathering this
automatically can be painful. For this step I wrote a strace-based
script that "spies" your build tool for compiler calls, that way it
should be build-tool agnostic. You can find the latest version of this
tool on github.
I sent the script to the PVS-Studio guys and after some back and
forth, I was given an experimental Linux build of PVS-Studio (thanks
again!). The script now covers all the analyzing process from
gathering compiler flags, to analyzing, displaying and filtering
the results.
Here's how you use it.
In order to not have to point to the license and binary at every use
you can set up env variables.
$ export PVS_LICENSE=~/prog/pvs/PVS-Studio.lic
$ export PVS_BIN=~/prog/pvs/PVS-Studio
Go to your project directory and generate a config file for your
C++11 project.
$ pvs-tool genconf -l C++11 pvs.cfg
If you need to configure the build before building, do it. Then trace
the actual build (your build command should go after the --
).
$ pvs-tool trace -- make -j8
This will output a "strace_out" file which have all the information we
need. The analyze step will process that file to extract all
compilation units and preprocessor flags, and run PVS-Studio on it.
$ pvs-tool analyze pvs.cfg
pvs-tool: deleting existing log pvs.log...
001/061 [ 0%] analyzing /hom../rtags/src/ClangIndexer.cpp...
002/061 [ 1%] analyzing /hom../rtags/src/CompilerManager.cpp...
003/061 [ 3%] analyzing /hom../rtags/src/CompletionThread.cpp...
004/061 [ 4%] analyzing /hom../rtags/src/DependenciesJob.cpp...
<...>
061/061 [98%] analyzing /hom../rtags/src/rp.cpp...
pvs-tool: analysis finished
pvs-tool: cleaning output...
pvs-tool: done (2M -> 0M)
The cleaning part removes duplicated lines and will drastically reduce
the file size of big results.
You can now view the results, grouped by files
$ pvs-tool view pvs.log
The output is similar to gcc/make so it works as-is in e.g. the Emacs
editor and I can use my usual builtin goto-error functions. You can
disable diagnostics e.g.
$ pvs-tool view -d V2006,V2008 pvs.log
By default it only shows level 1 errors but you can change it with -l.
You can look at the -h help messsage for more.
PVS-Studio found many problems in Samba. Most of them were false positives
but this is expected when you use any static analysis tool on large
codebase. The important thing is it also found real bugs. I'm going to
share the most interesting ones along with their fix, in the form of
diffs.
- if (memcmp(u0, _u0, sizeof(u0) != 0)) {
+ if (memcmp(u0, _u0, sizeof(*u0)) != 0) {
printf("USER_MODALS_INFO_0 struct has changed!!!!\n");
return -1;
}
Here, the closing parenthesis was misplaced. The result of the
sizeof
comparaison was used as the compared memory size (always 1
byte). Also, we want the size of the type u0
points to, not the size
of the pointer.
handle_main_input(regedit, key);
update_panels();
doupdate();
- } while (key != 'q' || key == 'Q');
+ } while (key != 'q' && key != 'Q');
Here, we want to exit the loop on any case of the letter 'q'.
uid = request->data.auth.uid;
- if (uid < 0) {
+ if (uid == (uid_t)-1) {
DEBUG(1,("invalid uid: '%u'\n", (unsigned int)uid));
return -1;
}
Here we tested the uid_t
type for negative values.
The sign of the uid_t
type is left unspecified by POSIX. It's defined as
an unsigned 32b int on Linux, therefore the < 0
check is always
false.
For unsigned version of uid_t
, in the comparaison uid == -1
the
compiler will implicitely cast -1 to unsigned making it a valid test
for both signed and unsigned version of uid_t
. I've made the cast
explicit because less magic is better in this case.
DEBUG(4,("smb_pam_auth: PAM: Authenticate User: %s\n", user));
- pam_error = pam_authenticate(pamh, PAM_SILENT | allow_null_passwords ? 0 : PAM_DISALLOW_NULL_AUTHTOK);
+ pam_error = pam_authenticate(pamh, PAM_SILENT | (allow_null_passwords ? 0 : PAM_DISALLOW_NULL_AUTHTOK));
switch( pam_error ){
case PAM_AUTH_ERR:
DEBUG(2, ("smb_pam_auth: PAM: Authentication Error for user %s\n", user));
Simple operator priority error.
gensec_init();
dump_args();
- if (check_arg_numeric("ibs") == 0 || check_arg_numeric("ibs") == 0) {
+ if (check_arg_numeric("ibs") == 0 || check_arg_numeric("obs") == 0) {
fprintf(stderr, "%s: block sizes must be greater that zero\n",
PROGNAME);
exit(SYNTAX_EXIT_CODE);
Here the test was doing the same thing twice.
if (!gss_oid_equal(&name1->gn_type, &name2->gn_type)) {
*name_equal = 0;
} else if (name1->gn_value.length != name2->gn_value.length ||
- memcmp(name1->gn_value.value, name1->gn_value.value,
+ memcmp(name1->gn_value.value, name2->gn_value.value,
name1->gn_value.length)) {
*name_equal = 0;
}
Here memcmp
was called with the same pointer, thus comparing the
same region of memory with itself.
ioctl_arg.fd = src_fd;
ioctl_arg.transid = 0;
ioctl_arg.flags = (rw == false) ? BTRFS_SUBVOL_RDONLY : 0;
- memset(ioctl_arg.unused, 0, ARRAY_SIZE(ioctl_arg.unused));
+ memset(ioctl_arg.unused, 0, sizeof(ioctl_arg.unused));
len = strlcpy(ioctl_arg.name, dest_subvolume,
ARRAY_SIZE(ioctl_arg.name));
if (len >= ARRAY_SIZE(ioctl_arg.name)) {
Here memset
was given the size as a number of elements instead of a byte size.
if (n + IDR_BITS < 31 &&
- ((id & ~(~0 << MAX_ID_SHIFT)) >> (n + IDR_BITS))) {
+ ((id & ~(~0U << MAX_ID_SHIFT)) >> (n + IDR_BITS))) {
return NULL;
}
Using negative values on the left-side of a left-shift operation is an
Undefined Behaviour in C.
if (cli_api(cli,
param, sizeof(param), 1024, /* Param, length, maxlen */
- data, soffset, sizeof(data), /* data, length, maxlen */
+ data, soffset, data_size, /* data, length, maxlen */
&rparam, &rprcnt, /* return params, length */
&rdata, &rdrcnt)) /* return data, length */
{
Here data
used to be a stack allocated array but was changed to a heap
allocated buffer without updating the sizeof
use.
goto query;
}
- if ((p->auth.auth_type != DCERPC_AUTH_TYPE_NTLMSSP) ||
- (p->auth.auth_type != DCERPC_AUTH_TYPE_KRB5) ||
- (p->auth.auth_type != DCERPC_AUTH_TYPE_SPNEGO)) {
+ if (!((p->auth.auth_type == DCERPC_AUTH_TYPE_NTLMSSP) ||
+ (p->auth.auth_type == DCERPC_AUTH_TYPE_KRB5) ||
+ (p->auth.auth_type == DCERPC_AUTH_TYPE_SPNEGO))) {
return NT_STATUS_ACCESS_DENIED;
}
Prior to this fix, the condition was always true and the function
always returned "access denied".
- Py_RETURN_NONE;
talloc_free(frame);
+ Py_RETURN_NONE;
}
Py_RETURN_NONE
is a macro that hides a return statement. In this
python binding many functions were returning before freeing heap
allocated memory. This problem was present in dozens of functions.
int i;
- for (i=0;ARRAY_SIZE(results);i++) {
+ for (i=0;i<ARRAY_SIZE(results);i++) {
if (results[i].res == res) return results[i].name;
}
return "*";
Here the for condition was always true.
int create_unlink_tmp(const char *dir)
{
+ if (!dir) {
+ dir = tmpdir();
+ }
+
size_t len = strlen(dir);
char fname[len+25];
int fd;
mode_t mask;
- if (!dir) {
- dir = tmpdir();
- }
-
Here the dir
pointer was used before the null-check.
Overall I'm really pleased with PVS-Studio and I would recommend
it. Unfortunately it's not officially available on Linux. Although you
can just contact them if you're interested it seems :)