Skip to content

Fix api documentation inconsistencies#1699

Open
val-ms wants to merge 9 commits intoCisco-Talos:mainfrom
val-ms:fix-API-documentation-inconsistencies
Open

Fix api documentation inconsistencies#1699
val-ms wants to merge 9 commits intoCisco-Talos:mainfrom
val-ms:fix-API-documentation-inconsistencies

Conversation

@val-ms
Copy link
Copy Markdown
Contributor

@val-ms val-ms commented Mar 31, 2026

Summary

Review and fix libclamav/clamav.h API/documentation inconsistencies by
comparing the public header against the prototypes and implementations.

This PR includes both real API/behavior fixes and header comment
corrections found during that audit.

Changes

  • Fix cl_fmap_set_hash() to accept a hash buffer pointer instead of a
    single char.
  • Fix cl_hash_file_fd_ex() so length == 0 hashes to EOF as
    documented.
  • Fix legacy cl_scanfile() and cl_scanfile_callback() wrappers to
    return scanned / CL_COUNT_PRECISION again.
  • Correct clamav.h docs for:
    • cl_cvdverify_ex() signature verification behavior
    • cl_fmap_get_name() return semantics
    • cl_scandesc_ex() / cl_scanfile_ex() cross-references
    • stale cl_fmap_get_file_hash() references
    • cl_scandesc_ex() parameter description
    • cl_fmap_get_path() CL_EACCES description
  • Update NEWS.md for the cl_fmap_set_hash() API fix.

val-ms added 9 commits March 31, 2026 17:09
cl_fmap_set_hash declared the hash input as a single char,
but the implementation passes that value to fmap_set_hash() as if
it were a full hash buffer. That causes the API contract to be
wrong and can lead to invalid memory reads.

Change the public API and implementation to accept a hash buffer
pointer, validate that the pointer is not NULL, and update NEWS.md
to match the corrected signature.
Fix cl_hash_file_fd_ex so a length of zero hashes the rest of
the file as documented. The old loop reduced the read size to
zero immediately and returned the digest of an empty input.

Keep the existing bounded-read path for explicit lengths, but
leave the block size unchanged when length is zero so the
function reads until EOF.
Fix the cl_cvdverify_ex documentation to match the current
implementation. The header said CL_DB_UNSIGNED disables CVD
signature verification, but .cvd files are still verified.

Clarify that .cvd verification occurs for .cvd files and drop
the unsupported CL_DB_UNSIGNED note from this API comment.
Fix the legacy cl_scanfile wrappers to preserve their
original scanned count contract. They should report
scanned / CL_COUNT_PRECISION, but after cl_scanfile_ex was
added they started returning exact byte counts.

Apply the same conversion used by cl_scanmap_callback() in
cl_scanfile() and cl_scanfile_callback(), including the
32-bit saturation check on the scaled value.
Fix the cl_fmap_get_name header comment to match the public
API. The comment claimed the function returns a const char *,
but the prototype and implementation return cl_error_t and
write the name through an output parameter.

Update the return documentation to describe the actual status
code contract.
Fix incorrect cross-references in the scan API header
comments. The cl_scandesc_ex() and cl_scanfile_ex()
documentation described them as extensions of
cl_scanmap_callback(), which is the wrong legacy API.

Update each comment to refer to the matching legacy wrapper so
the public API relationships are documented correctly.
Fix stale hash API references in the scanmap header comments.
The comments pointed users to cl_fmap_get_file_hash(), but
that public function does not exist.

Update both references to the actual public API,
cl_fmap_get_hash(), so the post-scan guidance matches the
header.
Fix the cl_scandesc_ex parameter documentation to match the
actual API. The comment said callers must provide a file
descriptor or a map, but this function only accepts a file
descriptor.

Remove the stale map reference so the parameter description
matches the public prototype.
Fix the cl_fmap_get_path return documentation to match the
implementation. The comment said CL_EACCES means the map has
no file descriptor, but this getter actually returns CL_EACCES
when the map has no stored path.

Update the error description so callers see the correct
failure condition.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR audits libclamav/clamav.h against implementations and fixes several API/documentation mismatches, including a couple of behavior/API corrections in libclamav itself.

Changes:

  • Fix cl_fmap_set_hash() to accept a hash buffer pointer (instead of a single char) and update related docs/NEWS.
  • Fix cl_hash_file_fd_ex() behavior so length == 0 hashes to EOF as documented.
  • Fix legacy cl_scanfile() / cl_scanfile_callback() wrappers to return scanned / CL_COUNT_PRECISION again, and correct multiple header comment inconsistencies/cross-references.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
NEWS.md Updates the published API signature listing for cl_fmap_set_hash().
libclamav/scanners.c Restores legacy wrapper semantics for scanned (divide by CL_COUNT_PRECISION) and adjusts overflow logic accordingly.
libclamav/fmap.c Implements the cl_fmap_set_hash() signature change and null-checks the new pointer parameter.
libclamav/crypto.c Fixes cl_hash_file_fd_ex() to read to EOF when length == 0.
libclamav/clamav.h Corrects multiple API docs/cross-references and updates the cl_fmap_set_hash() prototype/docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libclamav/fmap.c
Comment on lines +1458 to 1464
extern cl_error_t cl_fmap_set_hash(const cl_fmap_t *map, const char *hash_alg, const char *hash)
{
cl_error_t status = CL_ERROR;
cli_hash_type_t type;

if (!map || !hash_alg) {
if (!map || !hash_alg || !hash) {
status = CL_ENULLARG;
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cl_fmap_set_hash() now takes const char *hash, but fmap_set_hash() treats it as a binary digest buffer of length cli_hash_len(type) (not a NUL-terminated string). To avoid ambiguity and make the API self-describing, consider changing the parameter type (and docs) to const uint8_t * (or const void *) and explicitly documenting the required buffer length per algorithm.

Copilot uses AI. Check for mistakes.
Comment thread libclamav/fmap.c
}

status = fmap_set_hash((fmap_t *)map, (uint8_t *)&hash, type);
status = fmap_set_hash((fmap_t *)map, (uint8_t *)hash, type);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast drops const ((uint8_t *)hash) even though the hash buffer is only read (copied via memcpy in fmap_set_hash). Prefer updating fmap_set_hash() to accept const uint8_t * (and/or casting to const uint8_t * in the call) to preserve const-correctness and avoid compiler warnings.

Suggested change
status = fmap_set_hash((fmap_t *)map, (uint8_t *)hash, type);
status = fmap_set_hash((fmap_t *)map, (const uint8_t *)hash, type);

Copilot uses AI. Check for mistakes.
Comment thread libclamav/clamav.h
Comment on lines 686 to 692
* @param map The file map to modify.
* @param hash_alg The hash algorithm to use (e.g., "md5", "sha1", "sha2-256").
* @param hash The hash value to set.
* @param hash Pointer to the hash value to set.
* @return cl_error_t CL_SUCCESS if the hash was successfully set.
*/
extern cl_error_t cl_fmap_set_hash(const cl_fmap_t *map, const char *hash_alg, char hash);
extern cl_error_t cl_fmap_set_hash(const cl_fmap_t *map, const char *hash_alg, const char *hash);

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new cl_fmap_set_hash(..., const char *hash) prototype/documentation is ambiguous: the hash value is a binary digest buffer (may contain NUL bytes), not a C string. Consider using const uint8_t * (or const void *) and documenting that the buffer must be at least the digest length for hash_alg (e.g., 16 bytes for md5, 32 for sha2-256).

Copilot uses AI. Check for mistakes.
Comment thread libclamav/clamav.h
Comment on lines +1962 to 1967
* For `.cvd` files, this also verifies the CVD digital signature.
*
* @param file Filepath of CVD file.
* @param certs_directory Directory containing CA certificates required to verify the CVD digital signature.
* @param dboptions Bitmask of flags to modify behavior.
* Set CL_DB_FIPS_LIMITS to require the CVD to be signed with a FIPS-compliant external '.sign' file.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cl_cvdverify_ex() still supports CL_DB_UNSIGNED (and other APIs in this header document it), but this doc block no longer mentions it. Please restore/update the documentation to reflect that CL_DB_UNSIGNED disables digital signature verification (and clarify how that interacts with the “also verifies signature” statement).

Suggested change
* For `.cvd` files, this also verifies the CVD digital signature.
*
* @param file Filepath of CVD file.
* @param certs_directory Directory containing CA certificates required to verify the CVD digital signature.
* @param dboptions Bitmask of flags to modify behavior.
* Set CL_DB_FIPS_LIMITS to require the CVD to be signed with a FIPS-compliant external '.sign' file.
* For `.cvd` files, this also verifies the CVD digital signature by default.
* Signature verification can be disabled by setting CL_DB_UNSIGNED in @p dboptions.
*
* @param file Filepath of CVD file.
* @param certs_directory Directory containing CA certificates required to verify the CVD digital signature.
* @param dboptions Bitmask of flags to modify behavior.
* Set CL_DB_FIPS_LIMITS to require the CVD to be signed with a FIPS-compliant external '.sign' file.
* Set CL_DB_UNSIGNED to disable digital signature verification, even for `.cvd` files
* (if CL_DB_UNSIGNED is set, no FIPS signature checks are performed).

Copilot uses AI. Check for mistakes.
Comment thread libclamav/crypto.c
Comment on lines +712 to +721
size_t read_size = blocksize;

if (length != 0) {
read_size = MIN(read_size, length - byte_read);
}

#ifdef _WIN32
nread = _read(fd, block, blocksize);
nread = _read(fd, block, read_size);
#else
nread = read(fd, block, blocksize);
nread = read(fd, block, read_size);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, _read() takes an unsigned int byte count, but read_size is size_t. This can trigger warnings / truncation concerns. Consider making read_size an unsigned int under _WIN32 (or casting with an explicit bounds check) so the type matches _read’s signature.

Copilot uses AI. Check for mistakes.
Comment thread libclamav/scanners.c
if (SIZEOF_LONG == 4 && scanned_bytes > UINT32_MAX) {
if ((SIZEOF_LONG == 4) &&
(scanned_bytes / CL_COUNT_PRECISION > UINT32_MAX)) {
cli_warnmsg("cl_scanfile_callback: scanned_bytes exceeds UINT32_MAX, setting to UINT32_MAX\n");
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow check now compares (scanned_bytes / CL_COUNT_PRECISION) > UINT32_MAX, but the warning message still says scanned_bytes exceeds UINT32_MAX and is tagged cl_scanfile_callback even in cl_scanfile(). Please update the log text to reference the correct function and units (bytes/CL_COUNT_PRECISION) so the warning is accurate and actionable.

Suggested change
cli_warnmsg("cl_scanfile_callback: scanned_bytes exceeds UINT32_MAX, setting to UINT32_MAX\n");
cli_warnmsg("cl_scanfile: scanned_bytes/CL_COUNT_PRECISION exceeds UINT32_MAX, setting scanned to UINT32_MAX\n");

Copilot uses AI. Check for mistakes.
Comment thread NEWS.md
cl_error_t cl_fmap_get_fd(const cl_fmap_t *map, int *fd_out, size_t *offset_out, size_t *len_out);
cl_error_t cl_fmap_get_size(const cl_fmap_t *map, size_t *size_out);
cl_error_t cl_fmap_set_hash(const cl_fmap_t *map, const char *hash_alg, char hash);
cl_error_t cl_fmap_set_hash(const cl_fmap_t *map, const char *hash_alg, const char *hash);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cl_fmap_set_hash() public prototype is adjusted (e.g., to const uint8_t * to indicate a binary digest buffer), please keep this NEWS.md API listing in sync so it accurately reflects the shipped signature.

Suggested change
cl_error_t cl_fmap_set_hash(const cl_fmap_t *map, const char *hash_alg, const char *hash);
cl_error_t cl_fmap_set_hash(const cl_fmap_t *map, const char *hash_alg, const uint8_t *hash);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants