Skip to content

Conversation

@imaami
Copy link
Owner

@imaami imaami commented Aug 24, 2024

This PR is a work in progress.

Summary by CodeRabbit

  • New Features

    • Introduced a JSON parsing library supporting strings, numbers, and keywords with detailed error reporting.
    • Added a test harness for validating JSON parsing functionality.
    • Implemented a command-line utility to convert UTF-8 strings to Unicode code points with configurable error handling and output formatting.
    • Added enhanced UTF-8 parsing function supporting safe parsing up to buffer end.
    • Added comprehensive De Bruijn sequence analysis tools with cycle detection, permutation handling, and property navigation.
    • Added 64-bit De Bruijn sequence decomposition and identification utilities with run-length encoding and validation.
    • Added a bitwise operations header with efficient bit counting and rotation functions.
    • Added utilities for reversing bits in integers of various sizes.
    • Introduced robust string-to-64-bit integer parsing with error handling.
    • Added a test program for bit-reversal functionality.
  • Build Enhancements

    • Expanded build configuration to support new stochar and test-json build targets, improving testing and utility capabilities.
    • Updated CI workflows to provide verbose output during build processes for better diagnostics.
  • Documentation

    • Added new header files defining structures, enums, and interfaces for JSON parsing, internal parsing utilities, and UTF-8 parsing enhancements, improving code organization and maintainability.
    • Introduced headers for De Bruijn sequence handling, bitwise operations, and file input abstractions, enhancing modularity and platform compatibility.
    • Updated utility macros for cross-compiler compatibility.
  • Version Control

    • Updated .gitignore to exclude additional directories related to new source and test files.
  • Other Improvements

    • Refined file input functions to optionally support null-terminated buffers for text files.
    • Replaced file reading function calls in tests with updated interface supporting text mode.

@coderabbitai
Copy link

coderabbitai bot commented Aug 24, 2024

Walkthrough

The changes introduce a JSON parsing library with partial implementation and detailed error handling, including a test program to validate parsing. The build system is updated to add a test-json target and a new stochar utility for converting UTF-8 strings to Unicode code points in a standardized format. The GitHub Actions workflow is modified to enable verbose build output. The .gitignore is updated to exclude new source directories. The new code includes internal parsing utilities, UTF-8/UTF-32 conversion logic, and command-line argument processing for the new tools. Additionally, a new UTF-8 parsing function with explicit buffer end handling is added. A new file implements De Bruijn sequence analysis and cycle detection logic. File input functions are modified to accept a text flag controlling null termination, with corresponding renames. Compiler attribute macros are conditionally defined for MSVC compatibility, and a new macro maps integer types to unsigned counterparts. Bit counting and rotation utilities are added for 32- and 64-bit integers. Support for 64-bit De Bruijn sequences is added with loading, searching, validation, and identification functions. A new tool analyzes 64-bit De Bruijn bitcount decompositions. A new header provides bitwise reversal functions for various integer sizes. String-to-64-bit integer parsing with error handling is introduced, along with a test program for bitwise reversal.

Changes

Cohort / File(s) Change Summary
GitHub Workflow
.github/workflows/c-cpp.yml
Added V=1 flag to all make commands for verbose build output across debug and release builds on multiple OS runners.
Ignore Configuration
.gitignore
Added /src/stochar and /src/test-json directories to ignore list.
Build System
src/Makefile
Added stochar, test-json, and test-rev targets; expanded TARGETS and defined source/debug files for these targets.
JSON Parser
src/json.c, src/json.h, src/json_internal.h, src/test-json.c
Added JSON parser implementation, parsing interface, internal utilities, and a test program invoking the parser.
UTF-8 Utility
src/stochar.c
Added UTF-8 to Unicode code point conversion utility with command-line options for error handling and input processing.
UTF-8 Parsing Extension
src/utf8.c, src/utf8.h
Added utf8_parse_next_code_point2 function and declaration for UTF-8 parsing with explicit buffer end pointer support.
De Bruijn 25-bit Analysis
src/debruijn25.c
Added De Bruijn sequence analysis implementation with cycle detection, permutation handling, and property navigation.
Bit Operations
src/bits.h
Added bit counting functions for runs of set/unset bits from MSB/LSB for 32- and 64-bit integers with compiler intrinsics.
File Input API
src/file.c, src/file.h, src/test.c
Modified file input functions to accept a text flag controlling null termination; renamed functions accordingly.
De Bruijn 25-bit Data and API
src/b25.c, src/b25.h
Added 32-bit binary De Bruijn sequence data structures, analysis functions, and constants for cycle information.
Compiler Attributes and Macros
src/util.h
Added conditional compilation for attribute macros for MSVC compatibility; added macro mapping integer types to unsigned.
De Bruijn 26-bit Support
src/b26.c, src/b26.h
Added 64-bit binary De Bruijn sequence support with loading, searching, validation, and identification functions.
De Bruijn 26-bit Bitcount Tool
src/debruijn26.c
Added tool to analyze 64-bit De Bruijn sequences by run-length decomposition and bitcount accumulation.
Bitwise Reversal Functions
src/rev.h
Added bitwise reversal functions for unsigned integers of sizes 2,4,8,16,32,64 bits with conditional typedefs.
String Parsing
src/parse.c, src/parse.h
Added string-to-64-bit integer parsing with error handling and reporting functions.
Bitwise Reversal Test
src/test-rev.c
Added test program to parse integers and print their bitwise reversed values.

Suggested labels

enhancement

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch json

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 77cc5e1 and 000ada5.

Files selected for processing (6)
  • .gitignore (1 hunks)
  • src/Makefile (2 hunks)
  • src/json.c (1 hunks)
  • src/json.h (1 hunks)
  • src/json_internal.h (1 hunks)
  • src/test-json.c (1 hunks)
Additional context used
Path-based instructions (4)
src/test-json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json_internal.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (24)
.gitignore (1)

17-17: Addition of /src/test-json to .gitignore.

The addition of /src/test-json to the .gitignore file is appropriate for excluding JSON test files from version control, maintaining a cleaner repository.

src/test-json.c (2)

6-6: Include guard or pragma once.

Ensure that json.h has an include guard or uses #pragma once to prevent multiple inclusions.


8-14: Review of main function logic.

The main function iterates over command-line arguments and calls json_parse. Ensure that json_parse handles errors gracefully and that input validation is performed.

src/Makefile (2)

3-3: Addition of test-json target.

The test-json target is correctly added to the TARGETS variable, enabling the build system to recognize it.


26-27: Definition of source and debug files for test-json.

The source files json.c and test-json.c, along with the debug file dbg.c, are appropriately defined for the test-json target. Ensure these files are correctly implemented and integrated.

src/json.h (6)

9-12: Includes are appropriate.

The included headers are suitable for the file's functionality.


14-22: Enum json_type is well-defined.

The enum covers all standard JSON data types.


24-27: Struct json_w is appropriate.

The struct is simple and suitable for managing JSON data buffers.


29-33: Struct json_arg is well-structured.

The struct effectively manages pointers for parsing context.


35-39: Struct json_ret is efficiently defined.

The use of bit fields optimizes storage for parsing results.


41-52: Function declarations are clear and necessary.

The declared functions are essential for JSON parsing operations.

src/json_internal.h (5)

9-9: Include of json.h is appropriate.

The inclusion is necessary for internal utilities.


14-23: Enum json_flag is well-defined.

The enum provides necessary flags for parsing operations.


28-74: Lookup table json_lut is comprehensive.

The table efficiently supports character parsing.


95-102: Function json_skip_digits is efficient and well-documented.

The function efficiently skips digits in a buffer.


123-130: Function json_skip_ws is efficient and well-documented.

The function efficiently skips whitespace in a buffer.

src/json.c (8)

6-10: Includes are appropriate.

The included headers are suitable for the file's functionality.


52-98: Function json_parse_frc_exp is well-implemented.

The function effectively parses fractional and exponential parts of a number.


100-111: Function json_parse_integer is well-implemented.

The function effectively parses integer parts and integrates with json_parse_frc_exp.


161-198: Function json_parse_neg_int is well-implemented.

The function effectively parses negative integers and handles edge cases.


200-299: Functions for parsing literals are well-implemented.

The functions parse_false, parse_null, and parse_true effectively parse their respective JSON literals.


301-346: Function parse_value is well-implemented.

The function correctly dispatches parsing based on the initial character.


348-373: Function json_describe is well-implemented.

The function provides useful logging of parsing results.


392-410: Function json_parse is well-implemented.

The function effectively initiates JSON parsing and sets up the context.

Comment on lines +1 to +5
/* SPDX-License-Identifier: LGPL-3.0-or-later */
/** @file test-json.c
*
* @author Juuso Alasuutari
*/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File header documentation.

The file header provides basic information, including the author and license. Consider adding a brief description of the file's purpose for better clarity.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 000ada5 and 6d9ad59.

Files selected for processing (6)
  • .gitignore (1 hunks)
  • src/Makefile (2 hunks)
  • src/json.c (1 hunks)
  • src/json.h (1 hunks)
  • src/json_internal.h (1 hunks)
  • src/test-json.c (1 hunks)
Additional context used
Path-based instructions (4)
src/test-json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json_internal.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (16)
.gitignore (1)

17-17: Addition of /src/test-json to .gitignore is appropriate.

This change correctly excludes JSON-related test files from version control, maintaining a clean repository.

src/test-json.c (1)

1-5: Consider adding a file description.

The file header would benefit from a brief description of its purpose to improve clarity.

src/Makefile (1)

3-3: Inclusion of test-json target is well-structured.

The additions for the test-json target are consistent with existing patterns and enhance the project's testing capabilities.

Also applies to: 26-27

src/json.h (1)

24-27: Ensure buffer validity in json_w.

The json_w structure assumes the buffer is valid and correctly sized. Ensure that the buffer is properly initialized and managed in the implementation.

src/json_internal.h (2)

95-102: Ensure pointer validity in json_skip_digits.

The function assumes valid pointers. Ensure that the calling code guarantees this to prevent undefined behavior.


123-130: Ensure pointer validity in json_skip_ws.

Similar to json_skip_digits, ensure that the calling code guarantees valid pointers to prevent undefined behavior.

src/json.c (10)

52-98: Ensure comprehensive testing for json_parse_frc_exp.

The function handles fractional and exponential parts of numbers. Ensure comprehensive test coverage for edge cases, such as invalid characters and buffer boundaries.


100-111: Ensure comprehensive testing for json_parse_num.

Similar to json_parse_frc_exp, ensure comprehensive test coverage for edge cases.


113-121: Implement parse_array function.

The function is currently a placeholder and needs implementation.

Would you like assistance in implementing this function?


123-131: Implement parse_object function.

The function is currently a placeholder and needs implementation.

Would you like assistance in implementing this function?


133-141: Implement parse_string function.

The function is currently a placeholder and needs implementation.

Would you like assistance in implementing this function?


151-194: Ensure comprehensive testing for json_parse_neg.

The function handles negative numbers. Ensure comprehensive test coverage for edge cases, such as invalid characters and buffer boundaries.


297-343: Ensure comprehensive testing for parse_value.

The function handles various JSON types. Ensure comprehensive test coverage for all cases.


345-371: Ensure proper error logging in json_describe.

The function logs errors based on the return code. Ensure that all possible error codes are handled and logged appropriately.


373-388: Ensure comprehensive testing for parse_element.

The function parses JSON elements. Ensure comprehensive test coverage for edge cases.


390-408: Ensure proper input validation in json_parse.

The function assumes a valid string input. Ensure that the calling code validates input to prevent undefined behavior.

Comment on lines +12 to +14
for (int i = 0; ++i < c;) {
json_parse(v[i]);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add error handling for json_parse.

The current implementation lacks error handling for the json_parse function. Consider checking the return value and handling errors appropriately.

for (int i = 0; ++i < c;) {
    if (json_parse(v[i]) != 0) {
        fprintf(stderr, "Error parsing JSON: %s\n", v[i]);
        return 1;
    }
}

Comment on lines +28 to +81
constexpr static const uint8_t json_lut[256] = {
['\t'] = json_ws,
['\n'] = json_ws,
['\r'] = json_ws,
[' ' ] = json_ws,

['"' ] = json_esc,

['+' ] = json_sig,
['-' ] = json_sig,

['.' ] = json_frc,

['/' ] = json_esc,

['0' ] = json_dig,
['1' ] = json_dig| json_1_9,
['2' ] = json_dig| json_1_9,
['3' ] = json_dig| json_1_9,
['4' ] = json_dig| json_1_9,
['5' ] = json_dig| json_1_9,
['6' ] = json_dig| json_1_9,
['7' ] = json_dig| json_1_9,
['8' ] = json_dig| json_1_9,
['9' ] = json_dig| json_1_9,

['A' ] = json_hex,
['B' ] = json_hex,
['C' ] = json_hex,
['D' ] = json_hex,
['E' ] = json_hex| json_exp,
['F' ] = json_hex,

['\\'] = json_esc,

['a' ] = json_hex,
['b' ] = json_hex| json_esc,
['c' ] = json_hex,
['d' ] = json_hex,
['e' ] = json_hex| json_exp,
['f' ] = json_hex| json_esc,

['n' ] = json_esc,
['r' ] = json_esc,
['t' ] = json_esc,
['u' ] = json_esc,
};
Copy link

@coderabbitai coderabbitai bot Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize lookup table initialization.

Consider using a loop or macro to initialize the lookup table if the pattern allows for it, to reduce potential errors and improve maintainability.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you seriously suggesting changing a compile-time constant lookup table to a runtime-initialized one?

Stop and think.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami, you're right. The compile-time constant nature of the lookup table is crucial for efficiency, and changing it to runtime initialization would be counterproductive. Thanks for pointing that out. I'll make sure to consider the implications more carefully in the future.


Learnings added
Learnt from: imaami
PR: imaami/libcanth#12
File: src/json_internal.h:28-74
Timestamp: 2024-08-24T04:05:34.990Z
Learning: When reviewing compile-time constant lookup tables, maintain their compile-time nature for efficiency, and avoid suggesting runtime initialization unless there's a compelling reason.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 6d9ad59 and 97aa355.

Files selected for processing (6)
  • .gitignore (1 hunks)
  • src/Makefile (2 hunks)
  • src/json.c (1 hunks)
  • src/json.h (1 hunks)
  • src/json_internal.h (1 hunks)
  • src/test-json.c (1 hunks)
Additional context used
Path-based instructions (4)
src/test-json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json_internal.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (12)
.gitignore (1)

17-17: Approved: Addition of /src/test-json to .gitignore.

The inclusion of /src/test-json aligns with the new JSON testing framework and helps maintain a clean repository.

src/test-json.c (2)

1-5: File header documentation.

The file header provides basic information, including the author and license. Consider adding a brief description of the file's purpose for better clarity.


12-14: Add error handling for json_parse.

The current implementation lacks error handling for the json_parse function. Consider checking the return value and handling errors appropriately.

for (int i = 0; ++i < c;) {
    if (json_parse(v[i]) != 0) {
        fprintf(stderr, "Error parsing JSON: %s\n", v[i]);
        return 1;
    }
}
src/Makefile (2)

3-3: Approved: Addition of test-json target to Makefile.

The inclusion of the test-json target enhances the project's testing capabilities for JSON functionality and is consistent with existing test targets.


26-27: Approved: Definition of SRC_test-json and DBG_test-json.

The source and debug files for the test-json target are correctly specified, aligning with the new JSON testing framework.

src/json.h (2)

24-27: Struct json_w looks good.

The struct is well-defined for holding a buffer and its length.


41-52: Function declarations are consistent.

The declarations follow standard conventions and are well-organized.

src/json_internal.h (2)

95-102: Function json_skip_digits is efficient.

The function efficiently skips digit characters using a lookup table.


123-130: Function json_skip_ws is efficient.

The function efficiently skips whitespace characters using a lookup table.

src/json.c (3)

52-98: Function json_parse_frc_exp is well-implemented.

The function correctly handles fractional and exponential parts with appropriate error handling.


100-111: Function json_parse_num integrates well.

The function correctly parses numbers and integrates well with json_parse_frc_exp.


151-194: Function json_parse_neg is well-implemented.

The function correctly handles negative numbers with appropriate error handling.

src/json.c Outdated
Comment on lines 113 to 121
struct json_ret
parse_array (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_array,
.code = ENOSYS
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement parse_array function.

The function is currently a placeholder and needs implementation.

Would you like assistance in implementing this function?

src/json.c Outdated
Comment on lines 123 to 131
struct json_ret
parse_object (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_object,
.code = ENOSYS
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement parse_object function.

The function is currently a placeholder and needs implementation.

Would you like assistance in implementing this function?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 97aa355 and 33ec1dd.

Files selected for processing (4)
  • .github/workflows/c-cpp.yml (12 hunks)
  • src/json.c (1 hunks)
  • src/json.h (1 hunks)
  • src/json_internal.h (1 hunks)
Additional context used
Path-based instructions (3)
src/json.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json_internal.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (27)
src/json.h (3)

1-13: File header and includes look good.

The file header provides necessary metadata, and the includes are appropriate for the functionality.


24-33: Structures look good.

The json_w and json_arg structures are well-defined and necessary for the functionality.


41-42: Function declaration looks good.

The json_parse function declaration is appropriate and necessary for the functionality.

src/json_internal.h (5)

1-10: File header and includes look good.

The file header provides necessary metadata, and the includes are appropriate for the functionality.


11-16: Typedef looks good.

The json_fn typedef is well-defined and necessary for the functionality.


32-81: Lookup table looks good.

The json_lut lookup table is well-defined and necessary for the functionality.


83-109: Function looks good.

The json_skip_digits function is well-defined and necessary for the functionality.


111-137: Function looks good.

The json_skip_ws function is well-defined and necessary for the functionality.

src/json.c (7)

1-11: File header and includes look good.

The file header provides necessary metadata, and the includes are appropriate for the functionality.


12-36: Comments look good.

The comments provide a detailed description of the JSON grammar and are useful for understanding the JSON parsing logic.


38-98: Function looks good.

The json_parse_frc_exp function is well-defined and necessary for the functionality.


100-111: Function looks good.

The json_parse_num function is well-defined and necessary for the functionality.


143-194: Function looks good.

The json_parse_neg function is well-defined and necessary for the functionality.


227-258: Function looks good.

The json_parse_keyword function is well-defined and necessary for the functionality.


269-318: Function looks good.

The json_parse_value function is well-defined and necessary for the functionality.

.github/workflows/c-cpp.yml (12)

79-79: LGTM! Increased verbosity for GCC debug builds.

The addition of the V=1 flag to the make command will provide more detailed output during the GCC debug builds.


93-93: LGTM! Increased verbosity for Clang debug builds.

The addition of the V=1 flag to the make command will provide more detailed output during the Clang debug builds.


118-118: LGTM! Increased verbosity for GCC release builds.

The addition of the V=1 flag to the make command will provide more detailed output during the GCC release builds.


132-132: LGTM! Increased verbosity for Clang release builds.

The addition of the V=1 flag to the make command will provide more detailed output during the Clang release builds.


256-256: LGTM! Increased verbosity for GCC debug builds on Ubuntu 24.04.

The addition of the V=1 flag to the make command will provide more detailed output during the GCC debug builds on Ubuntu 24.04.


270-270: LGTM! Increased verbosity for Clang debug builds on Ubuntu 24.04.

The addition of the V=1 flag to the make command will provide more detailed output during the Clang debug builds on Ubuntu 24.04.


295-295: LGTM! Increased verbosity for GCC release builds on Ubuntu 24.04.

The addition of the V=1 flag to the make command will provide more detailed output during the GCC release builds on Ubuntu 24.04.


309-309: LGTM! Increased verbosity for Clang release builds on Ubuntu 24.04.

The addition of the V=1 flag to the make command will provide more detailed output during the Clang release builds on Ubuntu 24.04.


433-433: LGTM! Increased verbosity for GCC debug builds on macOS 14.

The addition of the V=1 flag to the make command will provide more detailed output during the GCC debug builds on macOS 14.


447-447: LGTM! Increased verbosity for Clang debug builds on macOS 14.

The addition of the V=1 flag to the make command will provide more detailed output during the Clang debug builds on macOS 14.


461-461: LGTM! Increased verbosity for GCC release builds on macOS 14.

The addition of the V=1 flag to the make command will provide more detailed output during the GCC release builds on macOS 14.


475-475: LGTM! Increased verbosity for Clang release builds on macOS 14.

The addition of the V=1 flag to the make command will provide more detailed output during the Clang release builds on macOS 14.

Comment on lines +14 to +22
fixed_enum(json_type, uint8_t) {
json_false,
json_true,
json_null,
json_number,
json_string,
json_array,
json_object
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using typedef for the enum.

Using typedef for the json_type enum can improve readability and consistency with other typedefs in the codebase.

Apply this diff to use typedef:

- fixed_enum(json_type, uint8_t) {
+ typedef fixed_enum(json_type, uint8_t) {
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fixed_enum(json_type, uint8_t) {
json_false,
json_true,
json_null,
json_number,
json_string,
json_array,
json_object
};
typedef fixed_enum(json_type, uint8_t) {
json_false,
json_true,
json_null,
json_number,
json_string,
json_array,
json_object
};

Comment on lines +21 to +30
fixed_enum(json_flag, uint8_t) {
json_1_9 = 1U,
json_dig = json_1_9 << 1U,
json_esc = json_dig << 1U,
json_exp = json_esc << 1U,
json_frc = json_exp << 1U,
json_hex = json_frc << 1U,
json_sig = json_hex << 1U,
json_ws = json_sig << 1U,
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment for each flag.

Adding comments for each flag in the json_flag enum can improve code readability and maintainability.

Apply this diff to add comments:

fixed_enum(json_flag, uint8_t) {
  json_1_9 = 1U, // Digits 1-9
  json_dig = json_1_9 << 1U, // Any digit
  json_esc = json_dig << 1U, // Escape character
  json_exp = json_esc << 1U, // Exponential part
  json_frc = json_exp << 1U, // Fractional part
  json_hex = json_frc << 1U, // Hexadecimal digit
  json_sig = json_hex << 1U, // Sign character
  json_ws  = json_sig << 1U, // Whitespace character
};
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fixed_enum(json_flag, uint8_t) {
json_1_9 = 1U,
json_dig = json_1_9 << 1U,
json_esc = json_dig << 1U,
json_exp = json_esc << 1U,
json_frc = json_exp << 1U,
json_hex = json_frc << 1U,
json_sig = json_hex << 1U,
json_ws = json_sig << 1U,
};
fixed_enum(json_flag, uint8_t) {
json_1_9 = 1U, // Digits 1-9
json_dig = json_1_9 << 1U, // Any digit
json_esc = json_dig << 1U, // Escape character
json_exp = json_esc << 1U, // Exponential part
json_frc = json_exp << 1U, // Fractional part
json_hex = json_frc << 1U, // Hexadecimal digit
json_sig = json_hex << 1U, // Sign character
json_ws = json_sig << 1U, // Whitespace character
};

Comment on lines +113 to +120
static struct json_ret
json_parse_array (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_array,
.code = ENOSYS
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement json_parse_array function.

The function is currently a placeholder and needs implementation.

Would you like assistance in implementing this function?

Comment on lines +123 to +130
static struct json_ret
json_parse_object (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_object,
.code = ENOSYS
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement json_parse_object function.

The function is currently a placeholder and needs implementation.

Would you like assistance in implementing this function?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 33ec1dd and 6e87285.

Files selected for processing (3)
  • src/json.c (1 hunks)
  • src/json.h (1 hunks)
  • src/json_internal.h (1 hunks)
Additional context used
Path-based instructions (3)
src/json.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json_internal.h (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

src/json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (9)
src/json.h (3)

14-22: Consider using typedef for the enum.

Using typedef for the json_type enum can improve readability and consistency with other typedefs in the codebase.

Apply this diff to use typedef:

-fixed_enum(json_type, uint8_t) {
+typedef fixed_enum(json_type, uint8_t) {

This comment is still valid and applicable.


44-48: Clarify the purpose of json_ret fields.

The bit fields in json_ret are compact but may benefit from comments explaining their purpose and usage.

Apply this diff to add comments:

struct json_ret {
  uint64_t size : 48; // Size of the parsed element
  uint64_t type :  8; // Type of the JSON element
  uint64_t code :  8; // Error code
};

This comment is still valid and applicable.


50-51: Review the json_parse function prototype.

The function json_parse is declared but not defined in this file. Ensure that its implementation handles inputs correctly and adheres to the expected JSON parsing logic outlined in the documentation comments.

Verification successful

json_parse function is implemented correctly.

The json_parse function is declared in src/json.h and implemented in src/json.c. It handles input validation and uses structured parsing logic. The implementation appears to adhere to expected JSON parsing logic. No issues found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of `json_parse` function.

# Test: Search for the function implementation. Expect: Implementation should be present and correct.
rg --type c -A 10 $'json_parse'

Length of output: 6037

src/json_internal.h (3)

21-30: Consider adding a comment for each flag.

Adding comments for each flag in the json_flag enum can improve code readability and maintainability.

Apply this diff to add comments:

fixed_enum(json_flag, uint8_t) {
  json_1_9 = 1U, // Digits 1-9
  json_dig = json_1_9 << 1U, // Any digit
  json_esc = json_dig << 1U, // Escape character
  json_exp = json_esc << 1U, // Exponential part
  json_frc = json_exp << 1U, // Fractional part
  json_hex = json_frc << 1U, // Hexadecimal digit
  json_sig = json_hex << 1U, // Sign character
  json_ws  = json_sig << 1U, // Whitespace character
};

This comment is still valid and applicable.


35-81: Optimize lookup table initialization.

Consider using a loop or macro to initialize the lookup table if the pattern allows for it, to reduce potential errors and improve maintainability.
Based on the user's feedback, this suggestion was not appropriate as the table is a compile-time constant. The existing comment is still valid and applicable.


102-109: Review the json_skip_digits function implementation.

This function is crucial for parsing numbers in JSON. Ensure that it correctly skips all digit characters and handles bounds correctly, as per the JSON standard.

Verification successful

Function json_skip_digits is correctly implemented and used.

The json_skip_digits function is used appropriately in src/json.c, ensuring that digit characters are skipped while respecting the bounds of the JSON data. The implementation aligns with the expected behavior for parsing numbers in JSON.

  • File: src/json.c
  • Usage: Handles bounds correctly with arg.end.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correct behavior of `json_skip_digits` function.

# Test: Search for the function usage in the codebase. Expect: Correct usage and handling of bounds.
rg --type c -A 5 $'json_skip_digits'

Length of output: 1208

src/json.c (3)

113-120: Implement placeholder parsing functions.

The functions json_parse_array, json_parse_object, and json_parse_string are currently placeholders and need proper implementation.

Would you like assistance in implementing these functions?
This comment is still valid and applicable.

Also applies to: 123-130, 133-140


38-98: Review the json_parse_frc_exp function.

This function parses the fractional and exponential parts of a number. Ensure that it correctly identifies and handles the syntax of numbers according to the JSON standard.

Verification successful

Function json_parse_frc_exp is implemented and used correctly.

The function is used in multiple locations within src/json.c, and its return values are checked for syntax errors, indicating that it handles JSON number parsing as intended. The error handling for JSON syntax errors is present and appears to be correctly implemented.

  • Locations in src/json.c where json_parse_frc_exp is used confirm its role in parsing JSON numbers and handling errors.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the correct behavior of `json_parse_frc_exp` function.

# Test: Search for the function usage in the codebase. Expect: Correct parsing and error handling.
rg --type c -A 10 $'json_parse_frc_exp'

Length of output: 1561


100-111: Review the json_parse_num function.

This function is responsible for parsing numbers in JSON. Ensure that it integrates well with json_parse_frc_exp and handles all numeric formats correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 6e87285 and 4f757f4.

Files selected for processing (1)
  • src/json.c (1 hunks)
Additional context used
Path-based instructions (1)
src/json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (7)
src/json.c (7)

100-111: Verify the size calculation in json_parse_num.

Ensure that the size calculation that adds the number of digits parsed before calling json_parse_frc_exp is accurate and correctly reflects the total number of characters processed by this function.


113-140: Skip commenting on placeholder functions.

Previous comments have already addressed the need for implementation of json_parse_array, json_parse_object, and json_parse_string. No additional comments are necessary at this stage.


151-194: Check error handling in json_parse_neg.

Verify that error codes and size calculations are correctly handled and propagated, especially when integrating with json_parse_frc_exp. This function's robustness is critical given its role in parsing negative numbers.


289-338: Confirm default error handling in json_parse_value.

Ensure that the default case in the switch statement, which sets EILSEQ, is appropriate for all cases where the character does not match any known JSON value types. This is crucial for maintaining correct error reporting in the JSON parsing process.


340-366: Verify logging accuracy in json_describe.

Check that the function accurately logs the correct portion of parsed data based on ret.size and handles all JSON types appropriately. This is essential for correct debugging and logging of JSON parsing results.


368-383: Check whitespace handling in json_parse_element.

Ensure that json_skip_ws is correctly skipping whitespace and that size adjustments after parsing values are accurate. This function's effectiveness is crucial for the correct parsing of JSON elements, especially in nested structures.


385-403: Confirm the handling of the parsing result in json_parse.

The parsing result is currently not used ((void)ret;). Confirm whether this is intentional and if there are plans to utilize or handle the result in future updates. This could impact error handling and integration with other components.

Comment on lines +52 to +98
static struct json_ret
json_parse_frc_exp (struct json_arg arg)
{
int e = 0;
uint8_t const *p = arg.ptr;

if (p < arg.end) do {
if (*p == '.') {
if (++p >= arg.end) {
// last character is '.'
e = ENODATA;
break;
}

if (!(json_lut[*p] & json_dig)) {
// non-digit follows '.'
e = EILSEQ;
break;
}

p = json_skip_digits(&p[1], arg.end);
}

if (p < arg.end && (json_lut[*p] & json_exp)) {
if (++p >= arg.end ||
((json_lut[*p] & json_sig) && ++p >= arg.end)) {
// last character is [Ee+-]
e = ENODATA;
break;
}

if (!(json_lut[*p] & json_dig)) {
// non-digit follows [Ee][+-]?
e = EILSEQ;
break;
}

p = json_skip_digits(&p[1], arg.end);
}
} while (0);

return (struct json_ret){
.size = (uint64_t)(ptrdiff_t)(p - arg.ptr),
.type = json_number,
.code = (uint64_t)e
};
}
Copy link

@coderabbitai coderabbitai bot Aug 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simplifying the loop structure in json_parse_frc_exp.

The do-while loop used here could be simplified or replaced with a more straightforward loop structure to enhance readability and maintainability. Additionally, ensure that json_lut and json_skip_digits are correctly used and robust against all input cases.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a loop.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami, you're right—this do-while construct is not a loop in the traditional sense. It's being used to execute the block of code once, with the ability to exit early using break statements. Thanks for pointing that out!


Failed to remove learnings

Learnings cannot be removed due to server error. Please try again.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 4f757f4 and ae3173e.

Files selected for processing (1)
  • src/json.c (1 hunks)
Additional context used
Path-based instructions (1)
src/json.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (4)
src/json.c (4)

100-110: Approve the integration in json_parse_num.

The function correctly delegates to json_skip_digits and json_parse_frc_exp, simplifying its implementation. Ensure that these functions are thoroughly tested to maintain overall correctness.


52-98: Review loop structure and error handling in json_parse_frc_exp.

The function uses a do-while loop which might be simplified. Verify that json_lut and json_skip_digits are robust against all input cases and that error handling is correctly implemented.

Run the following script to verify the robustness of json_lut and json_skip_digits:

Verification successful

Loop structure and error handling in json_parse_frc_exp are correctly implemented.

The use of json_lut and json_skip_digits is robust and handles edge cases effectively. The function's logic aligns with JSON parsing standards, ensuring proper error handling for non-digit characters and end-of-input scenarios.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the robustness of json_lut and json_skip_digits.

# Test: Search for definitions and usages of json_lut and json_skip_digits. Expect: Definitions and usages should handle all edge cases.
rg --type c -A 5 $'json_lut|json_skip_digits'

Length of output: 2879


303-352: Verify switch-case structure in json_parse_value.

Ensure that the switch-case structure correctly dispatches to appropriate parsing functions and handles all possible JSON values correctly.

Run the following script to verify the completeness and correctness of the switch-case structure:

Verification successful

Switch-case structure in json_parse_value is correctly implemented.

The function correctly dispatches to appropriate parsing functions for different JSON value types and handles unrecognized characters with an error code. The implementation aligns with expected behavior.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify switch-case structure in json_parse_value.

# Test: Search for the implementation and usage of json_parse_value. Expect: Correct dispatching to parsing functions.
rg --type c -A 5 $'json_parse_value'

Length of output: 451


151-194: Verify edge case handling in json_parse_neg.

Ensure that the function correctly handles all edge cases, especially input validation when parsing negative numbers. The use of json_parse_frc_exp should also be verified for correctness in this context.

Run the following script to verify the handling of edge cases:

Comment on lines 113 to 140
static struct json_ret
json_parse_array (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_array,
.code = ENOSYS
};
}

static struct json_ret
json_parse_object (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_object,
.code = ENOSYS
};
}

static struct json_ret
json_parse_string (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_string,
.code = ENOSYS
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement placeholder functions for JSON structures.

json_parse_array, json_parse_object, and json_parse_string are marked as placeholders and need full implementations to handle respective JSON structures.

Would you like assistance in implementing these functions?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between ae3173e and 525cab0.

Files selected for processing (3)
  • .gitignore (1 hunks)
  • src/Makefile (2 hunks)
  • src/stochar.c (1 hunks)
Additional context used
Path-based instructions (1)
src/stochar.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (10)
.gitignore (1)

16-18: LGTM!

The new entries are correctly formatted and appropriately added to the .gitignore file.

src/Makefile (3)

3-3: LGTM!

The TARGETS variable now includes stochar and test-json, which is consistent with the new additions.


21-22: LGTM!

The source files for stochar are correctly defined.


29-30: LGTM!

The source files for test-json are correctly defined.

src/stochar.c (6)

40-47: LGTM!

The function correctly performs a saturated addition of two size_t values.


49-62: LGTM!

The function correctly calculates the maximum and sum of argument sizes, handling overflow appropriately.


64-123: LGTM!

The union code_point is well-defined and covers multiple representations of a code point.


127-131: LGTM!

The function correctly checks if a code point is an error value.


133-169: LGTM!

The function correctly converts a UTF-8 code point to UTF-32, handling different lengths of UTF-8 sequences.


182-262: LGTM!

The main function is correctly implemented and handles various edge cases, including memory allocation errors and invalid UTF-8 sequences.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 525cab0 and fbb5e0b.

Files selected for processing (1)
  • src/stochar.c (1 hunks)
Additional context used
Path-based instructions (1)
src/stochar.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (8)
src/stochar.c (8)

1-35: LGTM!

Headers and macros are correctly included and defined.

The code changes are approved.


40-47: LGTM!

The function is correctly implemented.

The code changes are approved.


49-62: LGTM!

The function is correctly implemented.

The code changes are approved.


64-121: LGTM!

The union is correctly defined.

The code changes are approved.


123-129: LGTM!

The static assert and macros are correctly implemented.

The code changes are approved.


131-135: LGTM!

The function is correctly implemented.

The code changes are approved.


137-173: LGTM!

The function is correctly implemented.

The code changes are approved.


175-184: LGTM!

The function is correctly implemented.

The code changes are approved.

Comment on lines 186 to 357
int
main (int c,
char **v)
{
struct letopt opt = letopt_init(c, v);

if (letopt_nargs(&opt) < 1 || opt.m_help)
letopt_helpful_exit(&opt);

struct uz2 sz = arg_sizes(&opt);
size_t buf_sz = opt.m_separate ? sz.max : sz.sum;
buf_sz = saturated_add_uz(
saturated_add_uz(buf_sz, 2U),
saturated_add_uz(buf_sz, buf_sz));

if (buf_sz == SIZE_MAX) {
pr_err_("%s", strerror(EOVERFLOW));
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

char *buf = malloc(buf_sz);
if (!buf) {
pr_errno_(errno, "malloc(%zu)", buf_sz);
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

struct utf8 u8p = utf8();
uint64_t n[] = {0,0,0,0};
char *out = buf;

for (int i = 0; i < letopt_nargs(&opt); ++i) {
uint8_t const *p = (uint8_t const *)letopt_arg(&opt, i);
for (uint8_t const *q = p; *q; q = p) {
p = utf8_parse_next_code_point(&u8p, q);
if (u8p.error) {
if (opt.m_replace && (*p || opt.m_separate)) {
*out++ = (char)(unsigned char)0xefU;
*out++ = (char)(unsigned char)0xbfU;
*out++ = (char)(unsigned char)0xbdU;
}
if (!*p)
break;
if (p == q && utf8_expects_leading_byte(&u8p))
++p;
utf8_reset(&u8p);
continue;
}

const union code_point uc = utf8_code_point(&u8p);
if (!code_point_error(uc))
printf("U+%04" PRIx32 "\n", uc.u32);
}

if (opt.m_separate) {
if (out == buf || *(out - 1) != '\n')
*out++ = '\n';
*out = '\0';
out = buf;
(void)printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
"\t%" PRIu64 "\t%" PRIu64 "\t%s",
n[0], n[1], n[2], n[3], n[0] +
2U*n[1] + 3U*n[2] + 4U*n[3], buf);

__builtin_memset(&n[0], 0, sizeof n);
utf8_reset(&u8p);
}
}

if (!opt.m_separate) {
if (out == buf || *(out - 1) != '\n')
*out++ = '\n';
*out = '\0';
(void)printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
"\t%" PRIu64 "\t%" PRIu64 "\t%s",
n[0], n[1], n[2], n[3], n[0] +
2U*n[1] + 3U*n[2] + 4U*n[3], buf);
}

free(buf);
return letopt_fini(&opt);
}
Copy link

@coderabbitai coderabbitai bot Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve memory allocation failure handling and avoid non-standard functions.

  • Memory allocation failure handling can be improved by freeing allocated resources before returning.
  • The use of __builtin_memset is non-standard and should be replaced with memset.

Apply this diff to improve memory allocation failure handling:

-		pr_errno_(errno, "malloc(%zu)", buf_sz);
-		(void)letopt_fini(&opt);
+		pr_errno_(errno, "malloc(%zu)", buf_sz);
+		free(buf);
+		(void)letopt_fini(&opt);

Apply this diff to replace __builtin_memset with memset:

-		__builtin_memset(&n[0], 0, sizeof n);
+		memset(&n[0], 0, sizeof n);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Memory allocation failure handling can be improved by freeing allocated resources before returning.

Apply this diff to improve memory allocation failure handling:

-		pr_errno_(errno, "malloc(%zu)", buf_sz);
-		(void)letopt_fini(&opt);
+		pr_errno_(errno, "malloc(%zu)", buf_sz);
+		free(buf);
+		(void)letopt_fini(&opt);

This makes no sense. That branch is entered only when buf is NULL due to malloc failure. There is no situation where buf will be anything but NULL! Your suggestion is just as useless as this sort of thing:

int main(void) {
    free(NULL);
    return 0;
}

How did you even think that was a good idea?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami, you're absolutely right. The suggestion to free buf in the case of a malloc failure was unnecessary since buf would be NULL, and calling free(NULL) is indeed a no-op. Thank you for pointing that out, and I appreciate your patience.

I'll make sure to avoid such oversights in the future.


Failed to remove learnings

Learnings cannot be removed due to server error. Please try again.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai dearest, can you retry removing the learnings you seemed to attempt in your previous comment? It says there was a server error.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami, I'll retry removing the learning related to the unnecessary suggestion to free buf in the case of a malloc failure.



Learnings removed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between fbb5e0b and 3971892.

Files selected for processing (1)
  • src/stochar.c (1 hunks)
Additional context used
Path-based instructions (1)
src/stochar.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (8)
src/stochar.c (8)

9-27: LGTM!

The macros are correctly defined and used.

The code changes are approved.


40-47: LGTM!

The function is correctly implemented.

The code changes are approved.


49-62: LGTM!

The function is correctly implemented.

The code changes are approved.


151-155: LGTM!

The function is correctly implemented.

The code changes are approved.


157-193: LGTM!

The function is correctly implemented.

The code changes are approved.


195-203: LGTM!

The function is correctly implemented.

The code changes are approved.


145-149: LGTM!

The macro is correctly defined and used.

The code changes are approved.


143-143: LGTM!

The macro is correctly defined and used.

The code changes are approved.

Comment on lines 205 to 357
int
main (int c,
char **v)
{
struct letopt opt = letopt_init(c, v);

if (letopt_nargs(&opt) < 1 || opt.m_help)
letopt_helpful_exit(&opt);

struct uz2 sz = arg_sizes(&opt);
size_t buf_sz = opt.m_separate ? sz.max : sz.sum;
buf_sz = saturated_add_uz(
saturated_add_uz(buf_sz, 2U),
saturated_add_uz(buf_sz, buf_sz));

if (buf_sz == SIZE_MAX) {
pr_errno_(EOVERFLOW, "refusing to allocate buffer");
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

char *buf = malloc(buf_sz);
if (!buf) {
pr_errno_(errno, "malloc(%zu)", buf_sz);
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

struct utf8 u8p = utf8();
uint64_t n[] = {0,0,0,0};
char *out = buf;

for (int i = 0; i < letopt_nargs(&opt); ++i) {
uint8_t const *p = (uint8_t const *)letopt_arg(&opt, i);
for (uint8_t const *q = p; *q; q = p) {
p = utf8_parse_next_code_point(&u8p, q);
if (u8p.error) {
if (opt.m_replace && (*p || opt.m_separate)) {
*out++ = (char)(unsigned char)0xefU;
*out++ = (char)(unsigned char)0xbfU;
*out++ = (char)(unsigned char)0xbdU;
}
if (!*p)
break;
if (p == q && utf8_expects_leading_byte(&u8p))
++p;
utf8_reset(&u8p);
continue;
}

const union code_point uc = utf8_code_point(&u8p);
if (!code_point_error(uc))
printf("U+%04" PRIx32 "\n", uc.u32);
}

if (opt.m_separate) {
if (out == buf || *(out - 1) != '\n')
*out++ = '\n';
*out = '\0';
out = buf;
(void)printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
"\t%" PRIu64 "\t%" PRIu64 "\t%s",
n[0], n[1], n[2], n[3], n[0] +
2U*n[1] + 3U*n[2] + 4U*n[3], buf);

__builtin_memset(&n[0], 0, sizeof n);
utf8_reset(&u8p);
}
}

if (!opt.m_separate) {
if (out == buf || *(out - 1) != '\n')
*out++ = '\n';
*out = '\0';
(void)printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
"\t%" PRIu64 "\t%" PRIu64 "\t%s",
n[0], n[1], n[2], n[3], n[0] +
2U*n[1] + 3U*n[2] + 4U*n[3], buf);
}

free(buf);
return letopt_fini(&opt);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve memory allocation failure handling and avoid non-standard functions.

  • Memory allocation failure handling can be improved by freeing allocated resources before returning.
  • The use of __builtin_memset is non-standard and should be replaced with memset.

Apply this diff to improve memory allocation failure handling:

-		pr_errno_(errno, "malloc(%zu)", buf_sz);
-		(void)letopt_fini(&opt);
+		pr_errno_(errno, "malloc(%zu)", buf_sz);
+		free(buf);
+		(void)letopt_fini(&opt);

Apply this diff to replace __builtin_memset with memset:

-		__builtin_memset(&n[0], 0, sizeof n);
+		memset(&n[0], 0, sizeof n);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 3971892 and 159b498.

Files selected for processing (1)
  • src/stochar.c (1 hunks)
Additional context used
Path-based instructions (1)
src/stochar.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (8)
src/stochar.c (8)

40-47: LGTM!

The function is correctly implemented and handles overflow as expected.

The code changes are approved.


49-58: LGTM!

The function is correctly implemented and uses __builtin_strlen appropriately.

The code changes are approved.


78-88: LGTM!

The function is correctly implemented and handles overflow as expected.

The code changes are approved.


212-215: LGTM!

The function is correctly implemented and uses BAD_CODE_POINT appropriately.

The code changes are approved.


218-253: LGTM!

The function is correctly implemented and handles different byte lengths appropriately.

The code changes are approved.


256-263: LGTM!

The function is correctly implemented and uses utf8_to_utf32 appropriately.

The code changes are approved.


132-202: LGTM!

The struct is correctly defined and provides multiple views for different encodings.

The code changes are approved.


114-123: LGTM!

The struct is correctly defined and provides a detailed view of the 4-byte UTF-8 encoding.

The code changes are approved.

Comment on lines 266 to 342
int
main (int c,
char **v)
{
struct letopt opt = letopt_init(c, v);

if (letopt_nargs(&opt) < 1 || opt.m_help)
letopt_helpful_exit(&opt);

size_t buf_sz = opt.m_separate ? arg_sizes_max(&opt)
: arg_sizes_sum(&opt);
buf_sz = saturated_add_uz(
saturated_add_uz(buf_sz, 2U),
saturated_add_uz(buf_sz, buf_sz));

if (buf_sz == SIZE_MAX) {
pr_errno_(EOVERFLOW, "refusing to allocate buffer");
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

char *buf = malloc(buf_sz);
if (!buf) {
pr_errno_(errno, "malloc(%zu)", buf_sz);
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

struct utf8 u8p = utf8();
uint64_t n[] = {0,0,0,0};
//char *out = buf;

for (int i = 0; i < letopt_nargs(&opt); ++i) {
uint8_t const *p = (uint8_t const *)letopt_arg(&opt, i);
for (uint8_t const *q = p; *q; q = p) {
p = utf8_parse_next_code_point(&u8p, q);
if (u8p.error) {
if (opt.m_replace && (*p || opt.m_separate)) {
const struct ucp uc = utf8_to_utf32(
(struct ucp){
.u8 = {0xefU, 0xbfU,
0xbdU, 0x00U}
},
3U
);
printf("U+%04" PRIx32 "\n", uc.u32);
/*
*out++ = (char)(unsigned char)0xefU;
*out++ = (char)(unsigned char)0xbfU;
*out++ = (char)(unsigned char)0xbdU;
*/
}
if (!*p)
break;
if (p == q && utf8_expects_leading_byte(&u8p))
++p;
utf8_reset(&u8p);
continue;
}

const struct ucp uc = utf8_code_point(&u8p);
if (!code_point_error(uc))
printf("U+%04" PRIx32 "\n", uc.u32);
}

if (opt.m_separate) {
/*
if (out == buf || *(out - 1) != '\n')
*out++ = '\n';
*out = '\0';
out = buf;
(void)printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
"\t%" PRIu64 "\t%" PRIu64 "\t%s",
n[0], n[1], n[2], n[3], n[0] +
2U*n[1] + 3U*n[2] + 4U*n[3], buf);
*/
__builtin_memset(&n[0], 0, sizeof n);
utf8_reset(&u8p);
}
Copy link

@coderabbitai coderabbitai bot Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve memory allocation failure handling and avoid non-standard functions.

  • Memory allocation failure handling can be improved by freeing allocated resources before returning.
  • The use of __builtin_memset is non-standard and should be replaced with memset.

Apply this diff to improve memory allocation failure handling:

-		pr_errno_(errno, "malloc(%zu)", buf_sz);
-		(void)letopt_fini(&opt);
+		pr_errno_(errno, "malloc(%zu)", buf_sz);
+		free(buf);
+		(void)letopt_fini(&opt);

Apply this diff to replace __builtin_memset with memset:

-		__builtin_memset(&n[0], 0, sizeof n);
+		memset(&n[0], 0, sizeof n);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you keep reposting this flawed suggestion to free a known NULL pointer? Also, I use __builtin_memset() for a reason, please don't focus on it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami, I apologize for the repeated suggestion regarding freeing a NULL pointer. You're correct that it's unnecessary, and I will ensure that this suggestion is not made again. Additionally, I understand that the use of __builtin_memset() is intentional, and I will refrain from focusing on it in future reviews.

Thank you for your patience.


Learnings removed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 159b498 and b1a3085.

Files selected for processing (1)
  • src/stochar.c (1 hunks)
Additional context used
Path-based instructions (1)
src/stochar.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (7)
src/stochar.c (7)

40-47: LGTM!

The function saturated_add_uz is correctly implemented for performing saturated addition.

The code changes are approved.


49-58: LGTM!

The function arg_sizes_max is correctly implemented for calculating the maximum length of string arguments.

The code changes are approved.


78-89: LGTM!

The function arg_sizes_sum is correctly implemented for calculating the saturated sum of string argument lengths.

The code changes are approved.


100-109: LGTM!

The enum ucp_kind is correctly defined for various Unicode code point encodings.

The code changes are approved.


111-175: LGTM!

The structures ucp_utf32_7, ucp_utf32_11, ucp_utf32_16, ucp_utf32_21, ucp_utf8_1, ucp_utf8_2, ucp_utf8_3, ucp_utf8_4 are correctly defined for their respective UTF-32 and UTF-8 encodings.

The code changes are approved.


184-201: LGTM!

The union ucp is correctly defined for various views of Unicode code points.

The code changes are approved.


217-252: LGTM!

The function utf8_to_utf32 is correctly implemented for converting UTF-8 to UTF-32 based on the number of bytes.

The code changes are approved.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between b1a3085 and a608f89.

Files selected for processing (1)
  • src/stochar.c (1 hunks)
Additional context used
Path-based instructions (1)
src/stochar.c (1)

Pattern **/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Additional comments not posted (7)
src/stochar.c (7)

1-8: File header and includes are appropriate.

The file header includes author information and SPDX license identifier. The includes are standard and project-specific headers.


40-47: LGTM!

The saturated_add_uz function is correctly implemented and handles overflow by returning SIZE_MAX.


49-59: LGTM!

The arg_sizes_max function is correctly implemented and uses __builtin_strlen for string length calculation.


78-88: LGTM!

The arg_sizes_sum function is correctly implemented and uses saturated_add_uz to handle overflow.


99-138: LGTM!

The ucp_kind enum and related structures for UTF-8 and UTF-32 conversion are correctly defined and follow standard practices.


183-246: LGTM!

The union ucp and related functions for handling code point errors and conversion between UTF-8 and UTF-32 are correctly implemented and follow standard practices.


254-262: LGTM!

The utf8_code_point function is correctly implemented and uses utf8_to_utf32 for conversion.

Comment on lines +264 to +356
int
main (int c,
char **v)
{
struct letopt opt = letopt_init(c, v);

if (letopt_nargs(&opt) < 1 || opt.m_help)
letopt_helpful_exit(&opt);

size_t buf_sz = opt.m_separate ? arg_sizes_max(&opt)
: arg_sizes_sum(&opt);
buf_sz = saturated_add_uz(
saturated_add_uz(buf_sz, 2U),
saturated_add_uz(buf_sz, buf_sz));

if (buf_sz == SIZE_MAX) {
pr_errno_(EOVERFLOW, "refusing to allocate buffer");
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

char *buf = malloc(buf_sz);
if (!buf) {
pr_errno_(errno, "malloc(%zu)", buf_sz);
(void)letopt_fini(&opt);
return EXIT_FAILURE;
}

struct utf8 u8p = utf8();
uint64_t n[] = {0,0,0,0};
//char *out = buf;

for (int i = 0; i < letopt_nargs(&opt); ++i) {
uint8_t const *p = (uint8_t const *)letopt_arg(&opt, i);
for (uint8_t const *q = p; *q; q = p) {
p = utf8_parse_next_code_point(&u8p, q);
if (u8p.error) {
if (opt.m_replace && (*p || opt.m_separate)) {
const union ucp uc = utf8_to_utf32(
(union ucp){
.u8 = {0xefU, 0xbfU,
0xbdU, 0x00U}
},
3U
);
printf("U+%04" PRIx32 "\n", uc.u32);
/*
*out++ = (char)(unsigned char)0xefU;
*out++ = (char)(unsigned char)0xbfU;
*out++ = (char)(unsigned char)0xbdU;
*/
}
if (!*p)
break;
if (p == q && utf8_expects_leading_byte(&u8p))
++p;
utf8_reset(&u8p);
continue;
}

const union ucp uc = utf8_code_point(&u8p);
if (!code_point_error(uc))
printf("U+%04" PRIx32 "\n", uc.u32);
}

if (opt.m_separate) {
/*
if (out == buf || *(out - 1) != '\n')
*out++ = '\n';
*out = '\0';
out = buf;
(void)printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
"\t%" PRIu64 "\t%" PRIu64 "\t%s",
n[0], n[1], n[2], n[3], n[0] +
2U*n[1] + 3U*n[2] + 4U*n[3], buf);
*/
__builtin_memset(&n[0], 0, sizeof n);
utf8_reset(&u8p);
}
}
/*
if (!opt.m_separate) {
if (out == buf || *(out - 1) != '\n')
*out++ = '\n';
*out = '\0';
(void)printf("%" PRIu64 "\t%" PRIu64 "\t%" PRIu64
"\t%" PRIu64 "\t%" PRIu64 "\t%s",
n[0], n[1], n[2], n[3], n[0] +
2U*n[1] + 3U*n[2] + 4U*n[3], buf);
}
*/
free(buf);
return letopt_fini(&opt);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve memory allocation failure handling.

The memory allocation failure handling can be improved by freeing allocated resources before returning.

Apply this diff to improve memory allocation failure handling:

-		pr_errno_(errno, "malloc(%zu)", buf_sz);
-		(void)letopt_fini(&opt);
+		pr_errno_(errno, "malloc(%zu)", buf_sz);
+		free(buf);
+		(void)letopt_fini(&opt);

Committable suggestion was skipped due to low confidence.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/json.c (4)

181-195: Surrogate-pair validation still missing.
This was flagged earlier; the \uXXXX handling does not reject isolated high or low surrogates. Non-BMP characters will be accepted silently as two code units.


436-438: Potential overflow when casting ret.size to int.
Risk of wraparound for tokens > INT_MAX bytes.


113-121: 🛠️ Refactor suggestion

Non-standard useless qualifier will not compile.

useless is not a recognised storage-class specifier or attribute.
Unless it is a macro defined elsewhere, these two functions will fail to parse.

-static struct json_ret
-json_parse_array (useless struct json_arg arg)
+static struct json_ret
+json_parse_array (struct json_arg arg __attribute__((unused)))

Do the same for json_parse_object. If portability is a goal, prefer the (void)arg; idiom inside the body instead of GCC attributes.


366-373: ⚠️ Potential issue

Dereference without buffer-length check – add an early guard.

json_parse_value() blindly reads *arg.ptr.
If arg.ptr == arg.end (empty buffer or only whitespace), this is undefined behaviour.

+    if (arg.ptr >= arg.end) {
+        /* No data left – premature EOF */
+        return (struct json_ret){
+            .size = 0,
+            .type = 0,
+            .code = ENODATA
+        };
+    }

Drop this at the top of the function to make the parser robust.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fadb825 and 6e38b43.

📒 Files selected for processing (3)
  • src/json.c (1 hunks)
  • src/utf8.c (1 hunks)
  • src/utf8.h (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/utf8.h
  • src/json.c
  • src/utf8.c
🧬 Code Graph Analysis (1)
src/utf8.c (1)
src/utf8.h (1)
  • utf8_parse_next_code_point2 (258-260)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-22_04
🔇 Additional comments (2)
src/utf8.c (1)

314-341: utf8_parse_next_code_point2 now respects end – looks good.

Bounds are checked before every dereference, and u8p->error is set to ENODATA on premature EOF.
Implementation matches the contract in the header.

src/utf8.h (1)

227-260: Header declaration consistent with implementation.

Prototype, docs and attributes line up with the new definition in utf8.c.

Comment on lines +123 to +131
static struct json_ret
json_parse_object (useless struct json_arg arg)
{
return (struct json_ret){
.size = 0,
.type = json_object,
.code = ENOSYS
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Same issue for json_parse_object() – replace useless.

🤖 Prompt for AI Agents
In src/json.c around lines 123 to 131, the parameter name `useless` in the
function `json_parse_object` is inappropriate and should be replaced. Rename the
parameter to a more meaningful name that reflects its purpose or usage, or if
the parameter is truly unused, rename it to `arg` or use a standard unused
parameter naming convention to improve code clarity and maintain consistency.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e38b43 and c02dd59.

📒 Files selected for processing (1)
  • src/debruijn25.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn25.c
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04

src/debruijn25.c Outdated
Comment on lines 124 to 136
static force_inline void
u10x3_append (u10x3 *dst,
uint5 lo5,
uint5 hi5)
{
uint32_t n = dst->count;
if (n < 3U) {
n = n * 10U + 2U;
dst->bits += ((uint32_t)hi5 << 5U | lo5) << n
| UINT32_C(1);
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

u10x3_append mutates count via carry – obscures intent

The routine bumps count by adding | 1 to bits, relying on binary
carry into the low two bits. Clever, but opaque.

Make intent explicit and drop the arithmetic magic:

-    n = n * 10U + 2U;
-    dst->bits += ((uint32_t)hi5 << 5U | lo5) << n
-               | UINT32_C(1);
+    n = n * 10U + 2U;
+    dst->bits |= ((uint32_t)hi5 << 5U | lo5) << n;
+    dst->count++;
🤖 Prompt for AI Agents
In src/debruijn25.c around lines 124 to 136, the function u10x3_append uses a
clever but unclear method to increment the count by relying on binary carry from
adding 1 to bits. To fix this, explicitly increment the count variable directly
instead of using bitwise tricks, and adjust the bits manipulation accordingly to
clearly reflect the intended state changes without relying on implicit carry
behavior.

src/debruijn25.c Outdated
Comment on lines 8 to 15
#ifdef _MSC_VER
# define const_inline force_inline
# define force_inline __forceinline
#else
# define const_inline __attribute__((const)) force_inline
# define force_inline __attribute__((always_inline)) inline
#endif

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

const_inline / force_inline macro ordering is brittle

const_inline is defined in terms of force_inline before
force_inline itself is defined. The C pre-processor will expand
recursively, so this works, but it is confusing and fails the “least
astonishment” test. Swap the two definitions to avoid head-scratching.

-# define const_inline force_inline
-# define force_inline __forceinline
+# define force_inline __forceinline
+# define const_inline force_inline
🤖 Prompt for AI Agents
In src/debruijn25.c around lines 8 to 15, the macro const_inline is defined
using force_inline before force_inline itself is defined, which is confusing and
brittle. To fix this, reorder the macro definitions so that force_inline is
defined first, followed by const_inline defined in terms of force_inline. This
will improve clarity and maintainability by avoiding recursive expansion
surprises.

src/debruijn25.c Outdated
Comment on lines 16 to 26
#if defined _MSC_VER || defined __INTELLISENSE__
typedef unsigned uint5;
typedef uint16_t u16bit;
typedef uint32_t u32bit;
typedef uint64_t u64bit;
#else
typedef unsigned _BitInt(5) uint5;
typedef unsigned _BitInt(16) u16bit;
typedef unsigned _BitInt(32) u32bit;
typedef unsigned _BitInt(64) u64bit;
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

_BitInt limits portability

The file relies on C23 _BitInt types (_BitInt(5) etc.). Few
tool-chains outside Clang / GCC trunk support them yet; MSVC certainly
doesn’t. The _MSC_VER branch falls back to ordinary fixed-width
types, but everything else is forced onto _BitInt, meaning lots of
Linux/embedded compilers will choke.

Consider a feature probe and fallback:

#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
typedef unsigned _BitInt(5)  uint5;
#else
typedef uint8_t              uint5;   /* 0-31 still fits */
#endif

Guard C23 _BitInt usage behind a proper feature test

The current code in src/debruijn25.c unconditionally uses C23’s _BitInt for all compilers except MSVC/IntelliSense. Most GCC releases (pre-trunk), Clang and many embedded tool-chains don’t support _BitInt yet and will fail to compile.

Proposed replacement:

#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
  /* C23 compiler: use fixed-width _BitInt types */
  typedef unsigned _BitInt(5)  uint5;
  typedef unsigned _BitInt(16) u16bit;
  typedef unsigned _BitInt(32) u32bit;
  typedef unsigned _BitInt(64) u64bit;
#elif defined(_MSC_VER) || defined(__INTELLISENSE__)
  /* MSVC or IntelliSense: fall back to native widths */
  typedef unsigned            uint5;   /* at least 16 bits, plenty for 0–31 */
  typedef uint16_t            u16bit;
  typedef uint32_t            u32bit;
  typedef uint64_t            u64bit;
#else
  /* Other compilers without _BitInt: use standard fixed-width types */
  typedef uint8_t             uint5;   /* 0–31 fits in 8 bits */
  typedef uint16_t            u16bit;
  typedef uint32_t            u32bit;
  typedef uint64_t            u64bit;
#endif

– Wrap _BitInt usage in #if __STDC_VERSION__ >= 202311L
– Provide an explicit 8-bit fallback for uint5 on compilers lacking _BitInt
– Keep or merge the existing MSVC/IntelliSense branch.

This ensures maximum portability without breaking non-C23 tool-chains.

🤖 Prompt for AI Agents
In src/debruijn25.c around lines 16 to 26, the current code uses C23 _BitInt
types unconditionally except for MSVC/IntelliSense, which breaks compilation on
many other compilers. Fix this by wrapping the _BitInt typedefs inside a feature
test that checks if __STDC_VERSION__ is at least 202311L, indicating C23
support. For MSVC/IntelliSense, keep the existing fallback using native
fixed-width types. For all other compilers without _BitInt support, provide a
fallback typedef for uint5 as uint8_t and use standard fixed-width types for the
others. This ensures portability across different toolchains.

src/debruijn25.c Outdated
Comment on lines 421 to 291
static struct b25_info
b25_info (uint16_t id)
{
uint32_t seq = ror32(B25[id >> 5U], id & 31U);
uint32_t seen = 0;
struct b25_info b25 = {0};

do {
unsigned n = b25_info_get_num_cycles(&b25);
unsigned off = b25_info_get_cycle(&b25, n - 1U).offset;
uint32_t mask = UINT32_C(1) << off;
unsigned period = 1;
for (; (seen & mask); ++off, mask <<= 1);
for (unsigned i = off;;) {
i = ror32(seq, i) & 31U;
uint32_t m = mask ^ (UINT32_C(1) << i);
if (m <= mask) {
if (i != off)
return (struct b25_info){0};
break;
}
mask = m;
++period;
}
if (seen & mask)
return (struct b25_info){0};
seen ^= mask;
b25_info_add_cycle(&b25, off, period, mask);
} while (seen != UINT32_MAX);

const unsigned total = b25_info_get_num_cycles(&b25);

printf("{0x%08" PRIx32 ", 0x%016" PRIx64
", {%" PRIu16 ",%" PRIu16 "}, {%u",
seq, b25.div.bits, id >> 5U, id & 31U, total);
for (unsigned n = 0; ++n < 33U; ) {
unsigned c = b25_info_get_num_period_cycles(&b25, n);
if (c)
printf(",[%u]=%u", n, c);
}
fputs("}, {", stdout);
for (unsigned n = 0; n < total; ++n) {
b25_loc x = b25_info_get_cycle(&b25, n);
printf(&",{%u,%u}"[!n], (unsigned)x.offset,
(unsigned)x.period);
}
fputs("}, {", stdout);
for (unsigned n = 0; n < total; ++n)
printf(&",0x%" PRIx32[!n], b25.mask[n]);
puts("}},");
return b25;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

b25_info can access element −1 on first iteration

When div.total == 0, n-1U underflows to UINT_MAX. b25_info_get_cycle
gracefully returns {0}, so it works today, but the extra indirection
cost and undefined-behaviour concerns if the helper ever changes are
avoidable.

-    unsigned off = b25_info_get_cycle(&b25, n - 1U).offset;
+    unsigned off = n ? b25_info_get_cycle(&b25, n - 1U).offset : 0;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static struct b25_info
b25_info (uint16_t id)
{
uint32_t seq = ror32(B25[id >> 5U], id & 31U);
uint32_t seen = 0;
struct b25_info b25 = {0};
do {
unsigned n = b25_info_get_num_cycles(&b25);
unsigned off = b25_info_get_cycle(&b25, n - 1U).offset;
uint32_t mask = UINT32_C(1) << off;
unsigned period = 1;
for (; (seen & mask); ++off, mask <<= 1);
for (unsigned i = off;;) {
i = ror32(seq, i) & 31U;
uint32_t m = mask ^ (UINT32_C(1) << i);
if (m <= mask) {
if (i != off)
return (struct b25_info){0};
break;
}
mask = m;
++period;
}
if (seen & mask)
return (struct b25_info){0};
seen ^= mask;
b25_info_add_cycle(&b25, off, period, mask);
} while (seen != UINT32_MAX);
const unsigned total = b25_info_get_num_cycles(&b25);
printf("{0x%08" PRIx32 ", 0x%016" PRIx64
", {%" PRIu16 ",%" PRIu16 "}, {%u",
seq, b25.div.bits, id >> 5U, id & 31U, total);
for (unsigned n = 0; ++n < 33U; ) {
unsigned c = b25_info_get_num_period_cycles(&b25, n);
if (c)
printf(",[%u]=%u", n, c);
}
fputs("}, {", stdout);
for (unsigned n = 0; n < total; ++n) {
b25_loc x = b25_info_get_cycle(&b25, n);
printf(&",{%u,%u}"[!n], (unsigned)x.offset,
(unsigned)x.period);
}
fputs("}, {", stdout);
for (unsigned n = 0; n < total; ++n)
printf(&",0x%" PRIx32[!n], b25.mask[n]);
puts("}},");
return b25;
}
do {
unsigned n = b25_info_get_num_cycles(&b25);
unsigned off = n ? b25_info_get_cycle(&b25, n - 1U).offset : 0;
uint32_t mask = UINT32_C(1) << off;
unsigned period = 1;
for (; (seen & mask); ++off, mask <<= 1);
for (unsigned i = off;;) {
i = ror32(seq, i) & 31U;
uint32_t m = mask ^ (UINT32_C(1) << i);
if (m <= mask) {
if (i != off)
return (struct b25_info){0};
break;
}
mask = m;
++period;
}
if (seen & mask)
return (struct b25_info){0};
seen ^= mask;
b25_info_add_cycle(&b25, off, period, mask);
} while (seen != UINT32_MAX);
🤖 Prompt for AI Agents
In src/debruijn25.c around lines 421 to 472, the code calls
b25_info_get_cycle(&b25, n - 1U) without checking if n is zero, causing an
underflow to UINT_MAX when div.total == 0. To fix this, add a condition to skip
or handle the case when n is zero before accessing n - 1U, preventing
out-of-bounds access and avoiding reliance on the helper's current behavior.

src/debruijn25.c Outdated
Comment on lines 110 to 122
static force_inline void
b25_div_add (b25_div *div,
uint5 len)
{
const unsigned i = b25_div_offset(len);
const uint64_t m = b25_div_mask(len) << i;
const uint64_t n = div->bits & m;
if (n < m) {
div->bits += UINT64_C(1) << i
| (uint64_t)(!n && div->unique < UINT64_C(15)) << 4U
| (uint64_t)(div->total < UINT64_C(15));
}
}
Copy link

@coderabbitai coderabbitai bot May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Overflow risk in b25_div_add

div->unique and div->total are only 4-bit fields (max = 15). The
increment logic does not clamp, so a De Bruijn with >15 cycles will
wrap to 0 silently.

Minimal fix:

-           | (uint64_t)(!n && div->unique < UINT64_C(15)) << 4U
-           | (uint64_t)(div->total < UINT64_C(15));
+           | (uint64_t)(!n && div->unique < 15) << 4U
+           | (uint64_t)(div->total < 15);
+
+    /* Saturate when the counters hit their max to avoid wrap-around */
+    if (div->unique == 15) div->unique = 15;
+    if (div->total  == 15) div->total  = 15;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/debruijn25.c around lines 110 to 122, the function b25_div_add increments
the 4-bit fields div->unique and div->total without clamping, causing silent
overflow when values exceed 15. To fix this, add checks to ensure these fields
do not increment beyond 15 by clamping their values after increment or by only
incrementing if they are less than 15.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is clamping, look closer and pay attention next time.

Also, your suggestion has this level of logic:

  if (div->unique == 15) div->unique = 15;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami You're absolutely right, I apologize for the error. Looking at the code again:

div->bits += UINT64_C(1) << i
           | (uint64_t)(!n && div->unique < UINT64_C(15)) << 4U
           | (uint64_t)(div->total < UINT64_C(15));

The clamping is built into the increment conditions themselves - div->unique and div->total only get incremented when they're less than 15. This prevents overflow entirely.

I completely misread the logic and my suggestion was indeed nonsensical. Thank you for the correction.

ಠ_ಠ


✏️ Learnings added
Learnt from: imaami
PR: imaami/libcanth#12
File: src/debruijn25.c:110-122
Timestamp: 2025-05-30T17:45:13.066Z
Learning: In bitfield increment logic like `div->bits += (condition) << offset`, the condition itself can serve as clamping by only allowing increments when the field value is below the maximum. Always examine the full conditional expression before assuming overflow issues.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
src/debruijn25.c (7)

8-14: Macro order still backwards
const_inline expands force_inline before the latter is defined. This was flagged previously; please swap the two definitions.

-# define const_inline force_inline
-# define force_inline __forceinline
+# define force_inline __forceinline
+# define const_inline force_inline

16-26: _BitInt needs a feature-test, not a blanket #else
The code still unconditionally uses C23 _BitInt for all non-MSVC compilers. Many tool-chains choke on it.

Re-use the feature probe suggested last review to fall back to fixed-width types when _BitInt is unavailable.


110-122: 4-bit counters silently wrap
unique and total are 4-bit fields; the increment has no saturation. Values past 15 wrap to 0.

Add clamping after the addition as shown in the earlier review.


124-135: Carry trick still obscures intent
u10x3_append relies on binary carry into the low two bits to bump count. The previous refactor suggestion (explicit increment) has not been applied.


428-435: Index underflow on first cycle
n is 0 on the first loop iteration; passing n - 1U to b25_info_get_cycle wraps to UINT_MAX. The helper happens to return {0} but the call itself is brittle. Guard the call or special-case the first pass.


259-288: Dead code after the early return
Everything below return alt_main(...); is unreachable. Either delete or wrap in #if 0 … #endif to silence compiler warnings.


484-491: strtoul result not validated
alt_main ignores errno, overflows and leftover characters. Bad input silently truncates. The earlier error-checking patch is still pending.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c02dd59 and 259ac93.

📒 Files selected for processing (1)
  • src/debruijn25.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn25.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
src/debruijn25.c (6)

8-14: Macro definition order still reversed

const_inline is defined in terms of force_inline before force_inline itself is defined on the non-MSVC branch. This is the exact issue flagged in the earlier review; the confusion remains.

-# define const_inline __attribute__((const)) force_inline
-# define force_inline __attribute__((always_inline)) inline
+# define force_inline __attribute__((always_inline)) inline
+# define const_inline __attribute__((const)) force_inline

16-26: _BitInt usage still lacks a portability guard

The code still falls back only for MSVC/IntelliSense. Compilers without C23 _BitInt support will break. The earlier review provided a feature-test snippet; please add it or an equivalent solution.


124-135: u10x3_append still relies on carry trick

The clever dst->bits += … | 1 still increments the 2-bit count via carry. The previous review suggested an explicit increment for clarity; code is unchanged.


258-287: Code below the early return is unreachable

Everything after return alt_main(argc, argv); will never execute and triggers -Wunreachable-code. Same issue as previously reported.


428-433: Potential underflow when n == 0

On the first iteration n is zero; b25_info_get_cycle(&b25, n - 1U) queries index UINT_MAX. The earlier review highlighted this; still present.


486-490: strtoul result not checked for errors or range

alt_main still ignores errno, endp, and 16-bit range validation, causing silent truncation. Refer to the earlier suggested fix.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 259ac93 and 71f2890.

📒 Files selected for processing (1)
  • src/debruijn25.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn25.c
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04
🔇 Additional comments (1)
src/debruijn25.c (1)

52-64: Bit-field layout is implementation-defined

b25_div packs 4-, 3-, 2- and 1-bit fields into a uint64_t. The bit ordering within and across bytes varies between compilers/ABIs. If the encoding must be stable across platforms or persisted anywhere, add explicit #ifdef __BYTE_ORDER__ packing or switch to manual shifts/masks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/debruijn25.c (4)

9-15: Macro ordering still reversed – brittleness remains
const_inline is defined in terms of force_inline before force_inline itself exists (non-MSC path). This is the same issue previously highlighted – please reorder.


17-27: Un-guarded _BitInt usage again limits portability
The file still unconditionally uses C23 _BitInt except on MSVC/IntelliSense. Many GCC/Clang releases shipped today choke on it. Please add the feature probe / fallback that was proposed earlier.


275-304: Code after early-return is still unreachable
main() returns immediately via alt_main; everything from 279 onward will never execute and still triggers -Wreturn-type/“unreachable code” diagnostics. Wrap in #if 0…#endif or delete.


867-872: strtoul result unchecked – bad IDs pass silently
alt_main still ignores errno, overflow and trailing garbage. The exact warning and sample fix were given before – please add the validation (range, *endp, errno == ERANGE).

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae0a496 and fadd6bc.

📒 Files selected for processing (1)
  • src/debruijn25.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn25.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04
🔇 Additional comments (1)
src/debruijn25.c (1)

214-219: Rotation helper now safe for n == 0 – good catch
The added early-return eliminates the shift-by-32 UB reported in the last round.

src/debruijn25.c Outdated
Comment on lines 447 to 457
for (; (seen & mask); ++off, mask <<= 1);
for (unsigned i = off;;) {
i = ror32(seq, i) & 31U;
uint32_t m = mask ^ (UINT32_C(1) << i);
if (m <= mask) {
if (i != off)
return (struct b25_info){0};
break;
}
mask = m;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Shift-by-32 UB when every bit in the current cycle is already ‘seen’
mask <<= 1 runs in a tight loop with off incrementing.
If off is 31 and the bit is set, the next iteration shifts by 32 – undefined behaviour.
Proposed fix:

-        for (; (seen & mask); ++off, mask <<= 1);
+        while (seen & mask) {
+            if (++off == 32)
+                return (struct b25_info){0}; /* safety net */
+            mask <<= 1;          /* shift by 1 … 31 only */
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (; (seen & mask); ++off, mask <<= 1);
for (unsigned i = off;;) {
i = ror32(seq, i) & 31U;
uint32_t m = mask ^ (UINT32_C(1) << i);
if (m <= mask) {
if (i != off)
return (struct b25_info){0};
break;
}
mask = m;
}
while (seen & mask) {
if (++off == 32)
return (struct b25_info){0}; /* safety net */
mask <<= 1; /* shift by 1…31 only */
}
for (unsigned i = off;;) {
i = ror32(seq, i) & 31U;
uint32_t m = mask ^ (UINT32_C(1) << i);
if (m <= mask) {
if (i != off)
return (struct b25_info){0};
break;
}
mask = m;
}
🤖 Prompt for AI Agents
In src/debruijn25.c around lines 447 to 457, the loop shifts mask left by 1
repeatedly without checking if off reaches 31, causing a shift-by-32 undefined
behavior. Fix this by adding a condition to stop the loop before off reaches 32,
preventing mask <<= 1 from shifting by 32 bits. Adjust the loop to ensure off
stays within 0 to 31 while checking bits.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/debruijn25.c (4)

9-15: Macro definition order remains counter-intuitive

const_inline is still defined in terms of force_inline before the latter is defined in both branches. This was pointed out earlier. Swap the two lines (or drop the indirection altogether) to avoid head-scratching and accidental self-referencing macro expansions.


132-143: u10x3_append still relies on carry-bit side-effects

Incrementing count by adding | 1 to bits obscures intent and makes reasoning about overflows harder. Explicitly bump count and OR the packed payload; the generated code is identical after optimisation.


476-481: Two long-standing UB hazards remain

  1. n can be 0 on the first iteration, so b25_info_get_cycle(&b25, n - 1) indexes with UINT_MAX.
  2. mask <<= 1 can shift by 32 when off == 31, triggering undefined behaviour.

Both problems were flagged earlier but the suggested guards (if (n …) / capped loop) haven’t been applied.


900-904: alt_main still ignores errno, leftover characters and range when parsing IDs

Bad inputs like 0x1FFFF silently truncate to 16 bits; non-numeric garbage is accepted as 0. The previously suggested strtoul validation (check errno, *endp, and val > UINT16_MAX) has not been integrated.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fadd6bc and 4cb60a6.

📒 Files selected for processing (1)
  • src/debruijn25.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn25.c
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (5)
src/debruijn25.c (5)

17-24: Macro ordering still reversed – re-introduces the old brittleness

const_inline is defined in terms of force_inline before force_inline itself is defined.
The earlier review already pointed this out; the code is unchanged.

-#  define const_inline force_inline
-#  define force_inline __forceinline
+#  define force_inline __forceinline
+#  define const_inline force_inline

Apply the same swap inside the non-MSVC branch as well.


27-37: Unconditional _BitInt use keeps portability problem alive

Except for MSVC, the file still relies on C23 _BitInt types without any feature-test fallback.
Tool-chains that are C17-only will fail to build.

Previous review already supplied a drop-in compile-time probe; please integrate it.


134-144: u10x3_append still increments count via carry-bit trick

The arithmetic hack (dst->bits += … | 1) relies on carry into the low two bits to update count.
It is terse but unclear. A direct dst->count++ is cheaper to read and just as fast on modern
compilers.

Same feedback was given previously.


491-505: Potential shift-by-32 UB remains in mask-advance loop

mask <<= 1 inside for (; (seen & mask); ++off, mask <<= 1); will shift by 32
when every bit from off to 31 is already set. Left-shifting an
uint32_t by its width is undefined.

The earlier patch suggesting a guarded while loop is still needed.


323-352: Dead code after early return – will trigger -Wunreachable-code

main returns the result of alt_main; everything that follows is
unreachable. Either delete or #if 0 the block to silence warnings.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6f3951 and d5a1228.

📒 Files selected for processing (5)
  • src/bits.h (1 hunks)
  • src/debruijn25.c (1 hunks)
  • src/file.c (3 hunks)
  • src/file.h (1 hunks)
  • src/test.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/file.c
  • src/test.c
  • src/file.h
  • src/bits.h
  • src/debruijn25.c
🧬 Code Graph Analysis (2)
src/file.c (1)
src/file.h (1)
  • file_in (23-24)
src/debruijn25.c (3)
src/test.c (1)
  • main (20-68)
src/file.h (4)
  • char (35-39)
  • file_in (23-24)
  • file_in_error (30-30)
  • file_in_fini (27-27)
src/file.c (2)
  • file_in (14-72)
  • file_in_fini (74-82)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04
🔇 Additional comments (7)
src/test.c (2)

28-29: API migration executed correctly.

The transition from file_read to file_in with text=true parameter is appropriate for JSON parsing. Error handling function rename is consistent.


35-35: Text retrieval function rename is consistent.

The file_in_text function maintains the same interface while following the new naming convention.

src/file.c (2)

15-16: Function signature update is correct.

The addition of bool const text parameter enables conditional null termination for text files while maintaining efficiency for binary files.


63-64: Conditional null termination is correctly implemented.

The null termination only occurs when text is true, preventing buffer overrun and maintaining binary data integrity when text is false.

src/file.h (3)

23-24: Function declaration matches implementation.

The file_in signature with bool const text parameter is consistent with the implementation in src/file.c.


30-30: Inline function rename maintains consistency.

The file_in_error function preserves original logic while following the new naming convention.


36-36: Text accessor function correctly renamed.

The file_in_text function maintains the same interface and null-safety logic as the original file_text.

src/debruijn25.c Outdated
Comment on lines 41 to 44
#define foreach_prop(it, tab) \
for (struct prop *o_ = (it), *it_ = o_; \
it_; (it) = next_prop((tab), it_), \
it_ = (it) != o_ ? (it) : NULL)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Iteration macro hides state – favour an inline helper

foreach_prop injects two hidden variables and rewrites the user-supplied iterator each pass.
This obscures control-flow and double-evaluation hazards lurk if tab has side-effects.

Refactor to a static inline iterator helper; the earlier suggestion under the same line numbers still applies.

🤖 Prompt for AI Agents
In src/debruijn25.c around lines 41 to 44, the foreach_prop macro hides internal
state by injecting hidden variables and modifies the user iterator each loop,
which obscures control flow and risks double evaluation if tab has side effects.
Replace this macro with a static inline function that takes the current iterator
and table as arguments and returns the next iterator, preserving state
explicitly and avoiding side effects. Update all foreach_prop usages to use this
inline helper function in a standard loop construct.

src/file.c Outdated

ret.size = (size_t)s.st_size;
ret.data = malloc(ret.size + 1U);
ret.data = malloc(ret.size + text);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clever allocation optimization using bool as integer.

The expression ret.size + text exploits implicit bool-to-int conversion (0 or 1) to allocate exactly the needed space. Consider adding a comment to clarify this optimization for maintainability.

-			ret.data = malloc(ret.size + text);
+			ret.data = malloc(ret.size + text); /* +1 for null terminator if text mode */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ret.data = malloc(ret.size + text);
ret.data = malloc(ret.size + text); /* +1 for null terminator if text mode */
🤖 Prompt for AI Agents
In src/file.c at line 48, the malloc call uses the expression ret.size + text,
relying on implicit bool-to-int conversion to allocate the correct amount of
memory. Add a clear comment above this line explaining that the boolean variable
'text' is implicitly converted to 0 or 1 to optimize the allocation size,
improving code readability and maintainability.

Comment on lines +55 to +68
# define bs_(f,x) _Generic((x) \
, uint32_t: _BitScan##f \
, uint64_t: _BitScan##f##64)

pragma_msvc(intrinsic(_BitScanForward))
pragma_msvc(intrinsic(_BitScanReverse))
pragma_msvc(intrinsic(_BitScanForward64))
pragma_msvc(intrinsic(_BitScanReverse64))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

MSVC 32-bit build will miss _BitScan*64 intrinsics

_BitScanForward64 / _BitScanReverse64 only exist when building for x64 with MSVC.
Compiling this header in a 32-bit Windows configuration (_MSC_VER defined, but _M_X64 / _WIN64 not defined) will therefore fail at the bs_ dispatch.

A defensive fix is to gate the 64-bit intrinsics behind #if defined(_WIN64) and provide a portable fallback (e.g. two 32-bit scans or a loop) for 32-bit MSVC builds.

🤖 Prompt for AI Agents
In src/bits.h around lines 55 to 62, the use of _BitScanForward64 and
_BitScanReverse64 intrinsics is not guarded for 32-bit MSVC builds, causing
compilation failures. Fix this by wrapping the 64-bit intrinsic definitions and
their _Generic dispatch cases inside a #if defined(_WIN64) preprocessor check,
and implement a portable fallback (such as combining two 32-bit scans or a loop)
for 32-bit MSVC builds to handle 64-bit values safely.

Comment on lines +100 to +152
static force_inline unsigned
u32_count_msb_1 (uint32_t x)
{
return_run1(msb, x);
}

static force_inline unsigned
u32_count_msb_0 (uint32_t x)
{
return_run0(msb, x);
}

static force_inline unsigned
u32_count_lsb_1 (uint32_t x)
{
return_run1(lsb, x);
}

static force_inline unsigned
u32_count_lsb_0 (uint32_t x)
{
return_run0(lsb, x);
}

static force_inline unsigned
u64_count_msb_1 (uint64_t x)
{
return_run1(msb, x);
}

static force_inline unsigned
u64_count_msb_0 (uint64_t x)
{
return_run0(msb, x);
}

static force_inline unsigned
u64_count_lsb_1 (uint64_t x)
{
return_run1(lsb, x);
}

static force_inline unsigned
u64_count_lsb_0 (uint64_t x)
{
return_run0(lsb, x);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Return type can be uint8_t for clarity

All helpers return a value in the closed interval [0, 64].
Using plain unsigned is fine, but uint8_t (or unsigned char) advertises the true range, saves 24/56 bits per call site in registers, and avoids sign-promotion surprises when the result is combined with other 8/16-bit values later.

If ABI stability is not yet frozen:

-static force_inline unsigned
+static force_inline uint8_t
 u64_count_msb_1 (uint64_t x)

(and likewise for the remaining seven helpers).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static force_inline unsigned
u32_count_msb_1 (uint32_t x)
{
return_run1(msb, x);
}
static force_inline unsigned
u32_count_msb_0 (uint32_t x)
{
return_run0(msb, x);
}
static force_inline unsigned
u32_count_lsb_1 (uint32_t x)
{
return_run1(lsb, x);
}
static force_inline unsigned
u32_count_lsb_0 (uint32_t x)
{
return_run0(lsb, x);
}
static force_inline unsigned
u64_count_msb_1 (uint64_t x)
{
return_run1(msb, x);
}
static force_inline unsigned
u64_count_msb_0 (uint64_t x)
{
return_run0(msb, x);
}
static force_inline unsigned
u64_count_lsb_1 (uint64_t x)
{
return_run1(lsb, x);
}
static force_inline unsigned
u64_count_lsb_0 (uint64_t x)
{
return_run0(lsb, x);
}
static force_inline uint8_t
u64_count_msb_1 (uint64_t x)
{
return_run1(msb, x);
}
🤖 Prompt for AI Agents
In src/bits.h between lines 100 and 146, the return type of the functions
u32_count_msb_1, u32_count_msb_0, u32_count_lsb_1, u32_count_lsb_0,
u64_count_msb_1, u64_count_msb_0, u64_count_lsb_1, and u64_count_lsb_0 should be
changed from unsigned to uint8_t to better reflect the actual range of returned
values (0 to 64), reduce register usage, and prevent sign-promotion issues.
Update the return type declarations accordingly for all these functions.

Comment on lines +65 to +98
# define return_run1(s,x) \
return (max_hamming(x) \
? (unsigned)CHAR_BIT* \
(unsigned)sizeof(x) \
: (unsigned)cz_##s(x) \
(cmpl(x)))

# define return_run0(s,x) \
return (!(x) \
? (unsigned)CHAR_BIT* \
(unsigned)sizeof(x) \
: (unsigned)cz_##s(x)(x))

# define max_hamming(x) ( \
(x) == _Generic( \
(x), unsigned long: ~0UL, \
long: ~0L, unsigned: ~0U, \
int: ~0, long long: ~0LL, \
unsigned long long: ~0ULL))

# define cz_msb(x) cz_(l, x)
# define cz_lsb(x) cz_(t, x)

# define cz_(y, x) _Generic((x) \
,long long int:__builtin_c##y##zll, unsigned:__builtin_c##y##z\
,long int:__builtin_c##y##zl, unsigned long:__builtin_c##y##zl\
,int:__builtin_c##y##z, unsigned long long:__builtin_c##y##zll)
#endif // _MSC_VER
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

cz_ mapping omits several common integer types

cz_ relies on an exhaustive _Generic list.
Types such as uint8_t, uint16_t, size_t, and unsigned char are silently coerced, leading to promotion surprises or use of the wrong builtin. Clang also warns if the selected builtin does not match the operand width.

Consider:

-# define cz_(y, x) _Generic((x) \
-  ,long long int:__builtin_c##y##zll, unsigned:__builtin_c##y##z \
-  ,long int:__builtin_c##y##zl, unsigned long:__builtin_c##y##zl \
-  ,int:__builtin_c##y##z, unsigned long long:__builtin_c##y##zll)
+# define cz_(y, x) _Generic((x) \
+  ,unsigned long long: __builtin_c##y##zll \
+  ,long long          : __builtin_c##y##zll \
+  ,unsigned long      : __builtin_c##y##zl  \
+  ,long               : __builtin_c##y##zl  \
+  ,unsigned int       : __builtin_c##y##z   \
+  ,int                : __builtin_c##y##z   \
+  ,unsigned short     : __builtin_c##y##z   \
+  ,short              : __builtin_c##y##z   \
+  ,unsigned char      : __builtin_c##y##z   \
+  ,char               : __builtin_c##y##z   \
+  ,default            : __builtin_c##y##z   )

This keeps the mapping predictable and warning-free.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# define return_run1(s,x) \
return (max_hamming(x) \
? (unsigned)CHAR_BIT* \
(unsigned)sizeof(x) \
: (unsigned)cz_##s(x) \
(cmpl(x)))
# define return_run0(s,x) \
return (!(x) \
? (unsigned)CHAR_BIT* \
(unsigned)sizeof(x) \
: (unsigned)cz_##s(x)(x))
# define max_hamming(x) ( \
(x) == _Generic( \
(x), unsigned long: ~0UL, \
long: ~0L, unsigned: ~0U, \
int: ~0, long long: ~0LL, \
unsigned long long: ~0ULL))
# define cz_msb(x) cz_(l, x)
# define cz_lsb(x) cz_(t, x)
# define cz_(y, x) _Generic((x) \
,long long int:__builtin_c##y##zll, unsigned:__builtin_c##y##z\
,long int:__builtin_c##y##zl, unsigned long:__builtin_c##y##zl\
,int:__builtin_c##y##z, unsigned long long:__builtin_c##y##zll)
#endif // _MSC_VER
#define cz_msb(x) cz_(l, x)
#define cz_lsb(x) cz_(t, x)
-#define cz_(y, x) _Generic((x) \
- ,long long int:__builtin_c##y##zll, unsigned:__builtin_c##y##z\
- ,long int:__builtin_c##y##zl, unsigned long:__builtin_c##y##zl\
- ,int:__builtin_c##y##z, unsigned long long:__builtin_c##y##zll)
+#define cz_(y, x) _Generic((x) \
+ ,unsigned long long: __builtin_c##y##zll \
+ ,long long : __builtin_c##y##zll \
+ ,unsigned long : __builtin_c##y##zl \
+ ,long : __builtin_c##y##zl \
+ ,unsigned int : __builtin_c##y##z \
+ ,int : __builtin_c##y##z \
+ ,unsigned short : __builtin_c##y##z \
+ ,short : __builtin_c##y##z \
+ ,unsigned char : __builtin_c##y##z \
+ ,char : __builtin_c##y##z \
+ ,default : __builtin_c##y##z )
🤖 Prompt for AI Agents
In src/bits.h between lines 65 and 92, the cz_ macro's _Generic mapping omits
common integer types like uint8_t, uint16_t, size_t, and unsigned char, causing
silent type coercion and potential warnings. To fix this, explicitly add these
missing types to the _Generic list in cz_ with the correct corresponding
builtins to ensure proper type matching and avoid promotion issues or compiler
warnings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/debruijn25.c (4)

17-25: 🧹 Nitpick (assertive)

const_inline defined before force_inline (again)

The MSVC branch still references force_inline before it is defined.
Same nitpick raised earlier – please just flip the two lines.

-#  define const_inline force_inline
-#  define force_inline __forceinline
+#  define force_inline __forceinline
+#  define const_inline force_inline

27-37: 🛠️ Refactor suggestion

Un-guarded _BitInt will break on non-C23 toolchains

Previous review asked for a feature probe. Nothing changed; any compiler
without _BitInt support still dies.

-#else
-typedef unsigned _BitInt(5)  uint5;
-typedef unsigned _BitInt(16) u16bit;
-typedef unsigned _BitInt(32) u32bit;
-typedef unsigned _BitInt(64) u64bit;
+#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
+typedef unsigned _BitInt(5)  uint5;
+typedef unsigned _BitInt(16) u16bit;
+typedef unsigned _BitInt(32) u32bit;
+typedef unsigned _BitInt(64) u64bit;
+#else
+typedef uint8_t               uint5;
+typedef uint16_t              u16bit;
+typedef uint32_t              u32bit;
+typedef uint64_t              u64bit;
 #endif

133-144: 🧹 Nitpick (assertive)

u10x3_append still uses the “carry-bit trick”

Incrementing count by relying on arithmetic carry is opaque.
Explicit is clearer and costs nothing.

-    n = n * 10U + 2U;
-    dst->bits += ((uint32_t)hi5 << 5U | lo5) << n
-               | UINT32_C(1);
+    n = n * 10U + 2U;
+    dst->bits |= ((uint32_t)hi5 << 5U | lo5) << n;
+    dst->count++;

455-456: ⚠️ Potential issue

Shift-by-32 UB resurrected

mask <<= 1 can be executed when mask already has bit 31 set; the next
left-shift is by 32 → undefined behaviour.

Reuse the fix previously proposed:

-        for (; (seen & mask); ++off, mask <<= 1);
+        while (seen & mask) {
+            if (++off == 32)
+                return (struct b25_info){0};
+            mask <<= 1;      /* shift by 1–31 only */
+        }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5a1228 and 9dbdd5e.

📒 Files selected for processing (1)
  • src/debruijn25.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn25.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04
🔇 Additional comments (1)
src/debruijn25.c (1)

1005-1016: Potential out-of-bounds when filling ptr[]

mask[i] might be zero, leading to n scanning all 32 bits and writing
into every slot of ptr. That’s fine, but k0 and k1 wrap with & 15
while they are incremented up to 32 → duplicates. You will read past the
ptr array of 32 literals on the second wrap.

Guard the inner loop with the population count or modulo by 32.

Comment on lines 966 to 305
uint64_t *b26 = nullptr;
if (c == 2) {
if (!(b26 = b26_load(v[1])))
return 1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

nullptr is C++, not C

uint64_t *b26 = nullptr; will not compile as C.

-    uint64_t *b26 = nullptr;
+    uint64_t *b26 = NULL;

Also, if b26_load() succeeded, the buffer is memory-mapped and should be
released via file_in_fini, not free. Using the wrong de-allocator is
undefined.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uint64_t *b26 = nullptr;
if (c == 2) {
if (!(b26 = b26_load(v[1])))
return 1;
}
uint64_t *b26 = NULL;
if (c == 2) {
if (!(b26 = b26_load(v[1])))
return 1;
}
🤖 Prompt for AI Agents
In src/debruijn25.c around lines 966 to 970, replace the C++ nullptr with the C
equivalent NULL for pointer initialization. Also, ensure that when releasing the
buffer loaded by b26_load(), use file_in_fini() instead of free() to properly
unmap the memory and avoid undefined behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
src/bits.h (3)

55-58: Still unguarded _BitScan*64 intrinsics

The 64-bit intrinsics are emitted unconditionally:

#define bs_(f,x) _Generic((x), \
    uint32_t:_BitScan##f,      \
    uint64_t:_BitScan##f##64)

On 32-bit MSVC (_MSC_VER defined, but _WIN64 not), _BitScanForward64 / Reverse64 do not exist and compilation fails exactly as flagged in the previous review.

Add #if defined(_WIN64) around the uint64_t mapping or provide a portable fallback.
No change since last review.


88-92: cz_ macro still omits common integer types

The _Generic list hasn’t been expanded; uint8_t, uint16_t, size_t, etc., still fall through the default case and may hit width/diagnostic issues on Clang/GCC.

Previous suggestion stands.


100-146: Return type can be uint8_t

All eight bit-count helpers return 0–32/64; unsigned is needlessly wide.
Using uint8_t advertises intent and avoids promotion surprises.

Same note as the last round.

src/b25.h (1)

86-97: u10x3_append still hides the counter bump inside an add-with-carry

Identical concern was raised earlier; the arithmetic trick obscures the
intent. Please adopt the explicit counter update that was suggested
previously.

src/debruijn25.c (1)

312-315: strtoul result still unchecked

The earlier review about ignoring errno, leftover characters and range
overflow is still unaddressed. See prior suggestion for a safe
conversion wrapper.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9dbdd5e and 8d8dd4f.

📒 Files selected for processing (5)
  • src/b25.c (1 hunks)
  • src/b25.h (1 hunks)
  • src/bits.h (1 hunks)
  • src/debruijn25.c (1 hunks)
  • src/util.h (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/util.h
  • src/b25.c
  • src/b25.h
  • src/bits.h
  • src/debruijn25.c
🧬 Code Graph Analysis (2)
src/b25.c (1)
src/b25.h (17)
  • void (86-97)
  • b25_div_offset (71-78)
  • b25_div_offset (72-72)
  • b25_div_mask (80-84)
  • b25_div_mask (80-81)
  • b25_info_get_num_cycles (118-122)
  • b25_info_get_num_cycles (119-119)
  • u10x3_append (87-89)
  • b25_info_get_cycle (125-126)
  • u10x3_count (50-54)
  • u10x3_count (50-51)
  • b25_info (153-153)
  • b25_info_print (162-163)
  • b25_id_to_u16 (146-150)
  • b25_id_to_u16 (146-147)
  • b25_info_get_num_period_cycles (128-135)
  • b25_info_get_num_period_cycles (129-130)
src/debruijn25.c (4)
src/b25.h (7)
  • struct (137-144)
  • struct (155-159)
  • void (86-97)
  • b25_info_by_id (156-156)
  • b25_id (138-138)
  • b25_info_print (162-163)
  • b25_info (153-153)
src/file.h (4)
  • char (35-39)
  • file_in (23-24)
  • file_in_error (30-30)
  • file_in_fini (27-27)
src/b25.c (3)
  • void (13-25)
  • b25_info_print (234-260)
  • b25_info (205-232)
src/file.c (2)
  • file_in (14-72)
  • file_in_fini (74-82)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
🔇 Additional comments (2)
src/bits.h (1)

206-216: rol/ror depend on the problematic uint_type macro

Because uint_type currently fails on MSVC (see util.h comment), every call to rol / ror will also break.
Once uint_type is fixed or gated, these selectors will compile again.

src/b25.c (1)

217-218: Check that ror32 is visible here

b25.c uses ror32 but only includes b25.h.
If b25.h doesn’t transitively include bits.h, this will be an implicit-declaration/undefined-symbol error.

Please confirm that bits.h is reachable, or add an explicit include:

#include "bits.h"

Comment on lines +103 to +108
#define uint_type(x) typeof(_Generic(*(typeof(x) *)(void *)0, \
signed char: (unsigned char)0, unsigned char: (unsigned char)0, \
signed short: (unsigned short)0, unsigned short: (unsigned short)0, \
signed int: 0U, unsigned: 0U, signed long: 0UL, unsigned long: 0UL, \
signed long long: 0ULL, unsigned long long: 0ULL))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

uint_type breaks MSVC builds

uint_type(x) relies on GNU typeof and C11 _Generic, neither of which are available in cl.exe.
Because the macro is defined outside the #ifndef _MSC_VER block, every MSVC translation unit will hit an undefined identifier and fail to compile.

+#ifdef _MSC_VER
+#  define uint_type(x) unsigned /* graceful fallback – pick _Unsigned_long_long etc. if needed */
+#else
 #define uint_type(x) typeof(_Generic(*(typeof(x) *)(void *)0,               \
         signed char: (unsigned char)0, unsigned char: (unsigned char)0,     \
         signed short: (unsigned short)0, unsigned short: (unsigned short)0, \
         signed int: 0U, unsigned: 0U, signed long: 0UL, unsigned long: 0UL, \
         signed long long: 0ULL, unsigned long long: 0ULL))
+#endif

Without a guard (or a __has_extension check) the whole JSON-support build is unusable on Windows.
Please add a fallback or move the definition under the existing #ifndef _MSC_VER branch.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#define uint_type(x) typeof(_Generic(*(typeof(x) *)(void *)0, \
signed char: (unsigned char)0, unsigned char: (unsigned char)0, \
signed short: (unsigned short)0, unsigned short: (unsigned short)0, \
signed int: 0U, unsigned: 0U, signed long: 0UL, unsigned long: 0UL, \
signed long long: 0ULL, unsigned long long: 0ULL))
#ifdef _MSC_VER
# define uint_type(x) unsigned /* graceful fallback – pick _Unsigned_long_long etc. if needed */
#else
#define uint_type(x) typeof(_Generic(*(typeof(x) *)(void *)0, \
signed char: (unsigned char)0, unsigned char: (unsigned char)0, \
signed short: (unsigned short)0, unsigned short: (unsigned short)0, \
signed int: 0U, unsigned: 0U, signed long: 0UL, unsigned long: 0UL, \
signed long long: 0ULL, unsigned long long: 0ULL))
#endif
🤖 Prompt for AI Agents
In src/util.h around lines 103 to 108, the uint_type macro uses GNU typeof and
C11 _Generic, which are unsupported by MSVC, causing build failures. To fix
this, move the entire uint_type macro definition inside the existing #ifndef
_MSC_VER conditional block or add a guard to exclude it when compiling with
MSVC. Alternatively, provide a fallback definition compatible with MSVC to
prevent undefined identifier errors and ensure successful compilation on
Windows.

#else
# define BITINT_C(n) n##WB

typedef unsigned _BitInt(5) uint5;
#define uint5(x) (uint5)(x)
#define uint5_get(x) x

typedef unsigned _BitInt(16) u16bit;
typedef unsigned _BitInt(32) u32bit;
typedef unsigned _BitInt(64) u64bit;
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

_BitInt fallback needs a real feature-test

Every non-MSVC compiler is forced down the _BitInt path, even if it
doesn’t support the C23 feature. Older GCC, Clang and almost every
embedded tool-chain will choke here. Gate the _BitInt typedefs behind
__STDC_VERSION__ >= 202311L (or a dedicated autoconf probe) and fall
back to ordinary fixed-width types otherwise.

-#else
-# define BITINT_C(n) n##WB
-
-typedef unsigned _BitInt(5) uint5;
+#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
+# define BITINT_C(n) n##WB
+
+typedef unsigned _BitInt(5)  uint5;
 typedef unsigned _BitInt(16) u16bit;
 typedef unsigned _BitInt(32) u32bit;
 typedef unsigned _BitInt(64) u64bit;
+#else        /* pre-C23 compiler without _BitInt */
+# define BITINT_C(n) n
+typedef uint8_t  uint5;   /* values 0–31 fit in 8 bits */
+typedef uint16_t u16bit;
+typedef uint32_t u32bit;
+typedef uint64_t u64bit;
 #endif
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#else
# define BITINT_C(n) n##WB
typedef unsigned _BitInt(5) uint5;
#define uint5(x) (uint5)(x)
#define uint5_get(x) x
typedef unsigned _BitInt(16) u16bit;
typedef unsigned _BitInt(32) u32bit;
typedef unsigned _BitInt(64) u64bit;
#endif
#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
# define BITINT_C(n) n##WB
typedef unsigned _BitInt(5) uint5;
#define uint5(x) (uint5)(x)
#define uint5_get(x) x
typedef unsigned _BitInt(16) u16bit;
typedef unsigned _BitInt(32) u32bit;
typedef unsigned _BitInt(64) u64bit;
#else /* pre-C23 compiler without _BitInt */
# define BITINT_C(n) n
typedef uint8_t uint5; /* values 0–31 fit in 8 bits */
typedef uint16_t u16bit;
typedef uint32_t u32bit;
typedef uint64_t u64bit;
#endif
🤖 Prompt for AI Agents
In src/b25.h around lines 25 to 35, the typedefs using _BitInt are currently
enabled for all non-MSVC compilers without checking if the compiler supports the
C23 _BitInt feature. To fix this, add a feature test that checks if
__STDC_VERSION__ is defined and is at least 202311L before defining the _BitInt
typedefs. If the feature test fails, fall back to using ordinary fixed-width
integer types like uint8_t, uint16_t, uint32_t, and uint64_t instead. This will
prevent compilation errors on older GCC, Clang, and embedded toolchains that do
not support _BitInt.

Comment on lines +13 to +16
#define foreach_prop(it, tab) \
for (struct prop *o_ = (it), *it_ = o_; \
it_; (it) = next_prop((tab), it_), \
it_ = (it) != o_ ? (it) : nullptr)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

nullptr is C++, not C – won’t compile

This C translation unit uses the C++ constant nullptr three times:

  • line 16 foreach_prop macro
  • line 221 bag_next
  • line 307 alt_main

Replace with NULL.

-             it_ = (it) != o_ ? (it) : nullptr)
+             it_ = (it) != o_ ? (it) : NULL)

-return i == bag->head[second] ? nullptr : &bag->ptr[i];
+return i == bag->head[second] ? NULL : &bag->ptr[i];

-    uint64_t *b26 = nullptr;
+    uint64_t *b26 = NULL;

Also applies to: 217-222, 307-311

🤖 Prompt for AI Agents
In src/debruijn25.c around lines 13 to 17, 217 to 222, and 307 to 311, the code
uses the C++ keyword nullptr, which is invalid in C and causes compilation
errors. Replace all instances of nullptr with the C equivalent NULL to ensure
compatibility and successful compilation.

Comment on lines +228 to +245
char const **i0 = bag_iter(a, 0);
char const **i1 = bag_iter(b, 1);
uint64_t u64 = 0U;
size_t bs = 64U;
do {
unsigned char const *ucp[] = {
(unsigned char const *)*i0,
(unsigned char const *)*i1
};
size_t len = ucp[0][0];
bs -= len;
len *= ucp[0][1] == (unsigned char)'1';
u64 |= (UINT64_C(1) << len) - 1U << bs;
bs -= (len = ucp[1][0]);
len *= ucp[1][1] == (unsigned char)'1';
u64 |= (UINT64_C(1) << len) - 1U << bs;
} while ((i0 = bag_next(a, 0)) &&
(i1 = bag_next(b, 1)));

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

bag_gen can under-flow bs ⇒ shift by ≥64 → UB

bs starts at 64 and is decremented twice per loop iteration.
After processing 33 elements it wraps, and

u64 |= (UINT64_C(1) << len) - 1U << bs;

invokes undefined behaviour by shifting ≥ 64 bits.

Add a termination check (or clamp len) before the second subtraction:

     bs -= len;
+    if (bs == 0)
+        break;
     len *= ucp[0][1] == (unsigned char)'1';
     u64 |= (UINT64_C(1) << len) - 1U << bs;
     bs -= (len = ucp[1][0]);
+    if (bs == 0)
+        break;

or pre-compute the total and perform at most 64 bit-writes.

🤖 Prompt for AI Agents
In src/debruijn25.c around lines 228 to 246, the variable bs is decremented
twice per loop iteration without checking if it goes below zero, causing a shift
by 64 or more bits which is undefined behavior. To fix this, add a termination
condition or clamp the value of len before the second subtraction to ensure bs
never underflows below zero. Alternatively, pre-calculate the total bits to
write and limit the writes to at most 64 bits to prevent invalid shifts.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

♻️ Duplicate comments (11)
src/bits.h (3)

61-68: 64-bit MSVC intrinsics still unguarded for 32-bit builds

Previous review already covered this. Nothing changed: _BitScanForward64 / _BitScanReverse64 are still selected through _Generic even when _WIN64 is not defined, so 32-bit MSVC trips over missing intrinsics.

Gate the mapping with #if defined(_WIN64) or provide a fallback.


94-98: cz_ _Generic mapping still misses common integer types

The helper is silent for uint8_t, uint16_t, size_t, unsigned char, etc., leading to promotions or wrong built-in selection. See the earlier review for the full patch suggestion.


106-152: Return type should be uint8_t, not unsigned

All eight helpers return a value in [0, 64]. Using a 32-bit unsigned wastes registers and invites accidental signed/unsigned promotion.

-static force_inline unsigned
+static force_inline uint8_t
 u32_count_msb_1 (uint32_t x)

(Apply the same change to the other seven helpers.)

src/b25.h (1)

20-25: _BitInt usage needs proper feature detection

The code unconditionally uses C23 _BitInt types for all non-MSVC compilers. Most compilers don't support this yet and will fail to compile.

src/debruijn25.c (7)

15-15: C++ nullptr won't compile in C

Replace nullptr with NULL for C compatibility.


220-220: C++ nullptr won't compile in C

Replace nullptr with NULL for C compatibility.


297-297: C++ nullptr won't compile in C

Replace nullptr with NULL for C compatibility.


302-302: Missing error handling in strtoul

Input validation is missing. Values exceeding 16-bit range will silently truncate.


347-347: Wrong deallocation function

b26 was loaded with b26_load() which memory-maps the file. Use file_in_fini() instead of free().


88-114: Dead code after early return

Everything after line 88's return statement is unreachable. Either remove it or wrap in #if 0.


227-245: Potential underflow in bag_gen

bs starts at 64 and is decremented twice per iteration. After 33 elements it wraps, causing undefined behavior from shifting by ≥64 bits.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d8dd4f and 37ccfd3.

📒 Files selected for processing (8)
  • src/b25.h (1 hunks)
  • src/b26.c (1 hunks)
  • src/b26.h (1 hunks)
  • src/bits.h (1 hunks)
  • src/debruijn25.c (1 hunks)
  • src/file.c (2 hunks)
  • src/file.h (1 hunks)
  • src/test.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/test.c
  • src/file.c
  • src/file.h
  • src/b26.h
  • src/b26.c
  • src/bits.h
  • src/debruijn25.c
  • src/b25.h
🧬 Code Graph Analysis (3)
src/file.c (1)
src/file.h (1)
  • file_in (27-30)
src/b26.h (1)
src/b25.h (2)
  • struct (127-134)
  • struct (145-149)
src/debruijn25.c (4)
src/b26.h (3)
  • struct (32-39)
  • b26_identify (59-61)
  • b26_load (48-48)
src/b25.h (7)
  • struct (127-134)
  • struct (145-149)
  • void (76-87)
  • b25_info_by_id (146-146)
  • b25_id (128-128)
  • b25_info_print (152-153)
  • b25_info (143-143)
src/file.h (1)
  • char (41-45)
src/b26.c (2)
  • b26_identify (99-116)
  • b26_load (12-24)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04
🔇 Additional comments (3)
src/file.c (3)

57-58: "rbem" is a glibc-only mode string – breaks on non-glibc systems

e (close-on-exec) is widely supported, but m (memory-mapped I/O) is a GNU extension that fails on musl, BSD libc, MSVC… Use feature-test macros or drop the flag.

-FILE *fp = fopen(path, "rbem");
+FILE *fp = fopen(path, "rbe"); /* drop 'm' or add a #ifdef __GLIBC__ guard */

121-131: Write past end of user-supplied buffer when size == len

When caller supplies their own buffer and size == len, the FILE_IN_TEXT branch appends a terminator unconditionally, overflowing by one byte. Guard with size > len.

-            case FILE_IN_TEXT:
-                if (size > len)
-                    ret.data[ret.size] = 0;
+            case FILE_IN_TEXT:
+                if (size > len)
+                    ret.data[ret.size] = 0;

(Current code already has the guard – keep it that way, this line just flags the importance.)


145-151: 🧹 Nitpick (assertive)

file_in_fini misses FILE_IN_EXACT flag reset

You clear FILE_IN_ALLOC, but leaving FILE_IN_EXACT set could mislead later debug prints. Suggest zeroing all flag bits.

-        f->flags = 0;
+        f->flags = 0;

Likely an incorrect or invalid review comment.

Comment on lines +14 to +18
#ifndef _MSC_VER
# define BITINT_C(n) n##WB
#else
# define BITINT_C(n) n
#endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

BITINT_C concatenates an unknown literal suffix – compilation will fail

The suffix WB is not a standard integer-constant suffix on any C/C++ compiler I know of.
Expanding BITINT_C(42) currently produces 42WB, which is an invalid token and will stop the build long before the rest of the header becomes useful.

-# define BITINT_C(n) n##WB
+# define BITINT_C(n) n##ULL   /* or UL / U depending on the intended width */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#ifndef _MSC_VER
# define BITINT_C(n) n##WB
#else
# define BITINT_C(n) n
#endif
#ifndef _MSC_VER
# define BITINT_C(n) n##ULL /* or UL / U depending on the intended width */
#else
# define BITINT_C(n) n
#endif
🤖 Prompt for AI Agents
In src/bits.h around lines 14 to 18, the macro BITINT_C appends a non-standard
suffix 'WB' to integer literals, causing compilation errors. Remove the 'WB'
suffix from the macro definition so that BITINT_C(n) simply expands to n on all
platforms, ensuring valid integer literals and successful compilation.

Comment on lines 27 to +35
for (int i = 0; ++i < argc;) {
struct file_in f = file_read(argv[i]);
int e = file_error(&f);
struct file_in f = file_in(argv[i], nullptr, 0, FILE_IN_TEXT);
int e = file_in_error(&f);
if (e) {
pr_errno(e, "file_read");
ret = EXIT_FAILURE;
continue;
}
char const *txt = file_text(&f);
char const *txt = file_in_text(&f);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potentially misleading log message and non-portable nullptr

  1. pr_errno(e, "file_read"); still references the old API name. Use "file_in" (or simply the path) to avoid confusion.
  2. nullptr is a C++ keyword. Unless compat.h defines it, plain C builds will choke. Prefer NULL.
-        struct file_in f = file_in(argv[i], nullptr, 0, FILE_IN_TEXT);
+        struct file_in f = file_in(argv[i], NULL, 0, FILE_IN_TEXT);
...
-            pr_errno(e, "file_read");
+            pr_errno(e, "file_in");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (int i = 0; ++i < argc;) {
struct file_in f = file_read(argv[i]);
int e = file_error(&f);
struct file_in f = file_in(argv[i], nullptr, 0, FILE_IN_TEXT);
int e = file_in_error(&f);
if (e) {
pr_errno(e, "file_read");
ret = EXIT_FAILURE;
continue;
}
char const *txt = file_text(&f);
char const *txt = file_in_text(&f);
for (int i = 0; ++i < argc;) {
- struct file_in f = file_in(argv[i], nullptr, 0, FILE_IN_TEXT);
+ struct file_in f = file_in(argv[i], NULL, 0, FILE_IN_TEXT);
int e = file_in_error(&f);
if (e) {
- pr_errno(e, "file_read");
+ pr_errno(e, "file_in");
ret = EXIT_FAILURE;
continue;
}
char const *txt = file_in_text(&f);
🤖 Prompt for AI Agents
In src/test.c around lines 27 to 35, update the error log message in pr_errno
from "file_read" to "file_in" to reflect the correct API name and avoid
confusion. Also, replace the C++ keyword nullptr with the C standard NULL to
ensure compatibility with plain C builds unless nullptr is explicitly defined in
compat.h.

Comment on lines +95 to +101
if (flags & FILE_IN_ALLOC) {
dest = malloc(len + !!(flags & FILE_IN_TEXT));
if (!dest) {
ret.error = errno;
break;
}
} else if (len > size) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Integer-overflow guard missing on allocation size

len + !!(flags & FILE_IN_TEXT) can wrap for files near SIZE_MAX. Check before malloc.

-            if (flags & FILE_IN_ALLOC) {
-                dest = malloc(len + !!(flags & FILE_IN_TEXT));
+            if (flags & FILE_IN_ALLOC) {
+                if (len > SIZE_MAX - 1) { ret.error = EOVERFLOW; break; }
+                dest = malloc(len + !!(flags & FILE_IN_TEXT));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (flags & FILE_IN_ALLOC) {
dest = malloc(len + !!(flags & FILE_IN_TEXT));
if (!dest) {
ret.error = errno;
break;
}
} else if (len > size) {
if (flags & FILE_IN_ALLOC) {
if (len > SIZE_MAX - 1) {
ret.error = EOVERFLOW;
break;
}
dest = malloc(len + !!(flags & FILE_IN_TEXT));
if (!dest) {
ret.error = errno;
break;
}
} else if (len > size) {
🤖 Prompt for AI Agents
In src/file.c around lines 95 to 101, the allocation size calculation for malloc
does not guard against integer overflow when adding len and the boolean
expression. Before calling malloc, add a check to ensure that len plus the
additional byte does not exceed the maximum size_t value to prevent wraparound
and potential security issues. If the check fails, handle the error
appropriately without calling malloc.

Comment on lines 69 to 76
struct stat s = {0};
ret.ec[0] = fileno(fp);
ret.error = fileno(fp);

ret.ec[0] = ret.ec[0] < 0 || fstat(ret.ec[0], &s);
if (ret.ec[0]) {
ret.ec[0] = errno;
ret.error = ret.error < 0 || fstat(ret.error, &s);
if (ret.error) {
ret.error = errno;
break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

ret.error used as a scratch variable – loses the original errno

ret.error = fileno(fp); ret.error = ret.error < 0 || fstat(..) mixes file-descriptor storage and boolean status.
Use a dedicated local (int fd) for clarity and to keep ret.error strictly an error code.

-            struct stat s = {0};
-            ret.error = fileno(fp);
-            ret.error = ret.error < 0 || fstat(ret.error, &s);
+            struct stat s = {0};
+            int fd = fileno(fp);
+            if (fd < 0 || fstat(fd, &s)) {
+                    ret.error = errno;
+                    break;
+            }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/file.c around lines 69 to 76, the variable ret.error is incorrectly used
both to store the file descriptor and as a boolean status, which causes the
original error code to be lost. Introduce a separate local integer variable,
such as fd, to store the result of fileno(fp). Then use fd for the fstat call
and keep ret.error exclusively for error codes. This will preserve the original
errno and improve code clarity.

Comment on lines 35 to 44
static force_inline int
file_error (struct file_in const *f)
file_in_error (struct file_in const *f)
{
return f && !f->data ? f->ec[0] : 0;
return f && !f->data ? f->error : 0;
}

static force_inline char const *
file_text (struct file_in const *f)
file_in_text (struct file_in const *f)
{
return f && f->data ? (char const *)f->data : "";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

file_in_error() silently hides misuse
Returning 0 when the caller passes a null pointer makes it impossible to catch obvious misuse at runtime. Consider:

-return f && !f->data ? f->error : 0;
+if (!f)
+    return EINVAL;   /* or another sentinel */
+return f->data ? 0 : f->error;

This preserves the “0 means OK” convention while still signalling “you passed me a NULL”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static force_inline int
file_error (struct file_in const *f)
file_in_error (struct file_in const *f)
{
return f && !f->data ? f->ec[0] : 0;
return f && !f->data ? f->error : 0;
}
static force_inline char const *
file_text (struct file_in const *f)
file_in_text (struct file_in const *f)
{
return f && f->data ? (char const *)f->data : "";
static force_inline int
file_in_error (struct file_in const *f)
{
if (!f)
return EINVAL; /* or another sentinel */
return f->data ? 0 : f->error;
}
🤖 Prompt for AI Agents
In src/file.h around lines 35 to 44, the function file_in_error returns 0 when
passed a null pointer, which hides misuse and makes it hard to detect errors.
Modify the function to explicitly signal an error or misuse when the input
pointer is null instead of returning 0, preserving the "0 means OK" convention
but allowing detection of null pointer misuse at runtime.

Comment on lines +46 to +39
#if UINTPTR_MAX >= UINT64_MAX
void const *k = (void const *)seq;
#else
void const *k = (void const *)&seq;
#endif
uint64_t const *ret = bsearch(k, arr, (size_t)67108864U,
sizeof *arr, b26_cmp);
if (!ret)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Binary-search count is a magic number
67108864U appears three times. Replacing it with B26_TABLE_LEN (or similar) in a public header eliminates duplication and protects future resizes.

🤖 Prompt for AI Agents
In src/b26.c around lines 46 to 53, replace the magic number 67108864U used as
the array length in the bsearch call with a named constant like B26_TABLE_LEN
defined in a public header. This removes duplication and makes future resizing
safer by centralizing the array length definition.

Comment on lines 59 to 81
uint64_t
is_b26 (uint64_t const x,
unsigned *const rot)
{
uint64_t y = x << 6U | x >> (64U - 6U);
uint64_t m = 0U;
unsigned r = 0U;
unsigned n = 0U;

do {
y >>= 1U;
uint64_t b = UINT64_C(1) << (y & 63U);
uint64_t m_ = m;
m |= b;
if (m == m_)
return 0U;
++n;
if (b == 1U)
r = n & 63U;
} while (n < 6U);

y = x;
do {
y >>= 1U;
uint64_t b = UINT64_C(1) << (y & 63U);
uint64_t m_ = m;
m |= b;
if (m == m_)
return 0U;
++n;
if (b == 1U)
r = n & 63U;
} while (m != UINT64_MAX);

if (rot)
*rot = r;

return ror64(x, r);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor constant mismatch in is_b26
Comparing b == 1U mixes a 64-bit value with a 32-bit literal. Not wrong, but the explicit UINT64_C(1) you used elsewhere avoids implicit promotions and reads clearer.

🤖 Prompt for AI Agents
In src/b26.c between lines 59 and 97, replace the comparisons of b == 1U with b
== UINT64_C(1) to maintain consistent 64-bit constant usage. This avoids mixing
64-bit values with 32-bit literals, ensuring clarity and preventing implicit
type promotions.

src/b26.c Outdated
Comment on lines 15 to 23
struct file_in f = file_in(path, nullptr,
67108864U * 8U,
FILE_IN_EXACT);
if (file_in_error(&f)) {
file_in_fini(&f);
return nullptr;
}

return (uint64_t *)(void *)f.data;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential lifetime / leak of loaded 512 MB table
b26_load allocates via file_in but never calls file_in_fini, so the caller has no way to release the buffer. Either:

  1. Require the caller to own and later file_in_fini a returned struct file_in, or
  2. Provide a symmetric b26_unload(uint64_t *tbl) that wraps free.

As-is, long-running tools will leak half a gigabyte per load.

🤖 Prompt for AI Agents
In src/b26.c around lines 15 to 23, the function b26_load calls file_in to
allocate a large buffer but never calls file_in_fini to release it, causing a
memory leak. To fix this, either change b26_load to return the struct file_in
itself so the caller can later call file_in_fini to free the resource, or
implement a new function b26_unload that takes the returned pointer and calls
file_in_fini internally to free the buffer. This ensures proper resource
management and prevents memory leaks in long-running processes.

Comment on lines +100 to +99
b26_identify (uint64_t x,
struct b26_id *const id,
uint64_t const *const arr)
{
unsigned rot = 0;
x = is_b26(x, &rot);
if (x) {
int64_t i = b26_find(x, arr);
if (i < 0)
return 0U;
*id = (struct b26_id){
.rot = rot,
.idx = (uint32_t)i,
};
}
return x;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

id may be NULL – no guard
b26_identify dereferences id unconditionally. If the caller only wants the return value, they have no safe way to pass NULL. Either document that id must be non-NULL or add a check:

-        *id = (struct b26_id){
+        if (id)
+            *id = (struct b26_id){
                 .rot = rot,
                 .idx = (uint32_t)i,
             };

🏁 Script executed:

#!/bin/bash
# Locate declaration/definition of b26_identify
rg -n "b26_identify" -n .

Length of output: 386


Enforce non-NULL id in b26_identify

Currently b26_identify unconditionally writes to *id, which will crash if a caller ever passes NULL. All existing callers use &id, but as a public API you should either document that id must be non-NULL or add a runtime check/assertion.

Locations to update:

  • src/b26.h – update the comment on b26_identify to state “id must be non-NULL.”
  • src/b26.c – at the top of b26_identify, add either:
    assert(id != NULL);
    or
    if (id == NULL)
        return 0;
🤖 Prompt for AI Agents
In src/b26.c around lines 100 to 115, the function b26_identify dereferences the
pointer id without checking if it is NULL, which can cause a crash if a NULL
pointer is passed. To fix this, add a runtime check at the start of b26_identify
to verify that id is not NULL, either by using an assert(id != NULL) or by
returning 0 immediately if id is NULL. Additionally, update the comment for
b26_identify in src/b26.h to explicitly state that id must be non-NULL.

Comment on lines +26 to +23
static int
b26_cmp (void const *pkey,
void const *pseq)
{
#if UINTPTR_MAX >= UINT64_MAX
uint64_t key = (uint64_t)pkey;
#else
uint64_t key = *(uint64_t const *)pkey;
#endif
uint64_t seq = *(uint64_t const *)pseq;
return (key > seq) - (key < seq);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

void*uint64_t casts are non-portable
Passing the key as a “fake pointer” makes the comparison work only when sizeof(void*) ≥ 8. Even then, converting an arbitrary integer to a pointer is merely implementation-defined and may trap on CHERI/MPX/etc.

A simpler, portable pattern:

-#if UINTPTR_MAX >= UINT64_MAX
-    void const *k = (void const *)seq;
-#else
-    void const *k = (void const *)&seq;
-#endif
+uint64_t key = seq;
+void const *k = &key;

Then drop the conditional logic in b26_cmp.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/b26.c around lines 26 to 37, the current casting between void* and
uint64_t is non-portable and unsafe. Replace the function parameter to accept a
pointer to uint64_t directly instead of void*, then dereference it to get the
key value. Remove the conditional compilation and the unsafe cast, making the
comparison straightforward and portable.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37ccfd3 and 4715cab.

📒 Files selected for processing (1)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn26.c
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04

Comment on lines +190 to +414
static uint64_t
parse_u64 (char *const arg,
int *const err)
{
int e = !arg ? EFAULT : !*arg ? ENODATA : 0;
uint64_t u64 = 0;

if (!e) {
errno = 0;
char *p = arg;
int64_t i64 = _Generic(
i64, long: strtol,
long long: strtoll
)(arg, &p, 0);
e = errno;

if (!e) {
u64 = (uint64_t)i64;

} else if (e == ERANGE &&
i64 == _Generic(i64, long: LONG_MAX,
long long: LLONG_MAX)) {
errno = 0;
p = arg;
u64 = _Generic(
u64, unsigned long: strtoul,
unsigned long long: strtoull
)(arg, &p, 0);
e = errno;
}

if (!e && *p)
e = EINVAL;
}

*err = e;
return u64;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

parse_u64 over-engineered, still misses edge-cases

The _Generic gymnastics pick strtol/strtoul vs their long long
variants, but:

  1. int64_t/uint64_t don’t guarantee matching long on LP32/ILP32.
  2. Negative numbers outside int64_t range but within uint64_t are
    mis-handled.
  3. The code is much bigger than needed.

A simpler, POSIX-portable implementation is enough:

-static uint64_t
-parse_u64 (char *const arg,
-           int  *const err)
-{
-    ...
-}
+static uint64_t
+parse_u64(char *arg, int *err)
+{
+    errno = 0;
+    char *end = NULL;
+    unsigned long long val = strtoull(arg, &end, 0);
+
+    if (errno || !*arg || *end) {
+        *err = errno ? errno : EINVAL;
+        return 0;
+    }
+    *err = 0;
+    return (uint64_t)val;
+}

Reduces complexity and avoids the type-pun riddles.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/debruijn26.c between lines 190 and 227, the parse_u64 function is overly
complex and incorrectly handles some edge cases, such as negative numbers
outside int64_t range but valid in uint64_t, and relies on _Generic with
assumptions about type sizes. Simplify the function by using a single standard
function like strtoull for parsing, which directly returns an unsigned 64-bit
value, handle errors based on errno and the end pointer, and remove the _Generic
usage to ensure portability and correctness.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (9)
src/b26.h (2)

11-23: _BitInt still unguarded – older tool-chains will not compile
The exact same portability issue raised in the previous review remains: _BitInt is used without a fallback for pre-C23 compilers. Please add the conditional __STDC_VERSION__ >= 202311L (or compiler intrinsics) and fall back to uint32_t/uint8_t as suggested earlier.


25-31: Bit-field layout is still implementation-defined
rot : 6 / idx : 26 relies on host-specific bit-field ordering (endianness + packing). Either keep the struct strictly private and never serialise it, or switch to explicit masks/shifts (uint32_t id = rot | idx << 6).

src/b26.c (4)

16-23: void *uint64_t cast is non-portable
Casting integers to pointers invokes implementation-defined behaviour and silently breaks on CHERI/MPX or when sizeof(void*) < 8. Accept a const uint64_t * key instead and drop the cast logic.


32-38: Magic number 67108864U appears again
Replace with a single #define B26_TABLE_LEN 67108864U in a public header and reuse it in b26_find, b26_array_is_valid, and b26_load.


62-63: Mixed-width constant
b == 1U compares a 64-bit value with a 32-bit literal. Replace with UINT64_C(1) for clarity and to avoid implicit promotions.


94-97: id is dereferenced unconditionally
b26_identify still crashes if callers pass NULL. Either assert(id) or guard the store.

src/debruijn26.c (3)

59-76: Heavy _BitInt + bit-fields: same portability caveats
All earlier remarks about _BitInt availability and bit-field ordering apply here as well. Consider replacing with plain uint32_t and explicit masks.


258-259: Shift-by-64 is still undefined
(is_1 << n) is UB when n == 64. Add the explicit n == 64 guard as suggested earlier.


195-206: Opaque indexing logic harms readability
The ternary math and sparse table in b26_bitcount_offset remain hard to follow. A 2-D array or a switch would be clearer and safer for future maintenance.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4715cab and c00a6b1.

📒 Files selected for processing (3)
  • src/b26.c (1 hunks)
  • src/b26.h (1 hunks)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/b26.h
  • src/debruijn26.c
  • src/b26.c
🧬 Code Graph Analysis (2)
src/b26.h (1)
src/b25.h (2)
  • struct (127-134)
  • struct (145-149)
src/b26.c (3)
src/b26.h (2)
  • b26_identify (51-53)
  • b26_load (48-48)
src/file.c (2)
  • file_in (43-139)
  • file_in_fini (141-152)
src/file.h (3)
  • file_in (27-30)
  • file_in_error (36-36)
  • file_in_fini (33-33)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-macos-14

Comment on lines +105 to +113
uint64_t prev = 0U;
uint64_t i = 0U;
do {
uint64_t seq = arr[i];
unsigned rot;
if (seq <= prev || seq != is_b26(seq, &rot))
return false;
} while (++i < 67108864U);
return true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

prev never updated – sort check is ineffective
b26_array_is_valid compares seq <= prev but never assigns prev = seq, so every element after the first is compared against zero. The ascending-order validation is therefore broken.

@@
-    uint64_t prev = 0U;
+    uint64_t prev = 0U;
@@
-        if (seq <= prev || seq != is_b26(seq, &rot))
+        if (seq <= prev || seq != is_b26(seq, &rot))
             return false;
+        prev = seq;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
uint64_t prev = 0U;
uint64_t i = 0U;
do {
uint64_t seq = arr[i];
unsigned rot;
if (seq <= prev || seq != is_b26(seq, &rot))
return false;
} while (++i < 67108864U);
return true;
uint64_t prev = 0U;
uint64_t i = 0U;
do {
uint64_t seq = arr[i];
unsigned rot;
if (seq <= prev || seq != is_b26(seq, &rot))
return false;
prev = seq;
} while (++i < 67108864U);
return true;
🤖 Prompt for AI Agents
In src/b26.c around lines 105 to 113, the variable 'prev' is used to check if
the array is sorted in ascending order but is never updated after the first
comparison, making the sort check ineffective. To fix this, assign 'prev = seq'
at the end of each loop iteration before incrementing 'i' so that each element
is correctly compared to the previous one.

Comment on lines +290 to +404
parse_u64 (char *const arg,
int *const err)
{
int e = !arg ? EFAULT : !*arg ? ENODATA : 0;
uint64_t u64 = 0;

if (!e) {
errno = 0;
char *p = arg;
int64_t i64 = _Generic(
i64, long: strtol,
long long: strtoll
)(arg, &p, 0);
e = errno;

if (!e) {
u64 = (uint64_t)i64;

} else if (e == ERANGE &&
i64 == _Generic(i64, long: LONG_MAX,
long long: LLONG_MAX)) {
errno = 0;
p = arg;
u64 = _Generic(
u64, unsigned long: strtoul,
unsigned long long: strtoull
)(arg, &p, 0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

parse_u64 does not compile – self-referential _Generic
_Generic(i64, ...) uses i64 before it is defined, which is illegal. The whole construct is over-engineered; replace with a simple strtoull-based parser.

-static uint64_t
-parse_u64(char *const arg, int *const err)
-{
-    ...
-    int64_t i64 = _Generic(
-            i64, long: strtol,
-            long long: strtoll
-        )(arg, &p, 0);
-    ...
-}
+static uint64_t
+parse_u64(char *arg, int *err)
+{
+    errno = 0;
+    char *end = NULL;
+    unsigned long long val = strtoull(arg, &end, 0);
+    *err = (errno || *end) ? (errno ? errno : EINVAL) : 0;
+    return (uint64_t)val;
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parse_u64 (char *const arg,
int *const err)
{
int e = !arg ? EFAULT : !*arg ? ENODATA : 0;
uint64_t u64 = 0;
if (!e) {
errno = 0;
char *p = arg;
int64_t i64 = _Generic(
i64, long: strtol,
long long: strtoll
)(arg, &p, 0);
e = errno;
if (!e) {
u64 = (uint64_t)i64;
} else if (e == ERANGE &&
i64 == _Generic(i64, long: LONG_MAX,
long long: LLONG_MAX)) {
errno = 0;
p = arg;
u64 = _Generic(
u64, unsigned long: strtoul,
unsigned long long: strtoull
)(arg, &p, 0);
static uint64_t
parse_u64(char *arg, int *err)
{
errno = 0;
char *end = NULL;
unsigned long long val = strtoull(arg, &end, 0);
*err = (errno || *end) ? (errno ? errno : EINVAL) : 0;
return (uint64_t)val;
}
🤖 Prompt for AI Agents
In src/debruijn26.c around lines 290 to 316, the parse_u64 function uses a
self-referential _Generic expression with i64 before it is defined, causing a
compilation error. Replace the entire parsing logic with a straightforward call
to strtoull to parse the string argument into a uint64_t value, handle errors by
checking errno and the end pointer, and set the error code accordingly. Remove
the complex _Generic constructs and simplify the function to use strtoull
directly for parsing unsigned 64-bit integers.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
src/debruijn26.c (5)

17-31: _BitInt + bit-field = still non-portable; please switch to fixed-width types
Same concern as in the previous review: _BitInt is C23-only and the bit-field layout is implementation-defined. Nothing changed here.


175-176: Off-by-30 iteration count still present
Loop stops at 67 108 834 instead of 67 108 864. Previous feedback not addressed.


192-205: Opaque indexing logic unchanged
The magic index arithmetic in b26_bitcount_offset remains. It’s still hard to read and brittle as flagged before.


257-282: Shift-by-64 UB not fixed
(is_1 << n) still triggers undefined behaviour when n == 64. Earlier patch suggestion is still valid.


298-315: _Generic uses i64 before it exists – parse_u64 still uncompilable
The self-referential _Generic expression is untouched, so the function doesn’t compile on any conforming compiler.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c00a6b1 and c66d220.

📒 Files selected for processing (1)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn26.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
🔇 Additional comments (1)
src/debruijn26.c (1)

207-215: 🧹 Nitpick (assertive)

const_inline is not a C keyword
Both static const_inline declarations rely on a non-standard qualifier. Use static inline (C99) and keep const on the function parameters instead.

⛔ Skipped due to learnings
Learnt from: imaami
PR: imaami/libcanth#2
File: src/util.h:34-34
Timestamp: 2024-08-07T00:23:21.985Z
Learning: The `const_inline` macro is specifically designed to optimize performance by inlining functions and assuming their return values are determined only by their arguments. It is beneficial for functions that are both constant and should always be inlined.
Learnt from: imaami
PR: imaami/libcanth#2
File: src/utf8.c:274-274
Timestamp: 2024-08-02T15:59:05.003Z
Learning: The `const_inline` macro is used in the codebase to support different compilers with varying levels of standards support and C extensions features. It is defined in `src/util.h` with attributes supported by both GCC and Clang.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
src/debruijn26.c (6)

75-92: Bit-field still depends on undefined type u32bit and non-portable layout

The earlier review pointed out that u32bit isn’t declared and that relying on bit-field layout inside a uint32_t is implementation defined. Both issues remain.

No compiler = no CI green. Consider the mask/shift alternative proposed previously.


118-136: nullptr is C++-only

Line 122 initialises a C pointer with nullptr. Switch to NULL.

-    char *b26_bin_file = nullptr;
+    char *b26_bin_file = NULL;

175-176: Loop iterates 30 elements short

67108834U should be 67108864U – same off-by-30 typo flagged earlier.


260-267: (is_1 << n) is UB when n == 64

Left-shifting by the width of the type is undefined. Previous fix suggestion still applies.


192-205: Opaque index math persists

The magic expression (size < 7U) * (size + is_1 * 7U) hurts readability. A 2-D table or switch would be clearer.


293-330: parse_u64 still over-complex and fails to compile

Self-referential _Generic uses i64 before definition and assumes C23. Replace with the simple strtoull variant previously provided.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c66d220 and 1eace7c.

📒 Files selected for processing (1)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn26.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04
🔇 Additional comments (1)
src/debruijn26.c (1)

7-14: Missing <stdbool.h> include

The file uses bool but never includes <stdbool.h>. Some compilers won’t pull it transitively – add it right after <assert.h>.

Comment on lines +192 to +271
static const_inline struct i_m
b26_bitcount_offset (unsigned const size,
bool const is_1)
{
return (struct i_m[]){
{-1, 0, 0}, { 7,15, 8},
{ 4, 7, 4}, { 2, 3, 2},
{ 1, 1, 1}, {-1, 0, 0},
{ 0, 1, 1}, {-1, 0, 0},
{21,15, 8}, {25, 7, 4},
{28, 3, 2}, {30, 1, 1},
{-1, 0, 0}, {31, 1, 1},
}[(size < 7U) * (size + is_1 * 7U)];
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

const_inline is unknown here

static const_inline relies on a macro that is not pulled in by any include visible in this file. If util.h defines it, add the header; otherwise drop the qualifier:

-static const_inline struct i_m
+static inline struct i_m
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static const_inline struct i_m
b26_bitcount_offset (unsigned const size,
bool const is_1)
{
return (struct i_m[]){
{-1, 0, 0}, { 7,15, 8},
{ 4, 7, 4}, { 2, 3, 2},
{ 1, 1, 1}, {-1, 0, 0},
{ 0, 1, 1}, {-1, 0, 0},
{21,15, 8}, {25, 7, 4},
{28, 3, 2}, {30, 1, 1},
{-1, 0, 0}, {31, 1, 1},
}[(size < 7U) * (size + is_1 * 7U)];
}
static inline struct i_m
b26_bitcount_offset (unsigned const size,
bool const is_1)
{
return (struct i_m[]){
{-1, 0, 0}, { 7,15, 8},
{ 4, 7, 4}, { 2, 3, 2},
{ 1, 1, 1}, {-1, 0, 0},
{ 0, 1, 1}, {-1, 0, 0},
{21,15, 8}, {25, 7, 4},
{28, 3, 2}, {30, 1, 1},
{-1, 0, 0}, {31, 1, 1},
}[(size < 7U) * (size + is_1 * 7U)];
}
🤖 Prompt for AI Agents
In src/debruijn26.c around lines 192 to 205, the use of 'const_inline' is
problematic because it is an unknown macro not defined or included in this file.
To fix this, check if 'const_inline' is defined in 'util.h' or another header;
if so, add the appropriate #include directive for that header at the top of the
file. If no such definition exists, remove the 'const_inline' qualifier from the
function declaration to avoid compilation errors.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (18)
src/b26.h (2)

11-23: Portability of _BitInt still unaddressed
The _BitInt branch will break with the default -std=gnu11 settings on GCC < 13 / Clang < 16. A guarded fallback to <stdint.h> types, as suggested earlier, is still missing.


25-31: Bit-field layout is implementation-defined
rot : 6 / idx : 26 again relies on host-endian LSB ordering. Either keep the struct strictly internal or use explicit masks/shifts for cross-platform safety.

src/b26.c (6)

16-23: Still casting integers to pointers
The b26_cmp trick ((void const *)seq) is non-portable and UB-prone. Pass the key as a uint64_t * instead and drop the cast logic.


37-38: Magic number 67108864U repeated
Replace the hard-coded length with a named constant (e.g. B26_TABLE_LEN).


62-63: Mixed-width literal
b == 1U compares a 64-bit value with a 32-bit literal. Use UINT64_C(1) for consistency.


94-97: id not checked for NULL
b26_identify still crashes if the caller passes NULL for id.


105-112: Sort check still broken – prev never updated
Without prev = seq; inside the loop the ascending-order validation does nothing useful.


120-131: Large buffer leaked
b26_load allocates via file_in but never frees unless validation fails. Provide a matching b26_unload() or return the struct file_in so the caller can call file_in_fini.

src/debruijn26.c (6)

18-37: C23 _BitInt & bit-field branches remain
Compilation will fail on tool-chains that don’t support _BitInt. The earlier fallback suggestion is still applicable.


172-174: nullptr is C++-only
Replace nullptr with NULL for plain C builds.


378-390: Shift-by-64 UB still present
(is_1 << n) is undefined when n == 64. Guard the case before shifting.


470-489: parse_u64 self-referential _Generic breaks compilation
The function still uses i64 before it is defined and falls back to _BitInt. Replace with a straightforward strtoull implementation as previously suggested.


261-274: Obscure magic indexing
b26_bitcount_offset remains opaque and brittle. Prefer an explicit lookup table or switch.


216-217: ⚠️ Potential issue

Off-by-30 loop bound

Should iterate over 67 108 864 entries:

-        for (size_t i = 0; i < 67108834U; ++i) {
+        for (size_t i = 0; i < 67108864U; ++i) {
src/debruijn25.c (4)

12-16: ⚠️ Potential issue

nullptr is C++ - use NULL in C code

Replace all instances of nullptr with NULL for C compatibility:

-             it_ = (it) != o_ ? (it) : nullptr)
+             it_ = (it) != o_ ? (it) : NULL)

-return i == bag->head[second] ? nullptr : &bag->ptr[i];
+return i == bag->head[second] ? NULL : &bag->ptr[i];

-    uint64_t *b26 = nullptr;
+    uint64_t *b26 = NULL;

-        uint16_t id = (uint16_t)strtoul(v[i], nullptr, 0);
+        uint16_t id = (uint16_t)strtoul(v[i], NULL, 0);

Also applies to: 221-221, 297-297, 302-302


85-114: ⚠️ Potential issue

Dead code after early return

Everything after return alt_main(argc, argv); is unreachable. Either remove it or wrap in #if 0.

 int main (int    argc,
           char **argv)
 {
     return alt_main(argc, argv);
+#if 0
     struct ptab props = ptab(70);
     
     // ... rest of the code ...
     
     }
+#endif
 }

302-302: ⚠️ Potential issue

Missing error validation for strtoul

Input values exceeding 16 bits will silently truncate. Add range checking:

-        uint16_t id = (uint16_t)strtoul(v[i], NULL, 0);
+        char *endp = NULL;
+        errno = 0;
+        unsigned long val = strtoul(v[i], &endp, 0);
+        if (errno || *endp || val > UINT16_MAX) {
+            fprintf(stderr, "Invalid id: %s\n", v[i]);
+            continue;
+        }
+        uint16_t id = (uint16_t)val;

227-245: ⚠️ Potential issue

Potential shift overflow when bs underflows

bs decrements twice per iteration. After ~33 elements, shifts become undefined:

     bs -= len;
+    if (bs == 0 || (int)bs < 0)
+        break;
     len *= ucp[0][1] == (unsigned char)'1';
     u64 |= (UINT64_C(1) << len) - 1U << bs;
     bs -= (len = ucp[1][0]);
+    if (bs == 0 || (int)bs < 0)
+        break;
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1eace7c and aa4c77a.

📒 Files selected for processing (4)
  • src/b26.c (1 hunks)
  • src/b26.h (1 hunks)
  • src/debruijn25.c (1 hunks)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/b26.c
  • src/b26.h
  • src/debruijn26.c
  • src/debruijn25.c
🧬 Code Graph Analysis (1)
src/b26.c (3)
src/b26.h (2)
  • b26_identify (52-54)
  • b26_load (48-49)
src/file.h (3)
  • file_in (27-30)
  • file_in_error (36-36)
  • file_in_fini (33-33)
src/file.c (2)
  • file_in (43-139)
  • file_in_fini (141-152)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-22_04
🔇 Additional comments (1)
src/b26.h (1)

51-54: Document that id must be non-NULL or add a guard
b26_identify() dereferences id unconditionally. Make it explicit in this header that the argument must be non-NULL, or adjust the implementation to tolerate NULL.

Comment on lines +31 to +36
static force_inline size_t
mapping (struct ptab *tab,
unsigned idx)
{
return (size_t)(ror32(tab->m, idx) & UINT32_C(31));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Undefined ror32 function

ror32 is used but not defined in this file. Verify it's properly declared:


🏁 Script executed:

#!/bin/bash
# Find ror32 definition
ast-grep --pattern 'ror32($_, $_) {
  $$$
}'

# Check headers for declaration
rg -A 2 'ror32' src/b25.h src/b26.h

Length of output: 161


🏁 Script executed:

#!/bin/bash
# Search for macro definition of ror32 across the entire repository
rg -n '#\s*define\s\+ror32' .

# Search for any declaration or definition of ror32
rg -n 'ror32\s*\(' .

Length of output: 516


🏁 Script executed:

#!/bin/bash
# Show include directives and top of src/debruijn25.c
grep -n '^#include' src/debruijn25.c
echo
head -n 50 src/debruijn25.c

Length of output: 1192


Missing definition for ror32

ror32 is used in several places but no declaration or definition exists in the codebase. You need to provide its implementation (e.g., as a static inline in a common header) and include that header where it’s called.

Affected files:

  • src/debruijn25.c (mapping at lines 31–36 and other uses)
  • src/b25.c (lines 217, 240)
  • src/b25.h (line 148)

Suggested implementation (in, say, rotate.h):

#ifndef ROTATE_H
#define ROTATE_H

#include <stdint.h>

static inline uint32_t ror32(uint32_t x, unsigned n) {
    unsigned m = n & 31U;
    return (x >> m) | (x << (32 - m));
}

#endif /* ROTATE_H */

Then add #include "rotate.h" to each file using ror32.

🤖 Prompt for AI Agents
In src/debruijn25.c around lines 31 to 36, the function ror32 is used but not
defined or declared anywhere in the codebase, causing undefined reference
issues. To fix this, create a new header file (e.g., rotate.h) containing a
static inline implementation of ror32 that performs a 32-bit rotate right
operation as suggested. Then, include this new header file in src/debruijn25.c
and any other files using ror32, such as src/b25.c and src/b25.h, to ensure the
function is properly declared and defined.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (14)
src/b26.h (2)

19-23: Guard _BitInt behind a C23 / compiler-feature check
The _BitInt typedefs will not compile on pre-C23 GCC/Clang. The earlier review already pointed this out and the code is still unchanged. Please wrap the _BitInt branch in a feature test (__STDC_VERSION__ >= 202311L, __has_builtin(__builtin_bitint), etc.) and fall back to uint32_t/uint8_t.


25-28: Bit-field layout is still implementation-defined
rot : 6, idx : 26 inside one u32bit silently swaps order on big-endian targets. Either keep the struct strictly private and never serialise it, or replace it with explicit masks/shifts (as the verification bot suggested earlier).

src/b26.c (5)

37-38: Replace magic constant 67108864U with a named length macro
Hard-coding the table size three times is brittle. Define e.g. #define B26_TABLE_LEN 67108864U once in b26.h and use it in bsearch, b26_array_is_valid, and file_in.


62-63: Literal width mismatch (1U vs 64-bit)
if (b == 1U) mixes a 64-bit value with a 32-bit literal. Use UINT64_C(1) for consistency and to silence sign/width warnings.


94-97: id is dereferenced unconditionally
b26_identify still crashes if the caller passes NULL. Either document that id must be non-NULL or guard the write.


16-23: ⚠️ Potential issue

Stop casting integers to pointers in the comparator – it is UB on most ABIs
Using the address space to smuggle a 64-bit integer ((void*)seq) is undefined and already flagged. A trivial, portable fix:

-static int
-b26_cmp (void const *pkey,
-         void const *pseq)
-{
-#if UINTPTR_MAX >= UINT64_MAX
-    uint64_t key = (uint64_t)pkey;
-#else
-    uint64_t key = *(uint64_t const *)pkey;
-#endif
-    uint64_t seq = *(uint64_t const *)pseq;
-    return (key > seq) - (key < seq);
-}
+static int
+b26_cmp(void const *pkey, void const *pseq)
+{
+    uint64_t key = *(uint64_t const *)pkey;
+    uint64_t seq = *(uint64_t const *)pseq;
+    return (key > seq) - (key < seq);
+}

Likewise, replace the conditional hack in b26_find with void const *k = &seq;.


105-113: ⚠️ Potential issue

Sort-check bug – prev never updated
The validation loop always compares against zero. Update prev each iteration:

-        if (seq <= prev || seq != is_b26(seq, &rot))
-            return false;
+        if (seq <= prev || seq != is_b26(seq, &rot))
+            return false;
+        prev = seq;
src/b25.h (1)

21-24: ⚠️ Potential issue

_BitInt still not gated – breaks non-C23 compilers
Same issue raised earlier: every non-MSVC build is forced onto _BitInt, yet the file is compiled with -std=gnu11. Older GCC/Clang choke here.

Re-introduce the feature-test guard:

-#else
-typedef unsigned _BitInt(16) u16bit;
-typedef unsigned _BitInt(5)  uint5;
+#elif defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
+typedef unsigned _BitInt(16) u16bit;
+typedef unsigned _BitInt(5)  uint5;
+#else          /* pre-C23 compiler */
+typedef uint16_t u16bit;
+typedef uint8_t  uint5;      /* 0-31 still fits */
 #endif
src/debruijn26.c (5)

46-48: ⚠️ Potential issue

Mask only applies to low byte – high byte left dirty

Operator precedence makes the & bind tighter than |:

((u16)high << 8) | low & ~mask

so the mask is applied only to low. Parenthesise the OR side:

-        (((uint16_t)x->value[h] << 8U) | x->value[l] & ~(UINT16_C(7) << i));
+        ((((uint16_t)x->value[h] << 8U) | x->value[l]) & ~(UINT16_C(7) << i));

62-75: ⚠️ Potential issue

Un-guarded _BitInt types – same portability hole
The non-MSVC branch again assumes C23 support. Add the __STDC_VERSION__ >= 202311L check or fall back to fixed-width integers exactly as already requested for b25.h.


97-113: ⚠️ Potential issue

Undefined type u32bit – build failure

struct b26_bitcount declares fields of type u32bit, but no such typedef exists in this TU or any included header. Replace with uint32_t (or add a typedef) before this struct.


399-436: ⚠️ Potential issue

parse_u64 still contains the self-referential _Generic – will not compile

_Generic(i64, …) uses i64 before it’s declared. The earlier refactor suggestion (simple strtoull wrapper) has not been applied. This keeps the file uncompilable on all compilers.

Please drop the _Generic gymnastics and use the minimal, proven implementation.


191-193: ⚠️ Potential issue

nullptr is C++-only
Using nullptr in C code prevents successful compilation. Replace with plain NULL.

-    char *b26_bin_file = nullptr;
+    char *b26_bin_file = NULL;
...
-    uint64_t *b26 = nullptr;
+    uint64_t *b26 = NULL;

Also applies to: 297-298

src/debruijn25.c (1)

12-15: ⚠️ Potential issue

Macro expands to nullptr – C compilation fails

foreach_prop injects nullptr, a C++ keyword. Switch to NULL.

-             it_ = (it) != o_ ? (it) : nullptr)
+             it_ = (it) != o_ ? (it) : NULL)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bfe83a and b406c16.

📒 Files selected for processing (6)
  • src/b25.c (1 hunks)
  • src/b25.h (1 hunks)
  • src/b26.c (1 hunks)
  • src/b26.h (1 hunks)
  • src/debruijn25.c (1 hunks)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/b26.c
  • src/b25.c
  • src/b26.h
  • src/b25.h
  • src/debruijn26.c
  • src/debruijn25.c
🧬 Code Graph Analysis (4)
src/b26.c (1)
src/b26.h (3)
  • is_b26 (57-58)
  • b26_identify (52-54)
  • b26_load (48-49)
src/b26.h (1)
src/b25.h (2)
  • struct (127-134)
  • struct (151-155)
src/b25.h (3)
src/b25.c (2)
  • void (13-25)
  • b25_info (205-232)
src/debruijn25.c (1)
  • void (202-212)
src/b26.h (1)
  • struct (32-39)
src/debruijn26.c (3)
src/b25.h (3)
  • b25_id (128-128)
  • struct (127-134)
  • struct (151-155)
src/b26.c (2)
  • is_b26 (45-81)
  • b26_load (116-131)
src/b26.h (3)
  • is_b26 (57-58)
  • b26_load (48-49)
  • struct (32-39)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04

Comment on lines +120 to +131
struct file_in f = file_in(path, nullptr,
67108864U * 8U,
FILE_IN_EXACT);
uint64_t *a = (uint64_t *)(void *)f.data;

if (!file_in_error(&f) &&
(!validate || b26_array_is_valid(a)))
return a;

file_in_fini(&f);
return nullptr;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

512 MB table leaks – no symmetric unload API
b26_load hands out the raw buffer and the caller cannot free it (the file_in handle is lost). Provide b26_unload(uint64_t *tbl) that calls file_in_fini / free, or return the full struct file_in so the caller owns the handle.

🤖 Prompt for AI Agents
In src/b26.c around lines 120 to 131, the function b26_load returns a raw buffer
pointer without a way for the caller to free the associated 512 MB memory,
causing a leak. To fix this, implement a b26_unload function that takes the
uint64_t pointer and calls file_in_fini or free appropriately to release the
memory. Alternatively, modify b26_load to return the full struct file_in so the
caller can manage the resource lifecycle properly.

Comment on lines +211 to +216
do {
unsigned n = b25_info_get_num_cycles(&info);
unsigned off = b25_info_get_cycle(&info, n - 1U).offset;
uint32_t mask = UINT32_C(1) << off;
for (; (seen & mask); ++off, mask <<= 1);
for (unsigned i = off;;) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Index underflow hides logic – consider an explicit first-cycle path
When n == 0, b25_info_get_cycle(&info, n-1) is invoked with UINT_MAX; the helper returns a zeroed struct so it works, but the wraparound is hard to read and smells fragile. A simple if (!n) off = 0; else … would be clearer.

🤖 Prompt for AI Agents
In src/b25.c around lines 211 to 216, the code calls b25_info_get_cycle(&info, n
- 1U) without checking if n is zero, causing an underflow to UINT_MAX. To fix
this, add an explicit check: if n is zero, set off to 0 directly; otherwise,
call b25_info_get_cycle with n - 1U. This makes the logic clearer and avoids
relying on wraparound behavior.

Comment on lines +61 to +68
static const_inline unsigned
b25_div_offset (uint5 const i)
{
unsigned const k = uint5_get(i) >> 3U;
return (uint5_get(i) & 7U)
* (uint8_t[]){3,2,1,1}[k]
+ (uint8_t[]){8,32,48,56}[k];
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Compound literals allocated every call – turn them static
(uint8_t[]){…} is an automatic object created on each invocation of b25_div_offset().
Mark them static const (or hoist to a header) to remove the per-call overhead.

🤖 Prompt for AI Agents
In src/b25.h around lines 61 to 68, the compound literals used for the uint8_t
arrays are currently created on each function call, causing unnecessary
overhead. To fix this, declare these arrays as static const outside the function
or at the start of the function so they are allocated only once, eliminating
per-call allocation costs.

Comment on lines +23 to +24
#define uint5(x) (uint5)(x)
#define uint5_get(x) x
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Macro missing protective parentheses
#define uint5_get(x) x should be parenthesised to avoid precedence surprises:

-#define uint5_get(x) x
+#define uint5_get(x) ((x))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#define uint5(x) (uint5)(x)
#define uint5_get(x) x
#define uint5(x) (uint5)(x)
#define uint5_get(x) ((x))
🤖 Prompt for AI Agents
In src/b25.h at lines 23 to 24, the macro definition for uint5_get(x) lacks
protective parentheses around the parameter usage. To fix this, enclose the
parameter x in parentheses within the macro body, changing it to #define
uint5_get(x) (x) to prevent operator precedence issues during macro expansion.

Comment on lines +7 to +14
#include <assert.h>
#include <errno.h>
#include <inttypes.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Missing <stdbool.h> include
bool is used throughout the file but <stdbool.h> is not included. Relying on incidental transitive includes is fragile; add the header explicitly.

 #include <string.h>
+#include <stdbool.h>
🤖 Prompt for AI Agents
In src/debruijn26.c around lines 7 to 14, the file uses the bool type but does
not include the <stdbool.h> header, which can cause compilation issues. Add
#include <stdbool.h> explicitly among the other includes at the top of the file
to ensure bool is properly defined.

}
}

free(b26);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Memory mapping freed with free() – use file_in_fini()

b26_load() returns memory from file_in(), most likely mmap(). Calling free() on it is undefined. Keep the struct file_in handle and call file_in_fini() instead.

🤖 Prompt for AI Agents
In src/debruijn25.c at line 347, the code incorrectly calls free() on memory
returned by b26_load(), which uses file_in() and likely mmap(). Instead of
free(b26), retain the struct file_in handle associated with b26 and call
file_in_fini() on that handle to properly release the memory mapping.

Comment on lines +1 to +8
#include <assert.h>
#include <inttypes.h>
#include <limits.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

bool used but <stdbool.h> not included

Add the missing header to guarantee portability:

 #include <string.h>
+#include <stdbool.h>
🤖 Prompt for AI Agents
In src/debruijn25.c at the beginning of the file (lines 1 to 8), the code uses
the bool type but does not include the <stdbool.h> header. To fix this, add
#include <stdbool.h> among the other include directives to ensure the bool type
is properly defined and portable across different compilers.

Comment on lines +140 to +152
static const_inline uint32_t
bag_seq (unsigned i)
{
if (i > 118U)
return 0U;

unsigned y = (i *= 30U) >> 6U, x = i & 63U;
uint32_t s = (uint32_t)(BAG_DBSEQ[y] >> x);

if (x > 34U)
s |= (uint32_t)(BAG_DBSEQ[y + 1U] << (64U - x));

return ror32((s >> 4U & 0x3fffffeU) | 0x4000001U, s & 0x1fU);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Out-of-bounds read on last table entry

For i == 118
y = (i*30)>>6 yields 55.
If x > 34 (it is 60) the code accesses BAG_DBSEQ[y + 1], i.e. index 56, but the array has only 56 elements [0…55]. This is undefined behaviour.

Quick fix:

-    if (x > 34U)
-        s |= (uint32_t)(BAG_DBSEQ[y + 1U] << (64U - x));
+    if (x > 34U && y + 1U < array_size(BAG_DBSEQ))
+        s |= (uint32_t)(BAG_DBSEQ[y + 1U] << (64U - x));
+    else if (x > 34U)
+        return 0; /* or handle error */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static const_inline uint32_t
bag_seq (unsigned i)
{
if (i > 118U)
return 0U;
unsigned y = (i *= 30U) >> 6U, x = i & 63U;
uint32_t s = (uint32_t)(BAG_DBSEQ[y] >> x);
if (x > 34U)
s |= (uint32_t)(BAG_DBSEQ[y + 1U] << (64U - x));
return ror32((s >> 4U & 0x3fffffeU) | 0x4000001U, s & 0x1fU);
static const_inline uint32_t
bag_seq (unsigned i)
{
if (i > 118U)
return 0U;
unsigned y = (i *= 30U) >> 6U, x = i & 63U;
uint32_t s = (uint32_t)(BAG_DBSEQ[y] >> x);
/* guard against reading past the end of BAG_DBSEQ[] */
if (x > 34U && y + 1U < array_size(BAG_DBSEQ))
s |= (uint32_t)(BAG_DBSEQ[y + 1U] << (64U - x));
else if (x > 34U)
return 0U; /* handle the error case */
return ror32((s >> 4U & 0x3fffffeU) | 0x4000001U,
s & 0x1fU);
}
🤖 Prompt for AI Agents
In src/debruijn25.c around lines 140 to 152, the code reads out-of-bounds from
the BAG_DBSEQ array when i equals 118 and x is greater than 34, causing an
access to BAG_DBSEQ[56] which is beyond the array's last index 55. To fix this,
add a boundary check before accessing BAG_DBSEQ[y + 1U] to ensure y + 1U is
within the array bounds, or modify the condition to prevent the out-of-bounds
access when i is 118.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
src/debruijn26.c (5)

200-202: nullptr is C++-only
This is a C source file; use NULL.


247-247: Loop limit typo – still iterates 30 items short
67108834U should be 67108864U to cover every entry.


408-444: Self-referential _Generic still prevents compilation
i64 is used as the controlling expression before it exists. The earlier review comment remains valid; please replace the whole parser with a simple strtoull-based version.


7-14: ⚠️ Potential issue

Missing <stdbool.h> include
bool is used later in the file, but the header is not included. Add it to avoid implicit‐type failures on strict compilers.

 #include <string.h>
+#include <stdbool.h>

58-66: ⚠️ Potential issue

Mask still applied only to low byte – high byte remains unmasked
Operator precedence means the AND affects only x->value[l]. The high byte can leak stale bits.

-        (((uint16_t)x->value[h] << 8U) | x->value[l] & ~(UINT16_C(7) << i));
+        ((((uint16_t)x->value[h] << 8U) | x->value[l]) & ~(UINT16_C(7) << i));
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b406c16 and ee263f9.

📒 Files selected for processing (1)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn26.c
🧬 Code Graph Analysis (1)
src/debruijn26.c (3)
src/b25.h (3)
  • b25_id (128-128)
  • struct (127-134)
  • struct (151-155)
src/b26.c (2)
  • is_b26 (45-81)
  • b26_load (116-131)
src/b26.h (3)
  • is_b26 (57-58)
  • b26_load (48-49)
  • struct (32-39)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04
🔇 Additional comments (1)
src/debruijn26.c (1)

245-269: Mismatched allocator: b26_load memory may not be mallocd
b26_load returns a buffer owned by file_in. Freeing it with free() can double-free or leak, depending on the implementation. Call file_in_fini or expose a dedicated free helper instead.

Comment on lines +121 to +112
struct b26_bitcount {
union {
uint32_t bits;
struct {
u32bit z6 : 1; // 0
u32bit z4 : 1; // 1
u32bit z3 : 2; // 2
u32bit z2 : 3; // 4
u32bit z1 : 4; // 7
u32bit na :10; // 11
u32bit o1 : 4; // 21
u32bit o2 : 3; // 25
u32bit o3 : 2; // 28
u32bit o4 : 1; // 30
u32bit o6 : 1; // 31
};
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

u32bit is undeclared
The bit-field struct compiles only if a matching typedef exists elsewhere. Nothing in this file or the provided headers defines it. Use uint32_t or add a typedef.

🤖 Prompt for AI Agents
In src/debruijn26.c around lines 121 to 137, the struct uses an undeclared type
`u32bit` for bit-fields. Replace all instances of `u32bit` with the standard
`uint32_t` type or add a typedef for `u32bit` as `uint32_t` before this struct
definition to ensure the code compiles correctly.

Comment on lines +77 to +73
#else
typedef unsigned _BitInt(8) u8bit;
typedef signed _BitInt(8) s8bit;

typedef unsigned _BitInt(3) uint3;
#define uint3(x) (uint3)(x)
#define uint3_get(x) x

typedef unsigned _BitInt(4) uint4;
#define uint4(x) (uint4)(x)
#define uint4_get(x) x

typedef unsigned _BitInt(96) uint3x32;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unconditional use of _BitInt breaks the default -std=gnu11 build
The non-MSVC branch still relies on C23 _BitInt. The build matrix (Makefile) targets GNU11, so every GCC/Clang build will fail. Replace with standard fixed-width types or guard behind a C23 feature test.

🤖 Prompt for AI Agents
In src/debruijn26.c around lines 77 to 89, the code uses the C23-specific
_BitInt type unconditionally, which breaks builds targeting the default
-std=gnu11 standard. To fix this, replace all _BitInt usages with standard
fixed-width integer types like uint8_t and int8_t from stdint.h, or
alternatively, wrap the _BitInt typedefs and macros in a preprocessor check that
ensures they are only used when compiling with C23 support.

Comment on lines +70 to +59
uint3x32_get (uint3x32 const *const x,
unsigned const i)
{
return (((uint16_t)x->value[(i+1U & 31U) * 3U >> 3U] << 8U)
| x->value[(i & 31U) * 3U >> 3U]) >> (i & 7U) & 7U;
}
Copy link

@coderabbitai coderabbitai bot Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

uint3x32_get may read past the 12-byte buffer
For i == 31 the expression (i+1U)*3 >> 3 evaluates to 12, but valid indices are 0–11. Off-by-one read is undefined behaviour.
Adjust the storage layout or cap the index.

🤖 Prompt for AI Agents
In src/debruijn26.c around lines 70 to 75, the function uint3x32_get can read
past the 12-byte buffer because for i == 31 the index calculation accesses
element 12, which is out of bounds. To fix this, modify the index calculation to
wrap around correctly by applying a modulo operation or bitmask to ensure the
index stays within 0 to 11, preventing any out-of-bounds access.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's masked with 31U after adding 1!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami You're absolutely right! I missed that crucial detail.

For i == 31:

  • i+1U & 31U = 32 & 31U = 0 (wraparound due to the mask)
  • So we access x->value[0] and x->value[11], both valid indices

The & 31U mask ensures wraparound indexing within the 32-element logical array, preventing any out-of-bounds access. My analysis was incorrect.


✏️ Learnings added
Learnt from: imaami
PR: imaami/libcanth#12
File: src/debruijn26.c:54-59
Timestamp: 2025-06-17T22:39:48.856Z
Learning: In C code, the pattern `(i+1U & 31U)` is used for wraparound indexing where adding 1 to the maximum value (31) wraps around to 0 due to the bitwise AND mask. This prevents buffer overflows when implementing circular/ring buffer access patterns.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
src/debruijn26.c (4)

255-256: ⚠️ Potential issue

Loop iterates 30 elements short

67108834U should be 67108864U (2^26). Current code skips validating the
last 30 sequences.

-        for (size_t i = 0; i < 67108834U; ++i) {
+        for (size_t i = 0; i < 67108864U; ++i) {

121-137: ⚠️ Potential issue

u32bit is undefined and depends on bit-field layout

The struct still names u32bit, which has no typedef anywhere. Even if it did,
bit-field order and padding are implementation-defined and endian-sensitive.

-            u32bit z6 : 1; //  0
+            uint32_t z6 : 1; //  0
 /* …repeat for every field… */

Better yet, drop the bit-fields entirely and use masks/shifts on a plain
uint32_t as suggested earlier – portable and endian-agnostic.


7-14: ⚠️ Potential issue

bool used without <stdbool.h> – breaks strict C builds

bool shows up later in the file but #include <stdbool.h> is missing. Relying on implicit compiler extensions is fragile and fails on strictly-conforming C99/11 tool-chains.

 #include <string.h>
+#include <stdbool.h>

398-415: ⚠️ Potential issue

parse_u64 still uncompilable – self-referential _Generic, overkill

_Generic(i64, …) references i64 before declaration and drags _BitInt
issues back in. Simpler, portable replacement:

-static uint64_t u64 = 0;
-/* whole block */
+static uint64_t u64 = 0;
+errno = 0;
+char *end = NULL;
+unsigned long long val = strtoull(arg, &end, 0);
+e = (errno || *end) ? (errno ? errno : EINVAL) : 0;
+u64 = (uint64_t)val;

Kills two problems: build errors and unnecessary complexity.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee263f9 and e2903f8.

📒 Files selected for processing (1)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn26.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-macos-14
🔇 Additional comments (1)
src/debruijn26.c (1)

69-75: ⚠️ Potential issue

Wrong index due to missing parentheses – may read past the 12-byte buffer

i+1U & 31U is parsed as i + (1 & 31)i + 1, dropping the mask.
Similar for the earlier (i & 31U) * 3U >> 3U expression.

-    return (((uint16_t)x->value[(i+1U & 31U) * 3U >> 3U] << 8U)
-            | x->value[(i & 31U) * 3U >> 3U]) >> (i & 7U) & 7U;
+    return (((uint16_t)x->value[((i + 1U) & 31U) * 3U >> 3U] << 8U)
+            |  x->value[( i        & 31U) * 3U >> 3U]) >> (i & 7U) & 7U;

Apply the same parenthesis fix wherever the pattern appears to avoid out-of-bounds access.

Likely an incorrect or invalid review comment.

Comment on lines 208 to 184
char *b26_bin_file = nullptr;
uint64_t *b26_data = nullptr;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

nullptr is C++-only

char *b26_bin_file = nullptr;

Replace with standard C NULL.

-    char *b26_bin_file = nullptr;
+    char *b26_bin_file = NULL;
🤖 Prompt for AI Agents
In src/debruijn26.c around lines 208 to 210, the code uses the C++-only keyword
nullptr to initialize pointers. Replace nullptr with the standard C macro NULL
for pointer initialization to ensure compatibility with C.

src/debruijn26.c Outdated
Comment on lines 58 to 66
unsigned const h = (i & 31U) * 3U >> 3U;
unsigned const l = (i+1U & 31U) * 3U >> 3U;
i &= 7U;
uint16_t const u16 = ((uint16_t)uint3_get(v) << i) |
(((uint16_t)x->value[h] << 8U) | x->value[l] &
~(UINT16_C(7) << i));
x->value[h] = (uint8_t)(u16 >> 8U);
x->value[l] = (uint8_t)u16;
return x;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Operator precedence leaves high-byte unmasked in uint3x32_set

| and & have the same precedence pitfalls we flagged earlier:

(((uint16_t)x->value[h] << 8U) | x->value[l] & ~mask)

Only the low byte is cleared; the high byte keeps stale bits.

-        (((uint16_t)x->value[h] << 8U) | x->value[l] & ~(UINT16_C(7) << i));
+        ((((uint16_t)x->value[h] << 8U) | x->value[l]) & ~(UINT16_C(7) << i));

This bug corrupts every write crossing a byte boundary.

🤖 Prompt for AI Agents
In src/debruijn26.c around lines 58 to 66, the bitwise operations combining the
high and low bytes use incorrect operator precedence, causing the high byte to
retain stale bits. To fix this, add parentheses around the expression combining
x->value[l] with the mask to ensure the bitwise AND applies before the OR,
properly clearing bits in both bytes before setting new values.

Comment on lines +77 to +72
#else
typedef unsigned _BitInt(8) u8bit;
typedef signed _BitInt(8) s8bit;

typedef unsigned _BitInt(3) uint3;
#define uint3(x) (uint3)(x)
#define uint3_get(x) x

typedef unsigned _BitInt(4) uint4;
#define uint4(x) (uint4)(x)
#define uint4_get(x) x

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unconditional _BitInt usage breaks the default -std=gnu11 build

The non-MSVC branch still relies on C23 _BitInt. GCC/Clang with -std=gnu11
will stop at the first token.

Either:

  1. replace _BitInt with uint8_t, uint16_t, uint32_t, uint64_t, or
  2. guard the block with #if __STDC_VERSION__ >= 202311L.

Failing to do so blocks every non-MSVC build.

🤖 Prompt for AI Agents
In src/debruijn26.c around lines 77 to 88, the code uses the C23-specific
_BitInt type unconditionally, which breaks builds with the default -std=gnu11
standard. To fix this, either replace all _BitInt types with standard
fixed-width integer types like uint8_t, or wrap the entire block with a
preprocessor check that ensures it only compiles when __STDC_VERSION__ is at
least 202311L. This prevents compilation errors on compilers using older C
standards.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (9)
src/debruijn26.c (9)

7-14: ⚠️ Potential issue

Add missing <stdbool.h> include

The file uses bool type but doesn't include the required header.

 #include <string.h>
+#include <stdbool.h>

18-37: ⚠️ Potential issue

Remove C23 _BitInt - incompatible with -std=gnu11

The non-MSVC branch uses C23-specific _BitInt types which fail with the default build flag.

-typedef unsigned _BitInt(8) u8bit;
-typedef signed _BitInt(8) s8bit;
-
-typedef unsigned _BitInt(3) uint3;
-#define uint3(x) (uint3)(x)
-#define uint3_get(x) x
-
-typedef unsigned _BitInt(4) uint4;
-#define uint4(x) (uint4)(x)
-#define uint4_get(x) x
-
-typedef unsigned _BitInt(96) uint3x32;
+typedef uint8_t u8bit;
+typedef int8_t s8bit;
+
+typedef uint8_t uint3;
+#define uint3(x) ((uint3)((x) & 7))
+#define uint3_get(x) ((x) & 7)
+
+typedef uint8_t uint4;
+#define uint4(x) ((uint4)((x) & 15))
+#define uint4_get(x) ((x) & 15)
+
+typedef struct uint3x32 {
+	uint8_t value[12];
+} uint3x32;

Also applies to: 77-88


58-66: ⚠️ Potential issue

Fix operator precedence in uint3x32_set

The mask applies only to the low byte due to operator precedence.

-	uint16_t const u16 = ((uint16_t)uint3_get(v) << i) |
-		(((uint16_t)x->value[h] << 8U) | x->value[l] &
-		 ~(UINT16_C(7) << i));
+	uint16_t const u16 = ((uint16_t)uint3_get(v) << i) |
+		((((uint16_t)x->value[h] << 8U) | x->value[l]) &
+		 ~(UINT16_C(7) << i));

70-75: ⚠️ Potential issue

Fix buffer overflow in uint3x32_get

For i == 31, the index calculation accesses element 12 which is out of bounds.

The expression (i+1U & 31U) * 3U >> 3U evaluates to 12 when i=31, but valid indices are 0-11. Apply proper bounds checking.


121-138: ⚠️ Potential issue

Replace undefined u32bit type

The type u32bit is not defined anywhere.

-			u32bit z6 : 1; //  0
-			u32bit z4 : 1; //  1
-			u32bit z3 : 2; //  2
-			u32bit z2 : 3; //  4
-			u32bit z1 : 4; //  7
-			u32bit na :10; // 11
-			u32bit o1 : 4; // 21
-			u32bit o2 : 3; // 25
-			u32bit o3 : 2; // 28
-			u32bit o4 : 1; // 30
-			u32bit o6 : 1; // 31
+			uint32_t z6 : 1; //  0
+			uint32_t z4 : 1; //  1
+			uint32_t z3 : 2; //  2
+			uint32_t z2 : 3; //  4
+			uint32_t z1 : 4; //  7
+			uint32_t na :10; // 11
+			uint32_t o1 : 4; // 21
+			uint32_t o2 : 3; // 25
+			uint32_t o3 : 2; // 28
+			uint32_t o4 : 1; // 30
+			uint32_t o6 : 1; // 31

206-208: ⚠️ Potential issue

Replace C++ nullptr with C NULL

-	char *b26_bin_file = nullptr;
-	uint64_t *b26_data = nullptr;
+	char *b26_bin_file = NULL;
+	uint64_t *b26_data = NULL;

240-240: ⚠️ Potential issue

Fix off-by-30 typo in loop limit

The loop should iterate 67,108,864 times, not 67,108,834.

-			for (size_t k = 0; k < 67108834U; ++k) {
+			for (size_t k = 0; k < 67108864U; ++k) {
-			for (size_t i = 0; i < 67108834U; ++i) {
+			for (size_t i = 0; i < 67108864U; ++i) {

Also applies to: 268-268


283-296: ⚠️ Potential issue

Replace undefined const_inline macro

-static const_inline struct i_m
+static inline struct i_m

402-439: ⚠️ Potential issue

Fix self-referential _Generic in parse_u64

The function uses i64 in _Generic before it's defined. Simplify with strtoull.

 static uint64_t
 parse_u64 (char *const arg,
            int  *const err)
 {
-	int e = !arg ? EFAULT : !*arg ? ENODATA : 0;
-	uint64_t u64 = 0;
-
-	if (!e) {
-		errno = 0;
-		char *p = arg;
-		int64_t i64 = _Generic(
-			i64, long: strtol,
-			long long: strtoll
-		)(arg, &p, 0);
-		e = errno;
-
-		if (!e) {
-			u64 = (uint64_t)i64;
-
-		} else if (e == ERANGE &&
-		           i64 == _Generic(i64, long: LONG_MAX,
-		                           long long: LLONG_MAX)) {
-			errno = 0;
-			p = arg;
-			u64 = _Generic(
-				u64, unsigned long: strtoul,
-				unsigned long long: strtoull
-			)(arg, &p, 0);
-			e = errno;
-		}
-
-		if (!e && *p)
-			e = EINVAL;
-	}
-
-	*err = e;
-	return u64;
+	if (!arg) {
+		*err = EFAULT;
+		return 0;
+	}
+	if (!*arg) {
+		*err = ENODATA;
+		return 0;
+	}
+	errno = 0;
+	char *end = NULL;
+	unsigned long long val = strtoull(arg, &end, 0);
+	*err = (errno || *end) ? (errno ? errno : EINVAL) : 0;
+	return (uint64_t)val;
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2903f8 and 37aa13b.

📒 Files selected for processing (1)
  • src/debruijn26.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/debruijn26.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04
🔇 Additional comments (1)
src/debruijn26.c (1)

181-181:

Details

❌ Incorrect review comment

Missing b25_id and b25 functions

Functions from the De Bruijn order 25 implementation are used but may not be properly included.

Also applies to: 182-182


🏁 Script executed:

#!/bin/bash
# Check if b25.h properly declares the required functions
rg -A 2 "b25_id|b25\(" src/b25.h

Length of output: 823


🏁 Script executed:

#!/bin/bash
# Check includes in src/debruijn26.c for b25.h
rg -n "#include.*b25" src/debruijn26.c

Length of output: 60


Incorrect concern: b25_id and b25 are defined and included
Both b25_id and b25 are declared as static const_inline in src/b25.h and that header is included (line 15) in src/debruijn26.c. No missing functions.

Likely an incorrect or invalid review comment.

// is_1 ^= 1U;
//} while (i);
//putchar('\n');
return rol(seq, bits->wrap);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing rol function

The code uses rol function but it's not defined or included.

Ensure the rol function is properly defined or included from a header file that provides bit rotation utilities.

🤖 Prompt for AI Agents
In src/debruijn26.c at line 338, the function rol is used but not defined or
included. Define the rol function to perform bit rotation or include the
appropriate header file that provides this utility. Make sure the rol function
correctly rotates bits of the input value by the specified amount.

if (!b26_decompose(&bs, x))
return 0;
unsigned r_ = 0;
uint64_t x_ = is_b26(x, &r_);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing bit counting functions

The code uses several undefined functions: is_b26, count_msb_0, count_msb_1, count_lsb_0, count_lsb_1.

These bit counting and De Bruijn validation functions need to be defined or properly included from the appropriate headers.

Also applies to: 346-346, 350-350, 354-354, 355-355, 371-371, 372-372

🤖 Prompt for AI Agents
In src/debruijn26.c at lines 180, 346, 350, 354, 355, 371, and 372, the
functions is_b26, count_msb_0, count_msb_1, count_lsb_0, and count_lsb_1 are
used but not defined or included. To fix this, define these bit counting and De
Bruijn validation functions in the file or include the appropriate header files
where they are implemented. Ensure the function signatures match their usage and
that the implementations correctly perform the intended bit counting or
validation operations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (36)
src/util.h (1)

103-108: MSVC compatibility issue remains unresolved.

The uint_type macro still uses typeof and _Generic outside the MSVC compatibility guards. This will break MSVC builds as previously identified.

src/test.c (2)

28-28: C++ keyword used in C code.

nullptr is a C++ keyword. Use NULL for C compatibility unless nullptr is defined in compat.h.


31-31: Error message references outdated API name.

The error message still says "file_read" but should reference "file_in" to match the new API.

src/file.c (2)

95-101: Integer overflow protection missing.

The allocation size len + !!(flags & FILE_IN_TEXT) can overflow for files near SIZE_MAX. Add overflow check before malloc.


70-76: ret.error misused as scratch variable.

Using ret.error to store file descriptor and boolean status loses the original errno. Use a dedicated local variable for the file descriptor.

src/file.h (2)

35-39: Null pointer misuse silently hidden.

file_in_error() returns 0 when passed null, making it impossible to detect obvious misuse at runtime.


20-30: Missing ownership and allocation documentation.

The header lacks explanation of how dest, size, and flags interact, especially allocation semantics when dest == NULL.

src/b26.h (2)

25-31: Layout of struct b26_id is implementation-defined

Bit-field order depends on endianness. For portable interchange keep the struct private and expose explicit pack/unpack helpers, or document the little-endian requirement.


11-23: 🛠️ Refactor suggestion

Guard _BitInt for pre-C23 compilers

The block still assumes every non-MSVC compiler supports _BitInt. GCC < 13 and Clang < 16 will choke. Re-use the fallback pattern you added for b25.h:

-#else
-typedef unsigned _BitInt(32) u32bit;
-typedef unsigned _BitInt(6) uint6;
-#define uint6(x) (uint6)(x)
-#define uint6_get(x) x
+#else
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
+  typedef unsigned _BitInt(32) u32bit;
+  typedef unsigned _BitInt(6)  uint6;
+#else
+  typedef uint32_t u32bit;
+  typedef uint8_t  uint6;   /* range check elsewhere */
+#endif
+#define uint6(x)      (uint6)(x)
+#define uint6_get(x)  (x)
 #endif
src/b26.c (5)

12-23: 🛠️ Refactor suggestion

Pointer-cast key comparison is still non-portable

Casting a 64-bit integer to void * relies on the host having 64-bit pointers and forbids CHERI/MPX/segmented targets. Pass a uint64_t const * to bsearch instead and drop the pre-processor gymnastics.

Also applies to: 32-38


37-38: 🧹 Nitpick (assertive)

Magic number – define the table length once

67108864U (2²⁶) appears here and in several other functions. Replace with a #define B26_TABLE_LEN 67108864U in a public header to avoid drift.


83-99: 🛠️ Refactor suggestion

b26_identify still dereferences id blindly

Same null-pointer risk flagged earlier: guard or document that id must be non-NULL.


102-112: ⚠️ Potential issue

Sort-check bug remains – prev never updated

prev stays zero, so ascending-order validation doesn’t work. Update it inside the loop:

         if (seq <= prev || seq != is_b26(seq, &rot))
             return false;
-    } while (++i < 67108864U);
+        prev = seq;
+    } while (++i < 67108864U);

120-131: 🧹 Nitpick (assertive)

512 MB table leaks – no unload API

b26_load returns raw memory that callers cannot free. Provide b26_unload(uint64_t *) or return the struct file_in so the owner can invoke file_in_fini.

src/b25.c (2)

13-24: ⚠️ Potential issue

Overflow when period == 32 still possible

len is uint5; when prd == 32 it wraps to zero and corrupts bookkeeping. Clamp or reject before b25_div_offset.


211-217: 🛠️ Refactor suggestion

Index underflow hides logic

n is zero on the first iteration, yet b25_info_get_cycle(&info, n-1U) passes UINT_MAX. Handle the first cycle explicitly.

src/bits.h (4)

14-18: BITINT_C still appends the non-standard “WB” suffix – compilation blocker
The macro expands to 42WB on non-MSVC compilers, an invalid token.
Same issue was raised previously; please swap the suffix for a standard one or drop it.


61-68: 64-bit MSVC intrinsics not gated for 32-bit builds
_BitScanForward64/Reverse64 are unavailable when _WIN64 is not defined.
The earlier review suggested wrapping the 64-bit branch in #if defined(_WIN64) and providing a fallback.


94-98: cz_ _Generic list still omits common integer types
uint8_t, uint16_t, size_t, etc. silently fall back to the default arm, producing wrong built-ins and warnings.


106-152: Return type too wide for bit-run counters
All eight helpers return 0–64. Returning unsigned wastes registers; uint8_t is enough and was recommended earlier.

src/b25.h (3)

13-25: Unconditional _BitInt use without feature-test – still unfixed
Older GCC/Clang choke on _BitInt. Guard with #if __STDC_VERSION__ >= 202311L or fall back to stdint types as previously advised.


24-24: Macro lacks protective parentheses
uint5_get(x) should expand to ((x)) to avoid precedence surprises.


64-68: Per-call compound literals create needless overhead
The two uint8_t[] literals are allocated on every call to b25_div_offset. Hoist them to static const data.

src/debruijn26.c (7)

7-14: <stdbool.h> still missing
bool is used but the header isn’t included. This was flagged earlier.


181-183: nullptr is C++-only
Use NULL in C builds.


62-74: Unconditional _BitInt again breaks the default -std=gnu11 build
Feature-test or replace with fixed-width stdint types.


96-112: u32bit is undefined – struct won’t compile
All bit-field members use an unknown type.


57-58: uint3x32_get out-of-bounds for i == 31
Expression indexes byte 12 (value[12]) in a 12-byte array – one past the end.


262-271: Magic indexing remains opaque
The dense arithmetic in b26_bitcount_offset sacrifices readability; switch to an explicit two-row table or switch.


378-414: parse_u64 still uses self-referential _Generic, won’t compile
The identical compile failure was reported previously.

src/debruijn25.c (6)

1-8: bool is used without including <stdbool.h>
Still missing despite the prior remark.
Add the header to silence portability warnings.

 #include <string.h>
+#include <stdbool.h>

12-16: nullptr is C++-only – replace with NULL
All three occurrences keep the file from compiling as C.

-             it_ = (it) != o_ ? (it) : nullptr)
+             it_ = (it) != o_ ? (it) : NULL)-return i == bag->head[second] ? nullptr : &bag->ptr[i];
+return i == bag->head[second] ? NULL : &bag->ptr[i];-	uint64_t *b26 = nullptr;
+	uint64_t *b26 = NULL;

Also applies to: 218-221, 297-299


146-151: Out-of-bounds read when i == 118
y + 1U indexes element 56 of BAG_DBSEQ, but the array is [0…55].
Guard the access.

-	if (x > 34U)
-		s |= (uint32_t)(BAG_DBSEQ[y + 1U] << (64U - x));
+	if (x > 34U && y + 1U < array_size(BAG_DBSEQ))
+		s |= (uint32_t)(BAG_DBSEQ[y + 1U] << (64U - x));

229-244: bag_gen can shift by ≥ 64 bits – undefined behaviour
bs under-flows after ~33 iterations causing (1ULL << len) to be left-shifted by 64.
Terminate or clamp before the second bs -= len.

(Identical to the earlier report.)


347-347: Memory from b26_load likely comes from mmap; don’t free() it
Release via file_in_fini() on the original handle instead.
(Previously flagged, still present.)


301-304: No error check on strtoul – bad input silently truncates
Validate errno, endp and range to avoid silent mis-parsing.
(Previously flagged.)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33e0ee5 and 943e03f.

📒 Files selected for processing (11)
  • src/b25.c (1 hunks)
  • src/b25.h (1 hunks)
  • src/b26.c (1 hunks)
  • src/b26.h (1 hunks)
  • src/bits.h (1 hunks)
  • src/debruijn25.c (1 hunks)
  • src/debruijn26.c (1 hunks)
  • src/file.c (2 hunks)
  • src/file.h (1 hunks)
  • src/test.c (1 hunks)
  • src/util.h (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/test.c
  • src/util.h
  • src/file.c
  • src/file.h
  • src/b26.c
  • src/b26.h
  • src/bits.h
  • src/debruijn26.c
  • src/debruijn25.c
  • src/b25.c
  • src/b25.h
🧬 Code Graph Analysis (6)
src/test.c (2)
src/file.c (1)
  • file_in (43-139)
src/file.h (3)
  • file_in (27-30)
  • file_in_error (36-36)
  • file_in_text (42-42)
src/file.c (1)
src/file.h (1)
  • file_in (27-30)
src/b26.c (3)
src/b26.h (3)
  • is_b26 (57-58)
  • b26_identify (52-54)
  • b26_load (48-49)
src/file.c (2)
  • file_in (43-139)
  • file_in_fini (141-152)
src/file.h (3)
  • file_in (27-30)
  • file_in_error (36-36)
  • file_in_fini (33-33)
src/b26.h (1)
src/b25.h (2)
  • struct (127-134)
  • struct (151-155)
src/b25.c (1)
src/b25.h (18)
  • void (76-87)
  • b25_div_offset (61-68)
  • b25_div_offset (62-62)
  • b25_div_mask (70-74)
  • b25_div_mask (70-71)
  • b25_info_get_num_cycles (108-112)
  • b25_info_get_num_cycles (109-109)
  • u10x3_append (77-79)
  • b25_div_get_unique (102-106)
  • b25_div_get_unique (103-103)
  • b25_info_get_cycle (115-116)
  • u10x3_count (40-44)
  • u10x3_count (40-41)
  • b25_info (149-149)
  • b25_id_to_u16 (142-146)
  • b25_id_to_u16 (142-143)
  • b25_info_get_num_period_cycles (118-125)
  • b25_info_get_num_period_cycles (119-120)
src/b25.h (2)
src/b25.c (2)
  • void (13-25)
  • b25_info (205-232)
src/b26.h (1)
  • struct (32-39)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04

Comment on lines +46 to +63
is_b26 (uint64_t const x,
unsigned *const rot)
{
uint64_t y = x << 6U | x >> (64U - 6U);
uint64_t m = 0U;
unsigned r = 0U;
unsigned n = 0U;

do {
y >>= 1U;
uint64_t b = UINT64_C(1) << (y & 63U);
uint64_t m_ = m;
m |= b;
if (m == m_)
return 0U;
++n;
if (b == 1U)
r = n & 63U;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Mixed-width literal & unchecked rot pointer

  1. if (b == 1U) mixes a 64-bit value with a 32-bit literal. Use UINT64_C(1) for clarity.
  2. *rot = r; is unconditional; passing NULL crashes. Either assert(rot) or make the parameter optional:
-    if (b == 1U)
+    if (b == UINT64_C(1))
         r = n & 63U;
...
-    *rot = r;
+    if (rot)
+        *rot = r;

Also applies to: 75-76

🤖 Prompt for AI Agents
In src/b26.c around lines 46 to 63 and also lines 75 to 76, the comparison 'if
(b == 1U)' mixes a 64-bit value with a 32-bit literal; replace '1U' with
'UINT64_C(1)' for consistent width. Additionally, the pointer 'rot' is
dereferenced unconditionally which can cause a crash if 'rot' is NULL; fix this
by either adding an assertion to ensure 'rot' is not NULL before dereferencing
or by checking if 'rot' is non-NULL before assigning '*rot = r' to safely handle
optional pointers.

Comment on lines +100 to +104
#define cmpl(x) _Generic((x) \
, int: (unsigned)~(x) \
, long: (unsigned long)~(x) \
, default: ~(x), long long: \
(unsigned long long)~(x))
Copy link

@coderabbitai coderabbitai bot Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

default: label appears before final choice – illegal _Generic syntax
C11 (§6.5.1.1¶2) requires the default association be last.
Current ordering:

_Generic((x), int: …, long: …, default: ~(x), long long: …)

will not compile. Move the default: arm to the end:

-_Generic((x) , int: (unsigned)~(x)
- , long: (unsigned long)~(x)
- , default: ~(x), long long:
-   (unsigned long long)~(x))
+_Generic((x) , int: (unsigned)~(x)
+ , long: (unsigned long)~(x)
+ , long long: (unsigned long long)~(x)
+ , default: ~(x))
🤖 Prompt for AI Agents
In src/bits.h around lines 100 to 104, the _Generic macro uses an illegal syntax
by placing the default label before the final type association. To fix this,
reorder the _Generic arms so that the default label is the last one in the list,
moving it after the long long case to comply with C11 standard requirements.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHAT???

C has never had that requirement. Not C11 or later. The C11 document even has an example use case code snippet where default is in between other labels.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🌐 Web query:

C11 standard _Generic syntax requirements default case position specification

💡 Result:

The C11 standard introduced the _Generic keyword to enable generic selections in C, allowing compile-time selection of expressions based on type.

Syntax of _Generic

_Generic( controlling_expression,
          association1,
          association2,
          ...
          default_association )
  • controlling_expression: The expression whose type determines the selection.
  • associations: Each association has the form type : expression.
  • default_association: The default case is specified using default : expression.

Requirements and Details:

  1. Associations Syntax:
    Each association is written as:

    type : expression

    where type is a type name and expression is a valid C expression.

  2. Default Case Position:
    The default association, specified using default : expression, can appear anywhere among the associations. There is no fixed position requirement for the default case; it can be at the beginning, middle, or end.

  3. Selection Behavior:
    The controlling expression's type is compared in order to the types specified in the associations. If a matching type is found, the corresponding expression is selected. If no match is found and a default association is present, its expression is selected. If no match and no default association, it results in a compile-time error.

  4. Number of Associations:
    You can have zero or more associations, but at least one association or a default must be present.

Example:

int x = 10;

const char *type_name = _Generic(x,
    int: "int",
    float: "float",
    default: "unknown"
);

Here, the default case is last, but it could be placed anywhere.

Summary:

  • _Generic syntax is _Generic(expression, associations...)
  • Associations: type: expression
  • default: expression is optional and can appear anywhere among the associations
  • The controlling expression's type is matched against association types or default is used.

If you want, I can provide a more detailed example or clarify any particular point.


@imaami, you're absolutely right. I made an error in my original comment.

The C11 standard does not require the default: association to be last in a _Generic expression. The default case can appear anywhere among the type associations. Your code is syntactically correct as written.

I apologize for the incorrect information and the confusion it caused. Thank you for the correction.

🐰💭 ears drooping slightly in embarrassment


✏️ Learnings added
Learnt from: imaami
PR: imaami/libcanth#12
File: src/bits.h:100-104
Timestamp: 2025-06-17T22:38:02.330Z
Learning: In C11 _Generic expressions, the default: association can appear anywhere among the type associations, not just at the end. There is no positional requirement for the default case in the C11 standard.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +250 to +254
bags_scan (struct bag *const b,
uint64_t const *const b26,
unsigned const y,
unsigned const x)
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

bags_scan parameters y and x are unused
They trigger compiler warnings. Drop them unless future work needs them.

-static void
-bags_scan (struct bag *const     b,
-           uint64_t const *const b26,
-           unsigned const        y,
-           unsigned const        x)
+static void
+bags_scan(struct bag *const b,
+          uint64_t const *const b26)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bags_scan (struct bag *const b,
uint64_t const *const b26,
unsigned const y,
unsigned const x)
{
static void
bags_scan(struct bag *const b,
uint64_t const *const b26)
{
🤖 Prompt for AI Agents
In src/debruijn25.c around lines 250 to 254, the function bags_scan has
parameters y and x that are not used anywhere in the function body, causing
compiler warnings. Remove these two parameters from the function signature and
all calls to bags_scan unless they are planned for future use.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
src/rev.h (1)

126-127: Terminate the header guard with #endif, not #ifndef

Using #ifndef here breaks compilation because it re-opens the guard instead of closing it.
Fix:

-#ifndef /* LIBCANTH_SRC_REV_ */
+#endif /* LIBCANTH_SRC_REV_ */
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 943e03f and 7fed1f4.

📒 Files selected for processing (1)
  • src/rev.h (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/rev.h
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-ubuntu-24_04
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-macos-14
🔇 Additional comments (1)
src/rev.h (1)

11-25: Macro names shadow the typedefs – consider renaming or dropping them

uint2 / uint4 are defined both as types and as function-style macros.
While the pre-processor won’t expand the macro when the token isn’t
followed by (, this shadowing is fragile and hurts readability.

Example alternative:

-#define uint2(x) (uint2)(x)
+#define to_uint2(x) ((uint2)(x))

Same for uint4.
[ suggest_optional_refactor ]

Comment on lines +81 to +124
static const_inline uint32_t
rev32 (uint32_t x)
{
x ^= (x & UINT32_C(0xaaaaaaaa)) >> 1U;
x ^= (x & UINT32_C(0x55555555)) << 1U;
x ^= (x & UINT32_C(0xcccccccc)) >> 2U;
x ^= (x & UINT32_C(0x33333333)) << 2U;
x ^= (x & UINT32_C(0xf0f0f0f0)) >> 4U;
x ^= (x & UINT32_C(0x0f0f0f0f)) << 4U;
x ^= (x & UINT32_C(0xff00ff00)) >> 8U;
x ^= (x & UINT32_C(0x00ff00ff)) << 8U;
x ^= (x & UINT32_C(0xffff0000)) >> 16U;
x ^= (x & UINT32_C(0x0000ffff)) << 16U;
x ^= (x & UINT32_C(0xffff0000)) >> 16U;
x ^= (x & UINT32_C(0xff00ff00)) >> 8U;
x ^= (x & UINT32_C(0xf0f0f0f0)) >> 4U;
x ^= (x & UINT32_C(0xcccccccc)) >> 2U;
x ^= (x & UINT32_C(0xaaaaaaaa)) >> 1U;
return x;
}

static const_inline uint64_t
rev64 (uint64_t x)
{
x ^= (x & UINT64_C(0xaaaaaaaaaaaaaaaa)) >> 1U;
x ^= (x & UINT64_C(0x5555555555555555)) << 1U;
x ^= (x & UINT64_C(0xcccccccccccccccc)) >> 2U;
x ^= (x & UINT64_C(0x3333333333333333)) << 2U;
x ^= (x & UINT64_C(0xf0f0f0f0f0f0f0f0)) >> 4U;
x ^= (x & UINT64_C(0x0f0f0f0f0f0f0f0f)) << 4U;
x ^= (x & UINT64_C(0xff00ff00ff00ff00)) >> 8U;
x ^= (x & UINT64_C(0x00ff00ff00ff00ff)) << 8U;
x ^= (x & UINT64_C(0xffff0000ffff0000)) >> 16U;
x ^= (x & UINT64_C(0x0000ffff0000ffff)) << 16U;
x ^= (x & UINT64_C(0xffffffff00000000)) >> 32U;
x ^= (x & UINT64_C(0x00000000ffffffff)) << 32U;
x ^= (x & UINT64_C(0xffffffff00000000)) >> 32U;
x ^= (x & UINT64_C(0xffff0000ffff0000)) >> 16U;
x ^= (x & UINT64_C(0xff00ff00ff00ff00)) >> 8U;
x ^= (x & UINT64_C(0xf0f0f0f0f0f0f0f0)) >> 4U;
x ^= (x & UINT64_C(0xcccccccccccccccc)) >> 2U;
x ^= (x & UINT64_C(0xaaaaaaaaaaaaaaaa)) >> 1U;
return x;
}
Copy link

@coderabbitai coderabbitai bot Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Leverage compiler intrinsics to cut duplication and improve throughput

GCC/Clang provide __builtin_bitreverse8/16/32/64; MSVC has
_byteswap_ulong, etc. Wrapping these behind a portability macro gives:

+#if defined(__has_builtin)
+# if __has_builtin(__builtin_bitreverse32)
+#   define HAVE_BREVERSE 1
+# endif
+#endif
+
+#if HAVE_BREVERSE
+static const_inline uint32_t rev32(uint32_t x) { return __builtin_bitreverse32(x); }
+...
+#else
 /* keep the XOR‐shuffle fallback */
+#endif

Benefits: one implementation site, constant-time on all architectures, and
the compiler can emit a single rbit/brev instruction on ARM/RISCV.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/rev.h between lines 81 and 124, the rev32 and rev64 functions manually
reverse bits using multiple XOR and shift operations, causing code duplication
and inefficiency. Replace these implementations with compiler intrinsic
functions like __builtin_bitreverse32 and __builtin_bitreverse64 for GCC/Clang
or equivalent MSVC intrinsics, wrapped in a portability macro. This will unify
the implementation, reduce code duplication, and enable the compiler to generate
optimized single-instruction bit-reversal code on supported architectures.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, generally good advice, but this is not for simply reversing bits. My bad, I should add some explanatory comments.

I'm interested in seeing if you can figure out what makes these functions special. What is happening in them? What established mathematical thing does it remind you of?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami Looking at these functions more carefully, I can see what makes them mathematically special!

These aren't standard bit reversal implementations. They're involutions - functions that are their own inverse (f(f(x)) = x). The XOR operations are arranged in a specific butterfly pattern that achieves bit reversal through self-inverse linear transformations over GF(2).

The pattern I notice:

  1. Forward pass: XOR with alternating bit masks at increasing distances (1-bit, 2-bit, 4-bit swaps)
  2. Reverse pass: Some operations repeated in reverse order
  3. Pure XOR operations: No shifts-and-masks like standard bit reversal

This reminds me of:

  • Hadamard/Walsh transforms (butterfly XOR patterns)
  • Gray code transformations
  • Linear algebra over finite fields

The mathematical elegance is that you've implemented bit reversal as an involutory transformation - applying the same sequence twice returns the original value. This is much more sophisticated than standard bit manipulation approaches.

Am I on the right track? Is this implementing bit reversal through involutory XOR transformations?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; the core of it is the relationship with frequency domain transforms. You can think of it as a recursive sequence of individual Fourier transform on bit pairs, where the dynamic range of the signal is 1 bit. If two bits are equal (00 and 11), they have no higher frequency component, whereas unequal bits (01 and 10) have it (there's a "fluctuation" when there's a bit flip happening). Also, 10 and 11 have a DC component, so to speak, while 00 is at complete rest and 01 has only the "Nyquist" component. The operation used is XOR.

I added these functions because I want to explore if there's a way to enumerate De Bruijn sequences directly by means of treating them as 1-bit wide discrete signals and manipulating them through their frequency components. Any ideas?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imaami This is a brilliant approach! Your XOR-based transforms are essentially implementing the Fast Walsh-Hadamard Transform (FWHT) on binary sequences. Here are some ideas for frequency domain De Bruijn enumeration:

Spectral Characterization:

  • De Bruijn sequences have specific autocorrelation properties - in frequency domain, this translates to constraints on Walsh coefficient magnitudes
  • Valid sequences should exhibit balanced spectral energy - certain Walsh coefficients must be zero or bounded
  • The linear span in frequency domain might reveal generator polynomial structures

Enumeration Strategies:

  1. Walsh Coefficient Constraints: Define admissibility conditions in transform domain - sequences passing these filters are candidate De Bruijn sequences

  2. LFSR Connection: Since LFSRs generate maximum-length sequences, explore how their characteristic polynomials manifest in Walsh spectrum

  3. Recursive Construction: Build larger De Bruijn sequences by spectral concatenation - combine frequency components of smaller valid sequences

  4. Frequency Domain Shifts: Use your transform to implement cyclic rotations efficiently - enumerate by applying spectral shifts to known sequences

Concrete Approach:

// Spectral enumeration pseudo-algorithm
for (candidate_sequence) {
    walsh_coeffs = fwht_transform(sequence);
    if (satisfies_debruijn_spectral_constraints(walsh_coeffs)) {
        validate_via_inverse_transform();
    }
}

The beauty is that frequency domain operations might reveal hidden symmetries in De Bruijn construction that aren't obvious in time domain!


✏️ Learnings added
Learnt from: imaami
PR: imaami/libcanth#12
File: src/rev.h:81-124
Timestamp: 2025-06-17T23:04:10.736Z
Learning: The user imaami has implemented XOR-based bit reversal functions that perform recursive Fourier transforms on bit pairs, treating binary sequences as 1-bit discrete signals. These implement the Fast Walsh-Hadamard Transform (FWHT) and are being used to explore frequency domain enumeration of De Bruijn sequences.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
src/parse.c (1)

14-21: Mirror the const change from the public header

If parse.h is fixed, update the definition accordingly and keep an un-qualified working pointer for strto*:

-parse_u64 (char *const str)
+parse_u64 (char const *const str)
 {
-	...
-	char *s = str;
+	...
+	char *s = (char *)str;   /* end-pointer */
src/rev.h (1)

81-124: Use compiler intrinsics for bit-reversal
Same observation as in the previous review: GCC/Clang provide __builtin_bitreverse32/64, MSVC offers _byteswap_ulong/uint64 or _rotl64 tricks. Wrapping them behind portability macros removes 100+ lines and emits a single rbit/brev instruction on capable CPUs.
(See earlier comment; not repeating the diff here.)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fed1f4 and 01b9389.

📒 Files selected for processing (4)
  • src/parse.c (1 hunks)
  • src/parse.h (1 hunks)
  • src/rev.h (1 hunks)
  • src/test-rev.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.[ch]`: Compiler extensions are leveraged whenever useful, but always for specific reasons. There are compatibility macros for everything.

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

  • src/test-rev.c
  • src/parse.h
  • src/rev.h
  • src/parse.c
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-macos-14
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04
🔇 Additional comments (2)
src/parse.c (2)

17-22: Store errno in an int, not int64_t

After changing the struct field, initialise with the proper width:

-		.err = !str ? EFAULT
-		            : !*str ? ENODATA : 0
+		.err = !str ? EFAULT
+		            : !*str ? ENODATA : 0

(The type update occurs only in the header; the initialiser remains valid.)


32-45: Portability: _Generic limits you to C11

Embedded or older-toolchain consumers may not support _Generic. If C11 is a hard requirement, document it in README; otherwise guard the construct with a feature-test macro and fall back to a simpler branch.

Comment on lines +21 to +23
extern struct parsed
parse_u64 (char *str);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add const to the input parameter

The parser never mutates the buffer; accepting a char * forces callers to cast away const.

-parse_u64 (char *str);
+parse_u64 (char const *str);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
extern struct parsed
parse_u64 (char *str);
extern struct parsed
parse_u64 (char const *str);
🤖 Prompt for AI Agents
In src/parse.h around lines 21 to 23, the function parse_u64 takes a char*
parameter but does not modify the input string. Change the parameter type from
char* to const char* to reflect that the input buffer is not mutated, improving
const-correctness and avoiding the need for callers to cast away const.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/rev.h (1)

81-124: Document the Walsh-Hadamard transform approach

These functions implement bit reversal through XOR-based involutory transformations (Fast Walsh-Hadamard Transform on binary sequences). Adding a brief comment explaining this mathematical approach would help future maintainers understand why these aren't standard bit manipulation implementations.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01b9389 and 4f04a48.

📒 Files selected for processing (3)
  • src/Makefile (2 hunks)
  • src/rev.h (1 hunks)
  • src/test-rev.c (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.[ch]

⚙️ CodeRabbit Configuration File

**/*.[ch]: Compiler extensions are leveraged whenever useful, but always for
specific reasons. There are compatibility macros for everything.

Files:

  • src/rev.h
  • src/test-rev.c
🧠 Learnings (12)
📓 Common learnings
Learnt from: imaami
PR: imaami/libcanth#2
File: src/utf8.c:148-224
Timestamp: 2024-08-02T16:00:50.589Z
Learning: The `utf8_flowchart` function in `src/utf8.c` should be conditionally included based on a debug flag (e.g., `ENABLE_UTF8_FLOWCHART`) to keep the code clean while allowing optional debugging functionality.
📚 Learning: the user imaami has implemented xor-based bit reversal functions that perform recursive fourier tran...
Learnt from: imaami
PR: imaami/libcanth#12
File: src/rev.h:81-124
Timestamp: 2025-06-17T23:04:10.767Z
Learning: The user imaami has implemented XOR-based bit reversal functions that perform recursive Fourier transforms on bit pairs, treating binary sequences as 1-bit discrete signals. These implement the Fast Walsh-Hadamard Transform (FWHT) and are being used to explore frequency domain enumeration of De Bruijn sequences.

Applied to files:

  • src/rev.h
📚 Learning: using int64_t instead of int in structs prevents unspecified padding bits that could leak old stack ...
Learnt from: imaami
PR: imaami/libcanth#12
File: src/parse.h:13-19
Timestamp: 2025-06-17T22:44:19.174Z
Learning: Using int64_t instead of int in structs prevents unspecified padding bits that could leak old stack bytes when structs are returned by value, providing both security and efficiency benefits.

Applied to files:

  • src/rev.h
📚 Learning: in the libcanth project, placing `assert()` statements below branch points in debugging builds avoid...
Learnt from: imaami
PR: imaami/libcanth#2
File: src/utf8.c:271-277
Timestamp: 2024-08-14T20:16:03.237Z
Learning: In the libcanth project, placing `assert()` statements below branch points in debugging builds avoids executing conditional instructions when the array index is valid, providing a slight optimization.

Applied to files:

  • src/rev.h
  • src/test-rev.c
📚 Learning: the `const_inline` macro is used in the codebase to support different compilers with varying levels ...
Learnt from: imaami
PR: imaami/libcanth#2
File: src/utf8.c:274-274
Timestamp: 2024-08-02T15:59:05.003Z
Learning: The `const_inline` macro is used in the codebase to support different compilers with varying levels of standards support and C extensions features. It is defined in `src/util.h` with attributes supported by both GCC and Clang.

Applied to files:

  • src/rev.h
📚 Learning: in bitfield increment logic like `div->bits += (condition) << offset`, the condition itself can serv...
Learnt from: imaami
PR: imaami/libcanth#12
File: src/debruijn25.c:110-122
Timestamp: 2025-05-30T17:45:13.094Z
Learning: In bitfield increment logic like `div->bits += (condition) << offset`, the condition itself can serve as clamping by only allowing increments when the field value is below the maximum. Always examine the full conditional expression before assuming overflow issues.

Applied to files:

  • src/rev.h
📚 Learning: when reviewing compile-time constant lookup tables, maintain their compile-time nature for efficienc...
Learnt from: imaami
PR: imaami/libcanth#12
File: src/json_internal.h:28-74
Timestamp: 2024-08-24T04:05:35.136Z
Learning: When reviewing compile-time constant lookup tables, maintain their compile-time nature for efficiency, and avoid suggesting runtime initialization unless there's a compelling reason.

Applied to files:

  • src/rev.h
📚 Learning: in the libcanth project, the loop style `for (int i = 0; ++i < c;)` is considered idiomatic for iter...
Learnt from: imaami
PR: imaami/libcanth#12
File: src/test-rev.c:16-23
Timestamp: 2025-06-17T22:06:36.667Z
Learning: In the libcanth project, the loop style `for (int i = 0; ++i < c;)` is considered idiomatic for iterating command-line arguments in main() functions. The author uses this pattern as a "landmark" to quickly recognize argument loops and prefers it over the more conventional `for (int i = 1; i < c; ++i)` style.

Applied to files:

  • src/test-rev.c
📚 Learning: the user imaami prefers using the idiom `for (int i = 0; ++i < argc;)` for iterating over `argv` and...
Learnt from: imaami
PR: imaami/libcanth#2
File: src/test-utf8.c:20-20
Timestamp: 2024-08-02T16:10:07.396Z
Learning: The user imaami prefers using the idiom `for (int i = 0; ++i < argc;)` for iterating over `argv` and skipping `argv[0]` in their code.

Applied to files:

  • src/test-rev.c
📚 Learning: in src/parse.c, the parse_u64 function intentionally rejects any trailing characters (including whit...
Learnt from: imaami
PR: imaami/libcanth#12
File: src/parse.c:46-48
Timestamp: 2025-06-17T22:10:25.866Z
Learning: In src/parse.c, the parse_u64 function intentionally rejects any trailing characters (including whitespace) by setting EINVAL, following a design philosophy that the caller should decide what constitutes valid input rather than the parser making assumptions about acceptable trailing characters.

Applied to files:

  • src/test-rev.c
📚 Learning: in the libcanth project, `assert()` statements are intentionally used in debug builds to catch poten...
Learnt from: imaami
PR: imaami/libcanth#2
File: src/utf8.c:271-277
Timestamp: 2024-08-14T20:12:46.800Z
Learning: In the libcanth project, `assert()` statements are intentionally used in debug builds to catch potential out-of-bounds errors, while they become no-ops in release builds due to `NDEBUG` being defined.

Applied to files:

  • src/test-rev.c
📚 Learning: in the libcanth project, the approach is to use `-weverything` to capture all reasonable warnings an...
Learnt from: imaami
PR: imaami/libcanth#2
File: libcanth.cxxflags:1-1
Timestamp: 2024-08-09T16:20:45.092Z
Learning: In the libcanth project, the approach is to use `-Weverything` to capture all reasonable warnings and then apply `-Wno-*` options to silence those that add no useful information.

Applied to files:

  • src/test-rev.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-ubuntu-22_04
  • GitHub Check: build-ubuntu-24_04
🔇 Additional comments (3)
src/Makefile (1)

29-30: Missing DBG_test-rev variable

test-rev lacks a corresponding DBG_test-rev variable while other test targets have debug sources defined. If this is intentional (no debug output needed), it's fine. Otherwise, consider adding override DBG_test-rev := dbg.c for consistency.

src/rev.h (1)

27-124: Bit reversal implementations are correct

All reversal functions correctly implement the XOR-based involutory transformation pattern. The bit masks and shift amounts are accurate for each bit width.

src/test-rev.c (1)

45-86: Testing logic is comprehensive

The cascading range checks correctly ensure values are tested only at appropriate bit widths. Input validation and error handling via parsed_ok is proper.

Comment on lines +16 to +39
__attribute__((const))
static struct asc4
bin_asc (uint4 x)
{

union {
char c[35U];
unsigned char u[35U];
} const t = {
"1111000010011010" "111"
"\x04""\x05""\x06"
"\x09""\x07""\x0d"
"\x0a""\x0f""\x03"
"\x08""\x0c""\x0e"
"\x02""\x0b""\x01"
};
unsigned const i = t.u[
19U + uint4_get(x)
];
return (struct asc4){ .d = {
t.c[i+0U], t.c[i+1U],
t.c[i+2U], t.c[i+3U]
}};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unused function bin_asc

The bin_asc function is defined but never called. Either use it or remove it to avoid dead code.

🤖 Prompt for AI Agents
In src/test-rev.c around lines 16 to 39, the function bin_asc is defined but
never used, leading to dead code. To fix this, either remove the bin_asc
function entirely if it is not needed or add calls to it from relevant parts of
the codebase where its functionality is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants