Skip to content

Adding support of transient federates#358

Open
ChadliaJerad wants to merge 245 commits into
mainfrom
transient-fed
Open

Adding support of transient federates#358
ChadliaJerad wants to merge 245 commits into
mainfrom
transient-fed

Conversation

@ChadliaJerad

@ChadliaJerad ChadliaJerad commented Feb 20, 2024

Copy link
Copy Markdown
Collaborator

This PR replaces #192, and supports the latest major refactoring of the RTI.

It implements the transient feature in the federation execution. Details of the implementation are documented in Discussion lf-lang/lingua-franca#2212.

This should be merged with lf-lang/lingua-franca#2213


Merged #574

This PR builds on the centralized transient support from the #358 PR of transient-fed branch, extending the reactor-c runtime to support transient federates under decentralized coordination, where all connections are P2P and no RTI message forwarding occurs. It also adds support for physical connections involving transient federates under centralized coordination as well, which are P2P connections.

lf-lang/lingua-franca#2609

Protocol extensions for decentralized transients

Connection

  • lf_connect_to_federate(), called when executing the preamble, is updated to accept an is_transient flag; transient outbounds are queried once (no retry loop) and connected only when the RTI sends MSG_TYPE_OUTBOUND_CONNECTED. Non-transient outbounds still use the original pattern and keep retrying every ADDRESS_QUERY_RETRY_INTERVAL until connected.
  • The RTI keeps track of which transients are outbound of each federate in the outbound_transients array and updates number_of_outbound_transients accordingly.
  • MSG_TYPE_OUTBOUND_CONNECTED and MSG_TYPE_OUTBOUND_DISCONNECTED are new messages. The RTI sends these to the federate whose downstream peers have connected or disconnected. When an outbound transient connects, the federate queries its address from the RTI. When it disconnects, the federate skips sending messages, thus avoiding an error writing to a broken pipe.
  • MSG_TYPE_ADDRESS_QUERY is extended with an is_transient byte so the RTI can register the querying federate's outbound-transient relationships.
  • outbound_p2p_connection_is_transient[NUMBER_OF_FEDERATES] and inbound_p2p_connection_is_transient[NUMBER_OF_FEDERATES] are added to federate_instance_t.

Adaptations

  • mark_inputs_known_absent(): for inbound transients, the tag is set to env->current_tag instead of FOREVER_TAG when the P2P socket closes. FOREVER_TAG permanently blocks ports from being updated after a transient rejoins, causing spurious Attempt to update to earlier tag errors and outbound STP violations. env->current_tag is sufficient to unblock the scheduler while leaving the port open to future updates.
  • Deferred P2P connection in get_start_time_from_rti(): MSG_TYPE_OUTBOUND_CONNECTED arriving during the start-time handshake is now deferred — the federate ID is drained from the socket immediately, but lf_connect_to_federate() (which itself reads from the RTI socket) is called only after MSG_TYPE_TIMESTAMP is received. This eliminates a race condition where the address-query reply consumed the timestamp bytes, leading to Unexpected reply of type 2.
  • notify_federate_disconnected(): now calls send_outbound_disconnected_locked() in addition to the existing send_upstream_disconnected_locked() calls, so inbound federates of a departing transient close their outbound P2P sockets.
  • RTI shutdown: suppressed the spurious WARNING: Failed to accept the socket. Invalid argument that fired because accept() returns EINVAL when unblocked by the intentional shutdown_socket() call at end of execution.

Tracing and visualization (fedsd)

  • send_OUTBOUND_CONNECTED / receive_OUTBOUND_CONNECTED and send_OUTBOUND_DISCONNECTED / receive_OUTBOUND_DISCONNECTED trace events are added to trace_types.h and fedsd.py.
  • P2P_MSG arrows now connect sender and receiver correctly in fedsd: matching is done by physical-time ordering rather than partner_id (which is -1 on both sides for direct P2P tracepoints).

@edwardalee edwardalee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start, but I have some doubts about the concurrency handling as indicated in the comments. I will have to finish reviewing later, but I think there is enough to work with to go ahead and submit this provisional review.

Comment thread core/federated/RTI/main.c Outdated
Comment thread core/federated/RTI/main.c Outdated
Comment thread core/federated/RTI/main.c Outdated
Comment thread core/federated/RTI/main.c Outdated
Comment thread core/federated/RTI/rti_common.c Outdated
Comment thread core/federated/RTI/rti_remote.c Outdated
Comment thread core/federated/RTI/rti_remote.c Outdated
Comment thread core/federated/RTI/rti_remote.c Outdated
Comment thread core/federated/RTI/rti_remote.c Outdated
Comment thread core/federated/RTI/rti_remote.c Outdated
cmnrd
cmnrd previously requested changes Feb 22, 2024
Comment thread include/core/federated/federate.h Outdated
Comment thread include/core/federated/federate.h Outdated
Comment thread include/core/federated/network/net_common.h Outdated
@cmnrd

cmnrd commented Feb 23, 2024

Copy link
Copy Markdown
Contributor

In the "files changed" view, you can add suggestions to batch and commit them all at once. This way, we can avoid having lots of commit messages without a descriptive commit message.

@lhstrh

lhstrh commented Feb 23, 2024

Copy link
Copy Markdown
Member

In the "files changed" view, you can add suggestions to batch and commit them all at once. This way, we can avoid having lots of commit messages without a descriptive commit message.

You probably want to use an interactive rebase to squash or even fixup all these commits...

@edwardalee edwardalee marked this pull request as draft February 27, 2024 14:53
Comment thread core/federated/federate.c Outdated
@coderabbitai

coderabbitai Bot commented Jul 10, 2024

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

…r the effective start tag and the port and ip adddress of the peer federate
…start tag, otherwise it stays forever. This case happens whenever an outbount transient has connected when waiting for the timestamp
…sient + keep the current tag of an outbound transient"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants