-
Notifications
You must be signed in to change notification settings - Fork 340
mcp: improve http transports error handling and make transport work with any size message #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
findleyr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this change in #726. I'm not sure we shouldn't just remove the buffering.
91f93af to
ee3d418
Compare
ee3d418 to
85a4340
Compare
|
@findleyr current proposal (no fixed buffer size) approach is inline with most sdks. Can we get this merged? |
|
@alexbumbacea sorry for the delay, will review tomorrow. |
mcp/event.go
Outdated
| yield(Event{}, fmt.Errorf("context done: %w", ctx.Err())) | ||
| return | ||
| } else { | ||
| yield(Event{}, fmt.Errorf("error reading event: %w", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to expose these errors in the API: we can just format them with %v.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear why we don't wat to wrap it and instead we just want to print it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wrap an error, then callers can extract the error. That means the error type becomes part of your effective API. Which means you can't change it.
If you just include the error text, you don't have that problem. If later, someone needs to distinguish particular errors, we can design the proper API for them.
mcp/event.go
Outdated
| const maxTokenSize = 1 * 1024 * 1024 // 1 MiB max line size | ||
| scanner.Buffer(nil, maxTokenSize) | ||
| func scanEvents(ctx context.Context, r io.Reader) iter.Seq2[Event, error] { | ||
| scanner := bufio.NewReader(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'reader'
mcp/event.go
Outdated
| scanner := bufio.NewScanner(r) | ||
| const maxTokenSize = 1 * 1024 * 1024 // 1 MiB max line size | ||
| scanner.Buffer(nil, maxTokenSize) | ||
| func scanEvents(ctx context.Context, r io.Reader) iter.Seq2[Event, error] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding a ctx parameter? Does not appear to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used on L117 to stop the iteration on cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, ok... but is that necessary? Wouldn't a context cancellation close the response body?
mcp/event.go
Outdated
| yield(Event{}, fmt.Errorf("context done: %w", ctx.Err())) | ||
| return | ||
| } else { | ||
| yield(Event{}, fmt.Errorf("error reading event: %w", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wrap an error, then callers can extract the error. That means the error type becomes part of your effective API. Which means you can't change it.
If you just include the error text, you don't have that problem. If later, someone needs to distinguish particular errors, we can design the proper API for them.
mcp/event.go
Outdated
| return true | ||
| } | ||
| for { | ||
| line, err := scanner.ReadBytes('\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we think this is a potential DOS attack vector? True, the client is reading from the server, which it presumably trusts in many other ways. But a malicious server could make the client crash by feeding it a large event.
What about a high and configurable upper limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was already discussed in #726 that having this kind of limit would not protect against this in the event the server would present that as a multi "data" event.
Looking into other implementations of the MCP client (other languages and even other written in go) they do not impose such a limitation.
mcp/event.go
Outdated
| scanner := bufio.NewScanner(r) | ||
| const maxTokenSize = 1 * 1024 * 1024 // 1 MiB max line size | ||
| scanner.Buffer(nil, maxTokenSize) | ||
| func scanEvents(ctx context.Context, r io.Reader) iter.Seq2[Event, error] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used on L117 to stop the iteration on cancellation.
85a4340 to
7ba1c9a
Compare
|
@findleyr all comments have been either replied to or addressed. are there any other concerns? |
mcp/event.go
Outdated
| if !bytes.Equal(before, dataKey) { | ||
| flushData() | ||
| // If we're starting a new event field and we already have an event, emit it first | ||
| if bytes.Equal(before, eventKey) && !evt.Empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right: the event: field does not need to be the first field in the event. Maybe add a failing test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed and added tests the scenario you mentioned
mcp/event.go
Outdated
| dataBuf *bytes.Buffer // if non-nil, preceding field was also data | ||
| ) | ||
| flushData := func() { | ||
| emitEvent := func() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/emitEvent/yieldEvent
mcp/event.go
Outdated
| scanner := bufio.NewScanner(r) | ||
| const maxTokenSize = 1 * 1024 * 1024 // 1 MiB max line size | ||
| scanner.Buffer(nil, maxTokenSize) | ||
| func scanEvents(ctx context.Context, r io.Reader) iter.Seq2[Event, error] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, ok... but is that necessary? Wouldn't a context cancellation close the response body?
3577e96 to
377a108
Compare
377a108 to
8f634ce
Compare
|
@findleyr is there anything else? |
8f634ce to
9fc710b
Compare
9fc710b to
9792dd9
Compare
|
@findleyr can we get this merged? |
|
@alexbumbacea very sorry, I thought I had merged this. Total oversight on my part. @maciej-kisiel we may want to cherry-pick this into the release, or do a second prerelease. |
|
|
||
| case data := <-c.incoming: | ||
| case m := <-c.incoming: | ||
| if m.err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but returning (nil, nil) here is not correct. Previously the code would simply not receive the message: now it's reading a nil message, which is almost certain to cause a nil pointer exception.
I think we should simply not surface this error to the JSON-RPC layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the return here, just a TODO comment to have track of the idea that it needs to be handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not right: we'd proceed to process the nil message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not surfacing the error at all, would require even more extensive refactoring. the best I can think of it's to wrap the error in a jsonrpc.Response
| // Network errors during reading should trigger reconnection, not permanent failure. | ||
| // Return from processStream so handleSSE can attempt to reconnect. | ||
| c.logger.Debug(fmt.Sprintf("%s: stream read error (will attempt reconnect): %v", requestSummary, err)) | ||
| return lastEventID, reconnectDelay, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I think we actually need to be more subtle than this: a malformed event is a hard error, and different from a reader error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did some adaptaptation. let me know WDYT
|
Apologies again for this getting lost in the weeds: I think there are a few things that need to be addressed before merging. |
b434d37 to
da1ac52
Compare
|
|
||
| case data := <-c.incoming: | ||
| case m := <-c.incoming: | ||
| if m.err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not right: we'd proceed to process the nil message.
da1ac52 to
d534f57
Compare
We have a use case for messages larger than 1MB, hence making this configurable would help.
In order to discover the source of our problems we had to spend some time investigating, as errors were silently discarded. So we also included changes to improving surfacing errors .