Add tracing for physical connection open#6091
Conversation
| _connectionLogger = dataSourceConfig.LoggingConfiguration.ConnectionLogger; | ||
| _commandLogger = dataSourceConfig.LoggingConfiguration.CommandLogger; | ||
|
|
||
| // Make sure we do not flow AsyncLocals like Activity.Current |
There was a problem hiding this comment.
👍
Though a part of me thinks we should maybe start the activity only after the data source has been set up - that really isn't part of physical connection open. What do you think?
| // At no point should we ever have an activity here | ||
| Debug.Assert(Activity.Current is null); | ||
| // Set current activity as the one from the command | ||
| // So child activities from physical open are bound to it |
There was a problem hiding this comment.
Good thinking here.
Though at some point we should have the conversation about whether we want to keep multiplexing, given that it's known to be non-scalable (e.g. and is no longer used in the TE benchmarks).
| LogMessages.OpeningPhysicalConnection(ConnectionLogger, Host, Port, Database, UserFacingConnectionString); | ||
| var startOpenTimestamp = Stopwatch.GetTimestamp(); | ||
|
|
||
| var activity = NpgsqlActivitySource.ConnectionOpen(this); |
There was a problem hiding this comment.
As written below, there may be a case for only starting the activity after the data source is created and bootstrapped. Otherwise, the first "physical connection" is longer because it contains e.g. type loading and stuff like that; at minimum that skews stuff like average time calculations for physical open.
If we really wanted to surface startup costs like type loading properly, we could have another activity for that - though that seems like a bit much.
There was a problem hiding this comment.
As written below, there may be a case for only starting the activity after the data source is created and bootstrapped. Otherwise, the first "physical connection" is longer because it contains e.g. type loading and stuff like that; at minimum that skews stuff like average time calculations for physical open.
I do not think anyone really calculates average time for any sort of traces (too much stuff to store), there are metrics specifically for stuff like this (for example, in Npgslq we have db.client.commands.duration). Beside, right now when we log physical open we pass the total time, including type loading.
If we really wanted to surface startup costs like type loading properly, we could have another activity for that - though that seems like a bit much.
This might actually be not a bad idea, since this will also allow us to trace NpgsqlConnection.ReloadTypes.
There was a problem hiding this comment.
Right, makes sense. If it's not much work/changes, I'd try to exclude type loading from the activity span you're introducing here (simply because it really shouldn't be part of it). We can consider introducing a new activity for it, though I don't think that's very important.
There was a problem hiding this comment.
Right, makes sense. If it's not much work/changes, I'd try to exclude type loading from the activity span you're introducing here (simply because it really shouldn't be part of it).
The main problem is that we do consider it a part of physical open. That is, if we get an exception during bootstrap or from DataSourceConnectionInitializer, we'll break the physical connection. In that case, should we pass that exception to activity? It would be really weird if we don't (since we'll throw the exception but the trace will be green, even though the connection is already closed). But then, that means that we can't close the activity until we're absolutely sure the connection is stable.
There was a problem hiding this comment.
Or to be more specific, we can specify Activity.Duration via Activity.SetEndTime, so while it's still be considered as active, the duration will remains as is, but IDK
There was a problem hiding this comment.
I do not think anyone really calculates average time for any sort of traces (too much stuff to store), there are metrics specifically for stuff like this (for example, in Npgslq we have db.client.commands.duration). Beside, right now when we log physical open we pass the total time, including type loading.
FWIW calculating aggregate metrics from tracing data is a pretty standard thing to do, and indeed we could also not include the type loading time when logging.
But good point about the exception. I still think that fundamentally, type loading really shouldn't be counted as part of physical connection open; physical connections get opened all the time, whereas type loading is a one-time startup query that we happen to need (unless it's disabled by the user). It seems to just weirdly skew things to look at it as part of connection open.
But I'm OK with setting this aside for now and moving on - we can always change this in the future if we want.
Closes #4136