All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Calvin Wan <calvinwan@google.com>
Cc: git@vger.kernel.org, jonathantanmy@google.com,
	philipoakley@iee.email, johncai86@gmail.com, me@ttaylorr.com
Subject: Re: [PATCH v4 5/8] transport: add client side capability to request object-info
Date: Mon, 02 May 2022 17:54:38 -0700	[thread overview]
Message-ID: <xmqqilqnsaep.fsf@gitster.g> (raw)
In-Reply-To: <20220502170904.2770649-6-calvinwan@google.com> (Calvin Wan's message of "Mon, 2 May 2022 17:09:01 +0000")

Calvin Wan <calvinwan@google.com> writes:

> The question of version mismatch often comes up with client/server
> relationships. There are two cases to consider here (assuming either
> client or server functionality for object-info changes between the
> different versions):
>
>  1. client version > server version
>  	- client can request additional attributes from server
>  2. server version > client version
>  	- server can return additional info to the client
>
> The second case is a non-issue since the client would never be able to
> request additional info from the server.

That assumes that we can never remove attributes once we start
supporting them, so it is not a non-issue exactly.  The way we make
it a non-issue is to make sure that the server returns no more than
what the client has asked, i.e. the same way we deal with the first
case.

> In order to solve the first
> case, recall an earlier patch where the server sends back the attributes
> even if no object ids are sent by the client. This allows the client to
> first check whether the server can accept the requested attributes
> before sending the entire request.

It requires two round-trips when the server does not know something
the client has asked, one without any object names, and then a real
request with the list of object names?

I think in practice, because the way the previous step is
structured, you can send a full request and in the non-error
codepath the server side will understand everything and give full
response back without an extra round-trip.  Only in the error case,
the client will see the set of acceptable attributes and no object
data (and implicitly treat that as an indication of an error), but
then what would the client do after that happens?

 - The client will just give up; the attributes it wanted but the
   server couldn't supply are so fundamentally necessary to perform
   whatever the client wanted to do.

 - The client will retry the request by asking a known subset of
   attributes for the same set of objects.  This requires an extra
   round-trip.  If an earlier step didn't make .unknown case bypass
   the response loop, this extra round-trip would have been totally
   unnecessary.

> index 8c7752fc82..c27018d48c 100644
> --- a/fetch-pack.h
> +++ b/fetch-pack.h
> @@ -69,6 +69,12 @@ struct fetch_pack_args {
>  	unsigned connectivity_checked:1;
>  };
>  
> +struct object_info_args {
> +	const struct string_list *object_info_options;
> +	const struct string_list *server_options;
> +	const struct oid_array *oids;
> +};

These are const pointers because these lists are not owned by this
structure and are borrowed from somebody else?

> diff --git a/transport-helper.c b/transport-helper.c
> index b4dbbabb0c..093946f7fd 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -686,13 +686,16 @@ static int fetch_refs(struct transport *transport,
>  
>  	/*
>  	 * If we reach here, then the server, the client, and/or the transport
> -	 * helper does not support protocol v2. --negotiate-only requires
> -	 * protocol v2.
> +	 * helper does not support protocol v2. --negotiate-only and --object-info
> +	 * require protocol v2.
>  	 */

OK.

>  	if (data->transport_options.acked_commits) {
>  		warning(_("--negotiate-only requires protocol v2"));
>  		return -1;
>  	}
> +	if (transport->smart_options->object_info) {
> +		die(_("--object-info requires protocol v2"));
> +	}

OK.

> diff --git a/transport.c b/transport.c
> index 3d64a43ab3..08c505e1d0 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -353,6 +353,79 @@ static struct ref *handshake(struct transport *transport, int for_push,
>  	return refs;
>  }
>  
> +static int fetch_object_info(struct transport *transport, struct object_info **object_info_data)
> +{
> +	size_t i;
> +	int size_index = -1;
> +	struct git_transport_data *data = transport->data;
> +	struct object_info_args args;
> +	struct packet_reader reader;
> +	struct string_list server_attributes = STRING_LIST_INIT_DUP;
> +
> +	memset(&args, 0, sizeof(args));
> +	args.server_options = transport->server_options;
> +	args.object_info_options = transport->smart_options->object_info_options;

It is unclear who fills smart_options->object_info_options and from
what.  Are we referring to something that is not yet used, or worse
yet, code that is not yet written?

It seems that the result of applying 1-5/8 does not compile.

> +	connect_setup(transport, 0);
> +	packet_reader_init(&reader, data->fd[0], NULL, 0,
> +			PACKET_READ_CHOMP_NEWLINE |
> +			PACKET_READ_DIE_ON_ERR_PACKET);
> +	data->version = discover_version(&reader);
> +
> +	transport->hash_algo = reader.hash_algo;
> +
> +	switch (data->version) {
> +	case protocol_v2:
> +		if (!server_supports_v2("object-info", 0))
> +			return -1;
> +		/**
> +		 * Send a request with only attributes first. If server can return all
> +		 * of the requested attributes, then send request with object ids
> +		 */

Doesn't it force the most pessimistic behaviour, i.e. always do a
likely-to-be-useless (when the feature becomes widely avaiable)
probe round-trip before making a real request?

> +		send_object_info_request(data->fd[1], &args);
> +		if (packet_reader_read(&reader) != PACKET_READ_NORMAL) {
> +			check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> +			return -1;
> +		}

All these check_stateless_delimiter() calls in this patch are on
overly long lines.

> +		string_list_split(&server_attributes, reader.line, ' ', -1);
> +		packet_reader_read(&reader);
> +		check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> +		if (server_attributes.nr != args.object_info_options->nr)
> +			return -1;

So, I am guessing (without having the code that prepares the right
hand side) that we are comparing the number of attributes we asked
(on the right hand side) and the number of attributes the other side
said it supports in response to our request (on the left hand
side).  

This does not protect against a server that responds with attribute
names duplicated or any somewhat odd third-party servers that are
not identical to what you wrote here.  I would rather see us to loop
through object_info_options array and see if server_attributes list
says all of them are supported more explicitly.  That can be done in
the loop below, where you'd figure out the "foo_index" for each
attribute "foo" you are supporting in the code.

> +		for (i = 0; i < server_attributes.nr; i++) {
> +			if (!strcmp(server_attributes.items[i].string, "size"))
> +				size_index = i + 1;
> +		}

size_index was initialized to -1 and I was expecting we can tell the
attribute is not used by checking (size_index < 0), but this seems
to make size_index 1 based, not 0 based.  Intended?

I think this is to skip the object name that comes as the first item
in the response, but it would be more "pure" to keep foo_index
0-based and add the offset by whatever constant number of things
that come in front (currently, 1 for object name) near a lot closer
to where we parse and read the data, i.e. in the while() loop below.

> +		args.oids = transport->smart_options->object_info_oids;
> +		send_object_info_request(data->fd[1], &args);

And this is the second round-trip.

> +		break;
> +	case protocol_v1:
> +	case protocol_v0:
> +		die(_("wrong protocol version. expected v2"));
> +	case protocol_unknown_version:
> +		BUG("unknown protocol version");
> +	}
> +
> +	if (packet_reader_read(&reader) != PACKET_READ_NORMAL)
> +		die(_("error reading object info header"));
> +	i = 0;
> +	while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
> +		struct string_list object_info_values = STRING_LIST_INIT_DUP;
> +
> +		string_list_split(&object_info_values, reader.line, ' ', -1);
> +		if (size_index > 0) {
> +			if (!strcmp(object_info_values.items[size_index].string, ""))
> +				die("object-info: not our ref %s",
> +					object_info_values.items[0].string);
> +			*(*object_info_data)[i].sizep = strtoul(object_info_values.items[size_index].string, NULL, 10);
> +		}
> +		i++;
> +	}
> +	check_stateless_delimiter(transport->stateless_rpc, &reader, "stateless delimiter expected");
> +
> +	return 0;
> +}

All of the above look written for the best case scenario (i.e. it
assumes there won't be any unexpected type of response), which makes
a good fuzz target.  A malicious server side can throw non-number in
the "size" slot and strtoul() does not check for a malformatted
number, or it can return more response records than it was
requested, and overflow object_info_data[] array the caller of this
function prepared.

smart_options->object_info_oids may know how many response records
we should expect, so at least this while() loop should be able to
protect against an object_info_data[] overflow attack using it as
the upper limit of i.

>  static struct ref *get_refs_via_connect(struct transport *transport, int for_push,
>  					struct transport_ls_refs_options *options)
>  {
> @@ -392,8 +465,14 @@ static int fetch_refs_via_pack(struct transport *transport,
>  	args.server_options = transport->server_options;
>  	args.negotiation_tips = data->options.negotiation_tips;
>  	args.reject_shallow_remote = transport->smart_options->reject_shallow;
> +	args.object_info = transport->smart_options->object_info;
>  
> -	if (!data->got_remote_heads) {
> +	if (transport->smart_options && transport->smart_options->object_info) {
> +		if (!fetch_object_info(transport, data->options.object_info_data))
> +			goto cleanup;
> +		ret = -1;
> +		goto cleanup;
> +	} else if (!data->got_remote_heads) {
>  		int i;
>  		int must_list_refs = 0;
>  		for (i = 0; i < nr_heads; i++) {
> diff --git a/transport.h b/transport.h
> index 12bc08fc33..dbf60bb710 100644
> --- a/transport.h
> +++ b/transport.h
> @@ -6,6 +6,7 @@
>  #include "remote.h"
>  #include "list-objects-filter-options.h"
>  #include "string-list.h"
> +#include "object-store.h"
>  
>  struct git_transport_options {
>  	unsigned thin : 1;
> @@ -31,6 +32,12 @@ struct git_transport_options {
>  	 */
>  	unsigned connectivity_checked:1;
>  
> +	/*
> +	 * Transport will attempt to pull only object-info. Fallbacks
> +	 * to pulling entire object if object-info is not supported
> +	 */
> +	unsigned object_info : 1;
> +
>  	int depth;
>  	const char *deepen_since;
>  	const struct string_list *deepen_not;
> @@ -54,6 +61,10 @@ struct git_transport_options {
>  	 * common commits to this oidset instead of fetching any packfiles.
>  	 */
>  	struct oidset *acked_commits;
> +
> +	struct oid_array *object_info_oids;
> +	struct object_info **object_info_data;
> +	struct string_list *object_info_options;
>  };
>  
>  enum transport_family {

  reply	other threads:[~2022-05-03  1:16 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 23:19 [PATCH] fetch —object-info-format: client option for object-info Calvin Wan
2022-02-08 23:56 ` [PATCH v2] " Calvin Wan
2022-02-09 12:48   ` Philip Oakley
2022-02-10 22:32     ` Calvin Wan
2022-02-09 20:41   ` [PATCH v2] fetch object-info-format: " Jonathan Tan
2022-02-10 22:58     ` Calvin Wan
2022-03-28 19:11   ` [PATCH v3 0/3] object-info: add option for retrieving object info Calvin Wan
2022-03-28 19:11     ` [PATCH v3 1/3] fetch-pack: refactor packet writing and fetch options Calvin Wan
2022-03-29 22:54       ` Junio C Hamano
2022-03-29 23:01       ` Taylor Blau
2022-03-30 21:55       ` Jonathan Tan
2022-03-28 19:11     ` [PATCH v3 2/3] transfer.advertiseObjectInfo: add object-info config Calvin Wan
2022-03-29 22:34       ` Junio C Hamano
2022-03-29 22:48         ` Calvin Wan
2022-03-29 23:09           ` Taylor Blau
2022-03-28 19:11     ` [PATCH v3 3/3] object-info: add option for retrieving object info Calvin Wan
2022-03-29 19:57       ` Junio C Hamano
2022-03-29 22:54       ` Junio C Hamano
2022-03-30  0:49         ` Junio C Hamano
2022-03-30 22:31           ` Calvin Wan
2022-03-30 22:43         ` Jonathan Tan
2022-03-30 23:42           ` Junio C Hamano
2022-03-29 23:19       ` Taylor Blau
2022-03-30 22:47         ` Calvin Wan
2022-03-30 22:06       ` John Cai
2022-03-31 19:56         ` Calvin Wan
2022-04-01 16:16           ` Junio C Hamano
2022-03-30 22:07       ` Jonathan Tan
2022-03-30 22:12       ` Josh Steadmon
2022-03-30 22:46         ` Calvin Wan
2022-03-29 20:35     ` [PATCH v3 0/3] " Junio C Hamano
2022-03-29 22:40       ` Calvin Wan
2022-03-31  1:50     ` Junio C Hamano
2022-05-02 17:08     ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
2022-05-02 17:08       ` [PATCH v4 1/8] fetch-pack: refactor packet writing Calvin Wan
2022-05-02 17:08       ` [PATCH v4 2/8] fetch-pack: move fetch default settings Calvin Wan
2022-05-02 22:58         ` Junio C Hamano
2022-05-03 23:06           ` Jonathan Tan
2022-05-05 18:13             ` Calvin Wan
2022-05-02 17:08       ` [PATCH v4 3/8] object-store: add function to free object_info contents Calvin Wan
2022-05-02 23:23         ` Junio C Hamano
2022-05-04 19:09           ` Junio C Hamano
2022-05-05  0:15           ` Junio C Hamano
2022-05-05 16:47             ` Calvin Wan
2022-05-05 17:01               ` Junio C Hamano
2022-05-02 17:09       ` [PATCH v4 4/8] object-info: send attribute packet regardless of object ids Calvin Wan
2022-05-03  0:05         ` Junio C Hamano
2022-05-03 19:21           ` Calvin Wan
2022-05-03 23:11         ` Jonathan Tan
2022-05-02 17:09       ` [PATCH v4 5/8] transport: add client side capability to request object-info Calvin Wan
2022-05-03  0:54         ` Junio C Hamano [this message]
2022-05-03 18:58           ` Calvin Wan
2022-05-04 15:42             ` Junio C Hamano
2022-05-03 23:15         ` Jonathan Tan
2022-05-04 15:50           ` Junio C Hamano
2022-05-02 17:09       ` [PATCH v4 6/8] transport: add object-info fallback to fetch Calvin Wan
2022-05-03 23:27         ` Jonathan Tan
2022-05-02 17:09       ` [PATCH v4 7/8] cat-file: move parse_cmd and DEFAULT_FORMAT up Calvin Wan
2022-05-02 17:09       ` [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command Calvin Wan
2022-05-04 21:27         ` Jonathan Tan
2022-05-05 18:13           ` Calvin Wan
2022-05-05 18:44             ` Junio C Hamano
2022-05-05 19:09               ` Junio C Hamano
2022-05-05 19:19                 ` Calvin Wan
2022-07-31 15:24         ` ZheNing Hu
2022-08-08 17:43           ` Calvin Wan
2022-07-28 23:02       ` [PATCH v5 0/6] " Calvin Wan
2022-07-28 23:56         ` Junio C Hamano
2022-07-29  0:02           ` Junio C Hamano
2022-07-31  8:41         ` Phillip Wood
2022-08-04 22:57           ` Calvin Wan
2022-09-30 23:23         ` Junio C Hamano
2022-07-28 23:02       ` [PATCH v5 1/6] fetch-pack: refactor packet writing Calvin Wan
2022-07-28 23:02       ` [PATCH v5 2/6] fetch-pack: move fetch initialization Calvin Wan
2022-07-28 23:02       ` [PATCH v5 3/6] protocol-caps: initialization bug fix Calvin Wan
2022-07-29 17:51         ` Junio C Hamano
2022-07-28 23:02       ` [PATCH v5 4/6] serve: advertise object-info feature Calvin Wan
2022-07-29 17:57         ` Junio C Hamano
2022-08-01 18:28           ` Calvin Wan
2022-08-01 18:44             ` Ævar Arnfjörð Bjarmason
2022-08-01 18:47             ` Junio C Hamano
2022-08-01 18:58               ` Calvin Wan
2022-07-28 23:02       ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
2022-07-29 18:06         ` Junio C Hamano
2022-08-04 20:28           ` Calvin Wan
2022-08-01 13:30         ` Phillip Wood
2022-08-04 22:20           ` Calvin Wan
2022-08-08 10:07             ` Phillip Wood
2022-08-01 14:26         ` Phillip Wood
2022-08-08  9:16         ` Phillip Wood
2022-07-28 23:02       ` [PATCH v5 6/6] cat-file: add remote-object-info to batch-command Calvin Wan
2022-07-29  6:25         ` Ævar Arnfjörð Bjarmason
2022-08-07  5:50           ` ZheNing Hu
2022-08-08 18:07           ` Calvin Wan
2022-08-11 10:58             ` Ævar Arnfjörð Bjarmason
2022-07-29 18:21         ` Junio C Hamano
2022-08-08 18:37           ` Calvin Wan
2022-09-28 13:12         ` Ævar Arnfjörð Bjarmason
2022-07-31 15:02       ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command ZheNing Hu
2022-08-08 17:32         ` Calvin Wan
2022-08-13 22:17       ` Junio C Hamano
2022-02-09 19:09 ` [PATCH] fetch —object-info-format: client option for object-info John Cai
2022-02-10 22:49   ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqilqnsaep.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.