All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Calvin Wan <calvinwan@google.com>
Cc: Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Philip Oakley <philipoakley@iee.email>,
	johncai86@gmail.com, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command
Date: Sun, 31 Jul 2022 23:24:07 +0800	[thread overview]
Message-ID: <CAOLTT8RyDCokvJBaC1CCoT2+3AE1kOn7evjkTMK_V6KWoaH30g@mail.gmail.com> (raw)
In-Reply-To: <20220502170904.2770649-9-calvinwan@google.com>

Calvin Wan <calvinwan@google.com> 于2022年5月3日周二 06:33写道:
>
> Since the `info` command in cat-file --batch-command prints object info
> for a given object, it is natural to add another command in cat-file
> --batch-command to print object info for a given object from a remote.
> Add `remote-object-info` to cat-file --batch-command.
>
> While `info` takes object ids one at a time, this creates overhead when
> making requests to a server so `remote-object-info` instead can take
> multiple object ids at once.
>
> cat-file --batch-command is generally implemented in the following
> manner:
>
>  - Receive and parse input from user
>  - Call respective function attached to command
>  - Set batch mode state, get object info, print object info
>
> In --buffer mode, this changes to:
>
>  - Receive and parse input from user
>  - Store respective function attached to command in a queue
>  - After flush, loop through commands in queue
>     - Call respective function attached to command
>     - Set batch mode state, get object info, print object info
>
> Notice how the getting and printing of object info is accomplished one
> at a time. As described above, this creates a problem for making
> requests to a server. Therefore, `remote-object-info` is implemented in
> the following manner:
>
>  - Receive and parse input from user
>  If command is `remote-object-info`:
>     - Get object info from remote
>     - Loop through object info
>         - Call respective function attached to `info`
>         - Set batch mode state, use passed in object info, print object
>           info
>  Else:
>     - Call respective function attached to command
>     - Parse input, get object info, print object info
>
> And finally for --buffer mode `remote-object-info`:
>  - Receive and parse input from user
>  - Store respective function attached to command in a queue
>  - After flush, loop through commands in queue:
>     If command is `remote-object-info`:
>         - Get object info from remote
>         - Loop through object info
>             - Call respective function attached to `info`
>             - Set batch mode state, use passed in object info, print
>               object info
>     Else:
>         - Call respective function attached to command
>         - Set batch mode state, get object info, print object info
>
> To summarize, `remote-object-info` gets object info from the remote and
> then generates multiples `info` commands with the object info passed in.
> It cannot be implemented similarly to `info` and `content` because of
> the overhead with remote commenication.
>
> ---
>  Documentation/git-cat-file.txt |  16 +-
>  builtin/cat-file.c             | 200 +++++++++++++++----
>  t/t1006-cat-file.sh            | 347 +++++++++++++++++++++++++++++++++
>  3 files changed, 522 insertions(+), 41 deletions(-)
>
> diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
> index 24a811f0ef..0d9e8e6214 100644
> --- a/Documentation/git-cat-file.txt
> +++ b/Documentation/git-cat-file.txt
> @@ -115,6 +115,10 @@ info <object>::
>         Print object info for object reference `<object>`. This corresponds to the
>         output of `--batch-check`.
>
> +remote-object-info <remote> [<object>]::
> +       Print object info for object references `<object>` at specified <remote>.
> +       This command may only be combined with `--buffer`.
> +
>  flush::
>         Used with `--buffer` to execute all preceding commands that were issued
>         since the beginning or since the last flush was issued. When `--buffer`
> @@ -245,7 +249,8 @@ newline. The available atoms are:
>         The full hex representation of the object name.
>
>  `objecttype`::
> -       The type of the object (the same as `cat-file -t` reports).
> +       The type of the object (the same as `cat-file -t` reports). See
> +       `CAVEATS` below.
>
>  `objectsize`::
>         The size, in bytes, of the object (the same as `cat-file -s`
> @@ -253,13 +258,14 @@ newline. The available atoms are:
>
>  `objectsize:disk`::
>         The size, in bytes, that the object takes up on disk. See the
> -       note about on-disk sizes in the `CAVEATS` section below.
> +       note about on-disk sizes in the `CAVEATS` section below. Not
> +       supported by `remote-object-info`.
>
>  `deltabase`::
>         If the object is stored as a delta on-disk, this expands to the
>         full hex representation of the delta base object name.
>         Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
> -       below.
> +       below. Not supported by `remote-object-info`.
>
>  `rest`::
>         If this atom is used in the output string, input lines are split

Why not forbidden %(rest) here too?

> +static int get_remote_info(struct batch_options *opt, int argc, const char **argv)
> +{
> +       int retval = 0;
> +       size_t i;
> +       struct remote *remote = NULL;
> +       struct object_id oid;
> +       struct string_list object_info_options = STRING_LIST_INIT_NODUP;
> +       static struct transport *gtransport;
> +       char *format = DEFAULT_FORMAT;
> +
> +       if (opt->format)
> +               format = xstrdup(opt->format);
> +
> +       remote = remote_get(argv[0]);
> +       if (!remote)
> +               die(_("must supply valid remote when using --object-info"));
> +       oid_array_clear(&object_info_oids);
> +       for (i = 1; i < argc; i++) {
> +               if (get_oid(argv[i], &oid))
> +                       die(_("malformed object id '%s'"), argv[i]);
> +               oid_array_append(&object_info_oids, &oid);
> +       }
> +
> +       gtransport = transport_get(remote, NULL);
> +       if (gtransport->smart_options) {
> +               size_t j;
> +               int include_size = 0, include_type = 0;
> +
> +               remote_object_info = xcalloc(object_info_oids.nr, sizeof(struct object_info));
> +               gtransport->smart_options->object_info = 1;
> +               gtransport->smart_options->object_info_oids = &object_info_oids;
> +               /**
> +                * 'size' is the only option currently supported.
> +                * Other options that are passed in the format will default to a
> +                * standard fetch request rather than object-info.
> +                */
> +               if (strstr(format, "%(objectsize)")) {
> +                       string_list_append(&object_info_options, "size");
> +                       include_size = 1;
> +               }
> +               if (strstr(format, "%(objecttype)")) {
> +                       string_list_append(&object_info_options, "type");
> +                       include_type = 1;
> +               }
> +               if (strstr(format, "%(objectsize:disk)"))
> +                       die(_("objectsize:disk is currently not supported with remote-object-info"));
> +               if (strstr(format, "%(deltabase)"))
> +                       die(_("deltabase is currently not supported with remote-object-info"));

%(rest) too?

> +               if (object_info_options.nr > 0) {
> +                       gtransport->smart_options->object_info_options = &object_info_options;
> +                       for (j = 0; j < object_info_oids.nr; j++) {
> +                               if (include_size)
> +                                       remote_object_info[j].sizep = xcalloc(1, sizeof(long));
> +                               if (include_type)
> +                                       remote_object_info[j].typep = xcalloc(1, sizeof(enum object_type));
> +                       }
> +                       gtransport->smart_options->object_info_data = &remote_object_info;
> +                       retval = transport_fetch_refs(gtransport, NULL);
> +               }
> +       } else {
> +               retval = -1;
> +       }
> +       return retval;
> +}

Maybe such code style will be better?

        if (!gtransport->smart_options) {
               return -1;
        }
        ...
        return transport_fetch_refs(gtransport, NULL);

> +
>  struct object_cb_data {
>         struct batch_options *opt;
>         struct expand_data *expand;
> @@ -538,6 +611,7 @@ typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
>  struct queued_cmd {
>         parse_cmd_fn_t fn;
>         char *line;
> +       const char *name;
>  };
>
>  static void parse_cmd_contents(struct batch_options *opt,
> @@ -565,9 +639,49 @@ static const struct parse_cmd {
>  } commands[] = {
>         { "contents", parse_cmd_contents, 1},
>         { "info", parse_cmd_info, 1},
> +       { "remote-object-info", parse_cmd_info, 1},
>         { "flush", NULL, 0},
>  };
>
> +static void parse_remote_info(struct batch_options *opt,
> +                          char *line,
> +                          struct strbuf *output,
> +                          struct expand_data *data,
> +                          const struct parse_cmd *p_cmd,
> +                          struct queued_cmd *q_cmd)
> +{
> +       int count;
> +       size_t i;
> +       const char **argv;
> +
> +       count = split_cmdline(line, &argv);
> +       if (get_remote_info(opt, count, argv))
> +               goto cleanup;
> +       opt->use_remote_info = 1;
> +       data->skip_object_info = 1;
> +       data->mark_query = 0;
> +       for (i = 0; i < object_info_oids.nr; i++) {
> +               if (remote_object_info[i].sizep)
> +                       data->size = *remote_object_info[i].sizep;
> +               if (remote_object_info[i].typep)
> +                       data->type = *remote_object_info[i].typep;
> +
> +               data->oid = object_info_oids.oid[i];
> +               if (p_cmd)
> +                       p_cmd->fn(opt, argv[i+1], output, data);
> +               else
> +                       q_cmd->fn(opt, argv[i+1], output, data);
> +       }
> +       opt->use_remote_info = 0;
> +       data->skip_object_info = 0;
> +       data->mark_query = 1;
> +
> +cleanup:
> +       for (i = 0; i < object_info_oids.nr; i++)
> +               free_object_info_contents(&remote_object_info[i]);
> +       free(remote_object_info);
> +}
> +
>  static void dispatch_calls(struct batch_options *opt,
>                 struct strbuf *output,
>                 struct expand_data *data,
> @@ -579,8 +693,12 @@ static void dispatch_calls(struct batch_options *opt,
>         if (!opt->buffer_output)
>                 die(_("flush is only for --buffer mode"));
>
> -       for (i = 0; i < nr; i++)
> -               cmd[i].fn(opt, cmd[i].line, output, data);
> +       for (i = 0; i < nr; i++) {
> +               if (!strcmp(cmd[i].name, "remote-object-info"))
> +                       parse_remote_info(opt, cmd[i].line, output, data, NULL, &cmd[i]);
> +               else
> +                       cmd[i].fn(opt, cmd[i].line, output, data);
> +       }
>
>         fflush(stdout);
>  }
> @@ -640,11 +758,17 @@ static void batch_objects_command(struct batch_options *opt,
>                         dispatch_calls(opt, output, data, queued_cmd, nr);
>                         free_cmds(queued_cmd, &nr);
>                 } else if (!opt->buffer_output) {
> -                       cmd->fn(opt, p, output, data);
> +                       if (!strcmp(cmd->name, "remote-object-info")) {
> +                               char *line = xstrdup_or_null(p);
> +                               parse_remote_info(opt, line, output, data, cmd, NULL);

memory leak: "line".

> +                       } else {
> +                               cmd->fn(opt, p, output, data);
> +                       }
>                 } else {
>                         ALLOC_GROW(queued_cmd, nr + 1, alloc);
>                         call.fn = cmd->fn;
>                         call.line = xstrdup_or_null(p);
> +                       call.name = cmd->name;
>                         queued_cmd[nr++] = call;
>                 }
>         }
> +
> +test_expect_success 'remote-object-info fails on unspported filter option (objectsize:disk)' '
> +       (
> +               cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +
> +               test_must_fail git cat-file --batch-command="%(objectsize:disk)" 2>err <<-EOF &&
> +               remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
> +               EOF
> +               test_i18ngrep "objectsize:disk is currently not supported with remote-object-info" err
> +       )
> +'
> +
> +test_expect_success 'remote-object-info fails on unspported filter option (deltabase)' '
> +       (
> +               cd "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +
> +               test_must_fail git cat-file --batch-command="%(deltabase)" 2>err <<-EOF &&
> +               remote-object-info "$HTTPD_URL/smart/http_parent" $hello_sha1
> +               EOF
> +               test_i18ngrep "deltabase is currently not supported with remote-object-info" err
> +       )
> +'
> +

%(rest) too?

> +set_transport_variables "server"
> +
> +test_expect_success 'batch-command remote-object-info file://' '
> +       (
> +               cd server &&
> +
> +               echo "$hello_sha1 $hello_size" >expect &&
> +               echo "$tree_sha1 $tree_size" >>expect &&
> +               echo "$commit_sha1 $commit_size" >>expect &&
> +               echo "$tag_sha1 $tag_size" >>expect &&
> +               git cat-file --batch-command="%(objectname) %(objectsize)" >actual <<-EOF &&
> +               remote-object-info "file://$(pwd)" $hello_sha1
> +               remote-object-info "file://$(pwd)" $tree_sha1
> +               remote-object-info "file://$(pwd)" $commit_sha1
> +               remote-object-info "file://$(pwd)" $tag_sha1
> +               EOF
> +               test_cmp expect actual
> +       )

Can we support <rev> instead of only <oid> here?

$ git cat-file --batch-check
HEAD
28583b8d8ca72730d7c9e0ea50861ad431a6dea4 commit 3038
master
ab336e8f1c8009c8b1aab8deb592148e69217085 commit 281
origin/master
23b219f8e3f2adfb0441e135f0a880e6124f766c commit 282
origin/master:git.c
e5d62fa5a92e95e1ede041ebf913d841744c31f8 blob 28398

So cat-file --batch-check can support it.

$git cat-file --batch-commands
remote-object-info "file://$(pwd)" master:git.c

I guess it cannot work now, right?

  parent reply	other threads:[~2022-07-31 15:24 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 23:19 [PATCH] fetch —object-info-format: client option for object-info Calvin Wan
2022-02-08 23:56 ` [PATCH v2] " Calvin Wan
2022-02-09 12:48   ` Philip Oakley
2022-02-10 22:32     ` Calvin Wan
2022-02-09 20:41   ` [PATCH v2] fetch object-info-format: " Jonathan Tan
2022-02-10 22:58     ` Calvin Wan
2022-03-28 19:11   ` [PATCH v3 0/3] object-info: add option for retrieving object info Calvin Wan
2022-03-28 19:11     ` [PATCH v3 1/3] fetch-pack: refactor packet writing and fetch options Calvin Wan
2022-03-29 22:54       ` Junio C Hamano
2022-03-29 23:01       ` Taylor Blau
2022-03-30 21:55       ` Jonathan Tan
2022-03-28 19:11     ` [PATCH v3 2/3] transfer.advertiseObjectInfo: add object-info config Calvin Wan
2022-03-29 22:34       ` Junio C Hamano
2022-03-29 22:48         ` Calvin Wan
2022-03-29 23:09           ` Taylor Blau
2022-03-28 19:11     ` [PATCH v3 3/3] object-info: add option for retrieving object info Calvin Wan
2022-03-29 19:57       ` Junio C Hamano
2022-03-29 22:54       ` Junio C Hamano
2022-03-30  0:49         ` Junio C Hamano
2022-03-30 22:31           ` Calvin Wan
2022-03-30 22:43         ` Jonathan Tan
2022-03-30 23:42           ` Junio C Hamano
2022-03-29 23:19       ` Taylor Blau
2022-03-30 22:47         ` Calvin Wan
2022-03-30 22:06       ` John Cai
2022-03-31 19:56         ` Calvin Wan
2022-04-01 16:16           ` Junio C Hamano
2022-03-30 22:07       ` Jonathan Tan
2022-03-30 22:12       ` Josh Steadmon
2022-03-30 22:46         ` Calvin Wan
2022-03-29 20:35     ` [PATCH v3 0/3] " Junio C Hamano
2022-03-29 22:40       ` Calvin Wan
2022-03-31  1:50     ` Junio C Hamano
2022-05-02 17:08     ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command Calvin Wan
2022-05-02 17:08       ` [PATCH v4 1/8] fetch-pack: refactor packet writing Calvin Wan
2022-05-02 17:08       ` [PATCH v4 2/8] fetch-pack: move fetch default settings Calvin Wan
2022-05-02 22:58         ` Junio C Hamano
2022-05-03 23:06           ` Jonathan Tan
2022-05-05 18:13             ` Calvin Wan
2022-05-02 17:08       ` [PATCH v4 3/8] object-store: add function to free object_info contents Calvin Wan
2022-05-02 23:23         ` Junio C Hamano
2022-05-04 19:09           ` Junio C Hamano
2022-05-05  0:15           ` Junio C Hamano
2022-05-05 16:47             ` Calvin Wan
2022-05-05 17:01               ` Junio C Hamano
2022-05-02 17:09       ` [PATCH v4 4/8] object-info: send attribute packet regardless of object ids Calvin Wan
2022-05-03  0:05         ` Junio C Hamano
2022-05-03 19:21           ` Calvin Wan
2022-05-03 23:11         ` Jonathan Tan
2022-05-02 17:09       ` [PATCH v4 5/8] transport: add client side capability to request object-info Calvin Wan
2022-05-03  0:54         ` Junio C Hamano
2022-05-03 18:58           ` Calvin Wan
2022-05-04 15:42             ` Junio C Hamano
2022-05-03 23:15         ` Jonathan Tan
2022-05-04 15:50           ` Junio C Hamano
2022-05-02 17:09       ` [PATCH v4 6/8] transport: add object-info fallback to fetch Calvin Wan
2022-05-03 23:27         ` Jonathan Tan
2022-05-02 17:09       ` [PATCH v4 7/8] cat-file: move parse_cmd and DEFAULT_FORMAT up Calvin Wan
2022-05-02 17:09       ` [PATCH v4 8/8] cat-file: add --batch-command remote-object-info command Calvin Wan
2022-05-04 21:27         ` Jonathan Tan
2022-05-05 18:13           ` Calvin Wan
2022-05-05 18:44             ` Junio C Hamano
2022-05-05 19:09               ` Junio C Hamano
2022-05-05 19:19                 ` Calvin Wan
2022-07-31 15:24         ` ZheNing Hu [this message]
2022-08-08 17:43           ` Calvin Wan
2022-07-28 23:02       ` [PATCH v5 0/6] " Calvin Wan
2022-07-28 23:56         ` Junio C Hamano
2022-07-29  0:02           ` Junio C Hamano
2022-07-31  8:41         ` Phillip Wood
2022-08-04 22:57           ` Calvin Wan
2022-09-30 23:23         ` Junio C Hamano
2022-07-28 23:02       ` [PATCH v5 1/6] fetch-pack: refactor packet writing Calvin Wan
2022-07-28 23:02       ` [PATCH v5 2/6] fetch-pack: move fetch initialization Calvin Wan
2022-07-28 23:02       ` [PATCH v5 3/6] protocol-caps: initialization bug fix Calvin Wan
2022-07-29 17:51         ` Junio C Hamano
2022-07-28 23:02       ` [PATCH v5 4/6] serve: advertise object-info feature Calvin Wan
2022-07-29 17:57         ` Junio C Hamano
2022-08-01 18:28           ` Calvin Wan
2022-08-01 18:44             ` Ævar Arnfjörð Bjarmason
2022-08-01 18:47             ` Junio C Hamano
2022-08-01 18:58               ` Calvin Wan
2022-07-28 23:02       ` [PATCH v5 5/6] transport: add client support for object-info Calvin Wan
2022-07-29 18:06         ` Junio C Hamano
2022-08-04 20:28           ` Calvin Wan
2022-08-01 13:30         ` Phillip Wood
2022-08-04 22:20           ` Calvin Wan
2022-08-08 10:07             ` Phillip Wood
2022-08-01 14:26         ` Phillip Wood
2022-08-08  9:16         ` Phillip Wood
2022-07-28 23:02       ` [PATCH v5 6/6] cat-file: add remote-object-info to batch-command Calvin Wan
2022-07-29  6:25         ` Ævar Arnfjörð Bjarmason
2022-08-07  5:50           ` ZheNing Hu
2022-08-08 18:07           ` Calvin Wan
2022-08-11 10:58             ` Ævar Arnfjörð Bjarmason
2022-07-29 18:21         ` Junio C Hamano
2022-08-08 18:37           ` Calvin Wan
2022-09-28 13:12         ` Ævar Arnfjörð Bjarmason
2022-07-31 15:02       ` [PATCH v4 0/8] cat-file: add --batch-command remote-object-info command ZheNing Hu
2022-08-08 17:32         ` Calvin Wan
2022-08-13 22:17       ` Junio C Hamano
2022-02-09 19:09 ` [PATCH] fetch —object-info-format: client option for object-info John Cai
2022-02-10 22:49   ` Calvin Wan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8RyDCokvJBaC1CCoT2+3AE1kOn7evjkTMK_V6KWoaH30g@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.