From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
To: kernelci@lists.linux.dev, Dmitry Vyukov <dvyukov@google.com>,
Cristian Marussi <cristian.marussi@arm.com>,
Alice Ferrazzi <alicef@gentoo.org>,
Philip Li <philip.li@intel.com>,
Vishal Bhoj <vishal.bhoj@linaro.org>,
automated-testing@lists.yoctoproject.org,
Tim Bird <Tim.Bird@sony.com>, CKI <cki-project@redhat.com>,
Mark Brown <broonie@kernel.org>,
Johnson George <Johnson.George@microsoft.com>,
Sachin Sant <sachinp@linux.ibm.com>,
Aditya Nagesh <adityanagesh@microsoft.com>
Subject: KCIDB: Add timestamp metadata
Date: Mon, 23 Oct 2023 14:14:53 +0300 [thread overview]
Message-ID: <df217387-5e45-4b37-957f-5474a34af1db@redhat.com> (raw)
Hello everyone involved with, or interested in KCIDB,
As we're working on transitioning away from BigQuery as our main database, and
lowering our growing costs, we have to implement data retention policies in
order to maintain performance of PostgreSQL as the replacement.
For that purpose I'd like to introduce a concept of "metadata" to the KCIDB
I/O schema. Specifically a timestamp field for each object. The change will
bump the latest schema to v4.3.
The "metadata" is any field with a name starting with an underscore ("_").
Such fields are always discarded on submission, and are not loaded into, or
fetched from the database, by default. The API and the tools will have the
option to load them, however. E.g. for transferring raw data between
databases.
The timestamp field will be called "_timestamp", and added to the schema of
each object. It will be impossible to submit it through normal means, and it
will be generated/updated by the database automatically on each submission.
The new field will allow us to implement a de-duplication deadline in
PostgreSQL, after which object updates won't be allowed, and the data could be
transferred into BigQuery for long-term storage and public access for
analysis. Allowing only de-duplicated data in BigQuery would let us partition
the dataset by timestamp, reducing query costs for public access.
The new field would also let us delete the data archived in BigQuery from
PostgreSQL, after a while, and thus help us maintain its performance for
notification generation and dashboards.
Note that the "_timestamp" field will be different from the existing
"start_time" fields. The former will be automatically-generated by the
database, and the latter are submitter-supplied and optional. However, when
the database schema is upgraded to support the "_timestamp" field, it will
receive the value of "start_time" where it exists and is specified, for
existing objects. Otherwise the existing objects will receive the time of
upgrade as its value.
Please respond with your comments here, or in the corresponding PR for
kcidb-io:
https://github.com/kernelci/kcidb-io/pull/76
Thank you.
Nick
next reply other threads:[~2023-10-23 11:15 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 11:14 Nikolai Kondrashov [this message]
2023-10-23 12:34 ` KCIDB: Add timestamp metadata Nikolai Kondrashov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df217387-5e45-4b37-957f-5474a34af1db@redhat.com \
--to=nikolai.kondrashov@redhat.com \
--cc=Johnson.George@microsoft.com \
--cc=Tim.Bird@sony.com \
--cc=adityanagesh@microsoft.com \
--cc=alicef@gentoo.org \
--cc=automated-testing@lists.yoctoproject.org \
--cc=broonie@kernel.org \
--cc=cki-project@redhat.com \
--cc=cristian.marussi@arm.com \
--cc=dvyukov@google.com \
--cc=kernelci@lists.linux.dev \
--cc=philip.li@intel.com \
--cc=sachinp@linux.ibm.com \
--cc=vishal.bhoj@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).