Linux maintainer tooling and workflows
 help / color / mirror / Atom feed
From: Maxime Ripard <mripard@redhat.com>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: users@linux.kernel.org, tools@linux.kernel.org
Subject: Fetching an mbox from lore
Date: Sat, 22 Jul 2023 19:12:33 +0200	[thread overview]
Message-ID: <7pcrdb6rgkmnfk5nukr4q7brpdmzrjon5zjmc66r2xvqan7kyc@oe3ip5kmtuiy> (raw)

[-- Attachment #1: Type: text/plain, Size: 1908 bytes --]

Hi,

I've been trying to fetch an mbox from lore with an arbitrary search request.

I could fetch it fine using curl with the following example:

curl -XPOST -H "Content-Length:0" -OJ "http://lore.kernel.org/linux-clk/?q=d:1.week.ago..&x=m"

This returns a gzip'd mbox, everything's fine.

However, for some reason I can't duplicate it with python's requests
API, and it looks like I get redirected back and forth between HTTPS and
HTTP when I try to connect with the following script:

#!/usr/bin/env python3

from urllib.parse import urlparse

from requests import Request, Session

LORE_URL = "https://lore.kernel.org/linux-clk"

def try_url_redirect(url):
    headers={"Content-Length": "0"}
    params={"q": "d:1.week.ago..", "x": "m"}

    s = Session()

    req = Request('POST', url, headers=headers, params=params)
    p = req.prepare()

    print("Trying to connect to %s" % p.url)

    resp = s.send(p, allow_redirects=False)

    if resp.status_code == 301:
        print("Redirecting to %s" % resp.headers['location'])

        url = urlparse(resp.headers['location'])
        url = url._replace(query="")

        return (resp.status_code, url.geturl())

if __name__ == '__main__':
    code, url = try_url_redirect(LORE_URL)
    try_url_redirect(url)

The output is:

Trying to connect to https://lore.kernel.org/linux-clk?q=d%3A1.week.ago..&x=m
Redirecting to http://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m
Trying to connect to http://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m
Redirecting to https://lore.kernel.org/linux-clk/?q=d%3A1.week.ago..&x=m

If I do allow redirects, then requests will issue a GET on the new
location and I'll end up with the HTML webpage of that request.

Am I trying to do something not supported here, or is it supposed to
work and my script is wrong for some reason?

Thanks!
Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

             reply	other threads:[~2023-07-22 17:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-22 17:12 Maxime Ripard [this message]
2023-07-22 18:47 ` Fetching an mbox from lore Willy Tarreau
2023-07-22 22:34 ` Rob Herring
2023-07-24  8:07   ` Maxime Ripard
2023-07-23  1:36 ` Eric Wong
2023-07-24  8:06   ` Maxime Ripard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7pcrdb6rgkmnfk5nukr4q7brpdmzrjon5zjmc66r2xvqan7kyc@oe3ip5kmtuiy \
    --to=mripard@redhat.com \
    --cc=konstantin@linuxfoundation.org \
    --cc=tools@linux.kernel.org \
    --cc=users@linux.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).