Discussion:
no-https: a plain-HTTP to HTTPS proxy
(too old to reply)
Ivan Shmakov
2018-09-16 07:07:35 UTC
Permalink
[Cross-posting to news:comp.misc as the issue of plain-HTTP
unavailability was recently discussed there.]

It took me about a day to write a crude but apparently (more or
less) working HTTP to HTTPS proxy. (That I hope to beat into
shape and release via news:alt.sources around next Wednesday
or so. FTR, the code is currently under 600 LoC long, or 431 LoC
excluding comments and empty lines.) Some design notes are below.


Basics

The basic algorithm is as follows:

1. receive a request header from the client; we only allow
GET and HEAD requests for now, as we do not support request
/bodies/ as of yet;

2. decide the server and connect there;

3. send the header to the server;

4. receive the response header;

5. if that's an https: redirect:

5.1. connect over TLS, alter the request (Host:, "request target")
accordingly, go to step 3;

6. strip certain headers (such as Strict-Transport-Security: and
Upgrade:, but also Set-Cookie:) off the response and send the
result to the client;

7. copy up to Content-Length: octets from the server to the
client -- or all the remaining data if no Content-Length:
is given; (somewhat surprisingly, this seems to also work with
the "chunked" coding not otherwise considered in the code);

8. close the connection to the server and repeat from step 1
so long as the client connection remains active.

The server uses select(2) so that socket reads do not block and
supports an arbitrary number (up to the system-enforced limits)
of concurrent connections. For simplicity, socket writes /are/
allowed to block. (Hopefully not a problem for proxy-to-server
connections most of the time, and even less so for proxy-to-client
ones; assuming no malicious intent on the part of either,
obviously. The latter case may be mitigated by using a "proper"
HTTP proxy, such as Polipo, in the front of this one.)


Dealing with the https: references

There was an idea of transparently replacing https: references
in HTML and XML attributes with scheme-relative ones (like, e. g.,
https://example.com/ to //example.com/.) So far, that fails
more often than it works, for two primary reasons: compression
(although that can be solved by forcing Accept-Encoding: identity
in requests) -- and the fact that by the time such filtering can
take place, we've already sent the Content-Length: (if any) for
the original (unaltered) body to the client!

Also, as the code does not currently handle the "chunked" coding,
references split across chunks will not be handled. (The code
should handle references split across bufferfuls of data, though.)

Two possible ways to solve that would be to, for desired
Content-Type: values, either retrieve the whole response in full
before altering and forwarding to the client, /or/ to implement
support for "chunked" coding and force its use there (striping
Content-Length: off the original response, if any.)

I suppose both approaches can be implemented, with the first
used, say, when Content-Length: is below a configured limit,
although that increases the complexity of the code, which is
something I'd rather avoid.

That said, I don't think the https: references /should/ be an
issue in practice, as most of the links are ought to be relative
in the first place, such as:

<p ><a href="page2.html" >Continue reading of this article</a>,
or <a href="/" >go back to the top page.</a></p>

However, I suspect that images and such may be a common
exception in practice, like:

<img src="Loading Image..." />

Which of course would've worked just as well (and require no
specific action on the part of this proxy) being written as:

<img src="//static.example.com/useless-stock-photo.jpeg" />


Making responses even better

Other possible response alterations may include removing <link />
elements and Link: HTTP headers pointing to JavaScript code
(running arbitrary software from the Web is a bad idea, and
doing so while forgoing the meager TLS protection isn't making
it better) /and/ also <script /> elements. The latter, in turn,
will probably either require rather complex state tracking --
or getting the server response in full before the alterations
can take place.


Thoughts?
--
FSF associate member #7257 np. Nine Lives -- Slaygon
Eli the Bearded
2018-09-16 20:52:00 UTC
Permalink
Post by Ivan Shmakov
It took me about a day to write a crude but apparently (more or
less) working HTTP to HTTPS proxy. (That I hope to beat into
shape and release via news:alt.sources around next Wednesday
or so. FTR, the code is currently under 600 LoC long, or 431 LoC
excluding comments and empty lines.) Some design notes are below.
What language?
Post by Ivan Shmakov
1. receive a request header from the client; we only allow
GET and HEAD requests for now, as we do not support request
/bodies/ as of yet;
No POST requests will stop a lot of forms. HEAD is an easy case, but
largely unused.
Post by Ivan Shmakov
2. decide the server and connect there;
3. send the header to the server;
4. receive the response header;
5.1. connect over TLS, alter the request (Host:, "request target")
accordingly, go to step 3;
6. strip certain headers (such as Strict-Transport-Security: and
Upgrade:, but also Set-Cookie:) off the response and send the
result to the client;
That probably covers it. If you change HTTP/1.1 to HTTP/1.0 on the
requests, then 1% of servers will have issues and 50% fewer servers will
send chunked requests. (Numbers made up, based on my experiences.) You
can also drop Accept-Encoding: if you want to avoid dealing with
compressed responses.
Post by Ivan Shmakov
7. copy up to Content-Length: octets from the server to the
is given; (somewhat surprisingly, this seems to also work with
the "chunked" coding not otherwise considered in the code);
Yup, that works in my experience, too.
Post by Ivan Shmakov
Dealing with the https: references
There was an idea of transparently replacing https: references
in HTML and XML attributes with scheme-relative ones (like, e. g.,
https://example.com/ to //example.com/.) So far, that fails
more often than it works, for two primary reasons: compression
(although that can be solved by forcing Accept-Encoding: identity
No accept-encoding header == no compression.
Post by Ivan Shmakov
in requests) -- and the fact that by the time such filtering can
take place, we've already sent the Content-Length: (if any) for
the original (unaltered) body to the client!
You can fix that with whitespace padding.

<img src="Loading Image..." ...>
<img src="//qaz.wtf/tmp/chree.png" ...>

Beware of parsing issues. Real world HTML usually looks like one of the
first two but may sometimes look like one of second two of these:

<img src="https://qaz.wtf/tmp/chree.png" ...>
<img src='https://qaz.wtf/tmp/chree.png' ...>
<img src=https://qaz.wtf/tmp/chree.png ...>
<img src = "https://qaz.wtf/tmp/chree.png" ...>

(And that's ignoring case.)
Post by Ivan Shmakov
That said, I don't think the https: references /should/ be an
issue in practice, as most of the links are ought to be relative
Hahaha. There are so many different ways it is done in the real world.
Post by Ivan Shmakov
Thoughts?
Are you going to fix Referer: headers to use the https: version when
communicating with an https site? I think you probably should.

Elijah
------
only forces https on his site for the areas that require login
Ivan Shmakov
2018-09-18 13:10:44 UTC
Permalink
Post by Eli the Bearded
Post by Ivan Shmakov
It took me about a day to write a crude but apparently (more or
less) working HTTP to HTTPS proxy. (That I hope to beat into shape
and release via news:alt.sources around next Wednesday or so.
FTR, the code is currently under 600 LoC long, or 431 LoC excluding
comments and empty lines.) Some design notes are below.
What language?
Perl 5. It appears the most apt for the task of the five general
purpose languages I'm using regularly these days. (The others
being Emacs Lisp, Shell, Awk; and C, though that's mostly limited
to occasional embedded programming.)
Post by Eli the Bearded
Post by Ivan Shmakov
1. receive a request header from the client; we only allow GET and
HEAD requests for now, as we do not support request /bodies/ as of yet;
No POST requests will stop a lot of forms.
My intent was to support Web /reading/ over plain HTTP specifically
-- which is something that shouldn't involve forms IMO. That said,
I suppose there can be any number of resources that use POST for
/search/ forms, which is something that may be worth supporting.
Post by Eli the Bearded
HEAD is an easy case, but largely unused.
Easy, indeed, and I do use it myself, so the question of whether
to implement its handling or not wasn't really considered.

[...]
Post by Eli the Bearded
Post by Ivan Shmakov
6. strip certain headers (such as Strict-Transport-Security: and
Upgrade:, but also Set-Cookie:) off the response and send the result
to the client;
That probably covers it. If you change HTTP/1.1 to HTTP/1.0 on the
requests, then 1% of servers will have issues and 50% fewer servers
will send chunked requests. (Numbers made up, based on my experiences.)
The idea was to require the barest minimum of mangling in the
code, so to leave up the most choices to the user. As such,
HTTP/1.1 and chunked encoding appears worth enough supporting.
Post by Eli the Bearded
You can also drop Accept-Encoding: if you want to avoid dealing with
compressed responses.
Per RFC 7231, Accept-Encoding: identity communicates the client's
preference for "no encoding." Omitting the header, OTOH, means
"no preference":

5.3.4. Accept-Encoding

[...]

A request without an Accept-Encoding header field implies that the
user agent has no preferences regarding content-codings. Although
this allows the server to use any content-coding in a response, it
does not imply that the user agent will be able to correctly process
all encodings.

That said, I do wish for the user to have the choice of having
/both/ compression and transformations available. And while I'm
not constrained much by bandwidth, some of the future users of
this code may be.

[...]
Post by Eli the Bearded
Post by Ivan Shmakov
There was an idea of transparently replacing https: references in
HTML and XML attributes with scheme-relative ones (like, e. g.,
https://example.com/ to //example.com/.) So far, that fails more
often than it works, for two primary reasons: compression (although
that can be solved by forcing Accept-Encoding: identity in requests)
-- and the fact that by the time such filtering can take place,
we've already sent the Content-Length: (if any) for the original
(unaltered) body to the client!
You can fix that with whitespace padding.
<img src="https://qaz.wtf/tmp/chree.png" ...>
<img src="//qaz.wtf/tmp/chree.png" ...>
Yes, I've tried it (alongside Accept-Encoding: identity), it
worked, but I don't like it for the lack of generality.
Post by Eli the Bearded
Beware of parsing issues.
Other than those shown in the examples below?
Post by Eli the Bearded
Real world HTML usually looks like one of the first two but may
<img src="https://qaz.wtf/tmp/chree.png" ...>
<img src='https://qaz.wtf/tmp/chree.png' ...>
<img src=https://qaz.wtf/tmp/chree.png ...>
<img src = "https://qaz.wtf/tmp/chree.png" ...>
(And that's ignoring case.)
Indeed; and case and lack of quotes will require specialcasing
for HTML (I aim to support XML applications as well, which
fortunately are somewhat simpler in this respect.)

OTOH, I don't think I've ever seen the " = " form; do the blanks
around the equals sign even conform to any HTML version?

[...]
Post by Eli the Bearded
Post by Ivan Shmakov
Thoughts?
Are you going to fix Referer: headers to use the https: version when
communicating with an https site? I think you probably should.
I guess I'll leave it up to the user. Per my experience (with
copying Web pages using Wget), resources requiring Referer: are
more an exception rather than the rule, but still.
Post by Eli the Bearded
Elijah ------ only forces https on his site for the areas that
require login
And that's a sensible approach.
--
FSF associate member #7257 http://am-1.org/~ivan/
Rich
2018-09-18 16:36:51 UTC
Permalink
Post by Ivan Shmakov
Post by Eli the Bearded
Real world HTML usually looks like one of the first two but may
<img src="https://qaz.wtf/tmp/chree.png" ...>
<img src='https://qaz.wtf/tmp/chree.png' ...>
<img src=https://qaz.wtf/tmp/chree.png ...>
<img src = "https://qaz.wtf/tmp/chree.png" ...>
(And that's ignoring case.)
Indeed; and case and lack of quotes will require specialcasing
for HTML (I aim to support XML applications as well, which
fortunately are somewhat simpler in this respect.)
OTOH, I don't think I've ever seen the " = " form; do the
blanks around the equals sign even conform to any HTML
version?
The HTML spec does not appear to explicitly exclude use of spaces
around the equals sign. So unless there is an explicit exclusion
somewhere that I've missed, it would be legal to add spaces around the
equals.

The fact is, even ignoring the spaced equals item, that HTML is
"flexible" enough that if you get to the point of wanting to do
rewriting/editing that you'll have way less "pull your hair out" issues
if you make use of an HTML parser to parse the HTML instead of trying
to do anything by string or regex search/replace on the HTML. Anything
string/regex search based on HTML will appear to work ok until the day
it hits a legal bit of HTML it was not designed to handle, then it will
break badly.

I.e., the "edge conditions" are so numerous that you are better off
using a parser that has already been designed to handle those edge
conditions.
Ivan Shmakov
2018-09-18 17:05:35 UTC
Permalink
[...]
Post by Rich
Post by Ivan Shmakov
OTOH, I don't think I've ever seen the " = " form; do the blanks
around the equals sign even conform to any HTML version?
The HTML spec does not appear to explicitly exclude use of spaces
around the equals sign. So unless there is an explicit exclusion
somewhere that I've missed, it would be legal to add spaces around
the equals.
Does it explicitly allow spaces?
Post by Rich
The fact is, even ignoring the spaced equals item, that HTML is
"flexible" enough that if you get to the point of wanting to do
rewriting/editing that you'll have way less "pull your hair out"
issues if you make use of an HTML parser to parse the HTML instead of
trying to do anything by string or regex search/replace on the HTML.
Anything string/regex search based on HTML will appear to work ok
until the day it hits a legal bit of HTML it was not designed to
handle, then it will break badly.
I. e., the "edge conditions" are so numerous that you are better off
using a parser that has already been designed to handle those edge
conditions.
I tend to agree with the above for the general case: where I'd
expect the code to /fail/ if it encounters something it does not
understand.

In this case, something that the code does not understand is
ought to be left untouched, and I'm unsure if I can readily get
an HTTP parser that does that.
--
FSF associate member #7257 http://am-1.org/~ivan/
Andy Burns
2018-09-18 17:32:19 UTC
Permalink
Post by Ivan Shmakov
Post by Rich
The HTML spec does not appear to explicitly exclude use of spaces
around the equals sign.
Does it explicitly allow spaces?
The w3c validity checker doesn't warn if spaces are included.
Rich
2018-09-18 18:56:52 UTC
Permalink
Post by Ivan Shmakov
[...]
Post by Rich
Post by Ivan Shmakov
OTOH, I don't think I've ever seen the " = " form; do the blanks
around the equals sign even conform to any HTML version?
The HTML spec does not appear to explicitly exclude use of spaces
around the equals sign. So unless there is an explicit exclusion
somewhere that I've missed, it would be legal to add spaces around
the equals.
Does it explicitly allow spaces?
It is fully silent. It shows examples without the spaces, but is
silent otherwise as to their allowance (or disallowance) around the
equals. Given the silence, it is very possible that examples with
spaces may exist in the wild, and possible (although I have not tested)
that browsers accept HTML with spaces present.
Post by Ivan Shmakov
Post by Rich
The fact is, even ignoring the spaced equals item, that HTML is
"flexible" enough that if you get to the point of wanting to do
rewriting/editing that you'll have way less "pull your hair out"
issues if you make use of an HTML parser to parse the HTML instead
of trying to do anything by string or regex search/replace on the
HTML. Anything string/regex search based on HTML will appear to
work ok until the day it hits a legal bit of HTML it was not
designed to handle, then it will break badly.
I. e., the "edge conditions" are so numerous that you are better
off using a parser that has already been designed to handle those
edge conditions.
I tend to agree with the above for the general case: where I'd
expect the code to /fail/ if it encounters something it does
not understand.
In this case, something that the code does not understand is
ought to be left untouched, and I'm unsure if I can readily
get an HTTP parser that does that.
That is, of course, always the final 'out' for something so broken that
the 'content modification' module fails.

The difference is that you'll significantly reduce the number of
failure instances by using a parser to handle the parsing of the
incoming HTML, then passing the parse tree off to the 'content
modification' module vs. trying to do content modification with string
matching and/or regex matching (both of which are essentially creating
weak 'parsers' that only handle a small subset of the full
possibilities allowed).

But you can't possibly reduce the potential for failure to zero, no
matter what you do, because it is always possible to retreive something
that claims to be html but is so broken that it simply can't be handled
(or is simply miss-identified, i.e., someone sending a jpeg image but
mime-typing it in the header as text/html).
Ivan Shmakov
2018-09-19 05:15:57 UTC
Permalink
[...]
Post by Rich
Post by Ivan Shmakov
Post by Rich
I. e., the "edge conditions" are so numerous that you are better
off using a parser that has already been designed to handle those
edge conditions.
I tend to agree with the above for the general case: where I'd
expect the code to /fail/ if it encounters something it does not
understand.
In this case, something that the code does not understand is ought
to be left untouched, and I'm unsure if I can readily get an HTTP
s/HTTP/HTML/, obviously.
Post by Rich
Post by Ivan Shmakov
parser that does that.
That is, of course, always the final 'out' for something so broken
that the 'content modification' module fails.
The difference is that you'll significantly reduce the number of
failure instances by using a parser to handle the parsing of the
incoming HTML, then passing the parse tree off to the 'content
modification' module vs. trying to do content modification with
string matching and/or regex matching (both of which are essentially
creating weak 'parsers' that only handle a small subset of the full
possibilities allowed).
I also consider the possibility of running no-https as a public
service. As such, considerations like CPU and memory consumption,
including the ability to run in more or less constant space (per
connection, with the number of concurrent connections possibly
also limited) take priority. Creating a full DOM for the
possibly multi-MiB document, OTOH, is not an option.

(That said, if there's an HTML parser for per that /can/ be used
for running in constant space, I'd be curious to consider the
examples.)

If you want for these alterations to take place for every
possible document supported by your browser -- implement them as
a browser extension. For instance, user JavaScript run with
Greasemonkey for Firefox has (AIUI) full access to the DOM and
can walk that and consistently strip "https:" off attribute
values, regardless of the HTML document's syntax specifics.

[...]
--
FSF associate member #7257 http://am-1.org/~ivan/
Marko Rauhamaa
2018-09-18 19:02:22 UTC
Permalink
Post by Rich
The HTML spec does not appear to explicitly exclude use of spaces
around the equals sign. So unless there is an explicit exclusion
somewhere that I've missed, it would be legal to add spaces around the
equals.
No need to guess or improvise. The W3 consortium has provided an
explicit pseudocode implementation of an HTML parser:

<URL: https://www.w3.org/TR/html52/syntax.html#syntax>

In fact, I happened to implement the lexical analysis of HTML based on
this specification just a couple of weeks ago. It was about 3,000 lines
of code.

The specification is careful to address the proper behavior of a parser
when illegal HTML is encountered.


Marko
Rich
2018-09-18 19:08:48 UTC
Permalink
Post by Marko Rauhamaa
Post by Rich
The HTML spec does not appear to explicitly exclude use of spaces
around the equals sign. So unless there is an explicit exclusion
somewhere that I've missed, it would be legal to add spaces around the
equals.
No need to guess or improvise. The W3 consortium has provided an
<URL: https://www.w3.org/TR/html52/syntax.html#syntax>
Thanks for that reference. Looking through it, one finds this for
attributes:

The attribute name, followed by zero or more space characters,
followed by a single U+003D EQUALS SIGN character, followed by zero
or more space characters, followed by the attribute value,

So spaces around the equals sign are actually allowed per that syntax
page.
Andy Burns
2018-09-18 19:16:37 UTC
Permalink
Post by Ivan Shmakov
I don't think I've ever seen the " = " form; do the blanks
around the equals sign even conform to any HTML version?
yes, e.g.

"The attribute name, followed by zero or more space characters, followed
by a single U+003D EQUALS SIGN character, followed by zero or more space
characters, followed by the attribute value, which, in addition to the
requirements given above for attribute values, must not contain any
literal space characters, any U+0022 QUOTATION MARK characters ("),
U+0027 APOSTROPHE characters ('), U+003D EQUALS SIGN characters (=),
U+003C LESS-THAN SIGN characters (<), U+003E GREATER-THAN SIGN
characters (>), or U+0060 GRAVE ACCENT characters (`), and must not be
the empty string"

<https://www.w3.org/TR/html5/syntax.html#attribute-names>
Computer Nerd Kev
2018-09-16 22:52:54 UTC
Permalink
Post by Ivan Shmakov
It took me about a day to write a crude but apparently (more or
less) working HTTP to HTTPS proxy. (That I hope to beat into
shape and release via news:alt.sources around next Wednesday
or so. FTR, the code is currently under 600 LoC long, or 431 LoC
excluding comments and empty lines.) Some design notes are below.
Sounds like a great start. I'm looking forward to trying it out.
--
__ __
#_ < |\| |< _#
Mike Spencer
2018-09-19 20:27:44 UTC
Permalink
Post by Computer Nerd Kev
Post by Ivan Shmakov
It took me about a day to write a crude but apparently (more or
less) working HTTP to HTTPS proxy. (That I hope to beat into
shape and release via news:alt.sources around next Wednesday
or so. FTR, the code is currently under 600 LoC long, or 431 LoC
excluding comments and empty lines.) Some design notes are below.
Sounds like a great start. I'm looking forward to trying it out.
Same. As the guy who (possibly) triggered thie thread, I'm archiving
all posts. Weather is heavenly but will soon turn less salubrious, my
winter's firewood is all under cover and I'll be spending more time
hunched over the keyboard, trying to keep up with the evolution of
then web on my own terms.

Tnx for discussion; checking alt.sources periodically.
--
Mike Spencer Nova Scotia, Canada

Grumpy old geezer
Ivan Shmakov
2018-09-25 18:39:32 UTC
Permalink
It took me about a day to write a crude but apparently (more or less)
working HTTP to HTTPS proxy. (That I hope to beat into shape and
release via news:alt.sources around next Wednesday or so. FTR, the
code is currently under 600 LoC long, or 431 LoC excluding comments
and empty lines.) Some design notes are below.
It took much longer (of course), and the code has by now expanded
about threefold. The HTTP/1 support is much improved, however;
for instance, request bodies and chunked coding should now be
fully supported. Moreover, the relevant code was split off into
a separate HTTP1::MessageStream push-mode parser module (or about
a third of the overall code currently), allowing it to be used
in other applications.

The no-https.perl code proper still needs some clean-up after
all the modifications it got.

The command-line interface is about as follows. (Not all the
options are as of yet thoroughly tested, though.)

Usage:
$ no-https
[-d|--[no-]debug] [--listen=BIND|-l BIND] [--mangle=MANGLE]
[--connect=COMMAND] [--ssl-connect=COMMAND]
$ no-https {-h|--help}

BIND is either [HOST:]PORT or, if includes a /, a file name for a
Unix socket to create and listen on. The default is 8080.

COMMAND will have %%, %h, %p replaced with a literal %, target host
and TCP port, respectively. Also, %s and %t are replaced respectively
with a space and a TAB.

MANGLE can be minimal, header, or a name of an App::NoHTTPS::Mangle::
package to require and use. If not specified, default is tried
first, falling back to (internally-implemented) header.

The --connect= and --ssl-connect= should make it possible to
utilize a parent proxy, including a SOCKS one, such as that
provided by Tor, like: --connect="socat STDIO
SOCKS4:localhost:%h:%p,socksport=9050". For --ssl-connect=,
a tsocks(1)-wrapped gnutls-cli(1) may be an option.
Basics
1. receive a request header from the client; we only allow GET and
HEAD requests for now, as we do not support request /bodies/ as of yet;
RFC 7230 section 3.3 actually provides simple criteria for
determining whether the request has a body:

The presence of a message body in a request is signaled by a
Content-Length or Transfer-Encoding header field. Request message
framing is independent of method semantics, even if the method does
not define any use for a message body.

As such, and given that message passing was "symmetrized," any
request method except CONNECT is now allowed by the code.
2. decide the server and connect there;
3. send the header to the server;
Preceded by the request line, obviously. (It was considered
a part of the header in the original version of the code.)
4. receive the response header;
(Same here, for the status line.)

We also pass any number of "100 Continue" messages here from
server to client before the "payload" response.
5.1. connect over TLS, alter the request (Host:, "request target")
accordingly, go to step 3;
A Host: header is prepended to the request header if the
original has none.
6. strip certain headers (such as Strict-Transport-Security: and
Upgrade:, but also Set-Cookie:) off the response and send the result
to the client;
Both the decision whether to "eat up" the redirect and how to
alter the header and body of the messages (requests and responses
alike) are left to the "mangler" object. The object is ought to
implement the following methods.

$ma->message_mangler (PARSER, URI)
Return a new mangler object for the given HTTP1::MessageStream
parser state (either request or response) and request URI.

Alternatively, return an URI of the resource to transparently
request instead of the given one.

Return undef if this mangler has nothing to do with the
given parser state and URI.

$ma->parser ([PARSER]), $ma->uri ([URI]),
$ma->start_line ([START-LINE]), $ma->header ([HEADER])
Get or set the HTTP1::MessageStream object, URI, HTTP/1
start line and HTTP/1 header, respectively, associated with
the particular request.

$ma->chunked_p ()
Return a true value if the body is ought to be transmitted
to the remote using chunked coding. (The associated header
is set up accordingly.)

$ma->get_mangled_body_part ()
Return the next part of the (possibly modified) HTTP/1
message body. This will typically involve a call to the
parser object to interpret the portion of the message
currently in its own buffer.

There're currently two such classes implemented: "minimal" and
"header," and I believe that the above interface can be used to
implement rather arbitrary HTTP message filters.

The "minimal" class removes Upgrade and Proxy-Connection headers
from the messages (requests and responses alike) and causes the
calling code to transparently replace all the https: redirects
with requested resources.

The "header" class also filters Strict-Transport-Security and
Set-Cookie off the responses. (Although the former should have
no effect anyway.)

There's a minor issue with the handling of https: redirects.
When http://example.com/ redirects to https://example.com/foo/bar,
for instance, the links in the latter document will become
relative to the former URI (unless the 'base' URI is explicitly
given in the document); thus <a href="baz" /> will point to
/baz -- instead of the intended /foo/baz. A likely solution
is to only eat up http:SAME to https:SAME redirects, rewriting
http:SOME to https:OTHER instead to point to http:OTHER (which
will then likely result in a redirect to https:OTHER, in turn
eaten up by the mangler.)
7. copy up to Content-Length: octets from the server to the client --
or all the remaining data if no Content-Length: is given; (somewhat
surprisingly, this seems to also work with the "chunked" coding not
otherwise considered in the code);
Both the chunked coding and client-to-server body passing are
now ought to be supported (although POST requests remain untested.)
8. close the connection to the server and repeat from step 1 so long
as the client connection remains active.
[...]
--
FSF associate member #7257 http://am-1.org/~ivan/
Eli the Bearded
2018-09-25 22:29:27 UTC
Permalink
Post by Ivan Shmakov
It took me about a day to write a crude but apparently (more or less)
working HTTP to HTTPS proxy. (That I hope to beat into shape and
release via news:alt.sources around next Wednesday or so. FTR, the
code is currently under 600 LoC long, or 431 LoC excluding comments
and empty lines.) Some design notes are below.
It took much longer (of course), and the code has by now expanded
about threefold. The HTTP/1 support is much improved, however;
for instance, request bodies and chunked coding should now be
fully supported. Moreover, the relevant code was split off into
a separate HTTP1::MessageStream push-mode parser module (or about
a third of the overall code currently), allowing it to be used
in other applications.
Sounds interesting. I don't see it in alt.sources here (nor did you
include a message ID, as I know you have done in the past for such
things). When do you expect to have a version someone can try out?

(Will you be posting the code to CPAN?)

Elijah
------
recalls Ivan dislikes github
Ivan Shmakov
2018-09-26 01:05:15 UTC
Permalink
[...]
Post by Ivan Shmakov
It took much longer (of course), and the code has by now expanded
about threefold. The HTTP/1 support is much improved, however;
for instance, request bodies and chunked coding should now be fully
supported. Moreover, the relevant code was split off into a
separate HTTP1::MessageStream push-mode parser module (or about
a third of the overall code currently), allowing it to be used in
other applications.
Sounds interesting. I don't see it in alt.sources here (nor did you
include a message ID, as I know you have done in the past for such
things). When do you expect to have a version someone can try out?
Hopefully within this week; I'm still testing the proxy code
proper, and yet to write the READMEs. (Though by now you should
be well aware that my estimates can be overly optimistic.)
(Will you be posting the code to CPAN?)
One of the later versions; as a dependency, HTTP1::MessageStream
will take priority here, but no-https.perl will likely follow.
Elijah ------ recalls Ivan dislikes github
I by no means single out GitHub here; rather, I dislike any
platform that requires the user to run proprietary software,
such as proprietary JavaScript, to operate. Hence, GitLab, or
Savannah, sound like much better choices.
--
FSF associate member #7257 http://am-1.org/~ivan/
Ivan Shmakov
2018-10-04 20:07:49 UTC
Permalink
While I'm yet to make a proper announce, I'm glad to inform
anyone interested that the first public version of no-https.perl
is available from news:alt.sources: news:***@siamics.net.
--
FSF associate member #7257 http://am-1.org/~ivan/
Computer Nerd Kev
2018-10-05 00:11:40 UTC
Permalink
Post by Ivan Shmakov
While I'm yet to make a proper announce, I'm glad to inform
anyone interested that the first public version of no-https.perl
Great! I'm looking forward to trying it out once I get the time.
--
__ __
#_ < |\| |< _#
Loading...