masinter / multipart-form-data Goto Github PK
View Code? Open in Web Editor NEWupdate to RFC 2388 definition of multipart/form-data
update to RFC 2388 definition of multipart/form-data
appendix A contains several bits of ill-considered normative advice, remove them.
See https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909
And here is my proposed reply but some comments
RFC 2388 was clear:
Field names originally in non-ASCII character sets may be encoded
within the value of the "name" parameter using the standard method
described in RFC 2047.
For reasons I don't understand, browsers did different, incompatible
things.
I think the main advice is:
What should the browsers migrate to?
http://www.rfc-editor.org/rfc/rfc5987.txt
seems like a more recent proposal and possibly implemented in HTTP anyway.
Sites that use non-ASCII field names and want to work with multiple browsers already have to do fuzzy matching.
The problem is that the fuzzy matchers already deployed might not recognize any NEW encodings.
So I suppose having a name* value would be necessary.
--AaB03x
content-disposition: form-data; name="field1"
content-type: text/plain;charset=windows-1250
content-transfer-encoding: quoted-printable
Joe owes =80100.
--AaB03x
You need an empty line before the "Joe owes" (unfortunately, this was turned into a page break in RFC 2388).
Carsten Bormann [email protected] writes:
I would kill Appendix B; RFC 2388 is not going away.
HTML is not the only producer of multipart/form-data. Cleaning up encodings is very important, and citing examples from HTML5 in the RFC is illuminating and can help adoption, but please don't give readers the impression that multipart/form-data is useful only for HTML. Box.net, for example, uses it for their REST-based API and people struggle with the Google HTTP Client code library because it improperly implements multipart/form-data,. A clearer RFC that lays out the responsibility to implement might help adoption.
In XForms we also used multipart/related with an XML body for the first part, containing all instance data (what could or was bound to input etc. controls) and separate parts, referenced by URI from within the first part, for the uploaded-file components. Using JSON for the first part is a trivial change.
There are two HTML4 references, why?
What should the HTML5 reference be? It's non-normative, just to point to where this is used.
I ran some simple tests using http://software.hixie.ch/utilities/js/live-dom-viewer/ and http://software.hixie.ch/utilities/cgi/test-tools/echo and I'm wondering why this is taking so long.
This format seems pretty straightforward. As far as I can tell @dthaler is correct. The way things like name="..."'s value are encoded is not, it's just the bytes coming out of the encoder (which depends on the encoding of the <form>
).
Text entries have no Content-Type
header and the others do. multipart/mixed
is not to be used.
All we need here is an algorithm that takes a set form entries and serializes them and an algorithm that does the reverse. Ideally soon as we need both of these algorithms in browsers due to service workers (a sort of proxy server).
I'm somewhat tempted to just inline these algorithms and define this format together with its API, just as we already did for application/x-www-form-urlencoded
.
The text for section 4.7 currently reads:
Previously, it was recommended that senders use a "Content-Transfer-
Encoding" encoding (such ss quoted-printable) for each non-ASCII part
of a multipart/form-data body. This recommendation is "deprecated":
senders MUST NOT send any parts with a content-transfer-encoding
header. No deployed implementations that send such bodies have been
discovered.
But this precludes the use of multipart/form-data on 7-bit only transports, such as default SMTP, wherever non-ASCII form data has to be sent. This requirement, though, makes sense on transports that allow binary data, such as HTTP. Note also that the previous version of this draft was less strict in this respect. Moreover, as senders include forwarding proxies, this requirement unnecessarily applies to proxies. A better word would be "generate", rather than send.
I would prefer text like the following:
This recommendation is "deprecated": senders MUST NOT generate any parts
with a content-transfer-encoding header field unless the part is being sent via a
7-bit only transport, in which case it may be necessary to use a
transfer encoding such as "base64" or "quoted-printable". No deployed [...]
Note that HTTP is not a 7-bit only transport.
Note:
Change "such ss quoted-printable" to "such as quoted-printable".
See comments in marked up copy at
http://research.microsoft.com/~dthaler/draft-ietf-appsawg-multipart-form-data-01.pdf
L: * those creating HTML forms SHOULD use ASCII field names,
since deployed HTML processors vary,
and field names shouldn't be visible to the user anyway.
Clarify that "to maximize interoperability" i.e., it's not a conformance requirement. Maybe a "MAY" instead of a "SHOULD".
Hi Larry,
Some comments after reviewing your draft:
In Section 2:
As with all multipart MIME types, each part has an optional "Content-
Type", which defaults to "text/plain". If the contents of a file are
returned via filling out a form, then the file input is identified as
the appropriate media type, if known, or "application/octet-stream".
The inclusion of multiple files returned for a single file input
result in multiple parts, one for each file, with the same name.
I would insert references to where various mentioned media types are
defined.
As with all multipart MIME types, each part has an optional "Content-
Type", which defaults to "text/plain". If the contents of a file are
returned via filling out a form, then the file input is identified as
the appropriate media type, if known, or "application/octet-stream".
The inclusion of multiple files returned for a single file input
result in multiple parts, one for each file, with the same name.
It took me multiple passes to understand the last sentence. I am not
sure I got it. Can you insert an example or qualify various use of "file"?
3.5. Charset of text in form data
For example, a form with a text field in which a user typed 'Joe owes
<eu>100' where <eu> is the Euro symbol might have form data returned
as:
--AaB03x
content-disposition: form-data; name="field1"
content-type: text/plain;charset=windows-1250
content-transfer-encoding: quoted-printable
Joe owes =80100.
--AaB03x
Are you missing an empty line after "content-transfer-encoding:"? This
doesn't look like a proper MIME fragment.
Media Type name:
multipart
Media subtype name:
form-data
Required parameters:
none
Optional parameters:
none
This doesn't look correct. What about "boundary"?
Example multipart/form-data would be useful. I had to search the web to
find some
Mozilla bug database seems to indicate "can find no documentation" on how this is to work.
In particular, this means that multiple files submitted as part of a single element will result in each file having its own field; the "sets of files" feature ("multipart/mixed") of RFC 2388 is not used.
My view is that this use of multipart/mixed now qualifies for a NOT RECOMMENDED.
L: "Those developing server infrastructure to read multipart/form-data uploads
SHOULD be aware of the varying behavior of the browsers in translating
non-ASCII field names, and look for any of the variants (if they're
expecting non-ASCII field names).
H: If the servers have to look for variants, we should define those variants.
Agree, need to find current behavior of deployed browsers
-----Original Message-----
From: Ian Hickson [mailto:[email protected]]
Sent: Wednesday, February 13, 2013 7:18 AM
To: Larry Masinter
Subject: RFC 2388 (multipart/form-data)
Hey Larry,
Do you know if there is anyone working on fixing RFC2388? People keep
asking me to update the HTML spec to just define it all inline rather than
deferring to the RFC since the RFC leaves a lot of stuff underdefined, but
I don't have the bandwidth to spec all that myself at this point.
e.g.:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909
https://www.w3.org/Bugs/Public/show_bug.cgi?id=19879
Other feedback:
http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Oct/0204.html
http://lists.w3.org/Archives/Public/public-whatwg-archive/2012Jul/0037.html
http://lists.w3.org/Archives/Public/public-whatwg-archive/2012May/0003.html
Cheers,
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
I think some clarification of the "can" in 3.2 is required. Maybe say something like "the handling of file input fields that allow multiple files to be specified varies between browsers. Some send these as sets of files (wrapping all the parts for the files in one multipart/mixed), some just send multiple form-data parts with the same "name" attribute. HTML5 specifies the latter behavior."
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.