laitimes

"Carriage Returns Wrap (CRLF) is obsolete and should be abolished!" The public call of the father of SQLite has sparked heated discussions

"Carriage Returns Wrap (CRLF) is obsolete and should be abolished!" The public call of the father of SQLite has sparked heated discussions

Compile | Su Mi

出品 | CSDN(ID:CSDNnews)

As a programmer, you're no stranger to CRLF.

CRLF, full name Carriage Return Line Feed, Chinese translates as carriage return line feed. It consists of two characters: CR (\r, carriage Enter) and LF (\n, line feed), where carriage return is to move the cursor to the far left of the current line, and line break is to move the cursor down one line. One concept that needs to be mentioned here is the New Line (NL), which refers to moving the cursor down one line and moving it to the far left of the current line.

CRLF exists mainly to be compatible with file formats of different operating systems. Generally, Windows uses CRLF as a line break, while Unix/Linux and macOS only use LF.

However, in practice, the differences between CRLF and LF often lead to headaches and conflicts in the processing of files by many development teams. More and more developers are arguing that CRLF is outdated and should be scrapped.

"Carriage Returns Wrap (CRLF) is obsolete and should be abolished!" The public call of the father of SQLite has sparked heated discussions

This has sparked a lot of controversy, but it has also led to a rethinking: do we really need to continue to support CRLF in a modern development environment?

"Carriage Returns Wrap (CRLF) is obsolete and should be abolished!" The public call of the father of SQLite has sparked heated discussions

SQLite 之父发起呼吁

The announcement was initiated by United States software developer D. Richard Hipp, who created the SQLite open-source embedded relational database, as well as software such as the distributed version control system Fossil and the web server Althttpd.

According to D. Richard Hipp, both "Carriage Return" and "New Line" are useful control characters. NL (new line) is the most common operation and means starting a new line and writing from the beginning of the line. A separate CR can sometimes be useful, especially if you want to overwrite text that has already been written. And LF (line wrapping) is basically useless. No one wants to stop in the middle of a line, then move down a line and continue writing from the same column. No actual program would do that.

LF exists because it was left over from the days when computer terminals were used as fly-by-wire printers.

According to D. Richard Hipp, LF was born about 70 years ago in the era of mechanical teletypewriters. Instead of using transistors, the teletypewriters of the time consisted entirely of gears, cams, motors, relays, and servos. They are amazing and can convert binary codes transmitted through two copper wires into printed text on paper.

Teletypewriters work just like regular typewriters, printing about 5 characters per second. The printhead is a cylinder or oval-shaped ball that contains letters. There is a cloth band soaked in ink between the print head and the paper. In order to print a character, the print head rotates to the correct position and then strikes forward so that the ink on the tape forms the shape of the desired character on the paper. Each time a character is printed, the entire printhead mechanism (balls, ink ribbons, and various control cams and gears) moves the position of one character to the right. It all happens five times a second. These machines operate with a lot of noise and vibrations that are noticeable.

At the end of a line of text, the print head must return to the far left. The printhead moves quickly, but it still takes time to move to the far left. There was no memory at the time, so the printhead had to be moved completely to the left before the next character arrived.

To achieve this, the NL (new line) operation is divided into two sub-operations: CR (carriage return) and LF (line feed). CR (Enter) goes first, starts the printhead and moves to the left. While the printhead is still moving, the LF (Line Wrap) arrives, causing the paper to scroll one line. This extra LF (line wrap) character buys the printhead enough time to move completely to the far left before the next character arrives.

Looking back, the tradition of lines of text ending in CRLF dates back to the mechanical limitations of teletypewriters in the 50s of the 20th century. This is a prime example of how the details of the underlying implementation are exposed in the user interface.

By the time of Multics and Unix in the late '60s and early '70s of the 20th century, most people realized that it didn't make sense to use CRLF as NL (new line).

As a result, the task of sending separate CR and LF characters is handed over to the device driver of the teletypewriter, as the resolution of hardware defects should be handled at the driver level. The computer only needs to save one NL (new line) character and uses the same LF (new-line) code as the teletypewriter to represent the NL, where the real LF has no meaning in practical applications, so its numeric code is reused to represent the NL.

Nowadays, CR is represented by U+000d in Unicode encoding as code points, and LF and NL are represented by U+000a. Almost all modern machines only use U+000a for NL, and this meaning is also embedded in most programming languages, often using the backslash escape character \n.

Still, there are a handful of machines that insist on sending CR before NL, and the official Unicode name for U+000a is still LF. In addition, some protocols (such as HTTP, SMTP, CSV) still "require" that each line end in a CRLF. Nowadays, almost all software accepts a separate NL character (without a precedent CR) to indicate the end of a line. You'll have to look very carefully to find a device or app that actually interprets U+000a as a line break.

D. Richard Hipp 直言:

"This tradition, which is to send useless CR (carriage returns) before every NL (new line), originated in the era of rotary dial telephones, even before the invention of integrated circuits. There is no reason for this practice to continue in our modern world. The extra CR serves no practical purpose and just causes unnecessary trouble for the programmer and wastes bandwidth. ”

"Carriage Returns Wrap (CRLF) is obsolete and should be abolished!" The public call of the father of SQLite has sparked heated discussions

CRLF is obsolete!

In this context, D. Richard Hipp launched an appeal – "All those who seek simplicity, peace, and desire to promote human prosperity, please join me in opposing the use of CRLF and help it quickly become a relic of history."

To this end, he made four suggestions:

  1. Stop using "linefeed" (LF) as the name of the U+000a code point. Most of the technologies that have been built in the last two decades, and most of the technologies of the last half-century, have understood U+000a as a "newline" rather than a "linefeed". Although "linefeed" is its historical name, what does that matter? In almost all practical applications, it represents a "new line", so please call it "newline".
  2. Stop sending unnecessary CRs (carriage returns). Use CR only if you really need to overwrite the current line with new content. Adding CR before NL (new line) is a complete waste of bandwidth. Unless you have to communicate with some stubborn system that insists on staying in the 1950s, don't put CR before NL.
  3. Even if some existing protocols (e.g., HTTP, SMTP, CSV, FTP) technically require CRLF as the end of line, do not comply with them. Only NL is sent. Although technically incorrect, implementations of almost all of these protocols will accept a separate NL as an end-of-line marker. Don't succumb to the control of the CRLF.
  4. Fixed software that behaved or reported an error when receiving NL without a leading CR. All modern software should accept a separate U+000a character as a valid end-of-line marker. It is possible for the system to accept CR plus NL for backwards compatibility, but software that requires CR plus NL is problematic.

In D. Richard Hipp's view, the end of CRLF is long overdue, as it is long outdated.

"Carriage Returns Wrap (CRLF) is obsolete and should be abolished!" The public call of the father of SQLite has sparked heated discussions

impact

Unexpectedly, as soon as this remark came out, it aroused a strong resonance among programmers, and many people also held different opinions.

One developer said, "I couldn't agree more. This leads to endless clutter, especially in cross-platform text files. Not to mention parsing programmatically."

However, there are also netizens from HN forrestthewoods who said:

I strongly disagree with this view.

To put it simply – don't complain, figure it out yourself. Dealing with different or mixed line endings is indeed a slight hassle, but it's not complicated or difficult. Don't let others take on unnecessary trouble just to make it easier for yourself. Accept it and move on.

@Animats argues that "instead of appealing, it is better to convince Microsoft. Because that's what DOS legacies is what keeps this going."

@bmitc said: Who but poorly designed Unix tools and Git are bothered by this kind of problem? To accommodate Linux, I configured the editor to use LF on any operating system, and made sure Git no longer confuses line endings. I've never had a problem dealing with serial protocols.

As the controversy unfolded, D. Richard Hipp was forced to update his statement today, saying:

It looks like (1) the current mainstream software relies on outdated CRLF line endings more than I originally expected;

(2) A lot of people don't share my passion for creating a CRLF-free world.

Alas, this disappointed me a little, but the reality is that it is. Thanks to everyone who was willing to try the idea, it would have almost worked! As a result, I hereby withdraw this proposal and have reverted all of my systems to generate CRLFs as required by the specification. It's a pity.

However, there was an unexpected benefit of this experiment, as I found and fixed some issues in Fossil and althttpd that previously required CRLF and did not allow the use of a separate NL as an alternative.

So, have you ever had a CRLF issue in development?

Source:

https://fossil-scm.org/home/ext/crlf-harmful.md

https://news.ycombinator.com/item?id=41830717

"Carriage Returns Wrap (CRLF) is obsolete and should be abolished!" The public call of the father of SQLite has sparked heated discussions