Creating a csv file. What is the CSV file extension? How to open CSV in Excel - Import CSV file into Excel

a format designed to represent tabular data. Each line of the file is one line of the table. Values ​​of individual columns are separated by a separator character (delimiter) - a comma (,). However, most programs freely interpret the CSV standard and allow the use of other characters as a separator. In particular, in locales where the decimal separator is a comma, the table separator is usually a semicolon. Values ​​containing reserved characters (comma, semicolon, new line) are surrounded by double quotes ("); if there are quotes in the value, they are represented in the file as two quotes in a row. Lines are separated by a pair of characters CR LF (0x0D 0x0A) (in DOS and Windows, this pair is generated by pressing the Enter key). However, specific implementations may use other common line separators, such as LF (0x0A) in UNIX.

Despite the existence of RFC, today, CSV is usually understood as a set of values ​​separated by any delimiters, in any encoding, with any line endings. This greatly complicates the transfer of data from one program to another, despite the ease of implementing CSV support.

Example

Original text:

1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""","",4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded ",4799.00

Result table:

For Russianized Microsoft Excel (on systems where the list separator is set to a comma), the source text will look like this:

1965;Pixel;E240 – formaldehyde (dangerous preservative)!;"red, green, broken";3000.00 ;Keyboard shortcuts;"MUST USE! Ctrl, Alt, Shift";4799.00

Result table:

Programs for editing CSV files: Microsoft Excel, Numbers, TablePro, CSVed, OpenOffice.org Calc, KSpread, Google Docs. Import and export of CSV files is possible in many engineering packages, such as ANSYS and LabVIEW. Nokia PC Suite also creates CSV files when copying SMS messages c mobile phone on computer.

see also

Links

  • CSV-1203 (English)
  • RFC 4180 Specification

Wikimedia Foundation. 2010 .

  • Gamma (letter)

See what "CSV" is in other dictionaries:

    csv- Saltar a navegación, búsqueda Para otros usos de este término, véase CSV (desambiguación). Los ficheros CSV (del inglés comma separated values) son un tipo de documento en formato abierto sencillo para representar datos en forma de tabla, en las… … Wikipedia Español

    csv- may refer to: Clerics of Saint Viator Common Stored Value Ticket Confederación Sudamericana de Voleibol Character Strengths and Virtues Christian Social People s Party Community Service Volunteers GM U platform, a minivan made by General Motors… … Wikipedia

    csv- steht für: Certified Server Validation, eine vorgeschlagene technische Methode zur Spam Vermeidung Character Separated Values ​​or Comma Separated Values, siehe CSV (Dateiformat) Chrëschtlech Sozial Vollekspartei (Christlich Soziale Volkspartei) ... Deutsch Wikipedia

    csv- formatas statusas T sritis informatika apibrėžtis Duomenų bazės laukų įrašymo tekstiniu formatu būdas, kai duomenų laukai skiriami kableliais. Naują įrašą atitinka nauja eilutė. CSV formatas dažnai naudojamas adresų knygos duomenims, programos… … Enciklopedinis kompiuterijos žodynas

    .csv- , Erweiterung für eine ASCII Datei, die Daten aus einer Datenbankdatei enthält (Comma separated Values) … Universal-Lexikon

    csv- (Comma Separated Values) (Computers) file format used for storing database information in ASCII format (each entry or field is separated by a comma and each new row is represented by a new line) … English contemporary dictionary

    .csv- Das Dateiformat CSV beschreibt den Aufbau einer Textdatei zur Speicherung oder zum Austausch einfach strukturierter Daten. Die Dateiendung CSV ist eine Abkürzung für Comma Separated Values ​​(seltener Character Separated Values ​​or Colon Separated … Deutsch Wikipedia

    csv- Die Abkürzung CSV steht für: Certified Server Validation, eine vorgeschlagene technische Methode zur Spam Vermeidung Comma Separated Values ​​oder Character Separated Values, ein Dateiformat, siehe CSV (Dateiformat) Christlich Soziale Volkspartei… … Deutsch Wikipedia

    csv- cash surrender value (CSV) The amount of cash that can be obtained by the policy owner upon cancellation of a whole life insurance policy. CSV may also be borrowed by the policy owner. Only certain kinds of life insurance policies have cash… … Financial and business terms

    csv- Cette page d'homonymie répertorie les différents sujets et articles partageant un même nom. Sigles d'une seule lettre Sigles de deux lettres > Sigles de trois lettres Sigles de quatre lettres ... Wikipedia en Français

Books

  • Python. Application creation. The Professional's Library, Wesley J. Chan. Do you already know Python but want to learn more? A lot more? Immerse yourself in a variety of topics related to real applications. The book covers regular expressions, networking…

CSV is a de facto standard for interconnecting heterogeneous systems, for transmitting and processing bulk data with a "rigid", tabular structure. Many scripting programming languages ​​have built-in parsing and generation tools, it is well understood by both programmers and ordinary users, and problems with the data themselves are well detected in it, as they say, by eye.

The history of this format has at least 30 years. But even now, in the era of wholesale XML usage, to upload and download large amounts of data, they still use CSV. And, despite the fact that the format itself is quite well described in the RFC, everyone understands it in their own way.

In this article, I will try to summarize the existing knowledge about this format, point out typical errors, and also illustrate the described problems using the example of the import-export implementation curve in Microsoft Office 2007. I'll also show you how to get around these problems (including Excel's automatic type conversion to DATETIME and NUMBER) when opening .csv.

Let's start with the fact that the CSV format is actually called three different text formats that differ in separator characters: CSV itself (comma-separated values ​​- comma-separated values), TSV (tab-separated values ​​- tab-separated values) and SCSV (semicolon separated values ​​- semicolon separated values). In life, all three can be called one CSV, the separator character is at best selected during export or import, and more often it is simply “sewn up” inside the code. This creates a lot of problems in trying to figure it out.

As an illustration, let's take a seemingly trivial task: to import data into Microsoft Outlook from a table in Microsoft Excel.

Microsoft Excel has a CSV export facility and Microsoft Outlook has a corresponding import facility. What could be easier - made a file, "fed" mail program and is it done? No matter how.

Let's create a test table in Excel:

... and try to export it to three text formats:

What do we conclude from this?.. What Microsoft here calls "CSV (comma delimited)" is actually a "semicolon delimited" format. Microsoft's format is strictly Windows-1251. Therefore, if you have Unicode characters in Excel, they will be displayed as question marks in the CSV output. Also, the fact that line feeds are always a couple of characters, that Microsoft stupidly quotes everything where it sees a semicolon. Also, if you don't have Unicode characters at all, you can save on file size. Also that Unicode is only supported by UTF-16, not UTF-8, which would make much more sense.

Now let's see how Outlook looks at it. Let's try to import these files from it, specifying the same data sources. Outlook 2007: File -> Import and Export… -> Import from another program or file. Next, select the data format: "Comma Separated Values ​​(Windows)" and "Tab Separated Values ​​(Windows)".

Two Microsoft products do not understand each other, they completely lack the ability to transmit through text file structured data. In order for everything to work, the programmer's "dancing with a tambourine" is required.

We remember that Microsoft Excel can work with text files, import data from CSV, but in the 2007 version it does it in a very strange way. For example, if you simply open a file through the menu, it will open without any format recognition, just as a text file placed in its entirety in the first column. If you double-click on the CSV, Excel receives another command and imports the CSV as it should without asking any questions. The third option is to insert a file on the current sheet. In this interface, you can customize the separators, immediately see what happened. But one thing: it does not work well. For example, Excel does not understand quoted line breaks inside fields.

Moreover, the same save to CSV function, called through the interface and through a macro, works differently. The option with a macro does not look at the regional settings at all.

Unfortunately, there is no CSV standard as such, but, meanwhile, there is a so-called. memo. This is RFC 4180 of 2005, which describes everything quite sensibly. In the absence of anything else, it is correct to adhere to at least the RFC. But for compatibility with Excel, you should take into account its features.

  • between lines - line feed CRLF [in my opinion, they should not have been limited to two bytes, i.e. both CRLF (0x0D, 0x0A) and CR 0x0D]
  • separators are commas, there should not be a comma at the end of the line,
  • in the last line CRLF is optional,
  • the first line can be a header line (not marked in any way)
  • spaces surrounding the delimiter comma are ignored.
  • if the value contains CRLF, CR, LF (line separator characters), double quote or comma (field separator character), then enclosing the value in quotation marks is mandatory. Otherwise, it's allowed.
  • those. line breaks within the field are allowed. But such field values ​​must be quoted,
  • if there are double quotes inside the quoted part, then a specific quoting of quotes in CSV is used - their duplication.

Here is the description of the format in ABNF notation:

File = record *(CRLF record) header = name *(COMMA name) record = field *(COMMA field) name = field field = (escaped / non-escaped) escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE ) DQUOTE non-escaped = *TEXTDATA COMMA = %x2C DQUOTE = %x22 LF = %x0A CRLF = CR LF TEXTDATA = %x20-21 / %x23-2B / %x2D-7E

Also, when implementing the format, you need to remember that since there are no pointers to the number and type of columns, since there is no requirement to place a heading, there are conventions that you must not forget about:

  • a string value of digits that is not enclosed in quotes can be perceived by the program as a numeric value, due to which information can be lost, for example, leading zeros,
  • the number of values ​​in each row may differ and this situation must be properly handled. In some situations, you need to warn the user, in others, you need to create additional columns and fill them with empty values. You can decide that the number of columns is set by the header, or you can add them dynamically as you import CSV,
  • Quoting quotes through a "slash" is not according to the standard, you don't need to do that.
  • Since there is no field typing, there are no requirements for them. Integer and fractional parts in different countries are different, and this leads to the fact that the same CSV generated by the application is “understood” in one Excel, but not in another. Because Microsoft Office is regional oriented windows settings and there could be anything. In Russia, it says that the separator is a comma,
  • If CSV is opened not through the "Data" menu, but directly, then Excel does not ask unnecessary questions, and does what it thinks is right. For example, a field with a value of 1.24 he understands by default as "January 24"
  • Excel kills leading zeros and casts even when the value is in quotes. You shouldn't do this, it's a mistake. But to get around this Excel problem, you can do a little "hack" - start the value with an "equal" sign, and then put in quotation marks what needs to be transferred without changing the format.
  • Excel has a special character "equals", which in CSV is considered as a formula identifier. That is, if =2+3 occurs in CSV, it will add two and three and write the result into a cell. By default, he shouldn't.

An example of a valid CSV that can be used for tests:

Surname, First Name, Address, City/State, Zip Code, just a string Ivanov, Ivan, Lenina 20, Moscow, 08075, "1/3" Tyler, John, 110 terrace, PA,20121, "1.24" "Petrov ""Kul" "", Petya, 120 Hambling St., NJ, 08075, "1.24" Smirnov, Vasya, "7452 Street "" Near the Square "" road", York, 91234, "3-01" , Misha, Leningrad, 00123, "03-01" "John" "Black Head", Claude, Rock, "", Miami Beach, 00111, "0000" Sergey,

Exactly the same SCSV:

Surname; Name; The address; City/state; index; just a string Ivanov; Ivan; Lenina 20; Moscow; 08075;"1/3" Tyler; John;110 terrace; PA; 20121;"1.24" "Petrov ""Kul"""; Petya;120 Hambling St.; NJ;08075;"1,24" Smirnov;Vasya;"7452 Street""Near the Square""road"; York; 91234; "3-01"; Misha;; Leningrad; 00123;"03-01" "John "Blackhead""; Claude";Rock;""; Miami Beach;00111; "0000" Sergey;;

The first file, which is actually COMMA-SEPARATED, being saved in .csv, is not perceived by Excel at all.

The second file, which, according to the SCSV logic, is perceived by Excel and comes out like this:

Excel errors on import:

  1. Taken into account the spaces surrounding the delimiters
  2. The last column was not really recognized at all, despite the fact that the data is in quotation marks. The exception is the line with "Petrov" - 1.24 was correctly recognized there.
  3. Excel omitted the leading zeros in the index field.
  4. in the rightmost field of the last line, spaces before quotes no longer indicate a special character

If you use the import functionality (Data -> From file) and call all fields text when importing, then the following picture will appear:

It worked with type casting, but now newlines are not processed normally and there is a problem with leading zeros, quotes and extra spaces. And it is extremely inconvenient for users to open CSV this way.

There is an effective way to force Excel not to cast when we do not need it. But it will be CSV "special for Excel". This is done by placing an "=" sign in front of the quotation marks wherever there is a potential problem with types. At the same time, we remove the extra spaces.

Surname;FirstName;Address;City/State;zip code;just the string Ivanov;Ivan;Lenina 20;Moscow;="08075";="1/3" Tyler; John;110 terrace;PA;="20121";="1.24" "Petrov ""Kul""";Petya;120 Hambling St.;NJ;="08075";="1,24" Smirnov;Vasya;" 7452 Street ""Near the Square"" road";York;="91234";="3-01" ;Misha;;Leningrad;="00123";="03-01" "John "Black Head"" ;Claude";Rock;"";Miami Beach;="00111";="0000" Sergey;;

And this is what happens if we open this file in Excel:

I summarize.

In order to generate a usable CSV, the user needs to be able to do following settings before export:

  1. choose encoding. As a rule, UTF-8, UTF-16, Windows-1251, KOI8-R are important. Most of the time, there are no other options. One of them should go by default. If the data contains characters that have no analogues in the target encoding, you need to warn the user that the data will be broken;
  2. select separator between fields. Options are tab, comma, semicolon. The default is a semicolon. Do not forget that if the separator is entered in the text, it will be very difficult to enter a tab there, it is also an unprintable character;
  3. select separator between lines(CRLF 0x0D 0x0A or CR 0x0D);
  4. select decimal separator for numeric data(dot or comma).
  5. choose whether to display the title bar;
  6. choose how to quote special characters(especially newlines and quotes). In principle, you can deviate from the standard and quote them as \n and \", but in this case you need to remember to quote \n yourself if they meet and remember to make this an option when exporting-importing. But compatibility will go down the drain, because any RFC-standard parser will consider the construction...,"abc\"",… as an error;
  7. absolutely ideal - check the box "for Excel" and take into account the non-standards that Microsoft has introduced. For example, to replace the values ​​of numeric fields "like a date" with the construction ="<значение поля>«.
  8. decide whether to leave a "tail" of empty delimiters if it is generated. For example, out of 20 fields, only the first contains data, and the rest are empty. As a result, in a line you can either put 19 separators after the first one, or not put them. For large amounts of data, this can save milliseconds of processing and reduce the file size.

To build a good and user-friendly CSV importer, you need to remember the following:

  1. parsing the file must be done by tokens according to the grammar above or use well-established ready-made libraries(Excel works differently, because there is a problem with import);
  2. allow the user to select an encoding(top 4 is enough);
  3. allow the user to select a separator between fields(comma, tab, semicolon is enough);
  4. allow the user to select a separator between lines, but in addition to the CR and CRLF options, "CR or CRLF" must be provided. This is due to the fact that, for example, Excel, when exporting a table with line breaks inside cells, exports these line breaks as CR, and the rest of the lines are separated by CRLF. At the same time, when importing a file, it does not matter to him whether CR is there or CRLF;
  5. allow the user to select a separator between integer and fractional parts(comma or period);
  6. decide on a parsing method- first we read everything into memory, then we process or process line by line. In the first case, more memory may be needed, in the second case, an error in the middle will cause only a partial import, which can cause problems. The first option is preferable.

Rauf Aliyev,
Deputy CTO of Mail.Ru Group

If your computer has antivirus program can scan all files on the computer, as well as each file individually. You can scan any file by clicking right click by clicking on the file and selecting the appropriate option to perform a virus scan on the file.

For example, in this figure, file my-file.csv, then you need to right-click on this file, and in the file menu select the option "scan with AVG". Selecting this option will open AVG Antivirus which will check given file for the presence of viruses.


Sometimes an error can result from incorrect installation software , which may be due to a problem that occurred during the installation process. It may interfere with your operating system link your CSV file to the correct application software tool , influencing the so-called "file extension associations".

Sometimes simple reinstall Microsoft Excel can solve your problem by properly linking CSV to Microsoft Excel. In other cases, file association problems may result from bad software programming developer, and you may need to contact the developer for further assistance.


Advice: Try updating Microsoft Excel to latest version to make sure the latest patches and updates are installed.


This may seem too obvious, but often the CSV file itself may be causing the problem. If you received a file via an attachment Email or downloaded it from a website and the download process was interrupted (such as a power outage or other reason), the file may be corrupted. If possible, try to get a new copy CSV file and try to open it again.


Carefully: A corrupted file may cause collateral damage to a previous or existing malware on your PC, so it is very important that you have an up-to-date antivirus running on your computer at all times.


If your CSV file associated with the hardware on your computer to open the file you may need update device drivers associated with this equipment.

This problem usually associated with media file types, which depend on the successful opening of the hardware inside the computer, for example, sound card or video cards. For example, if you are trying to open an audio file but cannot open it, you may need to update sound card drivers.


Advice: If when you try to open a CSV file you get .SYS file related error message, the problem could probably be associated with corrupted or outdated device drivers that need to be updated. This process can be facilitated by using driver update software such as DriverDoc.


If the steps didn't solve the problem and you are still having trouble opening CSV files, this might be due to lack of available system resources . Some versions of CSV files may require a significant amount of resources (eg. memory/RAM, processing power) to open properly on your computer. This problem occurs quite often if you are using a fairly old computer. Hardware and at the same time a much newer operating system.

This problem can occur when the computer is having a hard time completing a task because operating system(and other services running in the background) can consume too many resources to open CSV file. Try closing all applications on your PC before opening Comma Separated Values ​​File. By freeing up all available resources on your computer, you will provide the best conditions for trying to open the CSV file.


If you completed all the above steps and your CSV file still won't open, you may need to run hardware upgrade. In most cases, even with older hardware versions, the processing power can still be more than enough for most user applications (unless you're doing a lot of CPU-intensive work like 3D rendering, financial/science modeling, or media-intensive work) . Thus, it is likely that your computer does not have enough memory(more commonly referred to as "RAM", or RAM) to perform the file open task.

CSV (Comma Separated Data) files are a special type of file that can be created and edited in Excel. CSV files store data not in columns, but separated by commas. Text and numbers saved in a CSV file can be easily transferred from one program to another. For example, you can export contacts from Google to a CSV file and then import them into Outlook.

For information on how to import a list of calendar entries into Outlook, see Import and export email, contacts, and Outlook calendar.

Create a .csv file from another program or email service provider

When exporting contacts from another program, such as Gmail, you can usually choose from several formats. Gmail offers a choice of Google CSV file, Outlook CSV file, and vCard files. When you export data from an Outlook profile, you can select a .csv file or an Outlook data file (.pst) for later import into another profile.

Download and open a sample CSV file to import contacts into Outlook

You can create a CSV file manually in one of two ways.

Create an Excel file and save it as a CSV

If contact information is stored in a program from which it cannot be exported, you can enter it manually.

Download the CSV template

If you'd like to start with an empty CSV file, you can download the sample below.


There are a few things to keep in mind when working with this CSV file.

Editing the CSV file with contacts to be imported into Outlook

Let's say you want to edit a CSV file exported from Outlook and then import it again into that application or another email service. You can easily do this with Excel.

When modifying a CSV file, keep the following points in mind.

    The column headings must remain on the first row.

    When you save a file in Excel, you will be prompted several times like this: "Are you sure you want to save the file in CSV format?" Always choose the answer "Yes". If you select "No", the file will be saved in its own excel format(XLSX) and cannot be used to import into Outlook.

Issue: All data is displayed in the first column

This could happen for several reasons, so there are several solutions that you can try.


Ministry of Education and Science Russian Federation

State budgetary institution of higher professional education

Novosibirsk State Technical University

Department of SIT

Settlement and graphic work

by discipline

"Network Information Technologies"

Data formatcsv

Group: AVT-909

Completed by: Gogoli A.G.

Teacher:

Khairetdinov M.S.

Novosibirsk, 2013

Exercise. 3

1. Introduction. 4

2. General information. 5

3. Data structure in the file. 6

3.1. Entries. 6

3.2. Fields (columns) 6

3.2 Separators. 7

3.3 End-of-record marker. 7

4. Title recording. nine

5. Data field protection. nine

5.1 Double quotes for protection.. 9

5.2 Double double quotes. ten

6. Implementation example. eleven

7. Libraries for working with the format.. 12

8. Test program. thirteen

Literature. fourteen


Exercise

1. Learn and write an overview of the CSV format.

2. Write a review of free distributed libraries available on the net that implement reading / writing data in the specified format / description language or transferring data for the specified protocol or I / O interface and interfaced with modules in C / C ++.

3. Write a procedure for reading data in the specified format.

4. Write a procedure for writing data in the specified format.

5. Write a function for receiving/transmitting data using the specified protocol or I/O interface.

6. Compile test data sets to test all kinds of data elements used in the specified format.


1. Introduction

csv(from English. Comma Separated Values- comma-separated values) - a text format designed to represent tabular data. Each line of the file is one line of the table. Values ​​of individual columns are separated by a separator character (delimiter) -comma (,).

You can think of a CSV file as storing data from the producer application to being read by the consumer application. Their main function is to store textual data and are not intended for binary data.


2. General information

A CSV file consists of two types of data: payload and tokens. The payload is what is written in the producer application and read in the consumer application. Markers are used to organize payload data within a CSV file.

The following rules apply to all files csv:

1 The file extension must be *.csv, regardless of the type of markers.

This ensures that the file is correctly read along with the markers. Three common 8-bit encodings are 1252, ISO/IEC 8859-1, and UTF-8.

3 Other than markers, nothing should be written in ASCII encoding.

The CSV file is not designed to store binary data. This rule prohibits the use of most ASCII control characters.

4 The CSV file must contain at least one entry.

The CSV file must not be empty (zero length) or contain only a logical end. Minimal amount records that the CSV file should contain, this is one record - a header, followed by 0 or more data records.


3. Structure of the data in the file

3.1. Entries.

A record in a CSV file consists of two parts: delimited main data and an end-of-record marker. (fig.1)

Figure 1. The structure of records in a CSV file.

3.2. Fields (columns)

CSV is generally used to store homogeneous tabular data. When viewed in a table, the data inside the CSV is visually organized into multiple rows (records) and columns (fields). Hence the term column field.

With a fixed field length, the relative location of each field within a record must be fixed. However, CSV is a file format that allows variable length records. This saves significant space compared to fixed-length formats. To implement this approach, a payload separation marker is used, which indicates the transition from one field to another. The field separator is one character.

There is also a header entry. Therefore, it is very important that the fields in the record follow the given order.

Figure 2 shows where separator characters (SEP) are used in a record. The entire record can only consist of delimiters.

Figure 2. Record format in CSV file.

3.2 Separators

Although the format file name Comma Separated Values- values ​​separated by commas assumes a comma as a field separator, some applications use other characters.

The following rules apply for the delimiter:

1 The field separator must be one character.

2 After selecting a character, the same character must be used throughout the entire file.

3 The application manufacturer must use a comma (ASCII 0x2C) as the field separator.

This rule raises one of the most difficult issues developers face when implementing code to process a CSV file: delimiters embedded in payloads.


Top