Page 1 of 2

Reading formatted data

Posted: Tue Jun 21, 2011 2:42 pm
by muuh-gnu
Hello Lispforum,

what is the closest Lisp equivalent for reading formatted data? (Like C's scanf and Fortran's read).

I have a file with measurement data (doubles) saved in columns, and I want to read those columns
into arrays for further processing.

What ist the easiest way to do this?

I can read the file line by line by "read-line", but this gives me a string og the whole line, which I
cant use without further processing.

"read" requires lisp input, so I cant use it for a sequence of doubles.

I a wondering that processing formatted measured data does seem to be something that is not being
done in Lisp at all, so no standard routines for this exist.

Any suggestions? Thanks.

Re: Reading formatted data

Posted: Tue Jun 21, 2011 10:07 pm
by ramarren
muuh-gnu wrote:I a wondering that processing formatted measured data does seem to be something that is not being
done in Lisp at all, so no standard routines for this exist.
Common Lisp standard has been created by an ANSI committee in a rather time consuming and expensive process, so the topics it covers are necessarily limited. The formatted input the standard covers are the s-expressions. Other input formats can be read using libraries. Just because something is not covered by the standard doesn't mean that it is not done any more than the fact that ANSI-C doesn't describe sockets means that no networking is done in C.

In the case of numeric column data this is trivially achieved using READ-LINE combined with PARSE-NUMBER and SPLIT-SEQUENCE, both of which are available trough Quicklisp. For more complicated input formats you would use either regular expressions with CL-PPCRE, or a parser generator like CL-YACC or PARSER-COMBINATORS or the like.

Re: Reading formatted data

Posted: Tue Jun 21, 2011 11:04 pm
by JamesF
You're correct that there's no "standard" system or library for doing this in Common Lisp, which suggests that it's not a problem to which the language is often applied.

I'm afraid that the best way I can suggest is to use read-line, then split the lines into their columns with something like cl-ppcre:split, and then insert each value into their respective array. That's pretty much what a library function would do anyway - either that, or it would employ a parser to read the file character-by-character, which you'd probably only want to do after the first approach exhibited serious performance problems.

This page may also prove useful for parsing numbers from strings.

Re: Reading formatted data

Posted: Wed Jun 22, 2011 12:11 am
by marcoxa
What's wrong with (untested):

Code: Select all

(defun read-tabulated-numbers-from-file (p)
  (with-open-file (f p :direction :input)
    (loop for line = (read-line f nil nil)
          while line
          collect (parse-line line))))


(defun parse-line (line)
  (loop for n = (read-from-string line nil nil)
        while n
        collect n into result
        finally (return (coerce result 'vector))))
This will give you a list of vectors (it is not necessary to do so: it could be a list) organized in rows. Your columns are one transpose away.

Cheers
--
MA

Re: Reading formatted data

Posted: Fri Jun 24, 2011 8:49 am
by muuh-gnu
Thanks everybody.

I hoped there would be some "easy" way to read in numeric data files.

I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.

Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.

Re: Reading formatted data

Posted: Fri Jun 24, 2011 3:50 pm
by Paul
muuh-gnu wrote:Thanks everybody.

I hoped there would be some "easy" way to read in numeric data files.

I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.

Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
I don't understand what you're asking for. In what way is READ not what you want?

Re: Reading formatted data

Posted: Sat Jun 25, 2011 1:42 am
by marcoxa
muuh-gnu wrote:Thanks everybody.

I hoped there would be some "easy" way to read in numeric data files.

I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.

Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
Well; you said "here is the file I have" and you got the answer: use READ (or READ-FROM-STRING). If you want something "like in Python" you should point to the Python functionalities you are using and then they can either be reproduced, or, more likely, found out there...

Cheers

Re: Reading formatted data

Posted: Sat Jun 25, 2011 1:49 am
by edgar-rft
muuh-gnu wrote:Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners point of view, or there should be a "batteries included" implementation delivering all those "standard" functionalities out of the box like python does.
Reading, identifying and interpreting floating-point numbers is not a "simple task" [see the examples below], and I'm in severe doubt that Python can do this, at least not with the "batteries included" libraries.

The main "problem" in this case is that Common Lisp is defined as a hardware independent programming language and the data type "double" is strongly hardware dependent (and software dependent, too).

So the question must be: What is wrong with "doubles"?

The trouble begins with the fact that there are several specifications for data types called "double":
  • A "double" can be an 16-bit (double byte) or 32-bit (double 16-bit register) or 64-bit (double 32-bit register) integer. All these formats have been used in the past on various hardware platforms. Probably somewhere in the future there will be double 64-bit, double 128-bit, and so on integers.
  • The IEEE_754 defines a floating-point data type named "double" that is from the math point of view not even a number, instead it is an inexact approximation of a number because of several hardware and software limitations.
Summary: the data type "double" virtually tells nothing about the data. It neither tells if the number is an integer or a floating-point number, nor tells it if the data is a number at all. The IEEE_754 defines e.g. "NaN" as "not a number".

For the further discussion I assume that we are talking about IEEE_754 "double" floating-point numbers.

The problem with IEEE_754 floating-point numbers is that only the memory format is defined but not the exaxt printed representation, what has to the consequence that there exist several locale dependent print-formats of IEEE_754 floating-point numbers:

100.0 = "one hundred dot zero" in english/american notation
100,0 = "one hundred comma zero" often used e.g. in Europe

In America the comma is often used as a "thousands" separator, while in Europe the dot is used for this purpose. This leads to an even worse "one million" mess:

1,000,000.0 = "one million dot zero" using commas as "thousands" separator
1.000.000,0 = "one million comma zero" using dots as "thousands" separator

And there are also various "e" formats for "standard", "scientific" and whatever exponentiation.

Now the contest: Whoever can tell me a library in any programming language that can reliably read, identify and interpret all the myriads of "double" printed integer and floating-point formats wins a big piece of pie.

I myself don't know a single library in any programming language that can do this.

The "rft" in "edgar-rft" is the german abbreviation of "Radio/Television Broadcast Technician". My job is to work with all sort of hardware and software measurement equipment every day and I can tell you that printed measurement data in floating-point "double" format can be considered as completely useless as long as nothing is known about the exact print-format of the "doubles" and the floating-point errors of the machine that has produced the data.

To me the question is: what do you want to achive with measurement data represented by an inreliable print-format?

- edgar

P.S: I will try to write a Common Lisp function to read your data into an array if you can tell more about the "doubles" print-format.

Re: Reading formatted data

Posted: Sat Jun 25, 2011 2:38 am
by Paul
edgar-rft wrote:
muuh-gnu wrote:Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners point of view, or there should be a "batteries included" implementation delivering all those "standard" functionalities out of the box like python does.
Reading, identifying and interpreting floating-point numbers is not a "simple task" [see the examples below], and I'm in severe doubt that Python can do this, at least not with the "batteries included" libraries.

The main "problem" in this case is that Common Lisp is defined as a hardware independent programming language and the data type "double" is strongly hardware dependent (and software dependent, too).
The OP said scanf(3) can do it, so he's talking about ASCII decimal representations, not binary format. In which case, READ will read them.

  • The IEEE_754 defines a floating-point data type named "double" that is from the math point of view not even a number, instead it is an inexact approximation of a number because of several hardware and software limitations.
Can you give an example of an "inexact approximation of a number"?

Re: Reading formatted data

Posted: Sat Jun 25, 2011 2:56 am
by marcoxa
Guys... we (should) all have read Goldberg's paper, but that has little import to the OP question... :ugeek:

Cheers