Reading formatted data

Discussion of Common Lisp
muuh-gnu
Posts: 2
Joined: Sat Nov 28, 2009 7:26 am

Reading formatted data

Post by muuh-gnu » Tue Jun 21, 2011 2:42 pm

Hello Lispforum,

what is the closest Lisp equivalent for reading formatted data? (Like C's scanf and Fortran's read).

I have a file with measurement data (doubles) saved in columns, and I want to read those columns
into arrays for further processing.

What ist the easiest way to do this?

I can read the file line by line by "read-line", but this gives me a string og the whole line, which I
cant use without further processing.

"read" requires lisp input, so I cant use it for a sequence of doubles.

I a wondering that processing formatted measured data does seem to be something that is not being
done in Lisp at all, so no standard routines for this exist.

Any suggestions? Thanks.

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Reading formatted data

Post by ramarren » Tue Jun 21, 2011 10:07 pm

muuh-gnu wrote:I a wondering that processing formatted measured data does seem to be something that is not being
done in Lisp at all, so no standard routines for this exist.
Common Lisp standard has been created by an ANSI committee in a rather time consuming and expensive process, so the topics it covers are necessarily limited. The formatted input the standard covers are the s-expressions. Other input formats can be read using libraries. Just because something is not covered by the standard doesn't mean that it is not done any more than the fact that ANSI-C doesn't describe sockets means that no networking is done in C.

In the case of numeric column data this is trivially achieved using READ-LINE combined with PARSE-NUMBER and SPLIT-SEQUENCE, both of which are available trough Quicklisp. For more complicated input formats you would use either regular expressions with CL-PPCRE, or a parser generator like CL-YACC or PARSER-COMBINATORS or the like.

JamesF
Posts: 98
Joined: Thu Jul 10, 2008 7:14 pm

Re: Reading formatted data

Post by JamesF » Tue Jun 21, 2011 11:04 pm

You're correct that there's no "standard" system or library for doing this in Common Lisp, which suggests that it's not a problem to which the language is often applied.

I'm afraid that the best way I can suggest is to use read-line, then split the lines into their columns with something like cl-ppcre:split, and then insert each value into their respective array. That's pretty much what a library function would do anyway - either that, or it would employ a parser to read the file character-by-character, which you'd probably only want to do after the first approach exhibited serious performance problems.

This page may also prove useful for parsing numbers from strings.

marcoxa
Posts: 85
Joined: Thu Aug 14, 2008 6:31 pm

Re: Reading formatted data

Post by marcoxa » Wed Jun 22, 2011 12:11 am

What's wrong with (untested):

Code: Select all

(defun read-tabulated-numbers-from-file (p)
  (with-open-file (f p :direction :input)
    (loop for line = (read-line f nil nil)
          while line
          collect (parse-line line))))


(defun parse-line (line)
  (loop for n = (read-from-string line nil nil)
        while n
        collect n into result
        finally (return (coerce result 'vector))))
This will give you a list of vectors (it is not necessary to do so: it could be a list) organized in rows. Your columns are one transpose away.

Cheers
--
MA
Marco Antoniotti

muuh-gnu
Posts: 2
Joined: Sat Nov 28, 2009 7:26 am

Re: Reading formatted data

Post by muuh-gnu » Fri Jun 24, 2011 8:49 am

Thanks everybody.

I hoped there would be some "easy" way to read in numeric data files.

I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.

Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.

Paul
Posts: 106
Joined: Tue Jun 02, 2009 6:00 am

Re: Reading formatted data

Post by Paul » Fri Jun 24, 2011 3:50 pm

muuh-gnu wrote:Thanks everybody.

I hoped there would be some "easy" way to read in numeric data files.

I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.

Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
I don't understand what you're asking for. In what way is READ not what you want?

marcoxa
Posts: 85
Joined: Thu Aug 14, 2008 6:31 pm

Re: Reading formatted data

Post by marcoxa » Sat Jun 25, 2011 1:42 am

muuh-gnu wrote:Thanks everybody.

I hoped there would be some "easy" way to read in numeric data files.

I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.

Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
Well; you said "here is the file I have" and you got the answer: use READ (or READ-FROM-STRING). If you want something "like in Python" you should point to the Python functionalities you are using and then they can either be reproduced, or, more likely, found out there...

Cheers
Marco Antoniotti

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: Reading formatted data

Post by edgar-rft » Sat Jun 25, 2011 1:49 am

muuh-gnu wrote:Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners point of view, or there should be a "batteries included" implementation delivering all those "standard" functionalities out of the box like python does.
Reading, identifying and interpreting floating-point numbers is not a "simple task" [see the examples below], and I'm in severe doubt that Python can do this, at least not with the "batteries included" libraries.

The main "problem" in this case is that Common Lisp is defined as a hardware independent programming language and the data type "double" is strongly hardware dependent (and software dependent, too).

So the question must be: What is wrong with "doubles"?

The trouble begins with the fact that there are several specifications for data types called "double":
  • A "double" can be an 16-bit (double byte) or 32-bit (double 16-bit register) or 64-bit (double 32-bit register) integer. All these formats have been used in the past on various hardware platforms. Probably somewhere in the future there will be double 64-bit, double 128-bit, and so on integers.
  • The IEEE_754 defines a floating-point data type named "double" that is from the math point of view not even a number, instead it is an inexact approximation of a number because of several hardware and software limitations.
Summary: the data type "double" virtually tells nothing about the data. It neither tells if the number is an integer or a floating-point number, nor tells it if the data is a number at all. The IEEE_754 defines e.g. "NaN" as "not a number".

For the further discussion I assume that we are talking about IEEE_754 "double" floating-point numbers.

The problem with IEEE_754 floating-point numbers is that only the memory format is defined but not the exaxt printed representation, what has to the consequence that there exist several locale dependent print-formats of IEEE_754 floating-point numbers:

100.0 = "one hundred dot zero" in english/american notation
100,0 = "one hundred comma zero" often used e.g. in Europe

In America the comma is often used as a "thousands" separator, while in Europe the dot is used for this purpose. This leads to an even worse "one million" mess:

1,000,000.0 = "one million dot zero" using commas as "thousands" separator
1.000.000,0 = "one million comma zero" using dots as "thousands" separator

And there are also various "e" formats for "standard", "scientific" and whatever exponentiation.

Now the contest: Whoever can tell me a library in any programming language that can reliably read, identify and interpret all the myriads of "double" printed integer and floating-point formats wins a big piece of pie.

I myself don't know a single library in any programming language that can do this.

The "rft" in "edgar-rft" is the german abbreviation of "Radio/Television Broadcast Technician". My job is to work with all sort of hardware and software measurement equipment every day and I can tell you that printed measurement data in floating-point "double" format can be considered as completely useless as long as nothing is known about the exact print-format of the "doubles" and the floating-point errors of the machine that has produced the data.

To me the question is: what do you want to achive with measurement data represented by an inreliable print-format?

- edgar

P.S: I will try to write a Common Lisp function to read your data into an array if you can tell more about the "doubles" print-format.

Paul
Posts: 106
Joined: Tue Jun 02, 2009 6:00 am

Re: Reading formatted data

Post by Paul » Sat Jun 25, 2011 2:38 am

edgar-rft wrote:
muuh-gnu wrote:Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners point of view, or there should be a "batteries included" implementation delivering all those "standard" functionalities out of the box like python does.
Reading, identifying and interpreting floating-point numbers is not a "simple task" [see the examples below], and I'm in severe doubt that Python can do this, at least not with the "batteries included" libraries.

The main "problem" in this case is that Common Lisp is defined as a hardware independent programming language and the data type "double" is strongly hardware dependent (and software dependent, too).
The OP said scanf(3) can do it, so he's talking about ASCII decimal representations, not binary format. In which case, READ will read them.

  • The IEEE_754 defines a floating-point data type named "double" that is from the math point of view not even a number, instead it is an inexact approximation of a number because of several hardware and software limitations.
Can you give an example of an "inexact approximation of a number"?

marcoxa
Posts: 85
Joined: Thu Aug 14, 2008 6:31 pm

Re: Reading formatted data

Post by marcoxa » Sat Jun 25, 2011 2:56 am

Guys... we (should) all have read Goldberg's paper, but that has little import to the OP question... :ugeek:

Cheers
Marco Antoniotti

Post Reply