Reading formatted data
Reading formatted data
Hello Lispforum,
what is the closest Lisp equivalent for reading formatted data? (Like C's scanf and Fortran's read).
I have a file with measurement data (doubles) saved in columns, and I want to read those columns
into arrays for further processing.
What ist the easiest way to do this?
I can read the file line by line by "read-line", but this gives me a string og the whole line, which I
cant use without further processing.
"read" requires lisp input, so I cant use it for a sequence of doubles.
I a wondering that processing formatted measured data does seem to be something that is not being
done in Lisp at all, so no standard routines for this exist.
Any suggestions? Thanks.
what is the closest Lisp equivalent for reading formatted data? (Like C's scanf and Fortran's read).
I have a file with measurement data (doubles) saved in columns, and I want to read those columns
into arrays for further processing.
What ist the easiest way to do this?
I can read the file line by line by "read-line", but this gives me a string og the whole line, which I
cant use without further processing.
"read" requires lisp input, so I cant use it for a sequence of doubles.
I a wondering that processing formatted measured data does seem to be something that is not being
done in Lisp at all, so no standard routines for this exist.
Any suggestions? Thanks.
Re: Reading formatted data
Common Lisp standard has been created by an ANSI committee in a rather time consuming and expensive process, so the topics it covers are necessarily limited. The formatted input the standard covers are the s-expressions. Other input formats can be read using libraries. Just because something is not covered by the standard doesn't mean that it is not done any more than the fact that ANSI-C doesn't describe sockets means that no networking is done in C.muuh-gnu wrote:I a wondering that processing formatted measured data does seem to be something that is not being
done in Lisp at all, so no standard routines for this exist.
In the case of numeric column data this is trivially achieved using READ-LINE combined with PARSE-NUMBER and SPLIT-SEQUENCE, both of which are available trough Quicklisp. For more complicated input formats you would use either regular expressions with CL-PPCRE, or a parser generator like CL-YACC or PARSER-COMBINATORS or the like.
Re: Reading formatted data
You're correct that there's no "standard" system or library for doing this in Common Lisp, which suggests that it's not a problem to which the language is often applied.
I'm afraid that the best way I can suggest is to use read-line, then split the lines into their columns with something like cl-ppcre:split, and then insert each value into their respective array. That's pretty much what a library function would do anyway - either that, or it would employ a parser to read the file character-by-character, which you'd probably only want to do after the first approach exhibited serious performance problems.
This page may also prove useful for parsing numbers from strings.
I'm afraid that the best way I can suggest is to use read-line, then split the lines into their columns with something like cl-ppcre:split, and then insert each value into their respective array. That's pretty much what a library function would do anyway - either that, or it would employ a parser to read the file character-by-character, which you'd probably only want to do after the first approach exhibited serious performance problems.
This page may also prove useful for parsing numbers from strings.
Re: Reading formatted data
What's wrong with (untested):
This will give you a list of vectors (it is not necessary to do so: it could be a list) organized in rows. Your columns are one transpose away.
Cheers
--
MA
Code: Select all
(defun read-tabulated-numbers-from-file (p)
(with-open-file (f p :direction :input)
(loop for line = (read-line f nil nil)
while line
collect (parse-line line))))
(defun parse-line (line)
(loop for n = (read-from-string line nil nil)
while n
collect n into result
finally (return (coerce result 'vector))))
Cheers
--
MA
Marco Antoniotti
Re: Reading formatted data
Thanks everybody.
I hoped there would be some "easy" way to read in numeric data files.
I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.
Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
I hoped there would be some "easy" way to read in numeric data files.
I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.
Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
Re: Reading formatted data
I don't understand what you're asking for. In what way is READ not what you want?muuh-gnu wrote:Thanks everybody.
I hoped there would be some "easy" way to read in numeric data files.
I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.
Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
Re: Reading formatted data
Well; you said "here is the file I have" and you got the answer: use READ (or READ-FROM-STRING). If you want something "like in Python" you should point to the Python functionalities you are using and then they can either be reproduced, or, more likely, found out there...muuh-gnu wrote:Thanks everybody.
I hoped there would be some "easy" way to read in numeric data files.
I am not able to use quicklisp behind a proxy and I dont know how to install
all those libraries manually, so i'll give it up for now.
Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners
point of view, or there should be a "batteries included" implementation delivering
all those "standard" functionalities out of the box like python does.
Cheers
Marco Antoniotti
Re: Reading formatted data
Reading, identifying and interpreting floating-point numbers is not a "simple task" [see the examples below], and I'm in severe doubt that Python can do this, at least not with the "batteries included" libraries.muuh-gnu wrote:Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners point of view, or there should be a "batteries included" implementation delivering all those "standard" functionalities out of the box like python does.
The main "problem" in this case is that Common Lisp is defined as a hardware independent programming language and the data type "double" is strongly hardware dependent (and software dependent, too).
So the question must be: What is wrong with "doubles"?
The trouble begins with the fact that there are several specifications for data types called "double":
- A "double" can be an 16-bit (double byte) or 32-bit (double 16-bit register) or 64-bit (double 32-bit register) integer. All these formats have been used in the past on various hardware platforms. Probably somewhere in the future there will be double 64-bit, double 128-bit, and so on integers.
- The IEEE_754 defines a floating-point data type named "double" that is from the math point of view not even a number, instead it is an inexact approximation of a number because of several hardware and software limitations.
For the further discussion I assume that we are talking about IEEE_754 "double" floating-point numbers.
The problem with IEEE_754 floating-point numbers is that only the memory format is defined but not the exaxt printed representation, what has to the consequence that there exist several locale dependent print-formats of IEEE_754 floating-point numbers:
100.0 = "one hundred dot zero" in english/american notation
100,0 = "one hundred comma zero" often used e.g. in Europe
In America the comma is often used as a "thousands" separator, while in Europe the dot is used for this purpose. This leads to an even worse "one million" mess:
1,000,000.0 = "one million dot zero" using commas as "thousands" separator
1.000.000,0 = "one million comma zero" using dots as "thousands" separator
And there are also various "e" formats for "standard", "scientific" and whatever exponentiation.
Now the contest: Whoever can tell me a library in any programming language that can reliably read, identify and interpret all the myriads of "double" printed integer and floating-point formats wins a big piece of pie.
I myself don't know a single library in any programming language that can do this.
The "rft" in "edgar-rft" is the german abbreviation of "Radio/Television Broadcast Technician". My job is to work with all sort of hardware and software measurement equipment every day and I can tell you that printed measurement data in floating-point "double" format can be considered as completely useless as long as nothing is known about the exact print-format of the "doubles" and the floating-point errors of the machine that has produced the data.
To me the question is: what do you want to achive with measurement data represented by an inreliable print-format?
- edgar
P.S: I will try to write a Common Lisp function to read your data into an array if you can tell more about the "doubles" print-format.
Re: Reading formatted data
The OP said scanf(3) can do it, so he's talking about ASCII decimal representations, not binary format. In which case, READ will read them.edgar-rft wrote:Reading, identifying and interpreting floating-point numbers is not a "simple task" [see the examples below], and I'm in severe doubt that Python can do this, at least not with the "batteries included" libraries.muuh-gnu wrote:Such simple tasks shouldnt be so non-obvious to do in Lisp, from a beginners point of view, or there should be a "batteries included" implementation delivering all those "standard" functionalities out of the box like python does.
The main "problem" in this case is that Common Lisp is defined as a hardware independent programming language and the data type "double" is strongly hardware dependent (and software dependent, too).
Can you give an example of an "inexact approximation of a number"?
- The IEEE_754 defines a floating-point data type named "double" that is from the math point of view not even a number, instead it is an inexact approximation of a number because of several hardware and software limitations.
Re: Reading formatted data
Guys... we (should) all have read Goldberg's paper, but that has little import to the OP question...
Cheers
Cheers
Marco Antoniotti