Page 1 of 1
Equivalent of shlex.slpit of python in common-lisp
Posted: Tue Aug 31, 2010 3:16 am
by sh4r4d
Hi All,
Do anybody know the equivalent of
shlex.split from python in Common Lisp.
Basically I am in search of to split a string at white space, but ignore text in
quotes.
--
Regards
-sharad
Re: Equivalent of shlex.slpit of python in common-lisp
Posted: Tue Aug 31, 2010 3:06 pm
by gugamilare
That function does not come with standard Common Lisp. You may use
split-sequence for that.
Re: Equivalent of shlex.slpit of python in common-lisp
Posted: Tue Aug 31, 2010 4:07 pm
by edgar-rft
I think the question was meant how can a string:
Code: Select all
"split this \"but do not split this\" in this string"
being splitted into a result like:
Code: Select all
("split" "this" "but do not split this" "in" "this" "string")
AFAIK this cannot be done with SPLIT-SEQUENCE, or does it and I don't know?
sh4r4d wrote:Basically I am in search of to split a string at white space, but ignore text in quotes.
Question: what do you want to split exactly? In Common Lisp, a string is always "text inside quotes", so your question to me sounds self-contradicting. A string that is ignored cannot be splitted. Was it meant like I had written in the example above or do you look for something different?
Re: Equivalent of shlex.slpit of python in common-lisp
Posted: Tue Aug 31, 2010 5:28 pm
by Tom
I'd solve this using
META-SEXP.
Code: Select all
(defrule quoted? (&aux (quoted (make-char-accum))) ()
(:* (:type white-space?))
#\" (:+ (:not #\") (:char-push quoted)) #\"
(:return quoted))
(defrule word? (&aux (word (make-char-accum))) ()
(:* (:type (or white-space? newline?)))
(:+ (:not (:type (or white-space? newline?)))
(:char-push word))
(:return word))
(defrule quoted-or-word? () ()
(:rule (or quoted? word?)))
(defun shlex-split (string)
"Return a list of words respecting escaped quotes."
(loop with ctx = (create-parser-context string)
for next = (quoted-or-word? ctx)
while next collect next))
I'm starting to sound like a broken record.
Cheers,
~ Tom
Re: Equivalent of shlex.slpit of python in common-lisp
Posted: Wed Sep 01, 2010 1:17 am
by ramarren
I would use my
parser-combinators library, but that's mostly because I wrote it and kind of like it, it is definitely less tested and optimized than META-SEXP
Code: Select all
(defun whitespace* ()
(gather-if-not* (complement
(alexandria:rcurry #'member '(#\Space #\Newline #\Tab)))
:result-type 'string
:accept-end t
:accept-empty t))
(defun word* ()
(gather-if-not* (complement #'alphanumericp) :result-type 'string :accept-end t :accept-empty nil))
(defun quoted* ()
(named-seq* (<- c1 (context?))
#\"
;; handle quotation marks escaped with backslash
(many*
(choice1 (gather-if-not* (alexandria:rcurry #'member '(#\" #\\))
:result-type nil)
"\\\""))
#\"
(<- c2 (context?))
(context-interval c1 c2)))
(defun quoted-or-word* ()
(named-seq* (whitespace*)
(<- datum (choice1 (quoted*) (word*)))
datum))
(defun shlex-split* ()
(named-seq* (<- data (many* (quoted-or-word*)))
(whitespace*) ;discard terminating whitespace
data))
It made me notice a couple of bugs too... I didn't use the library that much, especially the semi-optimized gather-if-not* function.
Re: Equivalent of shlex.slpit of python in common-lisp
Posted: Wed Sep 01, 2010 11:59 pm
by Warren Wilkinson
I actually wrote a blog post on writing parsers:
http://formlis.wordpress.com/2010/07/07 ... t-regexps/.
This operation is non-trivial; there are three 3 states: Reading Text, Skipping Whitespace, and Handling Quotes. There are also three diferent character classes: Regular characters, Spaces, and Quotes. A State Machine is a matrix of functions, there will be one function per combination of state and character class. The string is processed one character at a time, the character class is determined by the input character, while the machine state is a variable that is changed throughout the computation.
This idea is embodied in the code I've provided. The only trick is that I've seperated the "Action to Run" from the "State to Transition To".
Code: Select all
(defun char-class (char) (if (char-equal char #\Space) 0 (if (char-equal char #\") 1 2)))
(defconstant +white-mode+ 0)
(defconstant +read-mode+ 1)
(defconstant +quote-mode+ 2)
(defvar *collected*)
(defun skip (pos) (declare (ignore pos)))
(defun collect (pos) (declare (ignore pos)))
(defun startq (pos) (push (list (1+ pos) (1+ pos)) *collected*))
(defun startw (pos) (push (list pos pos) *collected*))
(defun finish (pos) (setf (second (car *collected*)) pos))
(defun reopen (pos)
(setf (second (car *collected*)) pos)
(push (list (1+ pos) (1+ pos)) *collected*))
(defvar *sm* (make-array 18 :initial-contents
;; SPACE QUOTE OTHER
(list #'skip +white-mode+ #'startq +quote-mode+ #'startw +read-mode+ ;; WHITEMODE
#'finish +white-mode+ #'reopen +quote-mode+ #'collect +read-mode+ ;; READMODE
#'collect +quote-mode+ #'finish +white-mode+ #'collect +quote-mode+)));; QUOTEMODE
(defun shlex-split (string)
(setf *collected* nil)
(loop for i upfrom 0
for c across string
with state = +white-mode+
do (let ((offset (+ (* state 6) (* 2 (char-class c)))))
(funcall (svref *sm* offset) i)
(setf state (svref *sm* (1+ offset))))
finally (unless (= state +white-mode+) (finish (length string))))
(mapcar #'(lambda (a) (apply #'subseq string a)) (nreverse *collected*)))
(shlex-split "independent single\"compound word\" alone selfish friendless \"buddy words\"")
;; Results: ("independent" "single" "compound word" "alone" "selfish" "friendless" "buddy words")
This may not be easily read, but the operation you have described is complex. My blog entry, and its link to the original paper of this technique in Forth, may help you understand how this parser works, as well as teach you an important programming technique.