Page 1 of 1

Equivalent of shlex.slpit of python in common-lisp

Posted: Tue Aug 31, 2010 3:16 am
by sh4r4d
Hi All,
Do anybody know the equivalent of
shlex.split from python in Common Lisp.

Basically I am in search of to split a string at white space, but ignore text in
quotes.

--
Regards
-sharad

Re: Equivalent of shlex.slpit of python in common-lisp

Posted: Tue Aug 31, 2010 3:06 pm
by gugamilare
That function does not come with standard Common Lisp. You may use split-sequence for that.

Re: Equivalent of shlex.slpit of python in common-lisp

Posted: Tue Aug 31, 2010 4:07 pm
by edgar-rft
I think the question was meant how can a string:

Code: Select all

"split this \"but do not split this\" in this string"
being splitted into a result like:

Code: Select all

("split" "this" "but do not split this" "in" "this" "string")
AFAIK this cannot be done with SPLIT-SEQUENCE, or does it and I don't know?
sh4r4d wrote:Basically I am in search of to split a string at white space, but ignore text in quotes.
Question: what do you want to split exactly? In Common Lisp, a string is always "text inside quotes", so your question to me sounds self-contradicting. A string that is ignored cannot be splitted. Was it meant like I had written in the example above or do you look for something different?

Re: Equivalent of shlex.slpit of python in common-lisp

Posted: Tue Aug 31, 2010 5:28 pm
by Tom
I'd solve this using META-SEXP.

Code: Select all

(defrule quoted? (&aux (quoted (make-char-accum))) ()
  (:* (:type white-space?))
  #\" (:+ (:not #\") (:char-push quoted)) #\"
  (:return quoted))

(defrule word? (&aux (word (make-char-accum))) ()
  (:* (:type (or white-space? newline?)))
  (:+ (:not (:type (or white-space? newline?)))
      (:char-push word))
  (:return word))

(defrule quoted-or-word? () ()
  (:rule (or quoted? word?)))

(defun shlex-split (string)
  "Return a list of words respecting escaped quotes."
  (loop with ctx = (create-parser-context string)
        for next = (quoted-or-word? ctx)
        while next collect next))
I'm starting to sound like a broken record.

Cheers,

~ Tom

Re: Equivalent of shlex.slpit of python in common-lisp

Posted: Wed Sep 01, 2010 1:17 am
by ramarren
I would use my parser-combinators library, but that's mostly because I wrote it and kind of like it, it is definitely less tested and optimized than META-SEXP

Code: Select all

(defun whitespace* ()
  (gather-if-not* (complement
                   (alexandria:rcurry #'member '(#\Space #\Newline #\Tab)))
                  :result-type 'string
                  :accept-end t
                  :accept-empty t))

(defun word* ()
  (gather-if-not* (complement #'alphanumericp) :result-type 'string :accept-end t :accept-empty nil))

(defun quoted* ()
  (named-seq* (<- c1 (context?))
              #\"
              ;; handle quotation marks escaped with backslash
              (many*
               (choice1 (gather-if-not* (alexandria:rcurry #'member '(#\" #\\))
                                       :result-type nil)
                        "\\\""))
              #\"
              (<- c2 (context?))
              (context-interval c1 c2)))

(defun quoted-or-word* ()
  (named-seq* (whitespace*)
              (<- datum (choice1 (quoted*) (word*)))
              datum))

(defun shlex-split* ()
  (named-seq* (<- data (many* (quoted-or-word*)))
              (whitespace*) ;discard terminating whitespace
              data))
It made me notice a couple of bugs too... I didn't use the library that much, especially the semi-optimized gather-if-not* function.

Re: Equivalent of shlex.slpit of python in common-lisp

Posted: Wed Sep 01, 2010 11:59 pm
by Warren Wilkinson
I actually wrote a blog post on writing parsers: http://formlis.wordpress.com/2010/07/07 ... t-regexps/.

This operation is non-trivial; there are three 3 states: Reading Text, Skipping Whitespace, and Handling Quotes. There are also three diferent character classes: Regular characters, Spaces, and Quotes. A State Machine is a matrix of functions, there will be one function per combination of state and character class. The string is processed one character at a time, the character class is determined by the input character, while the machine state is a variable that is changed throughout the computation.

This idea is embodied in the code I've provided. The only trick is that I've seperated the "Action to Run" from the "State to Transition To".

Code: Select all

(defun char-class (char) (if (char-equal char #\Space) 0 (if (char-equal char #\") 1 2)))

(defconstant +white-mode+ 0)
(defconstant +read-mode+ 1)
(defconstant +quote-mode+ 2)

(defvar *collected*)
(defun skip (pos) (declare (ignore pos)))
(defun collect (pos) (declare (ignore pos)))
(defun startq (pos) (push (list (1+ pos) (1+ pos)) *collected*))
(defun startw (pos) (push (list pos pos) *collected*))
(defun finish (pos) (setf (second (car *collected*)) pos))
(defun reopen (pos)
  (setf (second (car *collected*)) pos)
  (push (list (1+ pos) (1+ pos)) *collected*))

(defvar *sm* (make-array 18 :initial-contents
     ;;        SPACE                     QUOTE                   OTHER
   (list #'skip    +white-mode+    #'startq +quote-mode+    #'startw  +read-mode+    ;; WHITEMODE
	 #'finish  +white-mode+    #'reopen +quote-mode+    #'collect +read-mode+    ;; READMODE
	 #'collect +quote-mode+    #'finish +white-mode+    #'collect +quote-mode+)));; QUOTEMODE

(defun shlex-split (string)
  (setf *collected* nil)
  (loop for i upfrom 0
        for c across string
        with state = +white-mode+
        do (let ((offset (+ (* state 6) (* 2 (char-class c)))))
	     (funcall (svref *sm* offset) i)
	     (setf state (svref *sm* (1+ offset))))
        finally (unless (= state +white-mode+) (finish (length string))))
  (mapcar #'(lambda (a) (apply #'subseq string a)) (nreverse *collected*)))

(shlex-split "independent single\"compound word\" alone selfish friendless \"buddy words\"")
;; Results: ("independent" "single" "compound word" "alone" "selfish" "friendless" "buddy words")
This may not be easily read, but the operation you have described is complex. My blog entry, and its link to the original paper of this technique in Forth, may help you understand how this parser works, as well as teach you an important programming technique.