Equivalent of shlex.slpit of python in common-lisp

Discussion of Common Lisp
Post Reply
sh4r4d
Posts: 1
Joined: Tue Aug 31, 2010 3:09 am

Equivalent of shlex.slpit of python in common-lisp

Post by sh4r4d » Tue Aug 31, 2010 3:16 am

Hi All,
Do anybody know the equivalent of
shlex.split from python in Common Lisp.

Basically I am in search of to split a string at white space, but ignore text in
quotes.

--
Regards
-sharad

gugamilare
Posts: 406
Joined: Sat Mar 07, 2009 6:17 pm
Location: Brazil
Contact:

Re: Equivalent of shlex.slpit of python in common-lisp

Post by gugamilare » Tue Aug 31, 2010 3:06 pm

That function does not come with standard Common Lisp. You may use split-sequence for that.

edgar-rft
Posts: 226
Joined: Fri Aug 06, 2010 6:34 am
Location: Germany

Re: Equivalent of shlex.slpit of python in common-lisp

Post by edgar-rft » Tue Aug 31, 2010 4:07 pm

I think the question was meant how can a string:

Code: Select all

"split this \"but do not split this\" in this string"
being splitted into a result like:

Code: Select all

("split" "this" "but do not split this" "in" "this" "string")
AFAIK this cannot be done with SPLIT-SEQUENCE, or does it and I don't know?
sh4r4d wrote:Basically I am in search of to split a string at white space, but ignore text in quotes.
Question: what do you want to split exactly? In Common Lisp, a string is always "text inside quotes", so your question to me sounds self-contradicting. A string that is ignored cannot be splitted. Was it meant like I had written in the example above or do you look for something different?

Tom
Posts: 22
Joined: Sat Jun 28, 2008 12:52 pm
Location: Wichita, KS
Contact:

Re: Equivalent of shlex.slpit of python in common-lisp

Post by Tom » Tue Aug 31, 2010 5:28 pm

I'd solve this using META-SEXP.

Code: Select all

(defrule quoted? (&aux (quoted (make-char-accum))) ()
  (:* (:type white-space?))
  #\" (:+ (:not #\") (:char-push quoted)) #\"
  (:return quoted))

(defrule word? (&aux (word (make-char-accum))) ()
  (:* (:type (or white-space? newline?)))
  (:+ (:not (:type (or white-space? newline?)))
      (:char-push word))
  (:return word))

(defrule quoted-or-word? () ()
  (:rule (or quoted? word?)))

(defun shlex-split (string)
  "Return a list of words respecting escaped quotes."
  (loop with ctx = (create-parser-context string)
        for next = (quoted-or-word? ctx)
        while next collect next))
I'm starting to sound like a broken record.

Cheers,

~ Tom

ramarren
Posts: 613
Joined: Sun Jun 29, 2008 4:02 am
Location: Warsaw, Poland
Contact:

Re: Equivalent of shlex.slpit of python in common-lisp

Post by ramarren » Wed Sep 01, 2010 1:17 am

I would use my parser-combinators library, but that's mostly because I wrote it and kind of like it, it is definitely less tested and optimized than META-SEXP

Code: Select all

(defun whitespace* ()
  (gather-if-not* (complement
                   (alexandria:rcurry #'member '(#\Space #\Newline #\Tab)))
                  :result-type 'string
                  :accept-end t
                  :accept-empty t))

(defun word* ()
  (gather-if-not* (complement #'alphanumericp) :result-type 'string :accept-end t :accept-empty nil))

(defun quoted* ()
  (named-seq* (<- c1 (context?))
              #\"
              ;; handle quotation marks escaped with backslash
              (many*
               (choice1 (gather-if-not* (alexandria:rcurry #'member '(#\" #\\))
                                       :result-type nil)
                        "\\\""))
              #\"
              (<- c2 (context?))
              (context-interval c1 c2)))

(defun quoted-or-word* ()
  (named-seq* (whitespace*)
              (<- datum (choice1 (quoted*) (word*)))
              datum))

(defun shlex-split* ()
  (named-seq* (<- data (many* (quoted-or-word*)))
              (whitespace*) ;discard terminating whitespace
              data))
It made me notice a couple of bugs too... I didn't use the library that much, especially the semi-optimized gather-if-not* function.

Warren Wilkinson
Posts: 117
Joined: Tue Aug 10, 2010 11:24 pm
Location: Calgary, Alberta
Contact:

Re: Equivalent of shlex.slpit of python in common-lisp

Post by Warren Wilkinson » Wed Sep 01, 2010 11:59 pm

I actually wrote a blog post on writing parsers: http://formlis.wordpress.com/2010/07/07 ... t-regexps/.

This operation is non-trivial; there are three 3 states: Reading Text, Skipping Whitespace, and Handling Quotes. There are also three diferent character classes: Regular characters, Spaces, and Quotes. A State Machine is a matrix of functions, there will be one function per combination of state and character class. The string is processed one character at a time, the character class is determined by the input character, while the machine state is a variable that is changed throughout the computation.

This idea is embodied in the code I've provided. The only trick is that I've seperated the "Action to Run" from the "State to Transition To".

Code: Select all

(defun char-class (char) (if (char-equal char #\Space) 0 (if (char-equal char #\") 1 2)))

(defconstant +white-mode+ 0)
(defconstant +read-mode+ 1)
(defconstant +quote-mode+ 2)

(defvar *collected*)
(defun skip (pos) (declare (ignore pos)))
(defun collect (pos) (declare (ignore pos)))
(defun startq (pos) (push (list (1+ pos) (1+ pos)) *collected*))
(defun startw (pos) (push (list pos pos) *collected*))
(defun finish (pos) (setf (second (car *collected*)) pos))
(defun reopen (pos)
  (setf (second (car *collected*)) pos)
  (push (list (1+ pos) (1+ pos)) *collected*))

(defvar *sm* (make-array 18 :initial-contents
     ;;        SPACE                     QUOTE                   OTHER
   (list #'skip    +white-mode+    #'startq +quote-mode+    #'startw  +read-mode+    ;; WHITEMODE
	 #'finish  +white-mode+    #'reopen +quote-mode+    #'collect +read-mode+    ;; READMODE
	 #'collect +quote-mode+    #'finish +white-mode+    #'collect +quote-mode+)));; QUOTEMODE

(defun shlex-split (string)
  (setf *collected* nil)
  (loop for i upfrom 0
        for c across string
        with state = +white-mode+
        do (let ((offset (+ (* state 6) (* 2 (char-class c)))))
	     (funcall (svref *sm* offset) i)
	     (setf state (svref *sm* (1+ offset))))
        finally (unless (= state +white-mode+) (finish (length string))))
  (mapcar #'(lambda (a) (apply #'subseq string a)) (nreverse *collected*)))

(shlex-split "independent single\"compound word\" alone selfish friendless \"buddy words\"")
;; Results: ("independent" "single" "compound word" "alone" "selfish" "friendless" "buddy words")
This may not be easily read, but the operation you have described is complex. My blog entry, and its link to the original paper of this technique in Forth, may help you understand how this parser works, as well as teach you an important programming technique.
Need an online wiki database? My Lisp startup http://www.formlis.com combines a wiki with forms and reports.

Post Reply