Equivalent of shlex.slpit of python in common-lisp
Equivalent of shlex.slpit of python in common-lisp
Hi All,
Do anybody know the equivalent of
shlex.split from python in Common Lisp.
Basically I am in search of to split a string at white space, but ignore text in
quotes.
--
Regards
-sharad
Do anybody know the equivalent of
shlex.split from python in Common Lisp.
Basically I am in search of to split a string at white space, but ignore text in
quotes.
--
Regards
-sharad
-
- Posts: 406
- Joined: Sat Mar 07, 2009 6:17 pm
- Location: Brazil
- Contact:
Re: Equivalent of shlex.slpit of python in common-lisp
That function does not come with standard Common Lisp. You may use split-sequence for that.
Re: Equivalent of shlex.slpit of python in common-lisp
I think the question was meant how can a string:
being splitted into a result like:
AFAIK this cannot be done with SPLIT-SEQUENCE, or does it and I don't know?
Code: Select all
"split this \"but do not split this\" in this string"
Code: Select all
("split" "this" "but do not split this" "in" "this" "string")
Question: what do you want to split exactly? In Common Lisp, a string is always "text inside quotes", so your question to me sounds self-contradicting. A string that is ignored cannot be splitted. Was it meant like I had written in the example above or do you look for something different?sh4r4d wrote:Basically I am in search of to split a string at white space, but ignore text in quotes.
Re: Equivalent of shlex.slpit of python in common-lisp
I'd solve this using META-SEXP.
I'm starting to sound like a broken record.
Cheers,
~ Tom
Code: Select all
(defrule quoted? (&aux (quoted (make-char-accum))) ()
(:* (:type white-space?))
#\" (:+ (:not #\") (:char-push quoted)) #\"
(:return quoted))
(defrule word? (&aux (word (make-char-accum))) ()
(:* (:type (or white-space? newline?)))
(:+ (:not (:type (or white-space? newline?)))
(:char-push word))
(:return word))
(defrule quoted-or-word? () ()
(:rule (or quoted? word?)))
(defun shlex-split (string)
"Return a list of words respecting escaped quotes."
(loop with ctx = (create-parser-context string)
for next = (quoted-or-word? ctx)
while next collect next))
Cheers,
~ Tom
Thomas M. Hermann
Odonata Research LLC
http://www.odonata-research.com/
http://www.linkedin.com/in/thomasmhermann
Odonata Research LLC
http://www.odonata-research.com/
http://www.linkedin.com/in/thomasmhermann
Re: Equivalent of shlex.slpit of python in common-lisp
I would use my parser-combinators library, but that's mostly because I wrote it and kind of like it, it is definitely less tested and optimized than META-SEXP
It made me notice a couple of bugs too... I didn't use the library that much, especially the semi-optimized gather-if-not* function.
Code: Select all
(defun whitespace* ()
(gather-if-not* (complement
(alexandria:rcurry #'member '(#\Space #\Newline #\Tab)))
:result-type 'string
:accept-end t
:accept-empty t))
(defun word* ()
(gather-if-not* (complement #'alphanumericp) :result-type 'string :accept-end t :accept-empty nil))
(defun quoted* ()
(named-seq* (<- c1 (context?))
#\"
;; handle quotation marks escaped with backslash
(many*
(choice1 (gather-if-not* (alexandria:rcurry #'member '(#\" #\\))
:result-type nil)
"\\\""))
#\"
(<- c2 (context?))
(context-interval c1 c2)))
(defun quoted-or-word* ()
(named-seq* (whitespace*)
(<- datum (choice1 (quoted*) (word*)))
datum))
(defun shlex-split* ()
(named-seq* (<- data (many* (quoted-or-word*)))
(whitespace*) ;discard terminating whitespace
data))
-
- Posts: 117
- Joined: Tue Aug 10, 2010 11:24 pm
- Location: Calgary, Alberta
- Contact:
Re: Equivalent of shlex.slpit of python in common-lisp
I actually wrote a blog post on writing parsers: http://formlis.wordpress.com/2010/07/07 ... t-regexps/.
This operation is non-trivial; there are three 3 states: Reading Text, Skipping Whitespace, and Handling Quotes. There are also three diferent character classes: Regular characters, Spaces, and Quotes. A State Machine is a matrix of functions, there will be one function per combination of state and character class. The string is processed one character at a time, the character class is determined by the input character, while the machine state is a variable that is changed throughout the computation.
This idea is embodied in the code I've provided. The only trick is that I've seperated the "Action to Run" from the "State to Transition To".
This may not be easily read, but the operation you have described is complex. My blog entry, and its link to the original paper of this technique in Forth, may help you understand how this parser works, as well as teach you an important programming technique.
This operation is non-trivial; there are three 3 states: Reading Text, Skipping Whitespace, and Handling Quotes. There are also three diferent character classes: Regular characters, Spaces, and Quotes. A State Machine is a matrix of functions, there will be one function per combination of state and character class. The string is processed one character at a time, the character class is determined by the input character, while the machine state is a variable that is changed throughout the computation.
This idea is embodied in the code I've provided. The only trick is that I've seperated the "Action to Run" from the "State to Transition To".
Code: Select all
(defun char-class (char) (if (char-equal char #\Space) 0 (if (char-equal char #\") 1 2)))
(defconstant +white-mode+ 0)
(defconstant +read-mode+ 1)
(defconstant +quote-mode+ 2)
(defvar *collected*)
(defun skip (pos) (declare (ignore pos)))
(defun collect (pos) (declare (ignore pos)))
(defun startq (pos) (push (list (1+ pos) (1+ pos)) *collected*))
(defun startw (pos) (push (list pos pos) *collected*))
(defun finish (pos) (setf (second (car *collected*)) pos))
(defun reopen (pos)
(setf (second (car *collected*)) pos)
(push (list (1+ pos) (1+ pos)) *collected*))
(defvar *sm* (make-array 18 :initial-contents
;; SPACE QUOTE OTHER
(list #'skip +white-mode+ #'startq +quote-mode+ #'startw +read-mode+ ;; WHITEMODE
#'finish +white-mode+ #'reopen +quote-mode+ #'collect +read-mode+ ;; READMODE
#'collect +quote-mode+ #'finish +white-mode+ #'collect +quote-mode+)));; QUOTEMODE
(defun shlex-split (string)
(setf *collected* nil)
(loop for i upfrom 0
for c across string
with state = +white-mode+
do (let ((offset (+ (* state 6) (* 2 (char-class c)))))
(funcall (svref *sm* offset) i)
(setf state (svref *sm* (1+ offset))))
finally (unless (= state +white-mode+) (finish (length string))))
(mapcar #'(lambda (a) (apply #'subseq string a)) (nreverse *collected*)))
(shlex-split "independent single\"compound word\" alone selfish friendless \"buddy words\"")
;; Results: ("independent" "single" "compound word" "alone" "selfish" "friendless" "buddy words")
Need an online wiki database? My Lisp startup http://www.formlis.com combines a wiki with forms and reports.