diff --git a/ExampleStructure.png b/ExampleStructure.png deleted file mode 100644 index eaff47f..0000000 Binary files a/ExampleStructure.png and /dev/null differ diff --git a/LICENSE b/LICENSE deleted file mode 100644 index 0e259d4..0000000 --- a/LICENSE +++ /dev/null @@ -1,121 +0,0 @@ -Creative Commons Legal Code - -CC0 1.0 Universal - - CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE - LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN - ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS - INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES - REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS - PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM - THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED - HEREUNDER. - -Statement of Purpose - -The laws of most jurisdictions throughout the world automatically confer -exclusive Copyright and Related Rights (defined below) upon the creator -and subsequent owner(s) (each and all, an "owner") of an original work of -authorship and/or a database (each, a "Work"). - -Certain owners wish to permanently relinquish those rights to a Work for -the purpose of contributing to a commons of creative, cultural and -scientific works ("Commons") that the public can reliably and without fear -of later claims of infringement build upon, modify, incorporate in other -works, reuse and redistribute as freely as possible in any form whatsoever -and for any purposes, including without limitation commercial purposes. -These owners may contribute to the Commons to promote the ideal of a free -culture and the further production of creative, cultural and scientific -works, or to gain reputation or greater distribution for their Work in -part through the use and efforts of others. - -For these and/or other purposes and motivations, and without any -expectation of additional consideration or compensation, the person -associating CC0 with a Work (the "Affirmer"), to the extent that he or she -is an owner of Copyright and Related Rights in the Work, voluntarily -elects to apply CC0 to the Work and publicly distribute the Work under its -terms, with knowledge of his or her Copyright and Related Rights in the -Work and the meaning and intended legal effect of CC0 on those rights. - -1. Copyright and Related Rights. A Work made available under CC0 may be -protected by copyright and related or neighboring rights ("Copyright and -Related Rights"). Copyright and Related Rights include, but are not -limited to, the following: - - i. the right to reproduce, adapt, distribute, perform, display, - communicate, and translate a Work; - ii. moral rights retained by the original author(s) and/or performer(s); -iii. publicity and privacy rights pertaining to a person's image or - likeness depicted in a Work; - iv. rights protecting against unfair competition in regards to a Work, - subject to the limitations in paragraph 4(a), below; - v. rights protecting the extraction, dissemination, use and reuse of data - in a Work; - vi. database rights (such as those arising under Directive 96/9/EC of the - European Parliament and of the Council of 11 March 1996 on the legal - protection of databases, and under any national implementation - thereof, including any amended or successor version of such - directive); and -vii. other similar, equivalent or corresponding rights throughout the - world based on applicable law or treaty, and any national - implementations thereof. - -2. Waiver. To the greatest extent permitted by, but not in contravention -of, applicable law, Affirmer hereby overtly, fully, permanently, -irrevocably and unconditionally waives, abandons, and surrenders all of -Affirmer's Copyright and Related Rights and associated claims and causes -of action, whether now known or unknown (including existing as well as -future claims and causes of action), in the Work (i) in all territories -worldwide, (ii) for the maximum duration provided by applicable law or -treaty (including future time extensions), (iii) in any current or future -medium and for any number of copies, and (iv) for any purpose whatsoever, -including without limitation commercial, advertising or promotional -purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each -member of the public at large and to the detriment of Affirmer's heirs and -successors, fully intending that such Waiver shall not be subject to -revocation, rescission, cancellation, termination, or any other legal or -equitable action to disrupt the quiet enjoyment of the Work by the public -as contemplated by Affirmer's express Statement of Purpose. - -3. Public License Fallback. Should any part of the Waiver for any reason -be judged legally invalid or ineffective under applicable law, then the -Waiver shall be preserved to the maximum extent permitted taking into -account Affirmer's express Statement of Purpose. In addition, to the -extent the Waiver is so judged Affirmer hereby grants to each affected -person a royalty-free, non transferable, non sublicensable, non exclusive, -irrevocable and unconditional license to exercise Affirmer's Copyright and -Related Rights in the Work (i) in all territories worldwide, (ii) for the -maximum duration provided by applicable law or treaty (including future -time extensions), (iii) in any current or future medium and for any number -of copies, and (iv) for any purpose whatsoever, including without -limitation commercial, advertising or promotional purposes (the -"License"). The License shall be deemed effective as of the date CC0 was -applied by Affirmer to the Work. Should any part of the License for any -reason be judged legally invalid or ineffective under applicable law, such -partial invalidity or ineffectiveness shall not invalidate the remainder -of the License, and in such case Affirmer hereby affirms that he or she -will not (i) exercise any of his or her remaining Copyright and Related -Rights in the Work or (ii) assert any associated claims and causes of -action with respect to the Work, in either case contrary to Affirmer's -express Statement of Purpose. - -4. Limitations and Disclaimers. - - a. No trademark or patent rights held by Affirmer are waived, abandoned, - surrendered, licensed or otherwise affected by this document. - b. Affirmer offers the Work as-is and makes no representations or - warranties of any kind concerning the Work, express, implied, - statutory or otherwise, including without limitation warranties of - title, merchantability, fitness for a particular purpose, non - infringement, or the absence of latent or other defects, accuracy, or - the present or absence of errors, whether or not discoverable, all to - the greatest extent permissible under applicable law. - c. Affirmer disclaims responsibility for clearing rights of other persons - that may apply to the Work or any use thereof, including without - limitation any person's Copyright and Related Rights in the Work. - Further, Affirmer disclaims responsibility for obtaining any necessary - consents, permissions or other rights required for any use of the - Work. - d. Affirmer understands and acknowledges that Creative Commons is not a - party to this document and has no duty or obligation with respect to - this CC0 or use of the Work. diff --git a/figure/ExampleStructure.pages b/figure/ExampleStructure.pages deleted file mode 100755 index 1586a4a..0000000 Binary files a/figure/ExampleStructure.pages and /dev/null differ diff --git a/figure/ExampleStructure.pdf b/figure/ExampleStructure.pdf deleted file mode 100644 index 5c67bf0..0000000 Binary files a/figure/ExampleStructure.pdf and /dev/null differ diff --git a/readme conversions/langsci-gb4e.sty b/readme conversions/langsci-gb4e.sty deleted file mode 100644 index 54bceb4..0000000 --- a/readme conversions/langsci-gb4e.sty +++ /dev/null @@ -1,828 +0,0 @@ -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% File: langsci-gb4e.sty -%% Author: Language Science Press (http://langsci-press.org) -%% Date: 2020-03-17 13:12 UTC -%% Purpose: This file contains an adapted version of the gb4e package -%% for typetting linguistic examples. It also includes -%% adapted versions of the cgloss and jambox packages -%% Language: LaTeX -%% Licence: -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\ProvidesPackage{langsci-gb4e}[2020/01/01] - -\usepackage{etoolbox} - -\newtoggle{cgloss} -\toggletrue{cgloss} -\newtoggle{jambox} -\toggletrue{jambox} -\DeclareOption{nocgloss}{\togglefalse{cgloss}} -\DeclareOption{nojambox}{\togglefalse{jambox}} -\DeclareOption*{\PackageWarning{examplepackage}{Unknown option ‘\CurrentOption’}} -\ProcessOptions\relax - -% \def\gbVersion{4e} - -%%%%%%%%%%%%%%%%%%%%%%%% -% Format of examples: % -%%%%%%%%%%%%%%%%%%%%%%%% -% \begin{exe} or \exbegin -% (arab.) -% \begin{xlist} or \xlist -% (1st embedding, alph.) -% \begin{xlisti} or \xlisti -% (2st embedding, rom.) -% \end{xlisti} or \endxlisti -% -% \end{xlist} or \endxlist -% -% \end{exe} or \exend -% -% Other sublist-styles: xlistA (Alph.), xlistI (Rom.), xlistn (arab) -% -% \ex (produces Number) -% \ex (numbered example) -% \ex[jdgmt]{sentence} (numbered example with judgement) -% -% \exi{ident} (produces identifier) -% \exi{ident} (example numbered with identifier) -% \exi{ident}[jdgmt]{sentence} (dito with judgement) -% (\exr, \exp and \sn are defined in terms of \exi) -% -% \exr{label} (produces cross-referenced Num.) -% \exr{label} (cross-referenced example) -% \exr{label}[jdgmt]{sentence} (cross-referenced example with judgement) -% -% \exp{label} (same as -% \exp{label} \exr but -% \exp{label}[jdgmt]{sentence} with prime) -% -% \sn (unnumbered example) -% \sn[jdgmt]{sentence} (unnumbered example with judgement) -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% For my own lazyness (HANDLE WITH CARE---this works only -% in boringly normal cases.... ): -% -% \ea works like \begin{exe}\ex or \begin{xlist}\ex, -% depending on context -% \z works like \end{exe} or \end{xlist}, dep on context -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -%CGLOSS META -% Modified version of cgloss4e.sty. Hacked and renamed cgloss.sty -% by Alexis Dimitriadis (alexis@babel.ling.upenn.edu). Integrated into -% langsci-gb4e.sty by Sebastian Nordhoff -% EnD CGLOSS META - - - -\@ifundefined{new@fontshape}{\def\reset@font{}\let\mathrm\rm\let\mathit\mit}{} - - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% %% -% Font Specifications %% -% %% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -% Define commands for fonts to be used: -% -% 1) regular -% a. example line -\newcommand{\exfont}{\normalsize\upshape} -% b. glossing line -\newcommand{\glossfont}{\normalsize\upshape} -% c. translation font -\newcommand{\transfont}{\normalsize\upshape} -% d. example number -\newcommand{\exnrfont}{\exfont\upshape} -% -% 2) in footnote -% a. example line -\newcommand{\fnexfont}{\footnotesize\upshape} -% b. glossing line -\newcommand{\fnglossfont}{\footnotesize\upshape} -% c. translation font -\newcommand{\fntransfont}{\footnotesize\upshape} -% d. example number -\newcommand{\fnexnrfont}{\fnexfont\upshape} - -\newcommand{\examplesroman}{ - \let\eachwordone=\upshape - \exfont{\upshape} -} -\newcommand{\examplesitalics}{ - \let\eachwordone=\itshape - \exfont{\itshape} -} - - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% %% -%% Macros for examples, roughly following Linguistic Inquiry style. %% -%% %% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\def\qlist{\begin{list}{\Alph{xnum}.}{\usecounter{xnum}% -\setlength{\rightmargin}{\leftmargin}}} -\def\endqlist{\end{list}} - -\newif\if@noftnote\@noftnotetrue -\newif\if@xrec\@xrecfalse -\@definecounter{fnx} - -% set a flag that we are in footnotes now and change the size of example fonts -\let\oldFootnotetext\@footnotetext - -\renewcommand\@footnotetext[1]{% - \@noftnotefalse\setcounter{fnx}{0}% -\begingroup% -\let\exfont\fnexfont% -\let\glossfont\fnglossfont% -\let\transfont\fntransfont% -\let\exnrfont\fnexnrfont% - \oldFootnotetext{#1}% -\endgroup% -\@noftnotetrue} - - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% %% -%% counters %% -%% %% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -% start counters with 1 -\newcount\@xnumdepth \@xnumdepth = 0 - -% define four levels of indentation -\@definecounter{xnumi} -\@definecounter{xnumii} -\@definecounter{xnumiii} -\@definecounter{xnumiv} - - -% use (1) on page, but (i) in footnotes -\def\thexnumi -{\if@noftnote% -\@arabic\@xsi{xnumi}% -\else% -\@roman\@xsi{xnumi}% -\fi% -} -\def\thexnumii{\@xsii{xnumii}} -\def\thexnumiii{\@xsiii{xnumiii}} -\def\thexnumiv{\@xsiv{xnumiv}} -\def\p@xnumii{\thexnumi% -\if@noftnote% -\else% -.% -\fi} -\def\p@xnumiii{\thexnumi\thexnumii-} -\def\p@xnumiv{\thexnumi\thexnumii-\thexnumiii-} - -\def\xs@default#1{\csname @@xs#1\endcsname} -\def\@@xsi{\let\@xsi\arabic} -\def\@@xsii{\let\@xsii\alph} -\def\@@xsiii{\let\@xsiii\roman} -\def\@@xsiv{\let\@xsi\arabic} - -\@definecounter{rxnumi} -\@definecounter{rxnumii} -\@definecounter{rxnumiii} -\@definecounter{rxnumiv} - -\def\save@counters{% -\setcounter{rxnumi}{\value{xnumi}}% -\setcounter{rxnumii}{\value{xnumii}}% -\setcounter{rxnumiii}{\value{xnumiii}}% -\setcounter{rxnumiv}{\value{xnumiv}}}% - -\def\reset@counters{% -\setcounter{xnumi}{\value{rxnumi}}% -\setcounter{xnumii}{\value{rxnumii}}% -\setcounter{xnumiii}{\value{rxnumiii}}% -\setcounter{xnumiv}{\value{rxnumiv}}}% - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% %% -%% widths %% -%% %% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -% Control the width of example identifiers -\def\exewidth#1{\def\@exwidth{#1}} - -\newcommand{\twodigitexamples}{\exewidth{(23)}} -\newcommand{\threedigitexamples}{\exewidth{(234)}} -\newcommand{\fourdigitexamples}{\exewidth{(2345)}} - -\def\gblabelsep#1{\def\@gblabelsep{#1}} -\gblabelsep{1em} - -\def\subexsep#1{\def\@subexsep{#1}} -\subexsep{1.5ex} - -% set initial sizes of example number and judgement sizes -\exewidth{\exnrfont (35)} - -% how much should examples in footnotes be indented? -\newlength{\footexindent} -\setlength{\footexindent}{0pt} - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% %% -%% example lists %% -%% %% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\def\exe{% - %\ifnum\value{equation}>9 \exewidth{(23)}\else\fi% - %inserted by LangSci, for large example numbers - \ifnum\value{equation}>98 \exewidth{(235)}\else\fi% - \@ifnextchar [{\@exe}{\@exe[\@exwidth]}} - -\def\@exe[#1]{\ifnum \@xnumdepth >0% - \if@xrec\@exrecwarn\fi% - \if@noftnote\@exrecwarn\fi% - \@xnumdepth0\@listdepth0\@xrectrue% - \save@counters% - \fi% - \advance\@xnumdepth \@ne \@@xsi% - \if@noftnote% - \begin{list}{(\thexnumi)}% - {\usecounter{xnumi}\@subex{#1}{\@gblabelsep}{0em}% - \setcounter{xnumi}{\value{equation}} - \nopagebreak}% - \else% - \begin{list}{(\roman{xnumi})}% - {\usecounter{xnumi}\@subex{(iiv)}{\@gblabelsep}{\footexindent}% - \setcounter{xnumi}{\value{fnx}}}% - \fi} - - -\def\endexe{\if@noftnote\setcounter{equation}{\value{xnumi}}% - \else\setcounter{fnx}{\value{xnumi}}% - \reset@counters\@xrecfalse\fi\end{list}} - -\def\@exrecwarn{\typeout{*** Recursion on "exe"---your - example numbering will probably be screwed up!}} - -\def\xlist{\@ifnextchar [{\@xlist{}}{\@xlist{}[iv.]}} -\def\xlista{\@ifnextchar [{\@xlist{\alph}}{\@xlist{\alph}[m.]}} -\def\xlistabr{\@ifnextchar [{\@xlist{(\alph)}}{\@xlist{(\alph)}[m.]}} -\def\xlisti{\@ifnextchar [{\@xlist{\roman}}{\@xlist{\roman}[iv.]}} -\def\xlistn{\@ifnextchar [{\@xlist{\arabic}}{\@xlist{\arabic}[9.]}} -\def\xlistA{\@ifnextchar [{\@xlist{\Alph}}{\@xlist{\Alph}[M.]}} -\def\xlistI{\@ifnextchar [{\@xlist{\Roman}}{\@xlist{\Roman}[IV.]}} - -\def\endxlist{\end{list}} -\def\endxlista{\end{list}} -\def\endxlistabr{\end{list}} -\def\endxlistn{\end{list}} -\def\endxlistA{\end{list}} -\def\endxlistI{\end{list}} -\def\endxlisti{\end{list}} - - - - -%%% a generic sublist-styler -\def\@xlist#1[#2]{\ifnum \@xnumdepth >3 \@toodeep\else% - \advance\@xnumdepth \@ne% - \edef\@xnumctr{xnum\romannumeral\the\@xnumdepth}% - \def\@bla{#1} - \ifx\@bla\empty\xs@default{\romannumeral\the\@xnumdepth}\else% - \expandafter\let\csname @xs\romannumeral\the\@xnumdepth\endcsname#1\fi - \begin{list}{\csname the\@xnumctr\endcsname.}% - {\usecounter{\@xnumctr}\@subex{#2}{\@subexsep}{0em}}\fi} - -%% Added third argument to be able to add some more space to leftmargin -%% for footnotes that have bigger indentation. -%% St. M�. 07.01.2007 -\def\@subex#1#2#3{\settowidth{\labelwidth}{#1}\itemindent\z@\labelsep#2% - \ifnum\the\@xnumdepth=1% - \topsep 7\p@ plus2\p@ minus3\p@\itemsep3\p@ plus2\p@\else% - \topsep1.5\p@ plus\p@\itemsep1.5\p@ plus\p@\fi% - \parsep\p@ plus.5\p@ minus.5\p@% - \leftmargin\labelwidth\advance\leftmargin#2\advance\leftmargin#3\relax} - -%%% the example-items -\def\ex{\@ifnextchar [{\@ex}{\item}} -\def\@ex[#1]#2{\item\@exj[#1]{#2}} -\def\@exj[#1]#2{\@exjbg{#1} #2 \end{list}\nopagebreak} -\def\exi#1{\item[#1]\@ifnextchar [{\@exj}{}} -\def\judgewidth#1{\def\@jwidth{#1}} -\judgewidth{??} -\judgewidth{*} % if wider judgements are needed, enlarge within papers -\def\@exjbg#1{\begin{list}{#1}{\@subex{\@jwidth}{.5ex}{0em}}\item} -\def\exr#1{\exi{{(\ref{#1})}}} -\def\exp#1{\exi{{(\ref{#1}$'$)}}} -\def\sn{\exi{}} - - -\def\ex{\@ifnextchar [{\exnrfont\@ex}{\exnrfont\item\exfont}} -\def\@ex[#1]#2{\item\@exj[#1]{\exfont#2}} - -\def\@exjbg#1{\begin{list}{{\exnrfont#1}}{\@subex{\@jwidth}{.5ex}{0em}}\item} -\def\exi#1{\item[{\exnrfont#1}]\@ifnextchar [{\exnrfont\@exj}{}} - -\def\ea{\ifnum\@xnumdepth=0\begin{exe}\else\begin{xlist}[iv.]\fi\raggedright\ex} -\def\eal{\begin{exe}\exnrfont\ex\begin{xlist}[iv.]\raggedright} -\def\eas{\ifnum\@xnumdepth=0\begin{exe}[(34)]\else\begin{xlist}[iv.]\fi\ex\begin{tabular}[t]{@{}p{\linewidth}@{}}} - -% allow hyphenation and justification -\def\eanoraggedright{\ifnum\@xnumdepth=0\begin{exe}\else\begin{xlist}[iv.]\fi\ex} -\def\ealnoraggedright{\begin{exe}\exnrfont\ex\begin{xlist}[iv.]} - - - -\def\z{\ifnum\@xnumdepth=1\end{exe}\else\end{xlist}\fi} -\def\zl{\end{xlist}\end{exe}} -\def\zs{\end{tabular}\ifnum\@xnumdepth=1\end{exe}\else\end{xlist}\fi} -\def\zllast{\end{xlist}\end{exe}\removelastskip} - -% Control vertical space for examples in footnotes -\def\zlast{\z\vspace{-\baselineskip}} -\def\eafirst{\vspace{-1.5\baselineskip}\ea} - -%%%%%% control the alignment of exampleno. and (picture-)example -%%%%%% (by Lex Holt ). -\def\attop#1{\leavevmode\vtop{\strut\vskip-\baselineskip\vbox{#1}}} -\def\atcenter#1{$\vcenter{#1}$} -%%%%%% - - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% %% -%% several examples in one line %% -%% %% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\newcommand{\xbox}[2]{\noindent\parbox[t]{#1}{#2}\noindent} -\newcommand{\nobreakbox}[1]{\xbox{\linewidth}{#1}} -\newcommand{\xref}[1]{(\ref{#1})} -\newcommand{\xxref}[2]{(\ref{#1}--\ref{#2})} - - -\iftoggle{cgloss}{ -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% %% -%% CGLOSS starts here %% -%% %% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - - -\let\@gsingle=1 -\def\singlegloss{\let\@gsingle=1} -\def\nosinglegloss{\let\@gsingle=0} -\@ifundefined{new@fontshape}% - {\def\@selfnt{\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi}} - {\def\@selfnt{\selectfont}} - -\def\gll% % Introduces 2-line text-and-gloss. - {\raggedright% - \bgroup %\begin{flushleft} - \ifx\@gsingle1% - \def\baselinestretch{1}\@selfnt\fi - \bgroup - \twosent -} - -\def\glll% % Introduces 3-line text-and-gloss. - {\bgroup %\begin{flushleft} - \ifx\@gsingle1% - \def\baselinestretch{1}\@selfnt\fi - \bgroup - \threesent -} - - -\def\gllll% % Introduces 4-line text-and-gloss. - {\bgroup %\begin{flushleft} - \ifx\@gsingle1% - \def\baselinestretch{1}\@selfnt\fi - \bgroup - \foursent -} - - -\def\glllll% % Introduces 5-line text-and-gloss. - {\bgroup %\begin{flushleft} - \ifx\@gsingle1% - \def\baselinestretch{1}\@selfnt\fi - \bgroup - \fivesent -} - - -\def\gllllll% % Introduces 6-line text-and-gloss. - {\bgroup %\begin{flushleft} - \ifx\@gsingle1% - \def\baselinestretch{1}\@selfnt\fi - \bgroup - \sixsent -} - - -\def\glllllll% % Introduces 7-line text-and-gloss. - {\bgroup %\begin{flushleft} - \ifx\@gsingle1% - \def\baselinestretch{1}\@selfnt\fi - \bgroup - \sevensent -} - - -\def\gllllllll% % Introduces 8-line text-and-gloss. - {\bgroup %\begin{flushleft} - \ifx\@gsingle1% - \def\baselinestretch{1}\@selfnt\fi - \bgroup - \eightsent -} - - -\newlength{\gltoffset} -\setlength{\gltoffset}{.17\baselineskip} -\newcommand{\nogltOffset}{\setlength{\gltoffset}{0pt}} -\newcommand{\resetgltOffset}{\setlength{\gltoffset}{.17\baselineskip}} -\def\glt{\ifhmode\\*[\gltoffset]\else\nobreak\vskip\gltoffset\nobreak\fi\transfont} - - -% Introduces a translation -\let\trans\glt - -% \def\gln{\relax} -% % Ends the gloss environment. - -% The following TeX code is adapted, with permission, from: -% gloss.tex: Macros for vertically aligning words in consecutive sentences. -% Version: 1.0 release: 26 November 1990 -% Copyright (c) 1991 Marcel R. van der Goot (marcel@cs.caltech.edu). - -\newbox\lineone % boxes with words from first line -\newbox\linetwo -\newbox\linethree -\newbox\linefour -\newbox\linefive -\newbox\linesix -\newbox\lineseven -\newbox\lineeight -\newbox\wordone % a word from the first line (hbox) -\newbox\wordtwo -\newbox\wordthree -\newbox\wordfour -\newbox\wordfive -\newbox\wordsix -\newbox\wordseven -\newbox\wordeight -\newbox\gline % the constructed double line (hbox) -\newskip\glossglue % extra glue between glossed pairs or tuples -\glossglue = 0pt plus 2pt minus 1pt % allow stretch/shrink between words -%\glossglue = 5pt plus 2pt minus 1pt % allow stretch/shrink between words -\newif\ifnotdone - -\@ifundefined{eachwordone}{\let\eachwordone=\upshape}{\relax} -\@ifundefined{eachwordtwo}{\let\eachwordtwo=\upshape}{\relax} -\@ifundefined{eachwordthree}{\let\eachwordthree=\upshape}{\relax} -\@ifundefined{eachwordfour}{\let\eachwordfour=\upshape}{\relax} -\@ifundefined{eachwordfive}{\let\eachwordfive=\upshape}{\relax} -\@ifundefined{eachwordsix}{\let\eachwordsix=\upshape}{\relax} -\@ifundefined{eachwordseven}{\let\eachwordseven=\upshape}{\relax} -\@ifundefined{eachwordeight}{\let\eachwordeight=\upshape}{\relax} - -\def\lastword#1#2#3% #1 = \each, #2 = line box, #3 = word box - {\setbox#2=\vbox{\unvbox#2% - \global\setbox#3=\lastbox - }% - \ifvoid#3\global\setbox#3=\hbox{#1\strut{} }\fi - % extra space following \strut in case #1 needs a space - } - -\def\testdone - {\ifdim\ht\lineone=0pt - \ifdim\ht\linetwo=0pt \notdonefalse % tricky space after pt - \else\notdonetrue - \fi - \else\notdonetrue - \fi - } - -\gdef\getwords(#1,#2)#3 #4\\% #1=linebox, #2=\each, #3=1st word, #4=remainder - {\setbox#1=\vbox{\hbox{#2\strut#3{} }% adds space, the {} is needed for CJK otherwise the space - % would be ignored - \unvbox#1% - }% - \def\more{#4}% - \ifx\more\empty\let\more=\donewords - \else\let\more=\getwords - \fi - \more(#1,#2)#4\\% - } - -\gdef\donewords(#1,#2)\\{}% - -\gdef\twosent#1\\ #2\\{% #1 = first line, #2 = second line - \getwords(\lineone,\eachwordone)#1 \\% - \getwords(\linetwo,\eachwordtwo)#2 \\% - \loop\lastword{\eachwordone}{\lineone}{\wordone}% - \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% - \global\setbox\gline=\hbox{\unhbox\gline - \hskip\glossglue - \vtop{\box\wordone % vtop was vbox - \nointerlineskip - \box\wordtwo - }% - }% - \testdone - \ifnotdone - \repeat - \egroup % matches \bgroup in \gloss - \gl@stop} - -\gdef\threesent#1\\ #2\\ #3\\{% #1 = first line, #2 = second line, #3 = third - \getwords(\lineone,\eachwordone)#1 \\% - \getwords(\linetwo,\eachwordtwo)#2 \\% - \getwords(\linethree,\eachwordthree)#3 \\% - \loop\lastword{\eachwordone}{\lineone}{\wordone}% - \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% - \lastword{\eachwordthree}{\linethree}{\wordthree}% - \global\setbox\gline=\hbox{\unhbox\gline - \hskip\glossglue - \vtop{\box\wordone % vtop was vbox - \nointerlineskip - \box\wordtwo - \nointerlineskip - \box\wordthree - }% - }% - \testdone - \ifnotdone - \repeat - \egroup % matches \bgroup in \gloss - \gl@stop} - - - -\gdef\foursent#1\\ #2\\ #3\\ #4\\{% #1 = first line, #2 = second line, #3 = third etc - \getwords(\lineone,\eachwordone)#1 \\% - \getwords(\linetwo,\eachwordtwo)#2 \\% - \getwords(\linethree,\eachwordthree)#3 \\% - \getwords(\linefour,\eachwordfour)#4 \\% - \loop\lastword{\eachwordone}{\lineone}{\wordone}% - \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% - \lastword{\eachwordthree}{\linethree}{\wordthree}% - \lastword{\eachwordfour}{\linefour}{\wordfour}% - \global\setbox\gline=\hbox{\unhbox\gline - \hskip\glossglue - \vtop{\box\wordone % vtop was vbox - \nointerlineskip - \box\wordtwo - \nointerlineskip - \box\wordthree - \nointerlineskip - \box\wordfour - }% - }% - \testdone - \ifnotdone - \repeat - \egroup % matches \bgroup in \gloss - \gl@stop} - - - -\gdef\fivesent#1\\ #2\\ #3\\ #4\\ #5\\{% #1 = first line, #2 = second line, #3 = third etc - \getwords(\lineone,\eachwordone)#1 \\% - \getwords(\linetwo,\eachwordtwo)#2 \\% - \getwords(\linethree,\eachwordthree)#3 \\% - \getwords(\linefour,\eachwordfour)#4 \\% - \getwords(\linefive,\eachwordfive)#5 \\% - \loop\lastword{\eachwordone}{\lineone}{\wordone}% - \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% - \lastword{\eachwordthree}{\linethree}{\wordthree}% - \lastword{\eachwordfour}{\linefour}{\wordfour}% - \lastword{\eachwordfive}{\linefive}{\wordfive}% - \global\setbox\gline=\hbox{\unhbox\gline - \hskip\glossglue - \vtop{\box\wordone % vtop was vbox - \nointerlineskip - \box\wordtwo - \nointerlineskip - \box\wordthree - \nointerlineskip - \box\wordfour - \nointerlineskip - \box\wordfive - }% - }% - \testdone - \ifnotdone - \repeat - \egroup % matches \bgroup in \gloss - \gl@stop} - - - -\gdef\sixsent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\{% #1 = first line, #2 = second line, #3 = third etc - \getwords(\lineone,\eachwordone)#1 \\% - \getwords(\linetwo,\eachwordtwo)#2 \\% - \getwords(\linethree,\eachwordthree)#3 \\% - \getwords(\linefour,\eachwordfour)#4 \\% - \getwords(\linefive,\eachwordfive)#5 \\% - \getwords(\linesix,\eachwordsix)#6 \\% - \loop\lastword{\eachwordone}{\lineone}{\wordone}% - \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% - \lastword{\eachwordthree}{\linethree}{\wordthree}% - \lastword{\eachwordfour}{\linefour}{\wordfour}% - \lastword{\eachwordfive}{\linefive}{\wordfive}% - \lastword{\eachwordsix}{\linesix}{\wordsix}% - \global\setbox\gline=\hbox{\unhbox\gline - \hskip\glossglue - \vtop{\box\wordone % vtop was vbox - \nointerlineskip - \box\wordtwo - \nointerlineskip - \box\wordthree - \nointerlineskip - \box\wordfour - \nointerlineskip - \box\wordfive - \nointerlineskip - \box\wordsix - }% - }% - \testdone - \ifnotdone - \repeat - \egroup % matches \bgroup in \gloss - \gl@stop} - - - -\gdef\sevensent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\ #7\\{% #1 = first line, #2 = second line, #3 = third etc - \getwords(\lineone,\eachwordone)#1 \\% - \getwords(\linetwo,\eachwordtwo)#2 \\% - \getwords(\linethree,\eachwordthree)#3 \\% - \getwords(\linefour,\eachwordfour)#4 \\% - \getwords(\linefive,\eachwordfive)#5 \\% - \getwords(\linesix,\eachwordsix)#6 \\% - \getwords(\lineseven,\eachwordseven)#7 \\% - \loop\lastword{\eachwordone}{\lineone}{\wordone}% - \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% - \lastword{\eachwordthree}{\linethree}{\wordthree}% - \lastword{\eachwordfour}{\linefour}{\wordfour}% - \lastword{\eachwordfive}{\linefive}{\wordfive}% - \lastword{\eachwordsix}{\linesix}{\wordsix}% - \lastword{\eachwordseven}{\lineseven}{\wordseven}% - \global\setbox\gline=\hbox{\unhbox\gline - \hskip\glossglue - \vtop{\box\wordone % vtop was vbox - \nointerlineskip - \box\wordtwo - \nointerlineskip - \box\wordthree - \nointerlineskip - \box\wordfour - \nointerlineskip - \box\wordfive - \nointerlineskip - \box\wordsix - \nointerlineskip - \box\wordseven - }% - }% - \testdone - \ifnotdone - \repeat - \egroup % matches \bgroup in \gloss - \gl@stop} - - - -\gdef\eightsent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\ #7\\ #8\\{% #1 = first line, #2 = second line, #3 = third etc - \getwords(\lineone,\eachwordone)#1 \\% - \getwords(\linetwo,\eachwordtwo)#2 \\% - \getwords(\linethree,\eachwordthree)#3 \\% - \getwords(\linefour,\eachwordfour)#4 \\% - \getwords(\linefive,\eachwordfive)#5 \\% - \getwords(\linesix,\eachwordsix)#6 \\% - \getwords(\lineseven,\eachwordseven)#7 \\% - \getwords(\lineeight,\eachwordeight)#8 \\% - \loop\lastword{\eachwordone}{\lineone}{\wordone}% - \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% - \lastword{\eachwordthree}{\linethree}{\wordthree}% - \lastword{\eachwordfour}{\linefour}{\wordfour}% - \lastword{\eachwordfive}{\linefive}{\wordfive}% - \lastword{\eachwordsix}{\linesix}{\wordsix}% - \lastword{\eachwordseven}{\lineseven}{\wordseven}% - \lastword{\eachwordeight}{\lineeight}{\wordeight}% - \global\setbox\gline=\hbox{\unhbox\gline - \hskip\glossglue - \vtop{\box\wordone % vtop was vbox - \nointerlineskip - \box\wordtwo - \nointerlineskip - \box\wordthree - \nointerlineskip - \box\wordfour - \nointerlineskip - \box\wordfive - \nointerlineskip - \box\wordsix - \nointerlineskip - \box\wordseven - \nointerlineskip - \box\wordeight - }% - }% - \testdone - \ifnotdone - \repeat - \egroup % matches \bgroup in \gloss - \gl@stop} - -%\def\gl@stop{{\hskip -\glossglue}\unhbox\gline\end{flushleft}} - -% \leavevmode puts us back in horizontal mode, so that a \\ will work -\def\gl@stop{{\hskip -\glossglue}\unhbox\gline\leavevmode \egroup} -}{} %end toggle cgloss - -\iftoggle{jambox}{ -%BeGIN Jambox -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -% Alexis Dimitriadis -% -% This is version 0.3 (informal release, Nov. 2003). -% -% Line up material a fixed distance from the right margin. For annotating -% example sentences, usually with a short note in parentheses. -% May overflow to the left or right, or line up on the next line as necessary. -% -% \jambox[width]{text} Align 'text' starting 'width' distance from the -% right margin (default \the\jamwidth). -% \jam(something) Align a note delimited by parentheses (which are -% retained). No optional argument. -% \jambox*{text} Set \jamwidth to the width of 'text', then align it. -% (\jamwidth stays set for the rest of the environment). -% -% Notes: -% -% Distance from the right margin can be set to an explicit amount, or to the -% width of some piece of text, as follows: -% -% \jamwidth=2in\relax Or -% \settowidth\jamwidth {(``annotation'')} -% -% \jamwidth is locally scoped, so it can be set globally or inside an example -% environment. -% -% BUG: Not compatible with ragged-right mode. -% -% Incompatibilities: Not useful with the vanilla cgloss4e.sty, which ends -% glossed lines prematurely. -% I do have a suitably modified file, cgloss.sty. With it you can do the -% following: -% \gll To kimeno. \\ -% the text \\ \jambox{(Greek)} -% \trans `The text.' - - -\newdimen\jamwidth \jamwidth=2in -\def\jambox{\@ifnextchar[{\@jambox} - {\@ifnextchar*{\@jamsetbox}{\@jambox[\the\jamwidth]}}} - -% Set width AND display the argument. -% The star is read and ignored; the argument #1 is boxed, used to set -% \jamwidth, then passed to \@jambox (which also puts it in \@tempboxa!) -% -\def\@jamsetbox*#1{\setbox\@tempboxa\hbox{#1}\jamwidth=\wd\@tempboxa - \@jambox[\the\jamwidth]{\box\@tempboxa}} - -\def\@jambox[#1]#2{{\setbox\@tempboxa\hbox {#2}% - \ifdim \wd\@tempboxa<#1\relax % if label fits in the alloted space: - \@tempdima=#1\relax \advance\@tempdima by-\wd\@tempboxa % remaining \hspace - \unskip\nobreak\hfill\penalty250 % break line here if necessary - \hskip 1.2em minus 1.2em % used when the line extends past the margin - \hbox{}\nobreak\hfill\box\@tempboxa\nobreak - \hskip\@tempdima minus \@tempdima\hbox{}% - \else % the label is too wide: just right-align it - \hfill\penalty50\hbox{}\nobreak\hfill\box\@tempboxa - \fi - % suppress closing glue: - \parfillskip=0pt \finalhyphendemerits=0 \par}} -% The penalty enables a break, taken only if the line cannot fit. -% The \hbox{} ensures the next line does not begin with \hfill, which would -% be discarded if initial. -% (\vadjust inserts an empty element at the beginning of the next line, so -% that COULD be used instead of \hbox{}). -% Algorithm adapted from The TeXBook. -% -% The closing \par could be a problem if there is a \parskip... -}{} -\endinput diff --git a/readme conversions/pandoc-ling-old.lua b/readme conversions/pandoc-ling-old.lua deleted file mode 100755 index 1b92afa..0000000 --- a/readme conversions/pandoc-ling-old.lua +++ /dev/null @@ -1,908 +0,0 @@ ---[[ -pandoc-linguex: make interlinear glossing with pandoc - -Copyright © 2021 Michael Cysouw - -Permission to use, copy, modify, and/or distribute this software for any -purpose with or without fee is hereby granted, provided that the above -copyright notice and this permission notice appear in all copies. - -THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES -WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR -ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES -WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN -ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF -OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. -]] - -PANDOC_VERSION:must_be_at_least '2.10' - ---------------------- --- 'global' variables ---------------------- - -local counter = 0 -- actual numbering of examples -local chapter = 1 -- numbering of chapters (for unknown reasons this starts at 1, not 0) -local counterInChapter = 0 -- counter reset for each chapter -local indexEx = {} -- global lookup for example IDs -local orderInText = 0 -- order of references for resolving "Next"-style references -local indexRef = {} -- key/value: order in text = refID/exID -local rev_indexRef = {} -- "reversed" indexRef, i.e. key/value: refID/exID = order-number in text - ------------------------------------- --- User Settings with default values ------------------------------------- - -local formatGloss = false -- format interlinear examples -local xrefSuffixSep = " " --   separator to be inserted after number in example references -local restartAtChapter = false -- restart numbering at highest header without adding local chapternumbers -local addChapterNumber = false -- add chapternumbers to counting and restart at highest header -local latexPackage = "linguex" -local topDivision = "section" - -function getUserSettings (meta) - if meta.formatGloss ~= nil then - formatGloss = meta.formatGloss - end - if meta.xrefSuffixSep ~= nil then - xrefSuffixSep = pandoc.utils.stringify(meta.xrefSuffixSep) - end - if meta.restartAtChapter ~= nil then - restartAtChapter = meta.restartAtChapter - end - if meta.addChapterNumber ~= nil then - addChapterNumber = meta.addChapterNumber - end - if meta.latexPackage ~= nil then - latexPackage = pandoc.utils.stringify(meta.latexPackage) - end - if meta["top-level-division"] ~= nil then - topDivision = pandoc.utils.stringify(meta["top-level-division"]) - end -end - ------------------------------------------- --- add latex dependencies: langsci-gb4e is not on CTAN! --- restarting of counters is not working right for gb4e ------------------------------------------- - -function addFormatting (meta) - local tmp = meta['header-includes'] or pandoc.MetaList{meta['header-includes']} - - if FORMAT:match "html" then - -- add specific CSS for layout of examples - -- building on classes set in this filter - -- local f = io.open("pandoc-ling.css") - -- local css = f:read("*a") - -- f:close() - local css = [[ - - ]] - tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("html", css)) - - meta['header-includes'] = tmp - end - - if FORMAT:match "latex" then - - local function add (s) - tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("tex", s)) - end - - if latexPackage == "linguex" then - add("\\usepackage{linguex}") - -- no brackets - add("\\renewcommand{\\theExLBr}{}") - add("\\renewcommand{\\theExRBr}{}") - --add("\\renewcommand{\\firstrefdash}{}") - add("\\usepackage{chngcntr}") - if addChapterNumber then - add("\\counterwithin{ExNo}{"..topDivision.."}") - add("\\renewcommand{\\Exarabic}{\\the"..topDivision..".\\arabic}") - elseif restartAtChapter then - add("\\counterwithin*{ExNo}{"..topDivision.."}") - end - - elseif latexPackage:match "gb4e" then - add("\\usepackage{"..latexPackage.."}") - -- nnext package does not work with added top level number - add("\\usepackage[noparens]{nnext}") - add("\\usepackage{chngcntr}") - if addChapterNumber then - add("\\counterwithin{xnumi}{"..topDivision.."}") - elseif restartAtChapter then - add("\\counterwithin*{xnumi}{"..topDivision.."}") - end - - elseif latexPackage == "expex" then - add("\\usepackage{expex}") - add("\\lingset{belowglpreambleskip=-1.5ex, aboveglftskip=-1.5ex, exskip=0ex, interpartskip=-0.5ex, belowpreambleskip=-1ex}") - if addChapterNumber then - add("\\lingset{exnotype=chapter.arabic}") - end - if restartAtChapter then - --add("\\usepackage{epltxchapno}") - add("\\usepackage{etoolbox}") - add("\\pretocmd{\\"..topDivision.."}{\\excnt=1}{}{}") - end - - end - meta['header-includes'] = tmp - end - return meta -end - ------------------------------------------- --- add invisible numbering to section ------------------------------------------- - -function addSectionNumbering (doc) - local sections = pandoc.utils.make_sections(true, nil, doc.blocks) - return pandoc.Pandoc(sections, doc.meta) -end - ---------------------------- --- help function for format ---------------------------- - -function splitPara (p) - -- remove quotes, they interfere with the layout - if p[1].tag == "Quoted" then - p = p[1].content - end - -- split paragraph in subtables at Space - -- to insert paragraph into pandoc.Table - -- Is there a better way to do this in Pandoc-Lua? - local start = 1 - local result = {} - for i=1,#p do - if p[i].tag == "Space" then - local chunk = table.move(p, start, i-1, 1, {}) - table.insert(result, {pandoc.Plain(chunk)} ) - start = i + 1 - end - end - if start <= #p then - local chunk = table.move(p, start, #p, 1, {}) - table.insert(result, {pandoc.Plain(chunk)} ) - end - return result -end - -function turnIntoTable (rowContent, nCols, extraCols) - -- turn examples into Tables for alignment - -- use simpleTable for construction - local caption = {} - local headers = {} - local aligns = {} - for i=1,nCols do aligns[i] = "AlignLeft" end - aligns[extraCols + 1] = "AlignRight" -- Column for grammaticality judgements - local widths = {} - for i=1,nCols do widths[i] = 0 end - local rows = rowContent - - local result = pandoc.SimpleTable( - caption, - aligns, - widths, - headers, - rows - ) - -- turn into fancy new tables - result = pandoc.utils.from_simple_table(result) - - -- set class of table to "example" for styling via CSS - result.attr = {class = "linguistic-example"} - -- set class of judgment columns to "judgment" for styling via CSS - for i=1,#result.bodies[1].body do - result.bodies[1].body[i][2][extraCols+1].attr = pandoc.Attr(nil, {"linguistic-judgement"}) - end - - return result -end - -function splitForSmallCaps (s) - -- turn uppercase in gloss into small caps - local split = {} - for lower,upper in string.gmatch(s, "(.-)([%u%d][%u%d]+)") do - if lower ~= "" then - lower = pandoc.Str(lower) - table.insert(split, lower) - end - upper = pandoc.SmallCaps(pandoc.text.lower(upper)) - table.insert(split, upper) - end - for leftover in string.gmatch(s, "[%u%d][%u%d]+(.-[^%u%s])$") do - leftover = pandoc.Str(leftover) - table.insert(split, leftover) - end - if #split == 0 then - if s == "~" then s = "   " end -- sequence "space-nobreakspace-space" - table.insert(split, pandoc.Str(s)) - end - - return split -end - -function splitJudgement (line) - local judgement = "" - local first = pandoc.utils.stringify(line[1]) - if first == "^" then - judgement = line[2] - table.remove(line, 1) - table.remove(line, 1) - table.remove(line, 1) - elseif string.sub(first, 1, 1) == "^" then - judgement = pandoc.Str(string.sub(first, 2)) - table.remove(line, 1) - table.remove(line, 1) - end - return judgement, line -end - ------------------------- --- make markup in Pandoc ------------------------- - -function pandocMakeSingle (single, extraCols) - -- Make just a single-line example - local judge, data = splitJudgement(single) - local line = { {pandoc.Plain(judge)}, {pandoc.Plain(data)} } - - -- add extra columns before - -- either one (nummer) or two (nummer, letter) - if extraCols > 0 then - for i=1,extraCols do - table.insert(line, 1, {} ) - end - end - - -- turn into Table - local nCols = #line - local rowContent = { line } - local exampleSingle = turnIntoTable(rowContent, nCols, extraCols) - return exampleSingle -end - -function pandocMakeInterlinear (block, extraCols, formatOverride) - -- Make interlinear gloss 4-liner from LineBlock input - -- override format per example - local globalFormatGloss = formatGloss - if formatOverride ~= nil then - formatGloss = (formatOverride == "true") - end - - -- the four lines are: header, source, gloss and trans(lation) - local header = { { pandoc.Plain(block[1]) } } - table.insert(header, 1, {} ) - - local judgeSource, source = splitJudgement(block[2]) - source = splitPara(source) - if formatGloss then - -- remove format at make emph throughout - for i=1,#source do - local string = pandoc.utils.stringify(source[i]) - source[i] = { pandoc.Plain(pandoc.Emph(string)) } - end - end - table.insert(source, 1, { pandoc.Plain(judgeSource) } ) - - local gloss = splitPara(block[3]) - if formatGloss then - -- remove format and turn capital-sequences into smallcaps - for i=1,#gloss do - local string = pandoc.utils.stringify(gloss[i]) - gloss[i] = { pandoc.Plain(splitForSmallCaps(string)) } - end - end - table.insert(gloss, 1, {} ) - - local trans = block[#block] - if formatGloss then - -- remove quotes and add singlequote througout - if trans[1].tag == "Quoted" then - trans = trans[1].content - end - trans = {{ pandoc.Plain(pandoc.Quoted("SingleQuote", trans)) }} - else - trans = {{ pandoc.Plain(trans) }} - end - table.insert(trans, 1, {} ) - - -- return to global setting - if formatOverride ~= nil then - formatGloss = globalFormatGloss - end - - -- add extra columns before, either one or two - for i=1,extraCols do - table.insert(header, 1, {} ) - table.insert(source, 1, {} ) - table.insert(gloss, 1, {} ) - table.insert(trans, 1, {} ) - end - - -- turn into Table - local nCols = math.max(#source, #gloss) - local rowContent = {header, source, gloss, trans} - local interlinear = turnIntoTable(rowContent, nCols, extraCols) - - -- make header and trans long cells - interlinear.bodies[1].body[1][2][extraCols+2].col_span = nCols - extraCols - 1 - interlinear.bodies[1].body[#block][2][extraCols+2].col_span = nCols - extraCols - 1 - - -- shift upwards when header is empty - if next(block[1]) == nil then - table.remove(interlinear.bodies[1].body, 1) - end - - return interlinear -end - --- When multiple interlinears are combined, separate Tables are needed --- also make separate Tables when single examples are mixed with interlinears - -function pandocMakeList(data, number, formatOverride) - -- make a list of tables - local example = {} - -- go through all items of the list - for i=1,#data do - - if data[i][1].tag ~= "LineBlock" then - example[i] = pandocMakeSingle(data[i][1].content, 2) - -- add letter for sub-example in second column - example[i].bodies[1].body[1][2][2].contents[1] = - pandoc.Plain(string.char(96+i)..".") - - if i>1 and data[i-1][1].tag ~= "LineBlock" then - -- add tablerow to previous if also Plain/Para - table.insert(example[i-1].bodies[1].body, example[i].bodies[1].body[1]) - -- exchange tables - example[i] = example[i-1] - example[i-1] = "ignore" - end - - elseif data[i][1].tag == "LineBlock" then - example[i] = pandocMakeInterlinear(data[i][1].content, 2, formatOverride) - -- add letter for sub-example in second column - example[i].bodies[1].body[1][2][2].contents[1] = - pandoc.Plain(string.char(96+i)..".") - end - end - - -- remove empty tables. Work around for `table.remove` - local exampleList = {} - for i=1,#example do - if example[i] ~= "ignore" then - table.insert(exampleList,example[i]) - end - end - - -- keep track of judgements for better alignment - local judgeSize = 0 - for i=1,#exampleList do - for j=1,#exampleList[i].bodies[1].body do - if exampleList[i].bodies[1].body[j][2][3].contents[1] ~= nil then - local judge = pandoc.utils.stringify(exampleList[i].bodies[1].body[j][2][3].contents[1]) - judgeSize = math.max(judgeSize, utf8.len(judge)) - end - end - end - - -- rough approximations - local spaceForNumber = string.rep(" ", 2*(string.len(number)+2)) - local spaceForLabel = tostring(15 + 5*judgeSize) - if judgeSize == 0 then spaceForLabel = 0 end - - for i=1,#exampleList do - -- For better alignment with example number, add invisibles in first column - -- not nice solution, but portable across formats - exampleList[i].bodies[1].body[1][2][1].contents[1] = pandoc.Plain(spaceForNumber) - -- For better alignment, add column-width to judgement column - -- note: this is not portable outside html - exampleList[i].bodies[1].body[1][2][3].attr = - pandoc.Attr(nil, { "linguistic-judgement" }, { width = spaceForLabel.."px"} ) - end - - return exampleList -end - -function pandocMakeExample (data, number, formatOverride) - -- make the examples as list of tables - local example = {} - local preamble = nil - - if #data == 2 then - -- first part is assumed to be preamble - preamble = data[1].content - -- go on with second part - data = { data[2] } - end - - if data[1].tag == "Para" then - -- make one-line example - example[1] = pandocMakeSingle(data[1].content, 1) - elseif data[1].tag == "LineBlock" then - -- make one interlinear example - example[1] = pandocMakeInterlinear(data[1].content, 1, formatOverride) - elseif data[1].tag == "OrderedList" then - -- make list of examples - example = pandocMakeList(data[1].content, number, formatOverride) - end - - if preamble ~= nil then - -- How many positions should preamble be shifted to the left? - local shift = 1 - if data[1].tag == "OrderedList" then shift = 0 end - -- insert preamble as first row in example - preamble = pandocMakeSingle(preamble, shift) - table.insert(example[1].bodies[1].body, 1, preamble.bodies[1].body[1]) - -- make preamble multi-column - local range = #example[1].colspecs - shift - 1 - example[1].bodies[1].body[1][2][2].col_span = range - end - - -- Add example number to top left of first table - local numberParen = pandoc.Plain( "("..number..")" ) - example[1].bodies[1].body[1][2][1].contents[1] = numberParen - - return example -end - --------------------------- --- make markup in Latex --- using langsci-gb4e --------------------------- - --- convenience functions for Latex -function texFront (tex, pdoc) - return table.insert(pdoc, 1, pandoc.RawInline("tex", tex)) -end - -function texEnd (tex, pdoc) - return table.insert(pdoc, pandoc.RawInline("tex", tex)) -end - --- this is not ideal. It is too complex to really get judgement layout to work -function texSplitJudgement (line) - local judge, text = splitJudgement(line) - if judge ~= "" then - if latexPackage == "expex" then - judge = pandoc.utils.stringify(judge) - texFront("\\ljudge{"..judge.."} ", text) - else - table.insert(text, 1, judge) - end - end - return text -end - --- different kinds of examples: single line, interlinear, list -function texMakeSingle (line) - local example = texSplitJudgement(line) - texFront("\n ", example) - return example -end - -function texMakeInterlinear (block, exID, label, level, formatOverride ) - -- make one interlinear - - --check for local override of formatting - local globalFormatGloss = formatGloss - if formatOverride ~= nil then - formatGloss = (formatOverride == "true") - end - - -- the four lines are: header, source, gloss and trans(lation) - local header = block[1] - if level == 1 then label = "" end - if latexPackage == "expex" then - if #header > 1 then - texFront(" "..label.."\n \\begingl\n \\glpreamble ", header) - texEnd("//", header) - else - texFront("\n "..label.."\n \\begingl", header) - end - else - --if level == 1 then - -- texFront("\n ", header) - --else - texFront("\n "..label.." ", header) - --end - -- langsci-gb4e behaves here different from gb4e - if latexPackage == "langsci-gb4e" then - if #header > 1 then - texEnd("\\\\", header) - end - end - end - - local source = texSplitJudgement (block[2]) - if formatGloss then - for i=1,#source do - if source[i].tag ~= "Space" then - local string = pandoc.utils.stringify(source[i]) - source[i] = pandoc.Emph(string) - end - end - end - -- add latex - if latexPackage == "expex" then - texFront("\n \\gla ", source) - texEnd("//", source) - else - texFront("\n \\gll ", source) - texEnd("\\\\", source) - end - - - local gloss = block[3] - if formatGloss then - local result = pandoc.List() - for i=1,#gloss do - local string = pandoc.utils.stringify(gloss[i]) - result:extend(splitForSmallCaps(string)) - end - gloss = result - end - -- add latex - if latexPackage == "expex" then - texFront("\n \\glb ", gloss) - texEnd("//", gloss) - else - texFront("\n ",gloss) - texEnd("\\\\", gloss) - end - - local trans = block[4] - if formatGloss then - if trans[1].tag == "Quoted" then - trans = trans[1].content - texFront("`", trans) - texEnd("'", trans) - end - end - -- add latex - if latexPackage == "expex" then - texFront("\n \\glft ", trans) - texEnd("//\n \\endgl", trans) - else - texFront("\n \\glt ", trans) - end - - -- return to global setting - if formatOverride ~= nil then - formatGloss = globalFormatGloss - end - - -- combine for output - local interlinear = header - interlinear:extend(source) - interlinear:extend(gloss) - interlinear:extend(trans) - return interlinear -end - -function texMakeList (list, exID, formatOverride) - local example = pandoc.List() - local labeltwo = "" - - for i=1,#list do - - if latexPackage == "linguex" then - if i == 1 then labeltwo = "\\a." else labeltwo = "\\b." end - elseif latexPackage:match "gb4e" then - if i == 1 then labeltwo = "\\ea" else labeltwo = "\\ex" end - elseif latexPackage == "expex" then - labeltwo = "\\a" - end - - if list[i][1].tag ~= "LineBlock" then - local line = texSplitJudgement( list[i][1].content ) - texFront("\n "..labeltwo.." ", line) - example:extend(line) - - elseif list[i][1].tag == "LineBlock" then - local line = texMakeInterlinear(list[i][1].content, exID, labeltwo, 2, formatOverride) - if latexPackage:match "gb4e" then - texFront("\n", line) - texEnd("\n", line) - end - example:extend(line) - end - end - return example -end - -function texMakeExample (data, exID, formatOverride) - local example = pandoc.List() - - -- different labeling for tex packages - local labelone = "" - if latexPackage == "linguex" then labelone = "\\ex." - elseif latexPackage == "expex" then labelone = "\\ex" - elseif latexPackage:match "gb4e" then labelone = "\\ea" - end - - if #data == 2 then - -- assume first part is header - example = data[1].content - -- and then proceed with second part - data = { data[2] } - end - - if data[1].tag == "Para" then - -- example beginning - if #example > 0 then texEnd("\\\\", example) end - if latexPackage == "expex" then - texFront(labelone.." <"..exID.."> ", example) - else - texFront(labelone.." \\label{"..exID.."} ", example) - end - -- add one-line example - local line = texMakeSingle(data[1].content) - example:extend(line) - -- example ending - if latexPackage:match "gb4e" then - texEnd("\n\\z", example) - elseif latexPackage == "expex" then - texEnd("\n\\xe", example) - end - - elseif data[1].tag == "LineBlock" then - -- example beginning - if latexPackage == "expex" then - texFront(labelone.." <"..exID.."> ", example) - else - texFront(labelone.." \\label{"..exID.."} ", example) - end - -- add interlinear - local interlinear = texMakeInterlinear(data[1].content, exID, labelone, 1, formatOverride) - example:extend(interlinear) - -- example ending - if latexPackage:match "gb4e" then - texEnd("\n \\z", example) - elseif latexPackage == "expex" then - texEnd("\n\\xe", example) - end - - elseif data[1].tag == "OrderedList" then - -- example beginning - if latexPackage == "expex" then - texFront("\\pex <"..exID.."> ", example) - else - texFront(labelone.." \\label{"..exID.."} ", example) - end - -- add list of examples - local list = texMakeList(data[1].content, exID, formatOverride) - example:extend(list) - -- example ending - if latexPackage:match "gb4e" then - texEnd("\n \\z", example) - texEnd("\n\\z", example) - elseif latexPackage == "expex" then - texEnd("\n\\xe", example) - end - end - - return pandoc.Plain(example) -end - --------------------------- --- format example from div --------------------------- - -function makeExample (div) - - -- keep track of chapters (primary sections) - if div.classes[1] == "section" then - if div.attributes.number ~= nil and string.len(div.attributes.number) == 1 then - chapter = chapter + 1 - counterInChapter = 0 - end - end - - -- only do formatting for divs with class "ex" - if div.classes[1] == "ex" then - - -- keep count of examples - counter = counter + 1 - counterInChapter = counterInChapter + 1 - - -- format the numbering - local number = counter - if addChapterNumber then - number = chapter.."."..counterInChapter - elseif restartAtChapter then - number = counterInChapter - end - - -- make identifier for example - -- or keep user-provided identifier - local exID = "" - if div.identifier == "" then - exID = "ling-ex:"..chapter.."."..counterInChapter - else - exID = div.identifier - end - - -- keep global index of ids/numbers for crossreference - indexEx[exID] = number - - -- check format override per example - local formatOverride = div.attributes['formatGloss'] - - -- make different format for latex - if FORMAT:match "latex" then - return texMakeExample(div.content, exID, formatOverride) - else - local example = pandocMakeExample(div.content, number, formatOverride) - -- add temporary Cite to resolve "Next"-type references in pandoc - -- will be removed after cross-references are in place - local tmpCite = pandoc.Cite({pandoc.Str("@Target")},{pandoc.Citation(exID,"NormalCitation")}) - - return { - pandoc.Plain(tmpCite), - pandoc.Div(example, pandoc.Attr(exID) ) - } - end - end -end - -------------------------- --- format crossreferences -------------------------- - -function uniqueNextrefs (cite) - - -- to resolve "Next"-style references give them all an unique ID - -- make indices to check in which order they occur - local nameN = string.match(cite.content[1].text, "([N]+)ext") - local nameL = string.match(cite.content[1].text, "([L]+)ast") - local target = string.match(cite.content[1].text, "@Target") - - -- use random ID to make unique - if nameN ~= nil or nameL ~= nil then - cite.citations[1].id = tostring(math.random(99999)) - end - - -- make indices - if nameN ~= nil or nameL ~= nil or target ~= nil then - orderInText = orderInText + 1 - indexRef[orderInText] = cite.citations[1].id - rev_indexRef[cite.citations[1].id] = orderInText - end - - return(cite) -end - -function resolveNextrefs (cite) - - -- assume Next-style refs have numeric id (from uniqueNextrefs) - -- assume Example-IDs are not numeric (user should not use them!) - local id = cite.citations[1].id - local order = rev_indexRef[id] - - local distN = 0 - local sequenceN = string.match(cite.content[1].text, "([N]+)ext") - if sequenceN ~= nil then distN = string.len(sequenceN) end - - if distN > 0 then - for i=order,#indexRef do - if tonumber(indexRef[i]) == nil then - distN = distN - 1 - if distN == 0 then - cite.citations[1].id = indexRef[i] - end - end - end - end - - local distL = 0 - local sequenceL = string.match(cite.content[1].text, "([L]+)ast") - if sequenceL ~= nil then distL= string.len(sequenceL) end - - if distL > 0 then - for i=order,1,-1 do - if tonumber(indexRef[i]) == nil then - distL = distL - 1 - if distL == 0 then - cite.citations[1].id = indexRef[i] - end - end - end - end - - return(cite) -end - -function removeTmpTargetrefs (cite) - -- remove temporary cites for resolving Next-style reference - if cite.content[1].text == "@Target" then - return pandoc.Plain({}) - end -end - -function makeCrossrefs (cite) - - local id = cite.citations[1].id - local name = string.gsub(cite.content[1].text, "[%[%]@]", "") - local suffix = "" - local expexName = {Next = "nextx", NNext = "anextx", Last = "lastx", LLast = "blastx"} - - -- prevent Latex error when user sets xrefSuffixSep to space or nothing - if FORMAT:match "latex" then - if xrefSuffixSep == "" or xrefSuffixSep == " " or xrefSuffixSep == " " then - xrefSuffixSep = "\\," - end - end - - -- only make suffix if there is something there - if #cite.citations[1].suffix > 0 then - suffix = pandoc.utils.stringify(cite.citations[1].suffix[2]) - suffix = xrefSuffixSep..suffix - end - - -- make the cross-references - if FORMAT:match "latex" then - if latexPackage == "expex" then - if string.match("@Next@NNext@Last@LLast", name) ~= nil then - return pandoc.RawInline("latex", "({\\"..expexName[name].."}"..suffix..")") - elseif indexEx[id] ~= nil then - -- ignore other "cite" elements - return pandoc.RawInline("latex", "(\\getref{"..id.."}"..suffix..")") - end - else - if string.match("@Next@NNext@Last@LLast", name) ~= nil then - -- let latex handle these - return pandoc.RawInline("latex", "({\\"..name.."}"..suffix..")") - elseif indexEx[id] ~= nil then - -- ignore other "cite" elements - return pandoc.RawInline("latex", "(\\ref{"..id.."}"..suffix..")") - end - end - elseif indexEx[id] ~= nil then - -- ignore other "cite" elements - return pandoc.Link("("..indexEx[id]..suffix..")", "#"..id) - end - -end - ------------------------------------------- --- Pandoc trick to cycle through documents ------------------------------------------- - -return { - -- preparations - { Pandoc = addSectionNumbering }, - { Meta = getUserSettings }, - { Meta = addFormatting }, - -- formatting linguistic examples as tables - { Div = makeExample }, - -- three passes necessary to resolve NNext-style references - { Cite = uniqueNextrefs }, - { Cite = resolveNextrefs }, - { Cite = removeTmpTargetrefs }, - -- now finally all cross-references can be set - { Cite = makeCrossrefs } -} diff --git a/readme conversions/processVerbatim.lua b/readme conversions/processVerbatim.lua deleted file mode 100644 index 63657ea..0000000 --- a/readme conversions/processVerbatim.lua +++ /dev/null @@ -1,7 +0,0 @@ -function addRealCopy (code) - return { code, pandoc.RawBlock("markdown", code.text) } -end - -return { - { CodeBlock = addRealCopy } -} diff --git a/readme conversions/readme.docx b/readme conversions/readme.docx deleted file mode 100644 index bcf9aea..0000000 Binary files a/readme conversions/readme.docx and /dev/null differ diff --git a/readme conversions/readme.epub b/readme conversions/readme.epub deleted file mode 100644 index c339392..0000000 Binary files a/readme conversions/readme.epub and /dev/null differ diff --git a/readme conversions/readme.html b/readme conversions/readme.html deleted file mode 100644 index 78d02a9..0000000 --- a/readme conversions/readme.html +++ /dev/null @@ -1,694 +0,0 @@ - - - - - - - - Using pandoc-ling - - - - - - - -
-

Using pandoc-ling

-

Michael Cysouw

-
- -

1 pandoc-ling

-

Michael Cysouw <>

-

A Pandoc filter for linguistic examples

-

tl;dr

-
    -
  • Easily write linguistic examples including basic interlinear glossing.
  • -
  • Let numbering and cross-referencing be done for you.
  • -
  • Export to (almost) any format of your wishes for final polishing.
  • -
-

2 Rationale

-

In the field of linguistics there is an outspoken tradition to format example sentences in research papers in a very specific way. In the field, it is a perennial problem to get such example sentences to look just right. Within Latex, there are numerous packages to deal with this problem (e.g. covington, linguex, gb4e, expex, etc.). Depending on your needs, there is some Latex solution for almost everyone. However, these solutions in Latex are often cumbersome to type, and they are not portable to other formats. Specifically, transfer between Latex, html, docx, odt or epub would actually be highly desirable. Such transfer is the hallmark of Pandoc, a tool by John MacFarlane that provides conversion between these (and many more) formats.

-

Any such conversion between text-formats naturally never works perfectly: every text-format has specific features that are not transferable to other formats. A central goal of Pandoc (at least in my interpretation) is to define a set of shared concepts for text-structure (a ‘common denominator’ if you will, but surely not ‘least’!) that can then be mapped to other formats. In many ways, Pandoc tries (again) to define a set of logical concepts for text structure (‘semantic markup’), which can then be formatted by your favourite typesetter. As long as you stay inside the realm of this ‘common denominator’ (in practice that means Pandoc’s extended version of Markdown/CommonMark), conversion works reasonably well (think 90%-plus).

-

Building on John Gruber’s Markdown philosophy, there is a strong urge here to learn to restrain oneself while writing, and try to restrict the number of layout-possibilities to a minimum. In this sense, with pandoc-ling I propose a Markdown-structure for linguistic examples that is simple, easy to type, easy to read, and portable through the Pandoc universe by way of an extension mechanism of Pandoc, called a ‘Pandoc Lua Filter’. This extension will not magically allow you to write every linguistic example thinkable, but my guess is that in practice the present proposal covers the majority of situations in linguistic publications (think 90%-plus). As an example (and test case) I have included automatic conversions into various formats in this repository (chech them out to get an idea of the strengths and weaknesses of this approach).

-

3 The basic structure of a linguistic example

-

Basically, a linguistic examples consists of 6 possible building blocks, of which only the number and at least one example line are necessary. The space between the building blocks is kept as minimal as possible without becoming cramped. When (optional) building blocks are not included, then the other blocks shift left and up (only exception: a preamble without labels is not shifted left completely, but left-aligned with the example, not with the judgement).

-
    -
  • Number: Running tally of all examples in the work, possibly restarting at chapters or other major headings. Typically between round brackets, possibly with a chapter number added before in long works, e.g. example (7.26). Aligned top-left, typically left-aligned to main text margin.
  • -
  • Preamble: Optional information about the content/kind of example. Aligned top-left: to the top with the number, to the left with the (optional) label. When there is no label, then preamble is aligned with the example, not with the judgment.
  • -
  • Label: Indices for sub-examples. Only present when there are more than one example grouped together inside one numbered entity. Typically these sub-example labels use latin letters followed by a full stop. They are left-aligned with the preamble, and each label is top-aligned with the top-line of the corresponding example (important for longer line-wrapped examples).
  • -
  • Judgment: Examples can optionally have grammaticality judgments, typically symbols like **?!* sometimes in superscript relative to the corresponding example. judgements are right-aligned to each other, typically with only minimal space to the left-aligned examples.
  • -
  • Line example: A minimal linguistic example has at least one line example, i.e. an utterance of interest. Building blocks in general shift left and up when other (optional) building blocks are not present. Minimally, this results in a number with one line example.
  • -
  • Interlinear example: A complex structure typically used for examples from languages unknown to most readers. Consist of three or four lines that are left-aligned: -
      -
    • Header: An optional header is typically used to display information about the language of the example, including literature references. When not present, then all other lines from the interlinear example shift upwards.
    • -
    • Source: The actual language utterance, often typeset in italics. This line is internally separated at spaces, and each sub-block is left-aligned with the corresponding sub-blocks of the gloss.
    • -
    • Gloss: Explanation of the meaning of the source, often using abbreviations in small caps. This line is internally separated at spaces, and each block is left-aligned with the block from source.
    • -
    • Translation: Free translation of the source, typically quoted. Not separated in blocks, but freely extending to the right. Left-aligned with the other lines from the interlinear example.
    • -
  • -
-
-The structure of a linguistic example. -
-

There are of course much more possibilities to extend the structure of a linguistic examples, like third or fourth subdivisions of labels (often using small roman numerals as a third level) or multiple glossing lines in the interlinear example. Also, the content of the header is sometimes found right-aligned to the right of the interlinear example (language into to the top, reference to the bottom). All such options are currently not supported by pandoc-ling.

-

Under the hood, this structure is prepared by pandoc-ling as a table. Tables are reasonably well transcoded to different document formats. Specific layout considerations mostly have to be set manually. Alignment of the text should work in most exports. Some CSS styling is proposed by pandoc-ling, but can of course be overruled.

-

4 Introducing pandoc-ling

-

4.1 Editing linguistic examples

-

To include a linguistic example in Markdown pandoc-ling uses the div structure, which is indicated in Pandoc-Markdown by typing three colons at the start and three colons at the end. To indicate the class of this div the letters ‘ex’ (for ‘example’) should be added after the top colons (with or without space in between). This ‘ex’-class is the signal for pandoc-ling to start processing such a div. The numbering of these examples will be inserted by pandoc-ling.

-

Empty lines can be added inside the div for visual pleasure, as they mostly do not have an influence on the output. Exception: do not use empty lines between unlabelled line examples. Multiple lines of text can be used (without empty lines in between), but they will simply be interpreted as one sequential paragraph.

-
::: ex
-This is the most basic structure of a linguistic example. 
-:::
-
- - - - - - - - -
(4.1)This is the most basic structure of a linguistic example.
-
-

Alternatively, the class can be put in curled brackets (and then a leading full stop is necessary before ex). Inside these brackets more attributes can be added (separated by space), for example an id, using a hash, or any attribute=value pairs that should apply to this example. Currently there is only one attribute implemented (formatGloss), but in principle it is possible to add more attributes that can be used to fine-tune the typesetting of the example.

-
::: {#id .ex formatGloss=false}
-
-This is a multi-line example.
-But that does not mean anything for the result
-All these lines are simply treated as one paragraph.
-They will become one example with one number.
-
-:::
-
- - - - - - - - -
(4.2)This is a multi-line example. But that does not mean anything for the result All these lines are simply treated as one paragraph. They will become one example with one number.
-
-

A preamble can be added by inserting an empty line between preamble and example. The same considerations about multiple text-lines apply.

-
:::ex
-Preamble
-
-This is an example with a preamble.
-:::
-
- - - - - - - - - - - - -
(4.3)Preamble
This is an example with a preamble.
-
-

Sub-examples with labels are entered by starting each sub-example with a small latin letter and a full stop. Empty lines between labels are allowed. Subsequent lines without labels are treated as one paragraph. Empty lines not followed by a label with a full stop will result in errors.

-
:::ex
-a. This is the first example.
-b. This is the second.
-a. The actual letters are not important, `pandoc-ling` will put them in order.
-
-e. Empty lines are allowed between labelled lines
-Subsequent lines are again treated as one sequential paragraph.
-:::
-
- - - - - - - - - - - - - - - - - - - - - - - - -
(4.4)a.This is the first example.
b.This is the second.
c.The actual letters are not important, pandoc-ling will put them in order.
d.Empty lines are allowed between labelled lines Subsequent lines are again treated as one sequential paragraph.
-
-

A labelled list can be combined with a preamble.

-
:::ex
-Any nice description here
-
-a. one example sentence.
-b. two
-c. three
-:::
-
- - - - - - - - - - - - - - - - - - - - - - - -
(4.5)Any nice description here
a.one example sentence.
b.two
c.three
-
-

Grammaticality judgements should be added before an example, and after an optional label, separated from both by spaces (though four spaces in a row should be avoided, that could lead to layout errors). To indicate that any sequence of symbols is a judgements, prepend the judgement with a caret ^. Alignment will be figured out by pandoc-ling.

-
:::ex
-Throwing in a preamble for good measure
-
-a. ^* This traditionally signals ungrammaticality.
-b. ^? Question-marks indicate questionable grammaticality.
-c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
-d. However, such long sequences sometimes lead to undesirable effects in the layout.
-:::
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(4.6)Throwing in a preamble for good measure
a.*This traditionally signals ungrammaticality.
b.?Question-marks indicate questionable grammaticality.
c.whynot?But in principle any sequence can be used (here even in superscript).
d.However, such long sequences sometimes lead to undesirable effects in the layout.
-
-

A minor detail is the alignment of a single example with a preamble and grammaticality judgements. In this case it looks better for the preamble to be left aligned with the example and not with the judgement.

-
:::ex
-Here is a special case with a preamble
-
-^^???^ With a singly questionably example.
-Note the alignment! Especially with this very long example
-that should go over various lines in the output.
-:::
-
- - - - - - - - - - - - - - -
(4.7)Here is a special case with a preamble
???With a singly questionably example. Note the alignment! Especially with this very long example that should go over various lines in the output.
-
-

4.2 Interlinear examples

-

For interlinear examples with aligned source and gloss, the structure of a lineblock is used, starting the lines with a vertical line |. There should always be four vertical lines (for header, source, gloss and translation, respectively), although the content after the first vertical line can be empty. The source and gloss lines are separated at spaces, and all parts are right-aligned. If you want to have a space that is not separated, you will have to ‘protect’ the space, either by putting a backslash before the space, or by inserting a non-breaking space instead of a normal space (either type &nbsp; or insert an actual non-breaking space, i.e. unicode character U+00A0).

-
:::ex
-| Dutch (Germanic)
-| Deze zin is in het nederlands.
-| DEM sentence AUX in DET dutch.
-| This sentence is dutch.
-:::
-
- - - - - - - - - - - - - - - - - - - - - - - - - - -
(4.8)Dezezinisinhetnederlands.
DEMsentenceAUXinDETdutch.
This sentence is dutch.
-
-

An attempt is made to format interlinear examples when the option formatGloss=true is added. This will:

-
    -
  • remove formatting from the source and set everything in italics,
  • -
  • remove formatting from the gloss and set sequences (>1) of capitals and numbers into small caps (note that the positioning of small caps on web pages is highly complex),
  • -
  • a tilde ~ between spaces in the gloss is treated as a shortcut for an empty gloss (internally, the sequence space-tilde-space is replaced by space-space-nonBreakingSpace-space-space),
  • -
  • consistently put translations in single quotes, possibly removing other quotes.
  • -
- -
::: {.ex formatGloss=true}
-| Dutch (Germanic)
-| Deze zin is in het nederlands.
-| DEM sentence AUX in DET dutch.
-| This sentence is dutch.
-:::
-
- - - - - - - - - - - - - - - - - - - - - - - - - - -
(4.9)Dezezinisinhetnederlands.
demsentenceauxindetdutch.
‘This sentence is dutch.’
-
-

The results of such formatting will not always work, but it seems to be quite robust in my testing. The next example brings everything together:

-
    -
  • a preamble,
  • -
  • labels, both for single lines and for interlinear examples,
  • -
  • interlinear examples start on a new line immediately after the letter-label,
  • -
  • grammaticality judgements with proper alignment,
  • -
  • when the header of an interlinear example is left out, everything is shifted up,
  • -
  • The formatting of the interlinear is harmonised.
  • -
- -
::: {.ex formatGloss=true}
-Completely superfluous preamble, but it works ...
-
-a. Mixing single line examples with interlinear examples.
-a. This is of course highly unusal.
-Just for this example, let's add some extra material in this example.
-
-a.
-| Dutch (Germanic) Note the grammaticality judgement!
-| ^^:-)^ Deze zin is (dit\ is&nbsp;test) nederlands.
-| DEM sentence AUX ~ dutch.
-| This sentence is dutch.
-
-b.
-|
-| Deze tweede zin heeft geen header.
-| DEM second sentence have.3SG.PRES no header.
-| This second sentence does not have a header.
-:::
-
- - - - - - - - - - - - - - - - - - - - -
(4.10)Completely superfluous preamble, but it works …
a.Mixing single line examples with interlinear examples.
b.This is of course highly unusal. Just for this example, let’s add some extra material in this example.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
            c.:-)Dezezinis(dit is test)nederlands.
demsentenceaux   dutch.
‘This sentence is dutch.’
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
            d.Dezetweedezinheeftgeenheader.
demsecondsentencehave.3sg.presnoheader.
‘This second sentence does not have a header.’
-
-

4.3 Cross-referencing examples

-

The examples are automatically numbered by pandoc-ling. Cross-references to examples can be made by using the [@ID] format (used by Pandoc for citations). When an example has an explicit identifier (like #test in the next example), then a reference can be made to this example with [@test], leading to (4.11) when formatted.

-
::: {#test .ex}
-This is a test
-:::
-
- - - - - - - - -
(4.11)This is a test
-
-

Inspired by the linguex-approach, you can also use the keywords Next or Last to refer to the next or the last example, e.g. [@Last] will be formatted as (4.11). By doubling the capitals to NNext or LLast reference to the next/last-but-one can be made. Actually, the number of starting capitals can be repeated at will in pandoc-ling, so something like [@LLLLLLLLast] will also work. It will be formatted as (4.4) after the processing of pandoc-ling. Needless to say that in such a situation an explicit identifier would be a better choice.

-

Referring to sub-examples can be done by manually adding a suffix into the cross reference, simply separated from the identifier by a space. For example, [@LLast c] will refer to the third sub-example of the last-but-one example. Formatted this will look like this: (4.10 c), smile! However, note that the “c” has to be manually determined. It is simply a literal suffix that will be copied into the cross-reference. Something like [@LLast Ha1l0] will work also, leading to (4.10 Ha1l0) when formatted (which is of course nonsensical).

-

4.4 Options of pandoc-ling

-

4.4.1 Global options

-

The following global options are available with pandoc-ling. These can be added to the Pandoc metadata. An example of such metadata can be found at the bottom of this readme in the form of a YAML-block. Pandoc allows for various methods to provide metadata (see the link above).

-
    -
  • formatGloss (boolean, default false): should all interlinear examples be consistently formatted? If you use this option, you can simply use capital letters for abbreviations in the gloss, and they will be changed to small caps. The source line is set to italics, and the translations is put into single quotes.
  • -
  • xrefSuffixSep (string, defaults to no-break-space): When cross references have a suffix, how should the separator be formatted? The defaults ‘no-break-space’ is a safe options, but I personally like a ‘thin space’ better (Unicode U+2009), but symbol does not work with many fonts, and might lead to errors. For Latex typesetting, all space-like symbols are converted to a Latex thin space \,.
  • -
  • restartAtChapter (boolean, default false): should the counting restart for each chapter? Actually, when true this setting will restart the counting at the highest heading level, which for various output formats can be set by the Pandoc option top-level-division. Depending on your Latex setup, an explicit entry top-level-division: chapter might be necessary in your metadata.
  • -
  • addChapterNumber (boolean, default false): should the chapter (= highest heading level) number be added to the number of the example? In most formats this automatically implies restartAtChapter: true. In most Latex situations this only works in combination with a documentclass: book.
  • -
  • latexPackage (one of: linguex, gb4e, langsci-gb4e, expex, default linguex): Various options for converting examples to Latex packages that typeset linguistic examples. None of the conversions works perfectly, though in should work in most normal situations (think 90%-plus). It might be necessary to first convert to Latex and then typeset. Using the direct option insider Pandoc might also work in many situations.
  • -
-

4.4.2 Local options

-

Local options are options that can be set for each individual example. The formatGloss option can be used to have an individual example be formatted differently from the global setting. For example, when the global setting is formatGloss: true in the metadata, then adding formatGloss=false in the curly brackets of a specific example will block the formatting. This is especially useful when the automatic formatting does not give the desired result.

-

If you want to add something else (not a linguistic example) in a numbered example, then there is the local option noFormat=true. An attempt will be made to try and do a reasonable layout. Multiple paragraphs will simply we taken as is, and the number will be put in front. In HTML the number will be centred. It is usable for an incidental mathematical formula.

-
::: {.ex noFormat=true}
-$$\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}$$
-:::
-
- - - - - - - - -
(4.12)
-

x=1nx=x2x2\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}

-
-
-

4.5 Issues with pandoc-ling

-
    -
  • Manually provided identifiers for examples should not be purely numerical (so do not use e.g. #5789). In some situation this interferes with the setting of the cross-references.
  • -
  • Because the cross-references use the same structure as citations in Pandoc, the processing of citations (by citeproc) should be performed after the processing by pandoc-ling. Further, pandoc-crossref, another Pandoc extension for numbering figures and other captions, also uses the same system. From experience, it seems safer to put pandoc-crossref before pandoc-ling in the order of processing (though I have no idea why).
  • -
  • Interlinear examples will will not wrap at the end of the page. There is no solution yet for longer examples that are longer than the size of the page.
  • -
  • When exporting to docx there is a problem because there are paragraphs inserted after tables, which adds space in lists with multiple interlinear examples. This is by design. The official solution is to set font-size to 1 for this paragraph inside MS Word.
  • -
  • Multi-column cells are crucial for pandoc-ling to work properly. These are only introduced in new table format with Pandoc 2.10 (so older Pandoc version are not supported). Also note that these structures are not yet exported to all formats, e.g. it will not be displayed correctly in docx. However, this is currently an area of active development
  • -
  • langsci-gb4e is only available as part of the langsci package. You have to make it available to Pandoc, e.g. by adding it into the same directory as the pandoc-ling.lua filter. I have added a recent version of langsci-gb4e here for convenience, but this one might be outdated at some time in the future.
  • -
-

4.6 A note on Latex conversion

-

Originally, I decided to write this filter as a two-pronged conversion, making a markdown version myself, but using a mapping to one of the many latex libraries for linguistics examples as a quick fix. I assumed that such a mapping would be the easy part. However, it turned out that the mapping to latex was much more difficult that I anticipated. Basically, it turned out that the ‘common denominator’ that I was aiming for was not necessarily the ‘common denominator’ provided by the latex packages. I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and expex) with growing dismay. This approach resulted in a first version. However, after this version was (more or less) finished, I realised that it would be better to first define the ‘common denominator’ more clearly (as done here), and then implement this purely in Pandoc. From that basis I have then made attempts to map them to the various latex packages.

-

4.7 A note on implementation

-

The basic structure of the examples are transformed into Pandoc tables. Tables are reasonably safe for converting in other formats. Care has been taken to add classes to all elements of the tables (e.g. the preamble has the class linguistic-example-preamble). When exported formats are aware of these classes, they can be used to fine-tune the formatting. I have used a few such fine-tunings into the html output of this filter by adding a few CSS-style statements. The naming of the classes is quite transparent, using the form linguistic-example-....

- - diff --git a/readme conversions/readme_expex.pdf b/readme conversions/readme_expex.pdf deleted file mode 100644 index b3c728f..0000000 Binary files a/readme conversions/readme_expex.pdf and /dev/null differ diff --git a/readme conversions/readme_expex.tex b/readme conversions/readme_expex.tex deleted file mode 100644 index 27ff260..0000000 --- a/readme conversions/readme_expex.tex +++ /dev/null @@ -1,754 +0,0 @@ -% Options for packages loaded elsewhere -\PassOptionsToPackage{unicode}{hyperref} -\PassOptionsToPackage{hyphens}{url} -% -\documentclass[ -]{article} -\usepackage{lmodern} -\usepackage{amsmath} -\usepackage{ifxetex,ifluatex} -\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex - \usepackage[T1]{fontenc} - \usepackage[utf8]{inputenc} - \usepackage{textcomp} % provide euro and other symbols - \usepackage{amssymb} -\else % if luatex or xetex - \usepackage{unicode-math} - \defaultfontfeatures{Scale=MatchLowercase} - \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} -\fi -% Use upquote if available, for straight quotes in verbatim environments -\IfFileExists{upquote.sty}{\usepackage{upquote}}{} -\IfFileExists{microtype.sty}{% use microtype if available - \usepackage[]{microtype} - \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts -}{} -\makeatletter -\@ifundefined{KOMAClassName}{% if non-KOMA class - \IfFileExists{parskip.sty}{% - \usepackage{parskip} - }{% else - \setlength{\parindent}{0pt} - \setlength{\parskip}{6pt plus 2pt minus 1pt}} -}{% if KOMA class - \KOMAoptions{parskip=half}} -\makeatother -\usepackage{xcolor} -\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available -\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}} -\hypersetup{ - pdftitle={Using pandoc-ling}, - pdfauthor={Michael Cysouw}, - hidelinks, - pdfcreator={LaTeX via pandoc}} -\urlstyle{same} % disable monospaced font for URLs -\usepackage{graphicx} -\makeatletter -\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi} -\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi} -\makeatother -% Scale images if necessary, so that they will not overflow the page -% margins by default, and it is still possible to overwrite the defaults -% using explicit options in \includegraphics[width, height, ...]{} -\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio} -% Set default figure placement to htbp -\makeatletter -\def\fps@figure{htbp} -\makeatother -\setlength{\emergencystretch}{3em} % prevent overfull lines -\providecommand{\tightlist}{% - \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} -\setcounter{secnumdepth}{5} -\usepackage{expex} -\lingset{ - belowglpreambleskip = -1.5ex, - aboveglftskip = -1.5ex, - exskip = 0ex, - interpartskip = -0.5ex, - belowpreambleskip = -2ex - } -\ifluatex - \usepackage{selnolig} % disable illegal ligatures -\fi - -\title{Using pandoc-ling} -\author{Michael Cysouw} -\date{} - -\begin{document} -\maketitle - -{ -\setcounter{tocdepth}{3} -\tableofcontents -} -\hypertarget{pandoc-ling}{% -\section{pandoc-ling}\label{pandoc-ling}} - -\emph{Michael Cysouw} -\textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{} - -A Pandoc filter for linguistic examples - -tl;dr - -\begin{itemize} -\tightlist -\item - Easily write linguistic examples including basic interlinear glossing. -\item - Let numbering and cross-referencing be done for you. -\item - Export to (almost) any format of your wishes for final polishing. -\end{itemize} - -\hypertarget{rationale}{% -\section{Rationale}\label{rationale}} - -In the field of linguistics there is an outspoken tradition to format -example sentences in research papers in a very specific way. In the -field, it is a perennial problem to get such example sentences to look -just right. Within Latex, there are numerous packages to deal with this -problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your -needs, there is some Latex solution for almost everyone. However, these -solutions in Latex are often cumbersome to type, and they are not -portable to other formats. Specifically, transfer between Latex, html, -docx, odt or epub would actually be highly desirable. Such transfer is -the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John -MacFarlane that provides conversion between these (and many more) -formats. - -Any such conversion between text-formats naturally never works -perfectly: every text-format has specific features that are not -transferable to other formats. A central goal of Pandoc (at least in my -interpretation) is to define a set of shared concepts for text-structure -(a `common denominator' if you will, but surely not `least'!) that can -then be mapped to other formats. In many ways, Pandoc tries (again) to -define a set of logical concepts for text structure (`semantic markup'), -which can then be formatted by your favourite typesetter. As long as you -stay inside the realm of this `common denominator' (in practice that -means Pandoc's extended version of Markdown/CommonMark), conversion -works reasonably well (think 90\%-plus). - -Building on John Gruber's -\href{https://daringfireball.net/projects/markdown/syntax}{Markdown -philosophy}, there is a strong urge here to learn to restrain oneself -while writing, and try to restrict the number of layout-possibilities to -a minimum. In this sense, with \texttt{pandoc-ling} I propose a -Markdown-structure for linguistic examples that is simple, easy to type, -easy to read, and portable through the Pandoc universe by way of an -extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This -extension will not magically allow you to write every linguistic example -thinkable, but my guess is that in practice the present proposal covers -the majority of situations in linguistic publications (think 90\%-plus). -As an example (and test case) I have included automatic conversions into -various formats in this repository (chech them out to get an idea of the -strengths and weaknesses of this approach). - -\hypertarget{the-basic-structure-of-a-linguistic-example}{% -\section{The basic structure of a linguistic -example}\label{the-basic-structure-of-a-linguistic-example}} - -Basically, a linguistic examples consists of 6 possible building blocks, -of which only the number and at least one example line are necessary. -The space between the building blocks is kept as minimal as possible -without becoming cramped. When (optional) building blocks are not -included, then the other blocks shift left and up (only exception: a -preamble without labels is not shifted left completely, but left-aligned -with the example, not with the judgement). - -\begin{itemize} -\tightlist -\item - \textbf{Number}: Running tally of all examples in the work, possibly - restarting at chapters or other major headings. Typically between - round brackets, possibly with a chapter number added before in long - works, e.g.~example (7.26). Aligned top-left, typically left-aligned - to main text margin. -\item - \textbf{Preamble}: Optional information about the content/kind of - example. Aligned top-left: to the top with the number, to the left - with the (optional) label. When there is no label, then preamble is - aligned with the example, not with the judgment. -\item - \textbf{Label}: Indices for sub-examples. Only present when there are - more than one example grouped together inside one numbered entity. - Typically these sub-example labels use latin letters followed by a - full stop. They are left-aligned with the preamble, and each label is - top-aligned with the top-line of the corresponding example (important - for longer line-wrapped examples). -\item - \textbf{Judgment}: Examples can optionally have grammaticality - judgments, typically symbols like **?!* sometimes in superscript - relative to the corresponding example. judgements are right-aligned to - each other, typically with only minimal space to the left-aligned - examples. -\item - \textbf{Line example}: A minimal linguistic example has at least one - line example, i.e.~an utterance of interest. Building blocks in - general shift left and up when other (optional) building blocks are - not present. Minimally, this results in a number with one line - example. -\item - \textbf{Interlinear example}: A complex structure typically used for - examples from languages unknown to most readers. Consist of three or - four lines that are left-aligned: - - \begin{itemize} - \tightlist - \item - \textbf{Header}: An optional header is typically used to display - information about the language of the example, including literature - references. When not present, then all other lines from the - interlinear example shift upwards. - \item - \textbf{Source}: The actual language utterance, often typeset in - italics. This line is internally separated at spaces, and each - sub-block is left-aligned with the corresponding sub-blocks of the - gloss. - \item - \textbf{Gloss}: Explanation of the meaning of the source, often - using abbreviations in small caps. This line is internally separated - at spaces, and each block is left-aligned with the block from - source. - \item - \textbf{Translation}: Free translation of the source, typically - quoted. Not separated in blocks, but freely extending to the right. - Left-aligned with the other lines from the interlinear example. - \end{itemize} -\end{itemize} - -\begin{figure} -\centering -\includegraphics{figure/ExampleStructure.png} -\caption{The structure of a linguistic example.} -\end{figure} - -There are of course much more possibilities to extend the structure of a -linguistic examples, like third or fourth subdivisions of labels (often -using small roman numerals as a third level) or multiple glossing lines -in the interlinear example. Also, the content of the header is sometimes -found right-aligned to the right of the interlinear example (language -into to the top, reference to the bottom). All such options are -currently not supported by \texttt{pandoc-ling}. - -Under the hood, this structure is prepared by \texttt{pandoc-ling} as a -table. Tables are reasonably well transcoded to different document -formats. Specific layout considerations mostly have to be set manually. -Alignment of the text should work in most exports. Some \texttt{CSS} -styling is proposed by \texttt{pandoc-ling}, but can of course be -overruled. - -\hypertarget{introducing-pandoc-ling}{% -\section{\texorpdfstring{Introducing -\texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling}} - -\hypertarget{editing-linguistic-examples}{% -\subsection{Editing linguistic -examples}\label{editing-linguistic-examples}} - -To include a linguistic example in Markdown \texttt{pandoc-ling} uses -the \texttt{div} structure, which is indicated in Pandoc-Markdown by -typing three colons at the start and three colons at the end. To -indicate the \texttt{class} of this \texttt{div} the letters `ex' (for -`example') should be added after the top colons (with or without space -in between). This `ex'-class is the signal for \texttt{pandoc-ling} to -start processing such a \texttt{div}. The numbering of these examples -will be inserted by \texttt{pandoc-ling}. - -Empty lines can be added inside the \texttt{div} for visual pleasure, as -they mostly do not have an influence on the output. Exception: do -\emph{not} use empty lines between unlabelled line examples. Multiple -lines of text can be used (without empty lines in between), but they -will simply be interpreted as one sequential paragraph. - -\begin{verbatim} -::: ex -This is the most basic structure of a linguistic example. -::: -\end{verbatim} - -\ex - This is the most basic structure of a linguistic example. -\xe - -Alternatively, the \texttt{class} can be put in curled brackets (and -then a leading full stop is necessary before \texttt{ex}). Inside these -brackets more attributes can be added (separated by space), for example -an id, using a hash, or any attribute=value pairs that should apply to -this example. Currently there is only one attribute implemented -(\texttt{formatGloss}), but in principle it is possible to add more -attributes that can be used to fine-tune the typesetting of the example. - -\begin{verbatim} -::: {#id .ex formatGloss=false} - -This is a multi-line example. -But that does not mean anything for the result -All these lines are simply treated as one paragraph. -They will become one example with one number. - -::: -\end{verbatim} - -\ex - This is a multi-line example. But that does not mean anything for the -result All these lines are simply treated as one paragraph. They will -become one example with one number. -\xe - -A preamble can be added by inserting an empty line between preamble and -example. The same considerations about multiple text-lines apply. - -\begin{verbatim} -:::ex -Preamble - -This is an example with a preamble. -::: -\end{verbatim} - -\ex Preamble\\ - This is an example with a preamble. -\xe - -Sub-examples with labels are entered by starting each sub-example with a -small latin letter and a full stop. Empty lines between labels are -allowed. Subsequent lines without labels are treated as one paragraph. -Empty lines \emph{not} followed by a label with a full stop will result -in errors. - -\begin{verbatim} -:::ex -a. This is the first example. -b. This is the second. -a. The actual letters are not important, `pandoc-ling` will put them in order. - -e. Empty lines are allowed between labelled lines -Subsequent lines are again treated as one sequential paragraph. -::: -\end{verbatim} - -\pex[*=] - \a This is the first example. - \a This is the second. - \a The actual letters are not important, \texttt{pandoc-ling} will put -them in order. - \a Empty lines are allowed between labelled lines Subsequent lines are -again treated as one sequential paragraph. -\xe - -A labelled list can be combined with a preamble. - -\begin{verbatim} -:::ex -Any nice description here - -a. one example sentence. -b. two -c. three -::: -\end{verbatim} - -\pex[*=] Any nice description here\\ - \a one example sentence. - \a two - \a three -\xe - -Grammaticality judgements should be added before an example, and after -an optional label, separated from both by spaces (though four spaces in -a row should be avoided, that could lead to layout errors). To indicate -that any sequence of symbols is a judgements, prepend the judgement with -a caret \texttt{\^{}}. Alignment will be figured out by -\texttt{pandoc-ling}. - -\begin{verbatim} -:::ex -Throwing in a preamble for good measure - -a. ^* This traditionally signals ungrammaticality. -b. ^? Question-marks indicate questionable grammaticality. -c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). -d. However, such long sequences sometimes lead to undesirable effects in the layout. -::: -\end{verbatim} - -\pex[*=whynot?] Throwing in a preamble for good measure\\ - \a \ljudge{*}This traditionally signals ungrammaticality. - \a \ljudge{?}Question-marks indicate questionable grammaticality. - \a \ljudge{\textsuperscript{whynot?}}But in principle any sequence can -be used (here even in superscript). - \a However, such long sequences sometimes lead to undesirable effects -in the layout. -\xe - -A minor detail is the alignment of a single example with a preamble and -grammaticality judgements. In this case it looks better for the preamble -to be left aligned with the example and not with the judgement. - -\begin{verbatim} -:::ex -Here is a special case with a preamble - -^^???^ With a singly questionably example. -Note the alignment! Especially with this very long example -that should go over various lines in the output. -::: -\end{verbatim} - -\ex Here is a special case with a preamble\\ - - \judge{\textsuperscript{???}} With a singly questionably example. Note -the alignment! Especially with this very long example that should go -over various lines in the output. -\xe - -\hypertarget{interlinear-examples}{% -\subsection{Interlinear examples}\label{interlinear-examples}} - -For interlinear examples with aligned source and gloss, the structure of -a \texttt{lineblock} is used, starting the lines with a vertical line -\texttt{\textbar{}}. There should always be four vertical lines (for -header, source, gloss and translation, respectively), although the -content after the first vertical line can be empty. The source and gloss -lines are separated at spaces, and all parts are right-aligned. If you -want to have a space that is not separated, you will have to `protect' -the space, either by putting a backslash before the space, or by -inserting a non-breaking space instead of a normal space (either type -\texttt{\ } or insert an actual non-breaking space, i.e.~unicode -character \texttt{U+00A0}). - -\begin{verbatim} -:::ex -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\ex[*=] - \begingl - \glpreamble Dutch (Germanic)// - \gla Deze zin is in het nederlands. // - \glb DEM sentence AUX in DET dutch. // - \glft This sentence is dutch.// - \endgl -\xe - -An attempt is made to format interlinear examples when the option -\texttt{formatGloss=true} is added. This will: - -\begin{itemize} -\tightlist -\item - remove formatting from the source and set everything in italics, -\item - remove formatting from the gloss and set sequences (\textgreater1) of - capitals and numbers into small caps (note that the positioning of - small caps on web pages is - \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly - complex}), -\item - a tilde \texttt{\textasciitilde{}} between spaces in the gloss is - treated as a shortcut for an empty gloss (internally, the sequence - \texttt{space-tilde-space} is replaced by - \texttt{space-space-nonBreakingSpace-space-space}), -\item - consistently put translations in single quotes, possibly removing - other quotes. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\ex[*=] - \begingl - \glpreamble Dutch (Germanic)// - \gla \emph{Deze} \emph{zin} \emph{is} \emph{in} \emph{het} -\emph{nederlands.} // - \glb \textsc{dem} sentence \textsc{aux} in \textsc{det} dutch. // - \glft `This sentence is dutch.'// - \endgl -\xe - -The results of such formatting will not always work, but it seems to be -quite robust in my testing. The next example brings everything together: - -\begin{itemize} -\tightlist -\item - a preamble, -\item - labels, both for single lines and for interlinear examples, -\item - interlinear examples start on a new line immediately after the - letter-label, -\item - grammaticality judgements with proper alignment, -\item - when the header of an interlinear example is left out, everything is - shifted up, -\item - The formatting of the interlinear is harmonised. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -Completely superfluous preamble, but it works ... - -a. Mixing single line examples with interlinear examples. -a. This is of course highly unusal. -Just for this example, let's add some extra material in this example. - -a. -| Dutch (Germanic) Note the grammaticality judgement! -| ^^:-)^ Deze zin is (dit\ is test) nederlands. -| DEM sentence AUX ~ dutch. -| This sentence is dutch. - -b. -| -| Deze tweede zin heeft geen header. -| DEM second sentence have.3SG.PRES no header. -| This second sentence does not have a header. -::: -\end{verbatim} - -\pex[*=:-)] Completely superfluous preamble, but it works -\ldots{}\\ - \a Mixing single line examples with interlinear examples. - \a This is of course highly unusal. Just for this example, let's add -some extra material in this example. - \a - \begingl - \glpreamble Dutch (Germanic) Note the grammaticality judgement!// - \gla \ljudge{\textsuperscript{:-)}}\emph{Deze} \emph{zin} \emph{is} -\emph{(dit~is~test)} \emph{nederlands.} // - \glb \textsc{dem} sentence \textsc{aux} ~ dutch. // - \glft `This sentence is dutch.'// - \endgl - \a - \begingl - \gla \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen} -\emph{header.} // - \glb \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no -header. // - \glft `This second sentence does not have a header.'// - \endgl -\xe - -\hypertarget{cross-referencing-examples}{% -\subsection{Cross-referencing -examples}\label{cross-referencing-examples}} - -The examples are automatically numbered by \texttt{pandoc-ling}. -Cross-references to examples can be made by using the \texttt{{[}@ID{]}} -format (used by Pandoc for citations). When an example has an explicit -identifier (like \texttt{\#test} in the next example), then a reference -can be made to this example with \texttt{{[}@test{]}}, leading to -(\getref{test}) when formatted. - -\begin{verbatim} -::: {#test .ex} -This is a test -::: -\end{verbatim} - -\ex - This is a test -\xe - -Inspired by the \texttt{linguex}-approach, you can also use the keywords -\texttt{Next} or \texttt{Last} to refer to the next or the last example, -e.g.~\texttt{{[}@Last{]}} will be formatted as (\getref{test}). By -doubling the capitals to \texttt{NNext} or \texttt{LLast} reference to -the next/last-but-one can be made. Actually, the number of starting -capitals can be repeated at will in \texttt{pandoc-ling}, so something -like \texttt{{[}@LLLLLLLLast{]}} will also work. It will be formatted as -(\getref{ling-ex:4.4}) after the processing of \texttt{pandoc-ling}. -Needless to say that in such a situation an explicit identifier would be -a better choice. - -Referring to sub-examples can be done by manually adding a suffix into -the cross reference, simply separated from the identifier by a space. -For example, \texttt{{[}@LLast~c{]}} will refer to the third sub-example -of the last-but-one example. Formatted this will look like this: -(\getref{ling-ex:4.10}\,c), smile! However, note that the ``c'' has to -be manually determined. It is simply a literal suffix that will be -copied into the cross-reference. Something like -\texttt{{[}@LLast\ Ha1l0{]}} will work also, leading to -(\getref{ling-ex:4.10}\,Ha1l0) when formatted (which is of course -nonsensical). - -\hypertarget{options-of-pandoc-ling}{% -\subsection{\texorpdfstring{Options of -\texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling}} - -\hypertarget{global-options}{% -\subsubsection{Global options}\label{global-options}} - -The following global options are available with \texttt{pandoc-ling}. -These can be added to the -\href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}. -An example of such metadata can be found at the bottom of this -\texttt{readme} in the form of a YAML-block. Pandoc allows for various -methods to provide metadata (see the link above). - -\begin{itemize} -\tightlist -\item - \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}): - should all interlinear examples be consistently formatted? If you use - this option, you can simply use capital letters for abbreviations in - the gloss, and they will be changed to small caps. The source line is - set to italics, and the translations is put into single quotes. -\item - \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space): - When cross references have a suffix, how should the separator be - formatted? The defaults `no-break-space' is a safe options, but I - personally like a `thin space' better (Unicode \texttt{U+2009}), but - symbol does not work with many fonts, and might lead to errors. For - Latex typesetting, all space-like symbols are converted to a Latex - thin space \texttt{\textbackslash{},}. -\item - \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}): - should the counting restart for each chapter? Actually, when - \texttt{true} this setting will restart the counting at the highest - heading level, which for various output formats can be set by the - Pandoc option \texttt{top-level-division}. Depending on your Latex - setup, an explicit entry \texttt{top-level-division:\ chapter} might - be necessary in your metadata. -\item - \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}): - should the chapter (= highest heading level) number be added to the - number of the example? In most formats this automatically implies - \texttt{restartAtChapter:\ true}. In most Latex situations this only - works in combination with a \texttt{documentclass:\ book}. -\item - \textbf{\texttt{latexPackage}} (one of: \texttt{linguex}, - \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default - \texttt{linguex}): Various options for converting examples to Latex - packages that typeset linguistic examples. None of the conversions - works perfectly, though in should work in most normal situations - (think 90\%-plus). It might be necessary to first convert to - \texttt{Latex} and then typeset. Using the direct option insider - Pandoc might also work in many situations. -\end{itemize} - -\hypertarget{local-options}{% -\subsubsection{Local options}\label{local-options}} - -Local options are options that can be set for each individual example. -The \texttt{formatGloss} option can be used to have an individual -example be formatted differently from the global setting. For example, -when the global setting is \texttt{formatGloss:\ true} in the metadata, -then adding \texttt{formatGloss=false} in the curly brackets of a -specific example will block the formatting. This is especially useful -when the automatic formatting does not give the desired result. - -If you want to add something else (not a linguistic example) in a -numbered example, then there is the local option \texttt{noFormat=true}. -An attempt will be made to try and do a reasonable layout. Multiple -paragraphs will simply we taken as is, and the number will be put in -front. In HTML the number will be centred. It is usable for an -incidental mathematical formula. - -\begin{verbatim} -::: {.ex noFormat=true} -$$\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}$$ -::: -\end{verbatim} - -\ex - \[\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}\]\\ - -\xe - -\hypertarget{issues-with-pandoc-ling}{% -\subsection{\texorpdfstring{Issues with -\texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling}} - -\begin{itemize} -\tightlist -\item - Manually provided identifiers for examples should not be purely - numerical (so do not use e.g.~\texttt{\#5789}). In some situation this - interferes with the setting of the cross-references. -\item - Because the cross-references use the same structure as citations in - Pandoc, the processing of citations (by \texttt{citeproc}) should be - performed \textbf{after} the processing by \texttt{pandoc-ling}. - Further, - \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}}, - another Pandoc extension for numbering figures and other captions, - also uses the same system. From experience, it seems safer to put - \texttt{pandoc-crossref} \textbf{before} \texttt{pandoc-ling} in the - order of processing (though I have no idea why). -\item - Interlinear examples will will not wrap at the end of the page. There - is no solution yet for longer examples that are longer than the size - of the page. -\item - When exporting to \texttt{docx} there is a problem because there are - paragraphs inserted after tables, which adds space in lists with - multiple interlinear examples. This is - \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by - design}. The official solution is to set font-size to 1 for this - paragraph inside MS Word. -\item - Multi-column cells are crucial for \texttt{pandoc-ling} to work - properly. These are only introduced in new table format with Pandoc - 2.10 (so older Pandoc version are not supported). Also note that these - structures are not yet exported to all formats, e.g.~it will not be - displayed correctly in \texttt{docx}. However, this is currently an - area of active development -\item - \texttt{langsci-gb4e} is only available as part of the - \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}. - You have to make it available to Pandoc, e.g.~by adding it into the - same directory as the pandoc-ling.lua filter. I have added a recent - version of \texttt{langsci-gb4e} here for convenience, but this one - might be outdated at some time in the future. -\end{itemize} - -\hypertarget{a-note-on-latex-conversion}{% -\subsection{A note on Latex -conversion}\label{a-note-on-latex-conversion}} - -Originally, I decided to write this filter as a two-pronged conversion, -making a markdown version myself, but using a mapping to one of the many -latex libraries for linguistics examples as a quick fix. I assumed that -such a mapping would be the easy part. However, it turned out that the -mapping to latex was much more difficult that I anticipated. Basically, -it turned out that the `common denominator' that I was aiming for was -not necessarily the `common denominator' provided by the latex packages. -I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and -expex) with growing dismay. This approach resulted in a first version. -However, after this version was (more or less) finished, I realised that -it would be better to first define the `common denominator' more clearly -(as done here), and then implement this purely in Pandoc. From that -basis I have then made attempts to map them to the various latex -packages. - -\hypertarget{a-note-on-implementation}{% -\subsection{A note on implementation}\label{a-note-on-implementation}} - -The basic structure of the examples are transformed into Pandoc tables. -Tables are reasonably safe for converting in other formats. Care has -been taken to add \texttt{classes} to all elements of the tables -(e.g.~the preamble has the class \texttt{linguistic-example-preamble}). -When exported formats are aware of these classes, they can be used to -fine-tune the formatting. I have used a few such fine-tunings into the -html output of this filter by adding a few CSS-style statements. The -naming of the classes is quite transparent, using the form -\texttt{linguistic-example-...}. - -\end{document} diff --git a/readme conversions/readme_gb4e.pdf b/readme conversions/readme_gb4e.pdf deleted file mode 100644 index 82de228..0000000 Binary files a/readme conversions/readme_gb4e.pdf and /dev/null differ diff --git a/readme conversions/readme_gb4e.tex b/readme conversions/readme_gb4e.tex deleted file mode 100644 index b43566d..0000000 --- a/readme conversions/readme_gb4e.tex +++ /dev/null @@ -1,755 +0,0 @@ -% Options for packages loaded elsewhere -\PassOptionsToPackage{unicode}{hyperref} -\PassOptionsToPackage{hyphens}{url} -% -\documentclass[ -]{article} -\usepackage{lmodern} -\usepackage{amsmath} -\usepackage{ifxetex,ifluatex} -\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex - \usepackage[T1]{fontenc} - \usepackage[utf8]{inputenc} - \usepackage{textcomp} % provide euro and other symbols - \usepackage{amssymb} -\else % if luatex or xetex - \usepackage{unicode-math} - \defaultfontfeatures{Scale=MatchLowercase} - \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} -\fi -% Use upquote if available, for straight quotes in verbatim environments -\IfFileExists{upquote.sty}{\usepackage{upquote}}{} -\IfFileExists{microtype.sty}{% use microtype if available - \usepackage[]{microtype} - \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts -}{} -\makeatletter -\@ifundefined{KOMAClassName}{% if non-KOMA class - \IfFileExists{parskip.sty}{% - \usepackage{parskip} - }{% else - \setlength{\parindent}{0pt} - \setlength{\parskip}{6pt plus 2pt minus 1pt}} -}{% if KOMA class - \KOMAoptions{parskip=half}} -\makeatother -\usepackage{xcolor} -\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available -\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}} -\hypersetup{ - pdftitle={Using pandoc-ling}, - pdfauthor={Michael Cysouw}, - hidelinks, - pdfcreator={LaTeX via pandoc}} -\urlstyle{same} % disable monospaced font for URLs -\usepackage{graphicx} -\makeatletter -\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi} -\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi} -\makeatother -% Scale images if necessary, so that they will not overflow the page -% margins by default, and it is still possible to overwrite the defaults -% using explicit options in \includegraphics[width, height, ...]{} -\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio} -% Set default figure placement to htbp -\makeatletter -\def\fps@figure{htbp} -\makeatother -\setlength{\emergencystretch}{3em} % prevent overfull lines -\providecommand{\tightlist}{% - \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} -\setcounter{secnumdepth}{5} -\usepackage{gb4e} -\usepackage[noparens]{nnext} -\usepackage{chngcntr} -\counterwithin{xnumi}{section} -\ifluatex - \usepackage{selnolig} % disable illegal ligatures -\fi - -\title{Using pandoc-ling} -\author{Michael Cysouw} -\date{} - -\begin{document} -\maketitle - -{ -\setcounter{tocdepth}{3} -\tableofcontents -} -\hypertarget{pandoc-ling}{% -\section{pandoc-ling}\label{pandoc-ling}} - -\emph{Michael Cysouw} -\textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{} - -A Pandoc filter for linguistic examples - -tl;dr - -\begin{itemize} -\tightlist -\item - Easily write linguistic examples including basic interlinear glossing. -\item - Let numbering and cross-referencing be done for you. -\item - Export to (almost) any format of your wishes for final polishing. -\end{itemize} - -\hypertarget{rationale}{% -\section{Rationale}\label{rationale}} - -In the field of linguistics there is an outspoken tradition to format -example sentences in research papers in a very specific way. In the -field, it is a perennial problem to get such example sentences to look -just right. Within Latex, there are numerous packages to deal with this -problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your -needs, there is some Latex solution for almost everyone. However, these -solutions in Latex are often cumbersome to type, and they are not -portable to other formats. Specifically, transfer between Latex, html, -docx, odt or epub would actually be highly desirable. Such transfer is -the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John -MacFarlane that provides conversion between these (and many more) -formats. - -Any such conversion between text-formats naturally never works -perfectly: every text-format has specific features that are not -transferable to other formats. A central goal of Pandoc (at least in my -interpretation) is to define a set of shared concepts for text-structure -(a `common denominator' if you will, but surely not `least'!) that can -then be mapped to other formats. In many ways, Pandoc tries (again) to -define a set of logical concepts for text structure (`semantic markup'), -which can then be formatted by your favourite typesetter. As long as you -stay inside the realm of this `common denominator' (in practice that -means Pandoc's extended version of Markdown/CommonMark), conversion -works reasonably well (think 90\%-plus). - -Building on John Gruber's -\href{https://daringfireball.net/projects/markdown/syntax}{Markdown -philosophy}, there is a strong urge here to learn to restrain oneself -while writing, and try to restrict the number of layout-possibilities to -a minimum. In this sense, with \texttt{pandoc-ling} I propose a -Markdown-structure for linguistic examples that is simple, easy to type, -easy to read, and portable through the Pandoc universe by way of an -extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This -extension will not magically allow you to write every linguistic example -thinkable, but my guess is that in practice the present proposal covers -the majority of situations in linguistic publications (think 90\%-plus). -As an example (and test case) I have included automatic conversions into -various formats in this repository (chech them out to get an idea of the -strengths and weaknesses of this approach). - -\hypertarget{the-basic-structure-of-a-linguistic-example}{% -\section{The basic structure of a linguistic -example}\label{the-basic-structure-of-a-linguistic-example}} - -Basically, a linguistic examples consists of 6 possible building blocks, -of which only the number and at least one example line are necessary. -The space between the building blocks is kept as minimal as possible -without becoming cramped. When (optional) building blocks are not -included, then the other blocks shift left and up (only exception: a -preamble without labels is not shifted left completely, but left-aligned -with the example, not with the judgement). - -\begin{itemize} -\tightlist -\item - \textbf{Number}: Running tally of all examples in the work, possibly - restarting at chapters or other major headings. Typically between - round brackets, possibly with a chapter number added before in long - works, e.g.~example (7.26). Aligned top-left, typically left-aligned - to main text margin. -\item - \textbf{Preamble}: Optional information about the content/kind of - example. Aligned top-left: to the top with the number, to the left - with the (optional) label. When there is no label, then preamble is - aligned with the example, not with the judgment. -\item - \textbf{Label}: Indices for sub-examples. Only present when there are - more than one example grouped together inside one numbered entity. - Typically these sub-example labels use latin letters followed by a - full stop. They are left-aligned with the preamble, and each label is - top-aligned with the top-line of the corresponding example (important - for longer line-wrapped examples). -\item - \textbf{Judgment}: Examples can optionally have grammaticality - judgments, typically symbols like **?!* sometimes in superscript - relative to the corresponding example. judgements are right-aligned to - each other, typically with only minimal space to the left-aligned - examples. -\item - \textbf{Line example}: A minimal linguistic example has at least one - line example, i.e.~an utterance of interest. Building blocks in - general shift left and up when other (optional) building blocks are - not present. Minimally, this results in a number with one line - example. -\item - \textbf{Interlinear example}: A complex structure typically used for - examples from languages unknown to most readers. Consist of three or - four lines that are left-aligned: - - \begin{itemize} - \tightlist - \item - \textbf{Header}: An optional header is typically used to display - information about the language of the example, including literature - references. When not present, then all other lines from the - interlinear example shift upwards. - \item - \textbf{Source}: The actual language utterance, often typeset in - italics. This line is internally separated at spaces, and each - sub-block is left-aligned with the corresponding sub-blocks of the - gloss. - \item - \textbf{Gloss}: Explanation of the meaning of the source, often - using abbreviations in small caps. This line is internally separated - at spaces, and each block is left-aligned with the block from - source. - \item - \textbf{Translation}: Free translation of the source, typically - quoted. Not separated in blocks, but freely extending to the right. - Left-aligned with the other lines from the interlinear example. - \end{itemize} -\end{itemize} - -\begin{figure} -\centering -\includegraphics{figure/ExampleStructure.png} -\caption{The structure of a linguistic example.} -\end{figure} - -There are of course much more possibilities to extend the structure of a -linguistic examples, like third or fourth subdivisions of labels (often -using small roman numerals as a third level) or multiple glossing lines -in the interlinear example. Also, the content of the header is sometimes -found right-aligned to the right of the interlinear example (language -into to the top, reference to the bottom). All such options are -currently not supported by \texttt{pandoc-ling}. - -Under the hood, this structure is prepared by \texttt{pandoc-ling} as a -table. Tables are reasonably well transcoded to different document -formats. Specific layout considerations mostly have to be set manually. -Alignment of the text should work in most exports. Some \texttt{CSS} -styling is proposed by \texttt{pandoc-ling}, but can of course be -overruled. - -\hypertarget{introducing-pandoc-ling}{% -\section{\texorpdfstring{Introducing -\texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling}} - -\hypertarget{editing-linguistic-examples}{% -\subsection{Editing linguistic -examples}\label{editing-linguistic-examples}} - -To include a linguistic example in Markdown \texttt{pandoc-ling} uses -the \texttt{div} structure, which is indicated in Pandoc-Markdown by -typing three colons at the start and three colons at the end. To -indicate the \texttt{class} of this \texttt{div} the letters `ex' (for -`example') should be added after the top colons (with or without space -in between). This `ex'-class is the signal for \texttt{pandoc-ling} to -start processing such a \texttt{div}. The numbering of these examples -will be inserted by \texttt{pandoc-ling}. - -Empty lines can be added inside the \texttt{div} for visual pleasure, as -they mostly do not have an influence on the output. Exception: do -\emph{not} use empty lines between unlabelled line examples. Multiple -lines of text can be used (without empty lines in between), but they -will simply be interpreted as one sequential paragraph. - -\begin{verbatim} -::: ex -This is the most basic structure of a linguistic example. -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{ling-ex:4.1} - \ex [] { This is the most basic structure of a linguistic example. } -\end{exe} - -Alternatively, the \texttt{class} can be put in curled brackets (and -then a leading full stop is necessary before \texttt{ex}). Inside these -brackets more attributes can be added (separated by space), for example -an id, using a hash, or any attribute=value pairs that should apply to -this example. Currently there is only one attribute implemented -(\texttt{formatGloss}), but in principle it is possible to add more -attributes that can be used to fine-tune the typesetting of the example. - -\begin{verbatim} -::: {#id .ex formatGloss=false} - -This is a multi-line example. -But that does not mean anything for the result -All these lines are simply treated as one paragraph. -They will become one example with one number. - -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{id} - \ex [] { This is a multi-line example. But that does not mean anything -for the result All these lines are simply treated as one paragraph. They -will become one example with one number. } -\end{exe} - -A preamble can be added by inserting an empty line between preamble and -example. The same considerations about multiple text-lines apply. - -\begin{verbatim} -:::ex -Preamble - -This is an example with a preamble. -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{ling-ex:4.3} - \ex Preamble - \sn [] { This is an example with a preamble. } -\end{exe} - -Sub-examples with labels are entered by starting each sub-example with a -small latin letter and a full stop. Empty lines between labels are -allowed. Subsequent lines without labels are treated as one paragraph. -Empty lines \emph{not} followed by a label with a full stop will result -in errors. - -\begin{verbatim} -:::ex -a. This is the first example. -b. This is the second. -a. The actual letters are not important, `pandoc-ling` will put them in order. - -e. Empty lines are allowed between labelled lines -Subsequent lines are again treated as one sequential paragraph. -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{ling-ex:4.4} - \ex - \begin{xlist} - \ex [] { This is the first example. } - \ex [] { This is the second. } - \ex [] { The actual letters are not important, \texttt{pandoc-ling} -will put them in order. } - \ex [] { Empty lines are allowed between labelled lines Subsequent -lines are again treated as one sequential paragraph. } - \end{xlist} -\end{exe} - -A labelled list can be combined with a preamble. - -\begin{verbatim} -:::ex -Any nice description here - -a. one example sentence. -b. two -c. three -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{ling-ex:4.5} - \ex Any nice description here - \begin{xlist} - \ex [] { one example sentence. } - \ex [] { two } - \ex [] { three } - \end{xlist} -\end{exe} - -Grammaticality judgements should be added before an example, and after -an optional label, separated from both by spaces (though four spaces in -a row should be avoided, that could lead to layout errors). To indicate -that any sequence of symbols is a judgements, prepend the judgement with -a caret \texttt{\^{}}. Alignment will be figured out by -\texttt{pandoc-ling}. - -\begin{verbatim} -:::ex -Throwing in a preamble for good measure - -a. ^* This traditionally signals ungrammaticality. -b. ^? Question-marks indicate questionable grammaticality. -c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). -d. However, such long sequences sometimes lead to undesirable effects in the layout. -::: -\end{verbatim} - -\begin{exe} \judgewidth{whynot?} \label{ling-ex:4.6} - \ex Throwing in a preamble for good measure - \begin{xlist} - \ex [*] { This traditionally signals ungrammaticality. } - \ex [?] { Question-marks indicate questionable grammaticality. } - \ex [\textsuperscript{whynot?}] { But in principle any sequence can be -used (here even in superscript). } - \ex [] { However, such long sequences sometimes lead to undesirable -effects in the layout. } - \end{xlist} -\end{exe} - -A minor detail is the alignment of a single example with a preamble and -grammaticality judgements. In this case it looks better for the preamble -to be left aligned with the example and not with the judgement. - -\begin{verbatim} -:::ex -Here is a special case with a preamble - -^^???^ With a singly questionably example. -Note the alignment! Especially with this very long example -that should go over various lines in the output. -::: -\end{verbatim} - -\begin{exe} \judgewidth{???} \label{ling-ex:4.7} - \ex Here is a special case with a preamble - \sn [\textsuperscript{???}] { With a singly questionably example. Note -the alignment! Especially with this very long example that should go -over various lines in the output. } -\end{exe} - -\hypertarget{interlinear-examples}{% -\subsection{Interlinear examples}\label{interlinear-examples}} - -For interlinear examples with aligned source and gloss, the structure of -a \texttt{lineblock} is used, starting the lines with a vertical line -\texttt{\textbar{}}. There should always be four vertical lines (for -header, source, gloss and translation, respectively), although the -content after the first vertical line can be empty. The source and gloss -lines are separated at spaces, and all parts are right-aligned. If you -want to have a space that is not separated, you will have to `protect' -the space, either by putting a backslash before the space, or by -inserting a non-breaking space instead of a normal space (either type -\texttt{\ } or insert an actual non-breaking space, i.e.~unicode -character \texttt{U+00A0}). - -\begin{verbatim} -:::ex -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{ling-ex:4.8} - \ex [] { - Dutch (Germanic) - \gll Deze zin is in het nederlands. \\ - DEM sentence AUX in DET dutch. \\ - \glt This sentence is dutch. } -\end{exe} - -An attempt is made to format interlinear examples when the option -\texttt{formatGloss=true} is added. This will: - -\begin{itemize} -\tightlist -\item - remove formatting from the source and set everything in italics, -\item - remove formatting from the gloss and set sequences (\textgreater1) of - capitals and numbers into small caps (note that the positioning of - small caps on web pages is - \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly - complex}), -\item - a tilde \texttt{\textasciitilde{}} between spaces in the gloss is - treated as a shortcut for an empty gloss (internally, the sequence - \texttt{space-tilde-space} is replaced by - \texttt{space-space-nonBreakingSpace-space-space}), -\item - consistently put translations in single quotes, possibly removing - other quotes. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{ling-ex:4.9} - \ex [] { - Dutch (Germanic) - \gll \emph{Deze} \emph{zin} \emph{is} \emph{in} \emph{het} -\emph{nederlands.} \\ - \textsc{dem} sentence \textsc{aux} in \textsc{det} dutch. \\ - \glt `This sentence is dutch.' } -\end{exe} - -The results of such formatting will not always work, but it seems to be -quite robust in my testing. The next example brings everything together: - -\begin{itemize} -\tightlist -\item - a preamble, -\item - labels, both for single lines and for interlinear examples, -\item - interlinear examples start on a new line immediately after the - letter-label, -\item - grammaticality judgements with proper alignment, -\item - when the header of an interlinear example is left out, everything is - shifted up, -\item - The formatting of the interlinear is harmonised. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -Completely superfluous preamble, but it works ... - -a. Mixing single line examples with interlinear examples. -a. This is of course highly unusal. -Just for this example, let's add some extra material in this example. - -a. -| Dutch (Germanic) Note the grammaticality judgement! -| ^^:-)^ Deze zin is (dit\ is test) nederlands. -| DEM sentence AUX ~ dutch. -| This sentence is dutch. - -b. -| -| Deze tweede zin heeft geen header. -| DEM second sentence have.3SG.PRES no header. -| This second sentence does not have a header. -::: -\end{verbatim} - -\begin{exe} \judgewidth{:-)} \label{ling-ex:4.10} - \ex Completely superfluous preamble, but it works \ldots{} - \begin{xlist} - \ex [] { Mixing single line examples with interlinear examples. } - \ex [] { This is of course highly unusal. Just for this example, let's -add some extra material in this example. } - \ex [\textsuperscript{:-)}] { - Dutch (Germanic) Note the grammaticality judgement! - \gll \emph{Deze} \emph{zin} \emph{is} \emph{(dit~is~test)} -\emph{nederlands.} \\ - \textsc{dem} sentence \textsc{aux} ~ dutch. \\ - \glt `This sentence is dutch.' } - \ex [] { - \gll \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen} -\emph{header.} \\ - \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no -header. \\ - \glt `This second sentence does not have a header.' } - \end{xlist} -\end{exe} - -\hypertarget{cross-referencing-examples}{% -\subsection{Cross-referencing -examples}\label{cross-referencing-examples}} - -The examples are automatically numbered by \texttt{pandoc-ling}. -Cross-references to examples can be made by using the \texttt{{[}@ID{]}} -format (used by Pandoc for citations). When an example has an explicit -identifier (like \texttt{\#test} in the next example), then a reference -can be made to this example with \texttt{{[}@test{]}}, leading to -(\ref{test}) when formatted. - -\begin{verbatim} -::: {#test .ex} -This is a test -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{test} - \ex [] { This is a test } -\end{exe} - -Inspired by the \texttt{linguex}-approach, you can also use the keywords -\texttt{Next} or \texttt{Last} to refer to the next or the last example, -e.g.~\texttt{{[}@Last{]}} will be formatted as (\ref{test}). By doubling -the capitals to \texttt{NNext} or \texttt{LLast} reference to the -next/last-but-one can be made. Actually, the number of starting capitals -can be repeated at will in \texttt{pandoc-ling}, so something like -\texttt{{[}@LLLLLLLLast{]}} will also work. It will be formatted as -(\ref{ling-ex:4.4}) after the processing of \texttt{pandoc-ling}. -Needless to say that in such a situation an explicit identifier would be -a better choice. - -Referring to sub-examples can be done by manually adding a suffix into -the cross reference, simply separated from the identifier by a space. -For example, \texttt{{[}@LLast~c{]}} will refer to the third sub-example -of the last-but-one example. Formatted this will look like this: -(\ref{ling-ex:4.10}\,c), smile! However, note that the ``c'' has to be -manually determined. It is simply a literal suffix that will be copied -into the cross-reference. Something like \texttt{{[}@LLast\ Ha1l0{]}} -will work also, leading to (\ref{ling-ex:4.10}\,Ha1l0) when formatted -(which is of course nonsensical). - -\hypertarget{options-of-pandoc-ling}{% -\subsection{\texorpdfstring{Options of -\texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling}} - -\hypertarget{global-options}{% -\subsubsection{Global options}\label{global-options}} - -The following global options are available with \texttt{pandoc-ling}. -These can be added to the -\href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}. -An example of such metadata can be found at the bottom of this -\texttt{readme} in the form of a YAML-block. Pandoc allows for various -methods to provide metadata (see the link above). - -\begin{itemize} -\tightlist -\item - \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}): - should all interlinear examples be consistently formatted? If you use - this option, you can simply use capital letters for abbreviations in - the gloss, and they will be changed to small caps. The source line is - set to italics, and the translations is put into single quotes. -\item - \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space): - When cross references have a suffix, how should the separator be - formatted? The defaults `no-break-space' is a safe options, but I - personally like a `thin space' better (Unicode \texttt{U+2009}), but - symbol does not work with many fonts, and might lead to errors. For - Latex typesetting, all space-like symbols are converted to a Latex - thin space \texttt{\textbackslash{},}. -\item - \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}): - should the counting restart for each chapter? Actually, when - \texttt{true} this setting will restart the counting at the highest - heading level, which for various output formats can be set by the - Pandoc option \texttt{top-level-division}. Depending on your Latex - setup, an explicit entry \texttt{top-level-division:\ chapter} might - be necessary in your metadata. -\item - \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}): - should the chapter (= highest heading level) number be added to the - number of the example? In most formats this automatically implies - \texttt{restartAtChapter:\ true}. In most Latex situations this only - works in combination with a \texttt{documentclass:\ book}. -\item - \textbf{\texttt{latexPackage}} (one of: \texttt{linguex}, - \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default - \texttt{linguex}): Various options for converting examples to Latex - packages that typeset linguistic examples. None of the conversions - works perfectly, though in should work in most normal situations - (think 90\%-plus). It might be necessary to first convert to - \texttt{Latex} and then typeset. Using the direct option insider - Pandoc might also work in many situations. -\end{itemize} - -\hypertarget{local-options}{% -\subsubsection{Local options}\label{local-options}} - -Local options are options that can be set for each individual example. -The \texttt{formatGloss} option can be used to have an individual -example be formatted differently from the global setting. For example, -when the global setting is \texttt{formatGloss:\ true} in the metadata, -then adding \texttt{formatGloss=false} in the curly brackets of a -specific example will block the formatting. This is especially useful -when the automatic formatting does not give the desired result. - -If you want to add something else (not a linguistic example) in a -numbered example, then there is the local option \texttt{noFormat=true}. -An attempt will be made to try and do a reasonable layout. Multiple -paragraphs will simply we taken as is, and the number will be put in -front. In HTML the number will be centred. It is usable for an -incidental mathematical formula. - -\begin{verbatim} -::: {.ex noFormat=true} -$$\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}$$ -::: -\end{verbatim} - -\begin{exe} \judgewidth{} \label{ling-ex:4.12} - \ex [] { \[\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}\]\\ - } -\end{exe} - -\hypertarget{issues-with-pandoc-ling}{% -\subsection{\texorpdfstring{Issues with -\texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling}} - -\begin{itemize} -\tightlist -\item - Manually provided identifiers for examples should not be purely - numerical (so do not use e.g.~\texttt{\#5789}). In some situation this - interferes with the setting of the cross-references. -\item - Because the cross-references use the same structure as citations in - Pandoc, the processing of citations (by \texttt{citeproc}) should be - performed \textbf{after} the processing by \texttt{pandoc-ling}. - Further, - \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}}, - another Pandoc extension for numbering figures and other captions, - also uses the same system. From experience, it seems safer to put - \texttt{pandoc-crossref} \textbf{before} \texttt{pandoc-ling} in the - order of processing (though I have no idea why). -\item - Interlinear examples will will not wrap at the end of the page. There - is no solution yet for longer examples that are longer than the size - of the page. -\item - When exporting to \texttt{docx} there is a problem because there are - paragraphs inserted after tables, which adds space in lists with - multiple interlinear examples. This is - \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by - design}. The official solution is to set font-size to 1 for this - paragraph inside MS Word. -\item - Multi-column cells are crucial for \texttt{pandoc-ling} to work - properly. These are only introduced in new table format with Pandoc - 2.10 (so older Pandoc version are not supported). Also note that these - structures are not yet exported to all formats, e.g.~it will not be - displayed correctly in \texttt{docx}. However, this is currently an - area of active development -\item - \texttt{langsci-gb4e} is only available as part of the - \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}. - You have to make it available to Pandoc, e.g.~by adding it into the - same directory as the pandoc-ling.lua filter. I have added a recent - version of \texttt{langsci-gb4e} here for convenience, but this one - might be outdated at some time in the future. -\end{itemize} - -\hypertarget{a-note-on-latex-conversion}{% -\subsection{A note on Latex -conversion}\label{a-note-on-latex-conversion}} - -Originally, I decided to write this filter as a two-pronged conversion, -making a markdown version myself, but using a mapping to one of the many -latex libraries for linguistics examples as a quick fix. I assumed that -such a mapping would be the easy part. However, it turned out that the -mapping to latex was much more difficult that I anticipated. Basically, -it turned out that the `common denominator' that I was aiming for was -not necessarily the `common denominator' provided by the latex packages. -I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and -expex) with growing dismay. This approach resulted in a first version. -However, after this version was (more or less) finished, I realised that -it would be better to first define the `common denominator' more clearly -(as done here), and then implement this purely in Pandoc. From that -basis I have then made attempts to map them to the various latex -packages. - -\hypertarget{a-note-on-implementation}{% -\subsection{A note on implementation}\label{a-note-on-implementation}} - -The basic structure of the examples are transformed into Pandoc tables. -Tables are reasonably safe for converting in other formats. Care has -been taken to add \texttt{classes} to all elements of the tables -(e.g.~the preamble has the class \texttt{linguistic-example-preamble}). -When exported formats are aware of these classes, they can be used to -fine-tune the formatting. I have used a few such fine-tunings into the -html output of this filter by adding a few CSS-style statements. The -naming of the classes is quite transparent, using the form -\texttt{linguistic-example-...}. - -\end{document} diff --git a/readme conversions/readme_langsci-gb4e.pdf b/readme conversions/readme_langsci-gb4e.pdf deleted file mode 100644 index 65968e0..0000000 Binary files a/readme conversions/readme_langsci-gb4e.pdf and /dev/null differ diff --git a/readme conversions/readme_langsci-gb4e.tex b/readme conversions/readme_langsci-gb4e.tex deleted file mode 100644 index b6fa461..0000000 --- a/readme conversions/readme_langsci-gb4e.tex +++ /dev/null @@ -1,746 +0,0 @@ -% Options for packages loaded elsewhere -\PassOptionsToPackage{unicode}{hyperref} -\PassOptionsToPackage{hyphens}{url} -% -\documentclass[ -]{article} -\usepackage{lmodern} -\usepackage{amsmath} -\usepackage{ifxetex,ifluatex} -\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex - \usepackage[T1]{fontenc} - \usepackage[utf8]{inputenc} - \usepackage{textcomp} % provide euro and other symbols - \usepackage{amssymb} -\else % if luatex or xetex - \usepackage{unicode-math} - \defaultfontfeatures{Scale=MatchLowercase} - \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} -\fi -% Use upquote if available, for straight quotes in verbatim environments -\IfFileExists{upquote.sty}{\usepackage{upquote}}{} -\IfFileExists{microtype.sty}{% use microtype if available - \usepackage[]{microtype} - \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts -}{} -\makeatletter -\@ifundefined{KOMAClassName}{% if non-KOMA class - \IfFileExists{parskip.sty}{% - \usepackage{parskip} - }{% else - \setlength{\parindent}{0pt} - \setlength{\parskip}{6pt plus 2pt minus 1pt}} -}{% if KOMA class - \KOMAoptions{parskip=half}} -\makeatother -\usepackage{xcolor} -\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available -\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}} -\hypersetup{ - pdftitle={Using pandoc-ling}, - pdfauthor={Michael Cysouw}, - hidelinks, - pdfcreator={LaTeX via pandoc}} -\urlstyle{same} % disable monospaced font for URLs -\usepackage{graphicx} -\makeatletter -\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi} -\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi} -\makeatother -% Scale images if necessary, so that they will not overflow the page -% margins by default, and it is still possible to overwrite the defaults -% using explicit options in \includegraphics[width, height, ...]{} -\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio} -% Set default figure placement to htbp -\makeatletter -\def\fps@figure{htbp} -\makeatother -\setlength{\emergencystretch}{3em} % prevent overfull lines -\providecommand{\tightlist}{% - \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} -\setcounter{secnumdepth}{5} -\usepackage{langsci-gb4e} -\usepackage[noparens]{nnext} -\usepackage{chngcntr} -\counterwithin{xnumi}{section} -\ifluatex - \usepackage{selnolig} % disable illegal ligatures -\fi - -\title{Using pandoc-ling} -\author{Michael Cysouw} -\date{} - -\begin{document} -\maketitle - -{ -\setcounter{tocdepth}{3} -\tableofcontents -} -\hypertarget{pandoc-ling}{% -\section{pandoc-ling}\label{pandoc-ling}} - -\emph{Michael Cysouw} -\textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{} - -A Pandoc filter for linguistic examples - -tl;dr - -\begin{itemize} -\tightlist -\item - Easily write linguistic examples including basic interlinear glossing. -\item - Let numbering and cross-referencing be done for you. -\item - Export to (almost) any format of your wishes for final polishing. -\end{itemize} - -\hypertarget{rationale}{% -\section{Rationale}\label{rationale}} - -In the field of linguistics there is an outspoken tradition to format -example sentences in research papers in a very specific way. In the -field, it is a perennial problem to get such example sentences to look -just right. Within Latex, there are numerous packages to deal with this -problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your -needs, there is some Latex solution for almost everyone. However, these -solutions in Latex are often cumbersome to type, and they are not -portable to other formats. Specifically, transfer between Latex, html, -docx, odt or epub would actually be highly desirable. Such transfer is -the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John -MacFarlane that provides conversion between these (and many more) -formats. - -Any such conversion between text-formats naturally never works -perfectly: every text-format has specific features that are not -transferable to other formats. A central goal of Pandoc (at least in my -interpretation) is to define a set of shared concepts for text-structure -(a `common denominator' if you will, but surely not `least'!) that can -then be mapped to other formats. In many ways, Pandoc tries (again) to -define a set of logical concepts for text structure (`semantic markup'), -which can then be formatted by your favourite typesetter. As long as you -stay inside the realm of this `common denominator' (in practice that -means Pandoc's extended version of Markdown/CommonMark), conversion -works reasonably well (think 90\%-plus). - -Building on John Gruber's -\href{https://daringfireball.net/projects/markdown/syntax}{Markdown -philosophy}, there is a strong urge here to learn to restrain oneself -while writing, and try to restrict the number of layout-possibilities to -a minimum. In this sense, with \texttt{pandoc-ling} I propose a -Markdown-structure for linguistic examples that is simple, easy to type, -easy to read, and portable through the Pandoc universe by way of an -extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This -extension will not magically allow you to write every linguistic example -thinkable, but my guess is that in practice the present proposal covers -the majority of situations in linguistic publications (think 90\%-plus). -As an example (and test case) I have included automatic conversions into -various formats in this repository (chech them out to get an idea of the -strengths and weaknesses of this approach). - -\hypertarget{the-basic-structure-of-a-linguistic-example}{% -\section{The basic structure of a linguistic -example}\label{the-basic-structure-of-a-linguistic-example}} - -Basically, a linguistic examples consists of 6 possible building blocks, -of which only the number and at least one example line are necessary. -The space between the building blocks is kept as minimal as possible -without becoming cramped. When (optional) building blocks are not -included, then the other blocks shift left and up (only exception: a -preamble without labels is not shifted left completely, but left-aligned -with the example, not with the judgement). - -\begin{itemize} -\tightlist -\item - \textbf{Number}: Running tally of all examples in the work, possibly - restarting at chapters or other major headings. Typically between - round brackets, possibly with a chapter number added before in long - works, e.g.~example (7.26). Aligned top-left, typically left-aligned - to main text margin. -\item - \textbf{Preamble}: Optional information about the content/kind of - example. Aligned top-left: to the top with the number, to the left - with the (optional) label. When there is no label, then preamble is - aligned with the example, not with the judgment. -\item - \textbf{Label}: Indices for sub-examples. Only present when there are - more than one example grouped together inside one numbered entity. - Typically these sub-example labels use latin letters followed by a - full stop. They are left-aligned with the preamble, and each label is - top-aligned with the top-line of the corresponding example (important - for longer line-wrapped examples). -\item - \textbf{Judgment}: Examples can optionally have grammaticality - judgments, typically symbols like **?!* sometimes in superscript - relative to the corresponding example. judgements are right-aligned to - each other, typically with only minimal space to the left-aligned - examples. -\item - \textbf{Line example}: A minimal linguistic example has at least one - line example, i.e.~an utterance of interest. Building blocks in - general shift left and up when other (optional) building blocks are - not present. Minimally, this results in a number with one line - example. -\item - \textbf{Interlinear example}: A complex structure typically used for - examples from languages unknown to most readers. Consist of three or - four lines that are left-aligned: - - \begin{itemize} - \tightlist - \item - \textbf{Header}: An optional header is typically used to display - information about the language of the example, including literature - references. When not present, then all other lines from the - interlinear example shift upwards. - \item - \textbf{Source}: The actual language utterance, often typeset in - italics. This line is internally separated at spaces, and each - sub-block is left-aligned with the corresponding sub-blocks of the - gloss. - \item - \textbf{Gloss}: Explanation of the meaning of the source, often - using abbreviations in small caps. This line is internally separated - at spaces, and each block is left-aligned with the block from - source. - \item - \textbf{Translation}: Free translation of the source, typically - quoted. Not separated in blocks, but freely extending to the right. - Left-aligned with the other lines from the interlinear example. - \end{itemize} -\end{itemize} - -\begin{figure} -\centering -\includegraphics{figure/ExampleStructure.png} -\caption{The structure of a linguistic example.} -\end{figure} - -There are of course much more possibilities to extend the structure of a -linguistic examples, like third or fourth subdivisions of labels (often -using small roman numerals as a third level) or multiple glossing lines -in the interlinear example. Also, the content of the header is sometimes -found right-aligned to the right of the interlinear example (language -into to the top, reference to the bottom). All such options are -currently not supported by \texttt{pandoc-ling}. - -Under the hood, this structure is prepared by \texttt{pandoc-ling} as a -table. Tables are reasonably well transcoded to different document -formats. Specific layout considerations mostly have to be set manually. -Alignment of the text should work in most exports. Some \texttt{CSS} -styling is proposed by \texttt{pandoc-ling}, but can of course be -overruled. - -\hypertarget{introducing-pandoc-ling}{% -\section{\texorpdfstring{Introducing -\texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling}} - -\hypertarget{editing-linguistic-examples}{% -\subsection{Editing linguistic -examples}\label{editing-linguistic-examples}} - -To include a linguistic example in Markdown \texttt{pandoc-ling} uses -the \texttt{div} structure, which is indicated in Pandoc-Markdown by -typing three colons at the start and three colons at the end. To -indicate the \texttt{class} of this \texttt{div} the letters `ex' (for -`example') should be added after the top colons (with or without space -in between). This `ex'-class is the signal for \texttt{pandoc-ling} to -start processing such a \texttt{div}. The numbering of these examples -will be inserted by \texttt{pandoc-ling}. - -Empty lines can be added inside the \texttt{div} for visual pleasure, as -they mostly do not have an influence on the output. Exception: do -\emph{not} use empty lines between unlabelled line examples. Multiple -lines of text can be used (without empty lines in between), but they -will simply be interpreted as one sequential paragraph. - -\begin{verbatim} -::: ex -This is the most basic structure of a linguistic example. -::: -\end{verbatim} - -\ea \judgewidth{} \label{ling-ex:4.1} - This is the most basic structure of a linguistic example. -\z - -Alternatively, the \texttt{class} can be put in curled brackets (and -then a leading full stop is necessary before \texttt{ex}). Inside these -brackets more attributes can be added (separated by space), for example -an id, using a hash, or any attribute=value pairs that should apply to -this example. Currently there is only one attribute implemented -(\texttt{formatGloss}), but in principle it is possible to add more -attributes that can be used to fine-tune the typesetting of the example. - -\begin{verbatim} -::: {#id .ex formatGloss=false} - -This is a multi-line example. -But that does not mean anything for the result -All these lines are simply treated as one paragraph. -They will become one example with one number. - -::: -\end{verbatim} - -\ea \judgewidth{} \label{id} - This is a multi-line example. But that does not mean anything for the -result All these lines are simply treated as one paragraph. They will -become one example with one number. -\z - -A preamble can be added by inserting an empty line between preamble and -example. The same considerations about multiple text-lines apply. - -\begin{verbatim} -:::ex -Preamble - -This is an example with a preamble. -::: -\end{verbatim} - -\ea \judgewidth{} \label{ling-ex:4.3} Preamble\\ - This is an example with a preamble. -\z - -Sub-examples with labels are entered by starting each sub-example with a -small latin letter and a full stop. Empty lines between labels are -allowed. Subsequent lines without labels are treated as one paragraph. -Empty lines \emph{not} followed by a label with a full stop will result -in errors. - -\begin{verbatim} -:::ex -a. This is the first example. -b. This is the second. -a. The actual letters are not important, `pandoc-ling` will put them in order. - -e. Empty lines are allowed between labelled lines -Subsequent lines are again treated as one sequential paragraph. -::: -\end{verbatim} - -\ea \judgewidth{} \label{ling-ex:4.4} - \ea [] { This is the first example. } - \ex [] { This is the second. } - \ex [] { The actual letters are not important, \texttt{pandoc-ling} -will put them in order. } - \ex [] { Empty lines are allowed between labelled lines Subsequent -lines are again treated as one sequential paragraph. } - \z -\z - -A labelled list can be combined with a preamble. - -\begin{verbatim} -:::ex -Any nice description here - -a. one example sentence. -b. two -c. three -::: -\end{verbatim} - -\ea \judgewidth{} \label{ling-ex:4.5} Any nice description here - \ea [] { one example sentence. } - \ex [] { two } - \ex [] { three } - \z -\z - -Grammaticality judgements should be added before an example, and after -an optional label, separated from both by spaces (though four spaces in -a row should be avoided, that could lead to layout errors). To indicate -that any sequence of symbols is a judgements, prepend the judgement with -a caret \texttt{\^{}}. Alignment will be figured out by -\texttt{pandoc-ling}. - -\begin{verbatim} -:::ex -Throwing in a preamble for good measure - -a. ^* This traditionally signals ungrammaticality. -b. ^? Question-marks indicate questionable grammaticality. -c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). -d. However, such long sequences sometimes lead to undesirable effects in the layout. -::: -\end{verbatim} - -\ea \judgewidth{whynot?} \label{ling-ex:4.6} Throwing in a preamble for -good measure - \ea [*] { This traditionally signals ungrammaticality. } - \ex [?] { Question-marks indicate questionable grammaticality. } - \ex [\textsuperscript{whynot?}] { But in principle any sequence can be -used (here even in superscript). } - \ex [] { However, such long sequences sometimes lead to undesirable -effects in the layout. } - \z -\z - -A minor detail is the alignment of a single example with a preamble and -grammaticality judgements. In this case it looks better for the preamble -to be left aligned with the example and not with the judgement. - -\begin{verbatim} -:::ex -Here is a special case with a preamble - -^^???^ With a singly questionably example. -Note the alignment! Especially with this very long example -that should go over various lines in the output. -::: -\end{verbatim} - -\ea \judgewidth{???} \label{ling-ex:4.7} Here is a special case with a -preamble\\ - \textsuperscript{???}With a singly questionably example. Note the -alignment! Especially with this very long example that should go over -various lines in the output. -\z - -\hypertarget{interlinear-examples}{% -\subsection{Interlinear examples}\label{interlinear-examples}} - -For interlinear examples with aligned source and gloss, the structure of -a \texttt{lineblock} is used, starting the lines with a vertical line -\texttt{\textbar{}}. There should always be four vertical lines (for -header, source, gloss and translation, respectively), although the -content after the first vertical line can be empty. The source and gloss -lines are separated at spaces, and all parts are right-aligned. If you -want to have a space that is not separated, you will have to `protect' -the space, either by putting a backslash before the space, or by -inserting a non-breaking space instead of a normal space (either type -\texttt{\ } or insert an actual non-breaking space, i.e.~unicode -character \texttt{U+00A0}). - -\begin{verbatim} -:::ex -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\ea [] { \judgewidth{} \label{ling-ex:4.8} - Dutch (Germanic)\\ - \gll Deze zin is in het nederlands. \\ - DEM sentence AUX in DET dutch. \\ - \glt This sentence is dutch. } -\z - -An attempt is made to format interlinear examples when the option -\texttt{formatGloss=true} is added. This will: - -\begin{itemize} -\tightlist -\item - remove formatting from the source and set everything in italics, -\item - remove formatting from the gloss and set sequences (\textgreater1) of - capitals and numbers into small caps (note that the positioning of - small caps on web pages is - \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly - complex}), -\item - a tilde \texttt{\textasciitilde{}} between spaces in the gloss is - treated as a shortcut for an empty gloss (internally, the sequence - \texttt{space-tilde-space} is replaced by - \texttt{space-space-nonBreakingSpace-space-space}), -\item - consistently put translations in single quotes, possibly removing - other quotes. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\ea [] { \judgewidth{} \label{ling-ex:4.9} - Dutch (Germanic)\\ - \gll \emph{Deze} \emph{zin} \emph{is} \emph{in} \emph{het} -\emph{nederlands.} \\ - \textsc{dem} sentence \textsc{aux} in \textsc{det} dutch. \\ - \glt `This sentence is dutch.' } -\z - -The results of such formatting will not always work, but it seems to be -quite robust in my testing. The next example brings everything together: - -\begin{itemize} -\tightlist -\item - a preamble, -\item - labels, both for single lines and for interlinear examples, -\item - interlinear examples start on a new line immediately after the - letter-label, -\item - grammaticality judgements with proper alignment, -\item - when the header of an interlinear example is left out, everything is - shifted up, -\item - The formatting of the interlinear is harmonised. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -Completely superfluous preamble, but it works ... - -a. Mixing single line examples with interlinear examples. -a. This is of course highly unusal. -Just for this example, let's add some extra material in this example. - -a. -| Dutch (Germanic) Note the grammaticality judgement! -| ^^:-)^ Deze zin is (dit\ is test) nederlands. -| DEM sentence AUX ~ dutch. -| This sentence is dutch. - -b. -| -| Deze tweede zin heeft geen header. -| DEM second sentence have.3SG.PRES no header. -| This second sentence does not have a header. -::: -\end{verbatim} - -\ea \judgewidth{:-)} \label{ling-ex:4.10} Completely superfluous -preamble, but it works \ldots{} - \ea [] { Mixing single line examples with interlinear examples. } - \ex [] { This is of course highly unusal. Just for this example, let's -add some extra material in this example. } - \ex [\textsuperscript{:-)}] { - Dutch (Germanic) Note the grammaticality judgement!\\ - \gll \emph{Deze} \emph{zin} \emph{is} \emph{(dit~is~test)} -\emph{nederlands.} \\ - \textsc{dem} sentence \textsc{aux} ~ dutch. \\ - \glt `This sentence is dutch.' } - \ex [] { - \gll \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen} -\emph{header.} \\ - \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no -header. \\ - \glt `This second sentence does not have a header.' } - \z -\z - -\hypertarget{cross-referencing-examples}{% -\subsection{Cross-referencing -examples}\label{cross-referencing-examples}} - -The examples are automatically numbered by \texttt{pandoc-ling}. -Cross-references to examples can be made by using the \texttt{{[}@ID{]}} -format (used by Pandoc for citations). When an example has an explicit -identifier (like \texttt{\#test} in the next example), then a reference -can be made to this example with \texttt{{[}@test{]}}, leading to -(\ref{test}) when formatted. - -\begin{verbatim} -::: {#test .ex} -This is a test -::: -\end{verbatim} - -\ea \judgewidth{} \label{test} - This is a test -\z - -Inspired by the \texttt{linguex}-approach, you can also use the keywords -\texttt{Next} or \texttt{Last} to refer to the next or the last example, -e.g.~\texttt{{[}@Last{]}} will be formatted as (\ref{test}). By doubling -the capitals to \texttt{NNext} or \texttt{LLast} reference to the -next/last-but-one can be made. Actually, the number of starting capitals -can be repeated at will in \texttt{pandoc-ling}, so something like -\texttt{{[}@LLLLLLLLast{]}} will also work. It will be formatted as -(\ref{ling-ex:4.4}) after the processing of \texttt{pandoc-ling}. -Needless to say that in such a situation an explicit identifier would be -a better choice. - -Referring to sub-examples can be done by manually adding a suffix into -the cross reference, simply separated from the identifier by a space. -For example, \texttt{{[}@LLast~c{]}} will refer to the third sub-example -of the last-but-one example. Formatted this will look like this: -(\ref{ling-ex:4.10}\,c), smile! However, note that the ``c'' has to be -manually determined. It is simply a literal suffix that will be copied -into the cross-reference. Something like \texttt{{[}@LLast\ Ha1l0{]}} -will work also, leading to (\ref{ling-ex:4.10}\,Ha1l0) when formatted -(which is of course nonsensical). - -\hypertarget{options-of-pandoc-ling}{% -\subsection{\texorpdfstring{Options of -\texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling}} - -\hypertarget{global-options}{% -\subsubsection{Global options}\label{global-options}} - -The following global options are available with \texttt{pandoc-ling}. -These can be added to the -\href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}. -An example of such metadata can be found at the bottom of this -\texttt{readme} in the form of a YAML-block. Pandoc allows for various -methods to provide metadata (see the link above). - -\begin{itemize} -\tightlist -\item - \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}): - should all interlinear examples be consistently formatted? If you use - this option, you can simply use capital letters for abbreviations in - the gloss, and they will be changed to small caps. The source line is - set to italics, and the translations is put into single quotes. -\item - \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space): - When cross references have a suffix, how should the separator be - formatted? The defaults `no-break-space' is a safe options, but I - personally like a `thin space' better (Unicode \texttt{U+2009}), but - symbol does not work with many fonts, and might lead to errors. For - Latex typesetting, all space-like symbols are converted to a Latex - thin space \texttt{\textbackslash{},}. -\item - \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}): - should the counting restart for each chapter? Actually, when - \texttt{true} this setting will restart the counting at the highest - heading level, which for various output formats can be set by the - Pandoc option \texttt{top-level-division}. Depending on your Latex - setup, an explicit entry \texttt{top-level-division:\ chapter} might - be necessary in your metadata. -\item - \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}): - should the chapter (= highest heading level) number be added to the - number of the example? In most formats this automatically implies - \texttt{restartAtChapter:\ true}. In most Latex situations this only - works in combination with a \texttt{documentclass:\ book}. -\item - \textbf{\texttt{latexPackage}} (one of: \texttt{linguex}, - \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default - \texttt{linguex}): Various options for converting examples to Latex - packages that typeset linguistic examples. None of the conversions - works perfectly, though in should work in most normal situations - (think 90\%-plus). It might be necessary to first convert to - \texttt{Latex} and then typeset. Using the direct option insider - Pandoc might also work in many situations. -\end{itemize} - -\hypertarget{local-options}{% -\subsubsection{Local options}\label{local-options}} - -Local options are options that can be set for each individual example. -The \texttt{formatGloss} option can be used to have an individual -example be formatted differently from the global setting. For example, -when the global setting is \texttt{formatGloss:\ true} in the metadata, -then adding \texttt{formatGloss=false} in the curly brackets of a -specific example will block the formatting. This is especially useful -when the automatic formatting does not give the desired result. - -If you want to add something else (not a linguistic example) in a -numbered example, then there is the local option \texttt{noFormat=true}. -An attempt will be made to try and do a reasonable layout. Multiple -paragraphs will simply we taken as is, and the number will be put in -front. In HTML the number will be centred. It is usable for an -incidental mathematical formula. - -\begin{verbatim} -::: {.ex noFormat=true} -$$\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}$$ -::: -\end{verbatim} - -\ea \judgewidth{} \label{ling-ex:4.12} - \[\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}\]\\ - -\z - -\hypertarget{issues-with-pandoc-ling}{% -\subsection{\texorpdfstring{Issues with -\texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling}} - -\begin{itemize} -\tightlist -\item - Manually provided identifiers for examples should not be purely - numerical (so do not use e.g.~\texttt{\#5789}). In some situation this - interferes with the setting of the cross-references. -\item - Because the cross-references use the same structure as citations in - Pandoc, the processing of citations (by \texttt{citeproc}) should be - performed \textbf{after} the processing by \texttt{pandoc-ling}. - Further, - \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}}, - another Pandoc extension for numbering figures and other captions, - also uses the same system. From experience, it seems safer to put - \texttt{pandoc-crossref} \textbf{before} \texttt{pandoc-ling} in the - order of processing (though I have no idea why). -\item - Interlinear examples will will not wrap at the end of the page. There - is no solution yet for longer examples that are longer than the size - of the page. -\item - When exporting to \texttt{docx} there is a problem because there are - paragraphs inserted after tables, which adds space in lists with - multiple interlinear examples. This is - \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by - design}. The official solution is to set font-size to 1 for this - paragraph inside MS Word. -\item - Multi-column cells are crucial for \texttt{pandoc-ling} to work - properly. These are only introduced in new table format with Pandoc - 2.10 (so older Pandoc version are not supported). Also note that these - structures are not yet exported to all formats, e.g.~it will not be - displayed correctly in \texttt{docx}. However, this is currently an - area of active development -\item - \texttt{langsci-gb4e} is only available as part of the - \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}. - You have to make it available to Pandoc, e.g.~by adding it into the - same directory as the pandoc-ling.lua filter. I have added a recent - version of \texttt{langsci-gb4e} here for convenience, but this one - might be outdated at some time in the future. -\end{itemize} - -\hypertarget{a-note-on-latex-conversion}{% -\subsection{A note on Latex -conversion}\label{a-note-on-latex-conversion}} - -Originally, I decided to write this filter as a two-pronged conversion, -making a markdown version myself, but using a mapping to one of the many -latex libraries for linguistics examples as a quick fix. I assumed that -such a mapping would be the easy part. However, it turned out that the -mapping to latex was much more difficult that I anticipated. Basically, -it turned out that the `common denominator' that I was aiming for was -not necessarily the `common denominator' provided by the latex packages. -I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and -expex) with growing dismay. This approach resulted in a first version. -However, after this version was (more or less) finished, I realised that -it would be better to first define the `common denominator' more clearly -(as done here), and then implement this purely in Pandoc. From that -basis I have then made attempts to map them to the various latex -packages. - -\hypertarget{a-note-on-implementation}{% -\subsection{A note on implementation}\label{a-note-on-implementation}} - -The basic structure of the examples are transformed into Pandoc tables. -Tables are reasonably safe for converting in other formats. Care has -been taken to add \texttt{classes} to all elements of the tables -(e.g.~the preamble has the class \texttt{linguistic-example-preamble}). -When exported formats are aware of these classes, they can be used to -fine-tune the formatting. I have used a few such fine-tunings into the -html output of this filter by adding a few CSS-style statements. The -naming of the classes is quite transparent, using the form -\texttt{linguistic-example-...}. - -\end{document} diff --git a/readme conversions/readme_linguex.pdf b/readme conversions/readme_linguex.pdf deleted file mode 100644 index 2eb132e..0000000 Binary files a/readme conversions/readme_linguex.pdf and /dev/null differ diff --git a/readme conversions/readme_linguex.tex b/readme conversions/readme_linguex.tex deleted file mode 100644 index 5896dcd..0000000 --- a/readme conversions/readme_linguex.tex +++ /dev/null @@ -1,730 +0,0 @@ -% Options for packages loaded elsewhere -\PassOptionsToPackage{unicode}{hyperref} -\PassOptionsToPackage{hyphens}{url} -% -\documentclass[ -]{article} -\usepackage{lmodern} -\usepackage{amsmath} -\usepackage{ifxetex,ifluatex} -\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex - \usepackage[T1]{fontenc} - \usepackage[utf8]{inputenc} - \usepackage{textcomp} % provide euro and other symbols - \usepackage{amssymb} -\else % if luatex or xetex - \usepackage{unicode-math} - \defaultfontfeatures{Scale=MatchLowercase} - \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} -\fi -% Use upquote if available, for straight quotes in verbatim environments -\IfFileExists{upquote.sty}{\usepackage{upquote}}{} -\IfFileExists{microtype.sty}{% use microtype if available - \usepackage[]{microtype} - \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts -}{} -\makeatletter -\@ifundefined{KOMAClassName}{% if non-KOMA class - \IfFileExists{parskip.sty}{% - \usepackage{parskip} - }{% else - \setlength{\parindent}{0pt} - \setlength{\parskip}{6pt plus 2pt minus 1pt}} -}{% if KOMA class - \KOMAoptions{parskip=half}} -\makeatother -\usepackage{xcolor} -\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available -\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}} -\hypersetup{ - pdftitle={Using pandoc-ling}, - pdfauthor={Michael Cysouw}, - hidelinks, - pdfcreator={LaTeX via pandoc}} -\urlstyle{same} % disable monospaced font for URLs -\usepackage{graphicx} -\makeatletter -\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi} -\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi} -\makeatother -% Scale images if necessary, so that they will not overflow the page -% margins by default, and it is still possible to overwrite the defaults -% using explicit options in \includegraphics[width, height, ...]{} -\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio} -% Set default figure placement to htbp -\makeatletter -\def\fps@figure{htbp} -\makeatother -\setlength{\emergencystretch}{3em} % prevent overfull lines -\providecommand{\tightlist}{% - \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} -\setcounter{secnumdepth}{5} -\usepackage{linguex} -\renewcommand{\theExLBr}{} -\renewcommand{\theExRBr}{} -\newcommand{\jdg}[1]{\makebox[0.4em][r]{\normalfont#1\ignorespaces}} -\usepackage{chngcntr} -\counterwithin{ExNo}{section} -\renewcommand{\Exarabic}{\thesection.\arabic} -\ifluatex - \usepackage{selnolig} % disable illegal ligatures -\fi - -\title{Using pandoc-ling} -\author{Michael Cysouw} -\date{} - -\begin{document} -\maketitle - -{ -\setcounter{tocdepth}{3} -\tableofcontents -} -\hypertarget{pandoc-ling}{% -\section{pandoc-ling}\label{pandoc-ling}} - -\emph{Michael Cysouw} -\textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{} - -A Pandoc filter for linguistic examples - -tl;dr - -\begin{itemize} -\tightlist -\item - Easily write linguistic examples including basic interlinear glossing. -\item - Let numbering and cross-referencing be done for you. -\item - Export to (almost) any format of your wishes for final polishing. -\end{itemize} - -\hypertarget{rationale}{% -\section{Rationale}\label{rationale}} - -In the field of linguistics there is an outspoken tradition to format -example sentences in research papers in a very specific way. In the -field, it is a perennial problem to get such example sentences to look -just right. Within Latex, there are numerous packages to deal with this -problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your -needs, there is some Latex solution for almost everyone. However, these -solutions in Latex are often cumbersome to type, and they are not -portable to other formats. Specifically, transfer between Latex, html, -docx, odt or epub would actually be highly desirable. Such transfer is -the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John -MacFarlane that provides conversion between these (and many more) -formats. - -Any such conversion between text-formats naturally never works -perfectly: every text-format has specific features that are not -transferable to other formats. A central goal of Pandoc (at least in my -interpretation) is to define a set of shared concepts for text-structure -(a `common denominator' if you will, but surely not `least'!) that can -then be mapped to other formats. In many ways, Pandoc tries (again) to -define a set of logical concepts for text structure (`semantic markup'), -which can then be formatted by your favourite typesetter. As long as you -stay inside the realm of this `common denominator' (in practice that -means Pandoc's extended version of Markdown/CommonMark), conversion -works reasonably well (think 90\%-plus). - -Building on John Gruber's -\href{https://daringfireball.net/projects/markdown/syntax}{Markdown -philosophy}, there is a strong urge here to learn to restrain oneself -while writing, and try to restrict the number of layout-possibilities to -a minimum. In this sense, with \texttt{pandoc-ling} I propose a -Markdown-structure for linguistic examples that is simple, easy to type, -easy to read, and portable through the Pandoc universe by way of an -extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This -extension will not magically allow you to write every linguistic example -thinkable, but my guess is that in practice the present proposal covers -the majority of situations in linguistic publications (think 90\%-plus). -As an example (and test case) I have included automatic conversions into -various formats in this repository (chech them out to get an idea of the -strengths and weaknesses of this approach). - -\hypertarget{the-basic-structure-of-a-linguistic-example}{% -\section{The basic structure of a linguistic -example}\label{the-basic-structure-of-a-linguistic-example}} - -Basically, a linguistic examples consists of 6 possible building blocks, -of which only the number and at least one example line are necessary. -The space between the building blocks is kept as minimal as possible -without becoming cramped. When (optional) building blocks are not -included, then the other blocks shift left and up (only exception: a -preamble without labels is not shifted left completely, but left-aligned -with the example, not with the judgement). - -\begin{itemize} -\tightlist -\item - \textbf{Number}: Running tally of all examples in the work, possibly - restarting at chapters or other major headings. Typically between - round brackets, possibly with a chapter number added before in long - works, e.g.~example (7.26). Aligned top-left, typically left-aligned - to main text margin. -\item - \textbf{Preamble}: Optional information about the content/kind of - example. Aligned top-left: to the top with the number, to the left - with the (optional) label. When there is no label, then preamble is - aligned with the example, not with the judgment. -\item - \textbf{Label}: Indices for sub-examples. Only present when there are - more than one example grouped together inside one numbered entity. - Typically these sub-example labels use latin letters followed by a - full stop. They are left-aligned with the preamble, and each label is - top-aligned with the top-line of the corresponding example (important - for longer line-wrapped examples). -\item - \textbf{Judgment}: Examples can optionally have grammaticality - judgments, typically symbols like **?!* sometimes in superscript - relative to the corresponding example. judgements are right-aligned to - each other, typically with only minimal space to the left-aligned - examples. -\item - \textbf{Line example}: A minimal linguistic example has at least one - line example, i.e.~an utterance of interest. Building blocks in - general shift left and up when other (optional) building blocks are - not present. Minimally, this results in a number with one line - example. -\item - \textbf{Interlinear example}: A complex structure typically used for - examples from languages unknown to most readers. Consist of three or - four lines that are left-aligned: - - \begin{itemize} - \tightlist - \item - \textbf{Header}: An optional header is typically used to display - information about the language of the example, including literature - references. When not present, then all other lines from the - interlinear example shift upwards. - \item - \textbf{Source}: The actual language utterance, often typeset in - italics. This line is internally separated at spaces, and each - sub-block is left-aligned with the corresponding sub-blocks of the - gloss. - \item - \textbf{Gloss}: Explanation of the meaning of the source, often - using abbreviations in small caps. This line is internally separated - at spaces, and each block is left-aligned with the block from - source. - \item - \textbf{Translation}: Free translation of the source, typically - quoted. Not separated in blocks, but freely extending to the right. - Left-aligned with the other lines from the interlinear example. - \end{itemize} -\end{itemize} - -\begin{figure} -\centering -\includegraphics{figure/ExampleStructure.png} -\caption{The structure of a linguistic example.} -\end{figure} - -There are of course much more possibilities to extend the structure of a -linguistic examples, like third or fourth subdivisions of labels (often -using small roman numerals as a third level) or multiple glossing lines -in the interlinear example. Also, the content of the header is sometimes -found right-aligned to the right of the interlinear example (language -into to the top, reference to the bottom). All such options are -currently not supported by \texttt{pandoc-ling}. - -Under the hood, this structure is prepared by \texttt{pandoc-ling} as a -table. Tables are reasonably well transcoded to different document -formats. Specific layout considerations mostly have to be set manually. -Alignment of the text should work in most exports. Some \texttt{CSS} -styling is proposed by \texttt{pandoc-ling}, but can of course be -overruled. - -\hypertarget{introducing-pandoc-ling}{% -\section{\texorpdfstring{Introducing -\texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling}} - -\hypertarget{editing-linguistic-examples}{% -\subsection{Editing linguistic -examples}\label{editing-linguistic-examples}} - -To include a linguistic example in Markdown \texttt{pandoc-ling} uses -the \texttt{div} structure, which is indicated in Pandoc-Markdown by -typing three colons at the start and three colons at the end. To -indicate the \texttt{class} of this \texttt{div} the letters `ex' (for -`example') should be added after the top colons (with or without space -in between). This `ex'-class is the signal for \texttt{pandoc-ling} to -start processing such a \texttt{div}. The numbering of these examples -will be inserted by \texttt{pandoc-ling}. - -Empty lines can be added inside the \texttt{div} for visual pleasure, as -they mostly do not have an influence on the output. Exception: do -\emph{not} use empty lines between unlabelled line examples. Multiple -lines of text can be used (without empty lines in between), but they -will simply be interpreted as one sequential paragraph. - -\begin{verbatim} -::: ex -This is the most basic structure of a linguistic example. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.1} - This is the most basic structure of a linguistic example. - -Alternatively, the \texttt{class} can be put in curled brackets (and -then a leading full stop is necessary before \texttt{ex}). Inside these -brackets more attributes can be added (separated by space), for example -an id, using a hash, or any attribute=value pairs that should apply to -this example. Currently there is only one attribute implemented -(\texttt{formatGloss}), but in principle it is possible to add more -attributes that can be used to fine-tune the typesetting of the example. - -\begin{verbatim} -::: {#id .ex formatGloss=false} - -This is a multi-line example. -But that does not mean anything for the result -All these lines are simply treated as one paragraph. -They will become one example with one number. - -::: -\end{verbatim} - -\ex. \label{id} - This is a multi-line example. But that does not mean anything for the -result All these lines are simply treated as one paragraph. They will -become one example with one number. - -A preamble can be added by inserting an empty line between preamble and -example. The same considerations about multiple text-lines apply. - -\begin{verbatim} -:::ex -Preamble - -This is an example with a preamble. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.3} Preamble\\ - This is an example with a preamble. - -Sub-examples with labels are entered by starting each sub-example with a -small latin letter and a full stop. Empty lines between labels are -allowed. Subsequent lines without labels are treated as one paragraph. -Empty lines \emph{not} followed by a label with a full stop will result -in errors. - -\begin{verbatim} -:::ex -a. This is the first example. -b. This is the second. -a. The actual letters are not important, `pandoc-ling` will put them in order. - -e. Empty lines are allowed between labelled lines -Subsequent lines are again treated as one sequential paragraph. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.4} - \a. This is the first example. - \b. This is the second. - \b. The actual letters are not important, \texttt{pandoc-ling} will -put them in order. - \b. Empty lines are allowed between labelled lines Subsequent lines -are again treated as one sequential paragraph. - -A labelled list can be combined with a preamble. - -\begin{verbatim} -:::ex -Any nice description here - -a. one example sentence. -b. two -c. three -::: -\end{verbatim} - -\ex. \label{ling-ex:4.5} Any nice description here - \a. one example sentence. - \b. two - \b. three - -Grammaticality judgements should be added before an example, and after -an optional label, separated from both by spaces (though four spaces in -a row should be avoided, that could lead to layout errors). To indicate -that any sequence of symbols is a judgements, prepend the judgement with -a caret \texttt{\^{}}. Alignment will be figured out by -\texttt{pandoc-ling}. - -\begin{verbatim} -:::ex -Throwing in a preamble for good measure - -a. ^* This traditionally signals ungrammaticality. -b. ^? Question-marks indicate questionable grammaticality. -c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). -d. However, such long sequences sometimes lead to undesirable effects in the layout. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.6} Throwing in a preamble for good measure - \a. \jdg{*}This traditionally signals ungrammaticality. - \b. \jdg{?}Question-marks indicate questionable grammaticality. - \b. \jdg{\textsuperscript{whynot?}}But in principle any sequence can -be used (here even in superscript). - \b. However, such long sequences sometimes lead to undesirable effects -in the layout. - -A minor detail is the alignment of a single example with a preamble and -grammaticality judgements. In this case it looks better for the preamble -to be left aligned with the example and not with the judgement. - -\begin{verbatim} -:::ex -Here is a special case with a preamble - -^^???^ With a singly questionably example. -Note the alignment! Especially with this very long example -that should go over various lines in the output. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.7} Here is a special case with a preamble\\ - \jdg{\textsuperscript{???}}With a singly questionably example. Note -the alignment! Especially with this very long example that should go -over various lines in the output. - -\hypertarget{interlinear-examples}{% -\subsection{Interlinear examples}\label{interlinear-examples}} - -For interlinear examples with aligned source and gloss, the structure of -a \texttt{lineblock} is used, starting the lines with a vertical line -\texttt{\textbar{}}. There should always be four vertical lines (for -header, source, gloss and translation, respectively), although the -content after the first vertical line can be empty. The source and gloss -lines are separated at spaces, and all parts are right-aligned. If you -want to have a space that is not separated, you will have to `protect' -the space, either by putting a backslash before the space, or by -inserting a non-breaking space instead of a normal space (either type -\texttt{\ } or insert an actual non-breaking space, i.e.~unicode -character \texttt{U+00A0}). - -\begin{verbatim} -:::ex -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.8} - Dutch (Germanic) - \gll Deze zin is in het nederlands. \\ - DEM sentence AUX in DET dutch. \\ - \glt This sentence is dutch. - -An attempt is made to format interlinear examples when the option -\texttt{formatGloss=true} is added. This will: - -\begin{itemize} -\tightlist -\item - remove formatting from the source and set everything in italics, -\item - remove formatting from the gloss and set sequences (\textgreater1) of - capitals and numbers into small caps (note that the positioning of - small caps on web pages is - \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly - complex}), -\item - a tilde \texttt{\textasciitilde{}} between spaces in the gloss is - treated as a shortcut for an empty gloss (internally, the sequence - \texttt{space-tilde-space} is replaced by - \texttt{space-space-nonBreakingSpace-space-space}), -\item - consistently put translations in single quotes, possibly removing - other quotes. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.9} - Dutch (Germanic) - \gll \emph{Deze} \emph{zin} \emph{is} \emph{in} \emph{het} -\emph{nederlands.} \\ - \textsc{dem} sentence \textsc{aux} in \textsc{det} dutch. \\ - \glt `This sentence is dutch.' - -The results of such formatting will not always work, but it seems to be -quite robust in my testing. The next example brings everything together: - -\begin{itemize} -\tightlist -\item - a preamble, -\item - labels, both for single lines and for interlinear examples, -\item - interlinear examples start on a new line immediately after the - letter-label, -\item - grammaticality judgements with proper alignment, -\item - when the header of an interlinear example is left out, everything is - shifted up, -\item - The formatting of the interlinear is harmonised. -\end{itemize} - -\begin{verbatim} -::: {.ex formatGloss=true} -Completely superfluous preamble, but it works ... - -a. Mixing single line examples with interlinear examples. -a. This is of course highly unusal. -Just for this example, let's add some extra material in this example. - -a. -| Dutch (Germanic) Note the grammaticality judgement! -| ^^:-)^ Deze zin is (dit\ is test) nederlands. -| DEM sentence AUX ~ dutch. -| This sentence is dutch. - -b. -| -| Deze tweede zin heeft geen header. -| DEM second sentence have.3SG.PRES no header. -| This second sentence does not have a header. -::: -\end{verbatim} - -\ex. \label{ling-ex:4.10} Completely superfluous preamble, but it works -\ldots{} - \a. Mixing single line examples with interlinear examples. - \b. This is of course highly unusal. Just for this example, let's add -some extra material in this example. - \b. Dutch (Germanic) Note the grammaticality judgement! - \gll \jdg{\textsuperscript{:-)}}\emph{Deze} \emph{zin} \emph{is} -\emph{(dit~is~test)} \emph{nederlands.} \\ - \textsc{dem} sentence \textsc{aux} ~ dutch. \\ - \glt `This sentence is dutch.' - \b. - \gll \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen} -\emph{header.} \\ - \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no -header. \\ - \glt `This second sentence does not have a header.' - -\hypertarget{cross-referencing-examples}{% -\subsection{Cross-referencing -examples}\label{cross-referencing-examples}} - -The examples are automatically numbered by \texttt{pandoc-ling}. -Cross-references to examples can be made by using the \texttt{{[}@ID{]}} -format (used by Pandoc for citations). When an example has an explicit -identifier (like \texttt{\#test} in the next example), then a reference -can be made to this example with \texttt{{[}@test{]}}, leading to -(\ref{test}) when formatted. - -\begin{verbatim} -::: {#test .ex} -This is a test -::: -\end{verbatim} - -\ex. \label{test} - This is a test - -Inspired by the \texttt{linguex}-approach, you can also use the keywords -\texttt{Next} or \texttt{Last} to refer to the next or the last example, -e.g.~\texttt{{[}@Last{]}} will be formatted as (\ref{test}). By doubling -the capitals to \texttt{NNext} or \texttt{LLast} reference to the -next/last-but-one can be made. Actually, the number of starting capitals -can be repeated at will in \texttt{pandoc-ling}, so something like -\texttt{{[}@LLLLLLLLast{]}} will also work. It will be formatted as -(\ref{ling-ex:4.4}) after the processing of \texttt{pandoc-ling}. -Needless to say that in such a situation an explicit identifier would be -a better choice. - -Referring to sub-examples can be done by manually adding a suffix into -the cross reference, simply separated from the identifier by a space. -For example, \texttt{{[}@LLast~c{]}} will refer to the third sub-example -of the last-but-one example. Formatted this will look like this: -(\ref{ling-ex:4.10}\,c), smile! However, note that the ``c'' has to be -manually determined. It is simply a literal suffix that will be copied -into the cross-reference. Something like \texttt{{[}@LLast\ Ha1l0{]}} -will work also, leading to (\ref{ling-ex:4.10}\,Ha1l0) when formatted -(which is of course nonsensical). - -\hypertarget{options-of-pandoc-ling}{% -\subsection{\texorpdfstring{Options of -\texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling}} - -\hypertarget{global-options}{% -\subsubsection{Global options}\label{global-options}} - -The following global options are available with \texttt{pandoc-ling}. -These can be added to the -\href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}. -An example of such metadata can be found at the bottom of this -\texttt{readme} in the form of a YAML-block. Pandoc allows for various -methods to provide metadata (see the link above). - -\begin{itemize} -\tightlist -\item - \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}): - should all interlinear examples be consistently formatted? If you use - this option, you can simply use capital letters for abbreviations in - the gloss, and they will be changed to small caps. The source line is - set to italics, and the translations is put into single quotes. -\item - \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space): - When cross references have a suffix, how should the separator be - formatted? The defaults `no-break-space' is a safe options, but I - personally like a `thin space' better (Unicode \texttt{U+2009}), but - symbol does not work with many fonts, and might lead to errors. For - Latex typesetting, all space-like symbols are converted to a Latex - thin space \texttt{\textbackslash{},}. -\item - \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}): - should the counting restart for each chapter? Actually, when - \texttt{true} this setting will restart the counting at the highest - heading level, which for various output formats can be set by the - Pandoc option \texttt{top-level-division}. Depending on your Latex - setup, an explicit entry \texttt{top-level-division:\ chapter} might - be necessary in your metadata. -\item - \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}): - should the chapter (= highest heading level) number be added to the - number of the example? In most formats this automatically implies - \texttt{restartAtChapter:\ true}. In most Latex situations this only - works in combination with a \texttt{documentclass:\ book}. -\item - \textbf{\texttt{latexPackage}} (one of: \texttt{linguex}, - \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default - \texttt{linguex}): Various options for converting examples to Latex - packages that typeset linguistic examples. None of the conversions - works perfectly, though in should work in most normal situations - (think 90\%-plus). It might be necessary to first convert to - \texttt{Latex} and then typeset. Using the direct option insider - Pandoc might also work in many situations. -\end{itemize} - -\hypertarget{local-options}{% -\subsubsection{Local options}\label{local-options}} - -Local options are options that can be set for each individual example. -The \texttt{formatGloss} option can be used to have an individual -example be formatted differently from the global setting. For example, -when the global setting is \texttt{formatGloss:\ true} in the metadata, -then adding \texttt{formatGloss=false} in the curly brackets of a -specific example will block the formatting. This is especially useful -when the automatic formatting does not give the desired result. - -If you want to add something else (not a linguistic example) in a -numbered example, then there is the local option \texttt{noFormat=true}. -An attempt will be made to try and do a reasonable layout. Multiple -paragraphs will simply we taken as is, and the number will be put in -front. In HTML the number will be centred. It is usable for an -incidental mathematical formula. - -\begin{verbatim} -::: {.ex noFormat=true} -$$\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}$$ -::: -\end{verbatim} - -\ex. \label{ling-ex:4.12} - \[\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}\]\\ - - -\hypertarget{issues-with-pandoc-ling}{% -\subsection{\texorpdfstring{Issues with -\texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling}} - -\begin{itemize} -\tightlist -\item - Manually provided identifiers for examples should not be purely - numerical (so do not use e.g.~\texttt{\#5789}). In some situation this - interferes with the setting of the cross-references. -\item - Because the cross-references use the same structure as citations in - Pandoc, the processing of citations (by \texttt{citeproc}) should be - performed \textbf{after} the processing by \texttt{pandoc-ling}. - Further, - \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}}, - another Pandoc extension for numbering figures and other captions, - also uses the same system. From experience, it seems safer to put - \texttt{pandoc-crossref} \textbf{before} \texttt{pandoc-ling} in the - order of processing (though I have no idea why). -\item - Interlinear examples will will not wrap at the end of the page. There - is no solution yet for longer examples that are longer than the size - of the page. -\item - When exporting to \texttt{docx} there is a problem because there are - paragraphs inserted after tables, which adds space in lists with - multiple interlinear examples. This is - \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by - design}. The official solution is to set font-size to 1 for this - paragraph inside MS Word. -\item - Multi-column cells are crucial for \texttt{pandoc-ling} to work - properly. These are only introduced in new table format with Pandoc - 2.10 (so older Pandoc version are not supported). Also note that these - structures are not yet exported to all formats, e.g.~it will not be - displayed correctly in \texttt{docx}. However, this is currently an - area of active development -\item - \texttt{langsci-gb4e} is only available as part of the - \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}. - You have to make it available to Pandoc, e.g.~by adding it into the - same directory as the pandoc-ling.lua filter. I have added a recent - version of \texttt{langsci-gb4e} here for convenience, but this one - might be outdated at some time in the future. -\end{itemize} - -\hypertarget{a-note-on-latex-conversion}{% -\subsection{A note on Latex -conversion}\label{a-note-on-latex-conversion}} - -Originally, I decided to write this filter as a two-pronged conversion, -making a markdown version myself, but using a mapping to one of the many -latex libraries for linguistics examples as a quick fix. I assumed that -such a mapping would be the easy part. However, it turned out that the -mapping to latex was much more difficult that I anticipated. Basically, -it turned out that the `common denominator' that I was aiming for was -not necessarily the `common denominator' provided by the latex packages. -I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and -expex) with growing dismay. This approach resulted in a first version. -However, after this version was (more or less) finished, I realised that -it would be better to first define the `common denominator' more clearly -(as done here), and then implement this purely in Pandoc. From that -basis I have then made attempts to map them to the various latex -packages. - -\hypertarget{a-note-on-implementation}{% -\subsection{A note on implementation}\label{a-note-on-implementation}} - -The basic structure of the examples are transformed into Pandoc tables. -Tables are reasonably safe for converting in other formats. Care has -been taken to add \texttt{classes} to all elements of the tables -(e.g.~the preamble has the class \texttt{linguistic-example-preamble}). -When exported formats are aware of these classes, they can be used to -fine-tune the formatting. I have used a few such fine-tunings into the -html output of this filter by adding a few CSS-style statements. The -naming of the classes is quite transparent, using the form -\texttt{linguistic-example-...}. - -\end{document} diff --git a/readme conversions/test.sh b/readme conversions/test.sh deleted file mode 100755 index 3574b1e..0000000 --- a/readme conversions/test.sh +++ /dev/null @@ -1,41 +0,0 @@ -#!/usr/bin/env bash - -# produces the readme in various formats -# the filter processVerbatim.lua add the verbatim examples as real markdown - -# assumes Pandoc and a full Latex install -# langsci-gb4e.sty is made available here - -# note that there are various errors in the output -# they show current limitations - -# basic formats - -for format in html docx epub -do - pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ - pandoc -t $format -o readme.$format -L ../pandoc-ling.lua -s -N --toc --mathml -done - -# various latex variants, both tex and pdf - -for package in linguex gb4e langsci-gb4e -do - pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ - pandoc -t latex -o readme_$package.tex -L ../pandoc-ling.lua -s -N --toc \ - --metadata latexPackage="$package" - - pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ - pandoc -o readme_$package.pdf -L ../pandoc-ling.lua -N --toc \ - --metadata latexPackage="$package" --pdf-engine=xelatex -done - -# special settings for expex, errors with xelatex and chapternumbers - -pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ -pandoc -t latex -o readme_expex.tex -L ../pandoc-ling.lua -s -N --toc \ ---metadata latexPackage="expex" --metadata addChapterNumber="false" - -pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ -pandoc -o readme_expex.pdf -L ../pandoc-ling.lua -N --toc \ ---metadata latexPackage="expex" --pdf-engine=pdflatex --metadata addChapterNumber="false" diff --git a/readme.md b/readme.md deleted file mode 100644 index 062dced..0000000 --- a/readme.md +++ /dev/null @@ -1,255 +0,0 @@ -# pandoc-ling - -*Michael Cysouw* <> - -A Pandoc filter for linguistic examples - -tl;dr - -- Easily write linguistic examples including basic interlinear glossing. -- Let numbering and cross-referencing be done for you. -- Export to (almost) any format of your wishes for final polishing. -- As an example, check out this readme in [HTML](https://gitcdn.link/repo/cysouw/pandoc-ling/main/readme%20conversions/readme.html) or [Latex](https://gitcdn.link/repo/cysouw/pandoc-ling/main/readme%20conversions/readme_linguex.pdf). - -# Rationale - -In the field of linguistics there is an outspoken tradition to format example sentences in research papers in a very specific way. In the field, it is a perennial problem to get such example sentences to look just right. Within Latex, there are numerous packages to deal with this problem (e.g. covington, linguex, gb4e, expex, etc.). Depending on your needs, there is some Latex solution for almost everyone. However, these solutions in Latex are often cumbersome to type, and they are not portable to other formats. Specifically, transfer between Latex, html, docx, odt or epub would actually be highly desirable. Such transfer is the hallmark of [Pandoc](https://pandoc.org), a tool by John MacFarlane that provides conversion between these (and many more) formats. - -Any such conversion between text-formats naturally never works perfectly: every text-format has specific features that are not transferable to other formats. A central goal of Pandoc (at least in my interpretation) is to define a set of shared concepts for text-structure (a 'common denominator' if you will, but surely not 'least'!) that can then be mapped to other formats. In many ways, Pandoc tries (again) to define a set of logical concepts for text structure ('semantic markup'), which can then be formatted by your favourite typesetter. As long as you stay inside the realm of this 'common denominator' (in practice that means Pandoc's extended version of Markdown/CommonMark), conversion works reasonably well (think 90%-plus). - -Building on John Gruber's [Markdown philosophy](https://daringfireball.net/projects/markdown/syntax), there is a strong urge here to learn to restrain oneself while writing, and try to restrict the number of layout-possibilities to a minimum. In this sense, with `pandoc-ling` I propose a Markdown-structure for linguistic examples that is simple, easy to type, easy to read, and portable through the Pandoc universe by way of an extension mechanism of Pandoc, called a 'Pandoc Lua Filter'. This extension will not magically allow you to write every linguistic example thinkable, but my guess is that in practice the present proposal covers the majority of situations in linguistic publications (think 90%-plus). As an example (and test case) I have included automatic conversions into various formats in this repository (chech them out to get an idea of the strengths and weaknesses of this approach). - -# The basic structure of a linguistic example - -Basically, a linguistic examples consists of 6 possible building blocks, of which only the number and at least one example line are necessary. The space between the building blocks is kept as minimal as possible without becoming cramped. When (optional) building blocks are not included, then the other blocks shift left and up (only exception: a preamble without labels is not shifted left completely, but left-aligned with the example, not with the judgement). - -- **Number**: Running tally of all examples in the work, possibly restarting at chapters or other major headings. Typically between round brackets, possibly with a chapter number added before in long works, e.g. example (7.26). Aligned top-left, typically left-aligned to main text margin. -- **Preamble**: Optional information about the content/kind of example. Aligned top-left: to the top with the number, to the left with the (optional) label. When there is no label, then preamble is aligned with the example, not with the judgment. -- **Label**: Indices for sub-examples. Only present when there are more than one example grouped together inside one numbered entity. Typically these sub-example labels use latin letters followed by a full stop. They are left-aligned with the preamble, and each label is top-aligned with the top-line of the corresponding example (important for longer line-wrapped examples). -- **Judgment**: Examples can optionally have grammaticality judgments, typically symbols like **?!* sometimes in superscript relative to the corresponding example. judgements are right-aligned to each other, typically with only minimal space to the left-aligned examples. -- **Line example**: A minimal linguistic example has at least one line example, i.e. an utterance of interest. Building blocks in general shift left and up when other (optional) building blocks are not present. Minimally, this results in a number with one line example. -- **Interlinear example**: A complex structure typically used for examples from languages unknown to most readers. Consist of three or four lines that are left-aligned: - * **Header**: An optional header is typically used to display information about the language of the example, including literature references. When not present, then all other lines from the interlinear example shift upwards. - * **Source**: The actual language utterance, often typeset in italics. This line is internally separated at spaces, and each sub-block is left-aligned with the corresponding sub-blocks of the gloss. - * **Gloss**: Explanation of the meaning of the source, often using abbreviations in small caps. This line is internally separated at spaces, and each block is left-aligned with the block from source. - * **Translation**: Free translation of the source, typically quoted. Not separated in blocks, but freely extending to the right. Left-aligned with the other lines from the interlinear example. - -![The structure of a linguistic example.](ExampleStructure.png) - -There are of course much more possibilities to extend the structure of a linguistic examples, like third or fourth subdivisions of labels (often using small roman numerals as a third level) or multiple glossing lines in the interlinear example. Also, the content of the header is sometimes found right-aligned to the right of the interlinear example (language into to the top, reference to the bottom). All such options are currently not supported by `pandoc-ling`. - -Under the hood, this structure is prepared by `pandoc-ling` as a table. Tables are reasonably well transcoded to different document formats. Specific layout considerations mostly have to be set manually. Alignment of the text should work in most exports. Some `CSS` styling is proposed by `pandoc-ling`, but can of course be overruled. For latex (and beamer) special output is prepared using various available latex packages (see options, below). - -# Introducing `pandoc-ling` - -## Editing linguistic examples - -To include a linguistic example in Markdown `pandoc-ling` uses the `div` structure, which is indicated in Pandoc-Markdown by typing three colons at the start and three colons at the end. To indicate the `class` of this `div` the letters 'ex' (for 'example') should be added after the top colons (with or without space in between). This 'ex'-class is the signal for `pandoc-ling` to start processing such a `div`. The numbering of these examples will be inserted by `pandoc-ling`. - -Empty lines can be added inside the `div` for visual pleasure, as they mostly do not have an influence on the output. Exception: do *not* use empty lines between unlabelled line examples. Multiple lines of text can be used (without empty lines in between), but they will simply be interpreted as one sequential paragraph. - -``` -::: ex -This is the most basic structure of a linguistic example. -::: -``` - -Alternatively, the `class` can be put in curled brackets (and then a leading full stop is necessary before `ex`). Inside these brackets more attributes can be added (separated by space), for example an id, using a hash, or any attribute=value pairs that should apply to this example. Currently there is only one attribute implemented (`formatGloss`), but in principle it is possible to add more attributes that can be used to fine-tune the typesetting of the example. - -``` -::: {#id .ex formatGloss=false} - -This is a multi-line example. -But that does not mean anything for the result -All these lines are simply treated as one paragraph. -They will become one example with one number. - -::: -``` - -A preamble can be added by inserting an empty line between preamble and example. The same considerations about multiple text-lines apply. - -``` -:::ex -Preamble - -This is an example with a preamble. -::: -``` - -Sub-examples with labels are entered by starting each sub-example with a small latin letter and a full stop. Empty lines between labels are allowed. Subsequent lines without labels are treated as one paragraph. Empty lines *not* followed by a label with a full stop will result in errors. - -``` -:::ex -a. This is the first example. -b. This is the second. -a. The actual letters are not important, `pandoc-ling` will put them in order. - -e. Empty lines are allowed between labelled lines -Subsequent lines are again treated as one sequential paragraph. -::: -``` - -A labelled list can be combined with a preamble. - -``` -:::ex -Any nice description here - -a. one example sentence. -b. two -c. three -::: -``` - -Grammaticality judgements should be added before an example, and after an optional label, separated from both by spaces (though four spaces in a row should be avoided, that could lead to layout errors). To indicate that any sequence of symbols is a judgements, prepend the judgement with a caret `^`. Alignment will be figured out by `pandoc-ling`. - -``` -:::ex -Throwing in a preamble for good measure - -a. ^* This traditionally signals ungrammaticality. -b. ^? Question-marks indicate questionable grammaticality. -c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). -d. However, such long sequences sometimes lead to undesirable effects in the layout. -::: -``` - -A minor detail is the alignment of a single example with a preamble and grammaticality judgements. In this case it looks better for the preamble to be left aligned with the example and not with the judgement. - -``` -:::ex -Here is a special case with a preamble - -^^???^ With a singly questionably example. -Note the alignment! Especially with this very long example -that should go over various lines in the output. -::: -``` - -## Interlinear examples - -For interlinear examples with aligned source and gloss, the structure of a `lineblock` is used, starting the lines with a vertical line `|`. There should always be four vertical lines (for header, source, gloss and translation, respectively), although the content after the first vertical line can be empty. The source and gloss lines are separated at spaces, and all parts are right-aligned. If you want to have a space that is not separated, you will have to 'protect' the space, either by putting a backslash before the space, or by inserting a non-breaking space instead of a normal space (either type ` ` or insert an actual non-breaking space, i.e. unicode character `U+00A0`). - -``` -:::ex -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -``` - -An attempt is made to format interlinear examples when the option `formatGloss=true` is added. This will: - -- remove formatting from the source and set everything in italics, -- remove formatting from the gloss and set sequences (>1) of capitals and numbers into small caps (note that the positioning of small caps on web pages is [highly complex](https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align)), -- a tilde `~` between spaces in the gloss is treated as a shortcut for an empty gloss (internally, the sequence `space-tilde-space` is replaced by `space-space-nonBreakingSpace-space-space`), -- consistently put translations in single quotes, possibly removing other quotes. - -``` -::: {.ex formatGloss=true} -| Dutch (Germanic) -| Deze zin is in het nederlands. -| DEM sentence AUX in DET dutch. -| This sentence is dutch. -::: -``` - -The results of such formatting will not always work, but it seems to be quite robust in my testing. The next example brings everything together: - -- a preamble, -- labels, both for single lines and for interlinear examples, -- interlinear examples start on a new line immediately after the letter-label, -- grammaticality judgements with proper alignment, -- when the header of an interlinear example is left out, everything is shifted up, -- The formatting of the interlinear is harmonised. - -``` -::: {.ex formatGloss=true} -Completely superfluous preamble, but it works ... - -a. Mixing single line examples with interlinear examples. -a. This is of course highly unusal. -Just for this example, let's add some extra material in this example. - -a. -| Dutch (Germanic) Note the grammaticality judgement! -| ^^:-)^ Deze zin is (dit\ is test) nederlands. -| DEM sentence AUX ~ dutch. -| This sentence is dutch. - -b. -| -| Deze tweede zin heeft geen header. -| DEM second sentence have.3SG.PRES no header. -| This second sentence does not have a header. -::: -``` - -## Cross-referencing examples - -The examples are automatically numbered by `pandoc-ling`. Cross-references to examples can be made by using the `[@ID]` format (used by Pandoc for citations). When an example has an explicit identifier (like `#test` in the next example), then a reference can be made to this example with `[@test]`, leading to [@test] when formatted. - -``` -::: {#test .ex} -This is a test -::: -``` - -Inspired by the `linguex`-approach, you can also use the keywords `Next` or `Last` to refer to the next or the last example, e.g. `[@Last]` will be formatted as [@Last]. By doubling the capitals to `NNext` or `LLast` reference to the next/last-but-one can be made. Actually, the number of starting capitals can be repeated at will in `pandoc-ling`, so something like `[@LLLLLLLLast]` will also work. It will be formatted as [@LLLLLLLLast] after the processing of `pandoc-ling`. Needless to say that in such a situation an explicit identifier would be a better choice. - -Referring to sub-examples can be done by manually adding a suffix into the cross reference, simply separated from the identifier by a space. For example, `[@LLast c]` will refer to the third sub-example of the last-but-one example. Formatted this will look like this: [@LLast c], smile! However, note that the "c" has to be manually determined. It is simply a literal suffix that will be copied into the cross-reference. Something like `[@LLast Ha1l0]` will work also, leading to [@LLast Ha1l0] when formatted (which is of course nonsensical). - -## Options of `pandoc-ling` - -### Global options - -The following global options are available with `pandoc-ling`. These can be added to the [Pandoc metadata](https://pandoc.org/MANUAL.html#metadata-blocks). An example of such metadata can be found at the bottom of this `readme` in the form of a YAML-block. Pandoc allows for various methods to provide metadata (see the link above). - -- **`formatGloss`** (boolean, default `false`): should all interlinear examples be consistently formatted? If you use this option, you can simply use capital letters for abbreviations in the gloss, and they will be changed to small caps. The source line is set to italics, and the translations is put into single quotes. -- **`xrefSuffixSep`** (string, defaults to no-break-space): When cross references have a suffix, how should the separator be formatted? The defaults 'no-break-space' is a safe options, but I personally like a 'thin space' better (Unicode `U+2009`), but symbol does not work with many fonts, and might lead to errors. For Latex typesetting, all space-like symbols are converted to a Latex thin space `\,`. -- **`restartAtChapter`** (boolean, default `false`): should the counting restart for each chapter? Actually, when `true` this setting will restart the counting at the highest heading level, which for various output formats can be set by the Pandoc option `top-level-division`. Depending on your Latex setup, an explicit entry `top-level-division: chapter` might be necessary in your metadata. -- **`addChapterNumber`** (boolean, default `false`): should the chapter (= highest heading level) number be added to the number of the example? In most formats this automatically implies `restartAtChapter: true`. In most Latex situations this only works in combination with a `documentclass: book`. -- **`latexPackage`** (one of: `linguex`, `gb4e`, `langsci-gb4e`, `expex`, default `linguex`): Various options for converting examples to Latex packages that typeset linguistic examples. None of the conversions works perfectly, though in should work in most normal situations (think 90%-plus). It might be necessary to first convert to `Latex`, correct the output, and then typeset separately with a latex compiler like `xelatex`. Using the direct option insider Pandoc might also work in many situations. Export to `beamer` seems to work reasonably well with the `gb4e` package. All others have artefacts or errors. - -### Local options - -Local options are options that can be set for each individual example. The `formatGloss` option can be used to have an individual example be formatted differently from the global setting. For example, when the global setting is `formatGloss: true` in the metadata, then adding `formatGloss=false` in the curly brackets of a specific example will block the formatting. This is especially useful when the automatic formatting does not give the desired result. - -If you want to add something else (not a linguistic example) in a numbered example, then there is the local option `noFormat=true`. An attempt will be made to try and do a reasonable layout. Multiple paragraphs will simply we taken as is, and the number will be put in front. In HTML the number will be centred. It is usable for an incidental mathematical formula. - -``` -::: {.ex noFormat=true} -$$\sum_{x=1}^{n}{x}=\frac{x^2-x}{2}$$ -::: -``` - -## Issues with `pandoc-ling` - -- Manually provided identifiers for examples should not be purely numerical (so do not use e.g. `#5789`). In some situation this interferes with the setting of the cross-references. -- Because the cross-references use the same structure as citations in Pandoc, the processing of citations (by `citeproc`) should be performed **after** the processing by `pandoc-ling`. Further, [`pandoc-crossref`](https://github.com/lierdakil/pandoc-crossref), another Pandoc extension for numbering figures and other captions, also uses the same system. From experience, it seems safer to put `pandoc-crossref` **before** `pandoc-ling` in the order of processing (though I have no idea why). -- Interlinear examples will will not wrap at the end of the page. There is no solution yet for longer examples that are longer than the size of the page. -- When exporting to `docx` there is a problem because there are paragraphs inserted after tables, which adds space in lists with multiple interlinear examples. This is [by design](https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262). The official solution is to set font-size to 1 for this paragraph inside MS Word. -- Multi-column cells are crucial for `pandoc-ling` to work properly. These are only introduced in new table format with Pandoc 2.10 (so older Pandoc version are not supported). Also note that these structures are not yet exported to all formats, e.g. it will not be displayed correctly in `docx`. However, this is currently an area of active development -- `langsci-gb4e` is only available as part of the [`langsci` package](https://ctan.org/pkg/langsci?lang=en). You have to make it available to Pandoc, e.g. by adding it into the same directory as the pandoc-ling.lua filter. I have added a recent version of `langsci-gb4e` here for convenience, but this one might be outdated at some time in the future. -- `beamer` output seems to work best with `latexPackage: gb4e`. - -## A note on Latex conversion - -Originally, I decided to write this filter as a two-pronged conversion, making a markdown version myself, but using a mapping to one of the many latex libraries for linguistics examples as a quick fix. I assumed that such a mapping would be the easy part. However, it turned out that the mapping to latex was much more difficult that I anticipated. Basically, it turned out that the 'common denominator' that I was aiming for was not necessarily the 'common denominator' provided by the latex packages. I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and expex) with growing dismay. This approach resulted in a first version. However, after this version was (more or less) finished, I realised that it would be better to first define the 'common denominator' more clearly (as done here), and then implement this purely in Pandoc. From that basis I have then made attempts to map them to the various latex packages. - -## A note on implementation - -The basic structure of the examples are transformed into Pandoc tables. Tables are reasonably safe for converting in other formats. Care has been taken to add `classes` to all elements of the tables (e.g. the preamble has the class `linguistic-example-preamble`). When exported formats are aware of these classes, they can be used to fine-tune the formatting. I have used a few such fine-tunings into the html output of this filter by adding a few CSS-style statements. The naming of the classes is quite transparent, using the form `linguistic-example-...`. - ---- -author: Michael Cysouw -title: Using pandoc-ling - -formatGloss: false -xrefSuffixSep: " " -restartAtChapter: false -addChapterNumber: true -latexPackage: linguex -...