Mentor Talk:
WL Document Processing
Mentor Talk:
WL Document Processing
WL Document Processing
for Wolfram Summer Program 2024/Jack Heseltine (Mentor)
Background
Me
Me
I am a Consultant Software Engineer for Wolfram Research (Cloud Project) and work from Austria, where I am also completing a Masters in AI, at the Machine Learning Institute of Johannes Kepler University in Linz.
GitHub: https://github.com/heseltime
Website: https://heseltime.github.io
LinkedIn: https://www.linkedin.com/in/heselt-in-e/
Happy to stay in touch!
Happy to stay in touch!
Documents
Documents
I first started working with documents in a software development context while building an Enterprise Content (read: Documents) Management (ECM) system in a team for the Red Cross, during COVID. One of the remarkable things about good ECM is how knowledge becomes accessible and processes/workflows are enabled, making for more productive (non-profit, in my case) organizations: it comes down to appropriate, readable documents, often.
My AI Masters Thesis project is also about documents, specifically how to use LLM tooling to make PDF-documents accessible for people using screen-readers, in a fully automated fashion.
In this talk, we will look at Mathematica Notebooks as a type of document.
In this talk, we will look at Mathematica Notebooks as a type of document.
Of interest is document transformation, i.e. turning a source document format in the a target format with the same content.
Concepts (& Code)
Concepts (& Code)
To understand this document processing topic in Wolfram Language (WL), we need just a bit of conceptual background that can be looked up as needed.
The focus will be engineering though: if the following is not so interesting to you, now is still time to switch to another talk [if applicable]
Propositional Logic
Theorema leans heavily on this category of logic in how it expresses itself.
LaTeX
Used as an intermediate language to compile the PDF-document from.
Code
Code
Other than this, the focus is WL/Mathematica documents and engineering a project/pipeline in this context, with code samples that might help you with what you want to do in your own project.
Unless indicated otherwise, code will be available at this GitHub repo as well: https://github.com/heseltime/Tma2TeX
(Feel free to hold me to it if something is missing!)
(Feel free to hold me to it if something is missing!)
The Project: Tma2TeX (Theorema)
Theorema: Automated Theorem Prover
Theorema: Automated Theorem Prover
“A System for Automated Reasoning (Theorem Proving) and Automated Theory Exploration based on Mathematica”
Institutional Context: Johannes Kepler University in Linz (Hagenberg), Research Institute Symbolic Computation
What this Project Comes Down to: Theorema Notebooks are Mathematica Notebooks are Wolfram Language Expressions
What this Project Comes Down to: Theorema Notebooks are Mathematica Notebooks are Wolfram Language Expressions
Project Motivation: While Theorema and Mathematica is fine as a programming environment, the institute need and PDF for publication purposes mainly.
Project Goal: A fairly automated system that extends Theorema with transformation functionality, or a prototype thereof.
L
A
TE
X
Project Goal: A fairly automated system that extends Theorema with transformation functionality, or a prototype thereof.
Project Overview Link
Project Overview Link
BTW: For anyone interested in study abroad in Austria ...
SideNote and SideCode cells will not show during presentation.
Theorema & Tma2TeX Demo
Note on the IDE used: Eclipse with Wolfram Workbench
Open project in Eclipse, open FirstTour Theorema NB in same kernel session: show commander. (1)(2) Run the tma2tex.nb code, explaining that it uses the package code developed in tma2tex.wl. Show how the internal representation is loaded by executing FirstTour, then running relevant parts in tma2tex.nb again.(3) Do a transformation, show PDF output.The up to date repo is: https://github.com/heseltime/Tma2TeX/tree/master/tma2texAs of June 13th 2024, the structure (with the above files) is:
The Main Approach: Recursive Descent
The Main Approach: Recursive Descent
We are now talking about WL-code in the tma2tex.wl (package):
Two recursions, parseNbContent[] and parseTmaData[], through the notebook generally and then the Theorema expressions specifically: the latter are tagged and indexed, a helper function getTmaData[] establishes the connection to the Theorema-internal representation via an ID.
Should we look at some Code?
Should we look at some Code?
parseNbContent[]
parseNbContent[]
getTmaData[]
getTmaData[]
parseTmaData[]
parseTmaData[]
More Code: Main Client Functions
More Code: Main Client Functions
Answer questions here, show in demo/Eclipse Wolfram Workbench as needed
Thanks!
Thanks!
SW: WL as Computational Language, something to knit documents together with perhaps?
Bruno Buchberger (Research Institute Symbolic Computation): on Rewriting (in a math context, originally) -