Software

The corpus analysis software under development for this project is available for anyone to use. This software connects a modnlp-based concordance browser to the most recent update of the SHE Corpus.

To download and install a copy of the tool, first uninstall the current version on your laptop, and then click one of the following links:

Download installation package (Mac version) 

Download installation package (Windows version)

Windows users: To complete the installation process, you will need to unzip the folder once downloaded and run the installation by double clicking on the .exe file.

Once the tool has been installed, it should be possible to launch the application in subsequent sessions simply by finding modNLP among the list of programmes installed on your machine.

Mac users: To install and run ModNLP on Mac you will first need to modify your security permissions. Please note that this may not be possible without admin permissions on your computer.

1) Download the installer by following the link above. Try to install modNLP by double-clicking on the modnlp-1.0.dmg file. If this works skip to step 6.

2) Otherwise if you get an error msg. Open a Terminal window

– To do this you can press cmd+space to open spotlight and type Terminal

3) Copy or type out the following command and hit enter. You may be required to type in
your password:

sudo spctl –master-disable

4) Next, copy or type out the following command. DO NOT HIT ENTER YET. Drag the modNLP installer icon into the terminal window (alternatively type out the path to the installer at the end of the command)

sudo chmod -R 755

For example:

sudo chmod -R 755 /Users/me/Downloads/modNLP-1.0.dmg

5) The software should now install on the downloaded app which you should drag and place in the applications folder

6) Open the Application folder and click on the installed modNLP app. This might pop up a box saying the software is damaged. Don’t worry: the application just needs further permissions

7) Open a terminal (see step 2) and copy or type out the following command. DO NOT HIT ENTER YET. Drag the modNLP app from the Applications folder into the terminal window (alternatively type out the path to the modNLP.app at the end of the command)

xattr -cr  

For example:

xattr -cr /Applications/modNLP.app

8) The app should now run.

9) Pin the downloaded app to your launcher for easy access in subsequent sessions.

SourceForge

Additionally, the software code and plugins are available for download at : https://sourceforge.net/projects/modnlp/

Should you encounter any software bugs or other technical problems when using these tools, please create a ticket detailing the nature of the issue on our SourceForge project page: https://sourceforge.net/p/modnlp/tickets/

MODNLP: Modular Suite of NLP Tools

modnlp aims to provide a modular architecture and tools for natural language processing written (mainly) in Java. These tools are being developed in connection with the Genealogies of Knowledge project.

The following modnlp modules are currently available:

  • idx: an API and tools for (inverted) indexing, storage and retrieval of large amounts of text, with (XML-based) handling of meta-data.
  • tc: an API and tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, two sample classification tools, and evaluation modules.
  • tec-tools (v2), consisting of tec-server, a corpus indexer and server for corpus access and analysis over the web and tec-client: a corpus analysis client. Unlike the (now obsolete) version 1 of these tools, originally developed for the TEC project, and written in Perl, C (server side) and Java, the version in this site (v2) is written entirely in Java.

This new version of the tools forms the basis of software support for text analysis and visualisation in the SHE Corpus and the Genealogies of Knowledge projects.

The modnlp/tec tools have also been used by the European Parliamentary Comparable and Parallel Corpora project (ECPC) coordinated by Dr. Calzada Pérez (Universitat Jaume I, Spain), and by the Translational English Corpus, which has been collected and maintained under Prof Mona Baker’s supervision at the University of Manchester, and made available on the Internet through the Genealogies of Knowledge and SHE Corpus project websites, in a collaboration between the University of Edinburgh, the University of Manchester, and Oslo University.

Also available is the documentation of the modnlp suite (for developers).