Diane M. Napolitano

M.S. Computer Science,
State University of New York at Stony Brook, December 2008

B.S. Computer Science (minor in History),
State University of New York at Binghamton, May 2006

Hi! I'm Diane and I am a Research Engineer in the Natural Language Processing and Speech group in Research and Development at Educational Testing Service in Princeton, New Jersey.


My research interests are Machine Learning, Natural Language Processing, and Information Retrieval and Extraction. In particular, I am interested in the application of these to problems in the Social Sciences and Humanities.

At ETS, I am the lead developer on the TextEvaluator (formerly SourceRater) project. I have also made significant contributions to e-rater and fairly small contributions to SpeechRater.

At Stony Brook, I worked with Professor Amanda Stent in the HCI Lab. For my thesis, I developed a Java-based NLP/Machine Learning tool that aims to help students, both native and non-native speakers of English, improve their writing. You can read my thesis, if you'd like.

Over the summer of 2008, I contributed my IR/IE/ML/NLP skills to the PLOG project, as overseen by Professors Stent and Rob Johnson, and Mike Hart. I made recommendations on NLP tools for the project and worked on a section known as Affinity-Based Access Control (ABAC), in which blog entries are only shared with others who have a common interest in the entry's topic.

In the summer of 2007, I participated in the Data Sciences Summer Institute at the University of Illinois at Urbana-Champaign, where I worked on a project that explored the Virtual Web as a finite-state graph. A presentation on my work can be found below.

Publications, Presentations, etc.

In addition to this, I also serve as an occasional reviewer for the CALICO Journal.

Yoon, S., Cho, Y., and Napolitano, D. (2016, June). Spoken Text Difficulty Estimation Using Linguistic Features. To be presented at the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) workshop on Building Educational Applications, San Diego, CA.

Sheehan, K.M., Flor, M., Napolitano, D. and Ramineni, C. Using TextEvaluator to Quantify Sources of Linguistic Complexity in Textbooks Targeted at First-Grade Readers Over the Past Half Century. ETS Research Report Series. Vol. 2015, No. 2 (December 2015), pp. 1-17. (link)

Bhat, S., Yoon, S., and Napolitano, D. (2015, September). Automatic Detection of Grammatical Structures from Non-native Speech. Proceedings of the Sixth Workshop on Speech and Language Technology in Education (SLaTE), Leipzig, Germany. (link)

Napolitano, D., Sheehan, K.M., and Mundkowsky, R. (2015, June). Online Readability and Text Complexity Analysis with TextEvaluator. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) System Demonstrations Session, Denver, CO. (link)

Sheehan, K.M, Flor, M., Napolitano, D., and Ramineni, C. (2015, April). Using TextEvaluator to Better Understand the Comprehension Challenges Presented Within Textbooks Targeted at First Grade Readers. Presented at the annual meeting of the American Educational Research Association (AERA), Chicago, IL.

Sheehan, K.M and Napolitano, D. (2014). Differences between the 2011 SourceRater engine and the current TextEvaluator engine. Princeton, NJ: Educational Testing Service.

Sheehan, K.M., Kostin, I., Napolitano, D. & Flor, M. The TextEvaluator Tool: Helping teachers and test developers select texts for use in instruction and assessment. The Elementary School Journal. Vol. 115, No. 2 (December 2014), pp. 184-209. (link)

In July 2014, I provided an introduction to NLP to many members of the Data Management and Analytics department at Mathematica Policy Research.

Cho, Y., Yoon, S., Napolitano, D. (2014, May). An Automated Spoken Text Difficulty Evaluation System. Presented at the Computer Assisted Language Instruction Consortium (CALICO) Conference, Athens, OH.

Sheehan, K.M. and Napolitano, D. (2014, April). Measuring the Difficulty of Inferring Connections Across Sentences. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), Philadelphia, PA.

Derrick Higgins, Chris Brew, Michael Heilman, Ramon Ziai, Lei Chen, Aoife Cahill, Michael Flor, Nitin Madnani, Joel Tetreault, Daniel Blanchard, Diane Napolitano, Chong Min Lee, John Blackmore. (2014, March). Is getting the right answer just about choosing the right words? The role of syntactically-informed features in short answer scoring. arXiv:1403.0801v2. (link)

Sheehan, K.M., Kostin, I, Napolitano, D, Flor, M. (2013, December). Helping teachers and test developers determine the difficulty of text for instruction and assessment. Paper presented at: Addressing the Three Legs of the Text Complexity Triangle: Quantitative, Qualitative, and Reader-Task Systems. Proceedings of the 63rd Annual Conference of the Literacy Research Association, Dallas, TX.

Cahill, A., Madnani, N., Tetreault, J., and Napolitano, D. (2013, June). Robust Systems for Preposition Error Correction Using Wikipedia Revisions. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Atlanta, GA. (link)

Sheehan, K.M., Flor, M., and Napolitano, D. (2013, June). A Two-Stage Approach for Generating Unbiased Estimates of Text Complexity. Proceedings of the 2nd Workshop on Natural Language Processing for Improving Textual Accessibility (NLP4ITA), Atlanta, GA. (link)

Sheehan, K.M., Kostin, I., and Napolitano, D. (2012, April). SourceRater: An automated approach for generating text complexity classifications aligned with the Common Core Standards. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), Vancouver, BC.

Sheehan, K.M., Kostin, I., and Napolitano, D. (2012, April). SourceRater: Helping Teachers and Test Developers Determine the Difficulty of Text for Instruction and Assessment. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), Vancouver, BC.

Napolitano, Diane M. and Amanda Stent, "TechWriter: An Evolving System for Writing Assistance for Advanced Learners of English'". CALICO Journal 26, no. 3 (May 2009): 611-625. (link) (E-mail me for full version)

Diane M. Napolitano and Amanda Stent. TechWriter: An individualized approach to writing assistance and improvement. Computer Assisted Language Instruction Consortium (CALICO) workshop, "Automatic Analysis of Learner Language". 2008. (abstract) (poster)

Here is the presentation on my work at the University of Illinois that I gave to my reading group.


I was an Adjunct Instructor at the State University of New York College at Old Westbury for three academic years. The courses I taught were:

While at Stony Brook, I had the pleasure to be a TA for the following courses (which, luckily, match up exactly with my teaching interests):

Activities and Affiliations

I am a member the CSA market share at Z Food Farm and an ARPA level member of the host of this website, a.k.a. the "PBS of the Internet". I drive a Volvo and I'm probably much more excited about it than I should be. ;)

I was the founding Vice-President and Webmaster of Women in Computer Science at Stony Brook, and I used to regularly attend both LUGSB and SBCS meetings. When I was an undergrad at Binghamton, I was a representative on the Student Assembly and was on the Rules Committee as both Vice-Chair and Chair.

My primary OS since 1999 has been some variant of UNIX: first Slackware for about nine years, then Debian for four, and now I use Mac OS. At work I use Mac OS and RedHat Enterprise Linux.

I probably spend most of my free time listening to music, and you can get a good feel for my (excellent :) ) musical taste from my last.fm account. When possible, I spend time outside the house playing board games, adventuring around New Jersey and the five boroughs, or going on hikes.