It's likely now a motivational understatement to say that "Deceptive agents and content, and the adversarial utilization of [social] media platforms by institutions in information warfare campaigns throughout the 2010s will have lasting geopolitical impacts that stack up over time." One could go further: advances in generative artificial intelligence (AI) in the 2020s will likely make the media-driven information wars of the 2010s feel like little more than tremors in the social fabric, presaging what will likely become even more formidible misuses of information technologies. This sounds sensationilist, and its accuracy highlights the ease with which one can provoke this period's public interest in AI. As defined by the zeitgeist, AI's not all bad, maybe scary; but, it gets worse if we accept to the words of preeminent AI experts, explaining their need to operate for profit and with the source of their technologies closed. To advance AI, there're still lots of tabletop experiments left to be conducted. So, alongside work investigating malicious—and more than ever, malfeasant—uses of social data and information technologies, my research more than ever focuses on leveraging insights from quantitative linguistics to improve the design and scrutability of linguistic foundation models. This research tends to direct mathematical analysis and software development towards improving the precision with which neural network architectures learn, i.e., to match or improve learning capabilities while using less data and computational power. Where possible, this work organizes insights from its process to develop general knowledge and theories that intend to build understanding on what these algorithms learn, as well as what high-level functions might emerge within them (and from their use).


I'm a natural scientist trained in physics, mathematics, and scientific programming and develop and teach data science coursework at a variety of levels at Drexel's College of Computing and Informatics. My recent work has been focused on establishing undergraduate and graduate core curricula, and is transitioning into the development of specialized electives, particularly focused on social computing applications, bias and data, and advanced social data processing methods. To see samples, please reach out via email.

  • Foundations of Data Science (INFO 825). Develops foundations for research practice in data science (DS) through guided and collaborative literature review activities and light research reproduction efforts. Students will gain knowledge about current and emerging trends in DS research methodologies and disciplinary applications. Discusses how critical works and student-selected publications spanning DS research areas interact with different DS-related publishing venues, as well as how to align writing and presentation styles to meet diverse norms and standards of different venues. Specific readings and topics will be selected by students, who must identify a DS-related research subject whose literature they wish to master.

  • Data Acquisition and Pre-Processing (DSCI 511). Introduces the breadth of data science through a project lifecycle perspective. Covers early-stage data-life cycle activities in depth for the development and dissemination of data sets. Provides technical experience with data harvesting, acquisition, pre-processing, and curation. Concludes with an open-ended term project where students explore data availability, scale, variability, and reliability.

  • Data Analysis and Interpretation (DSCI 521). Introduces methods for data analysis and their quantitative foundations in application to pre-processed data. Covers reproducibility and interpretation for project life cycle activities, including data exploration, hypothesis generation and testing, pattern recognition, and task automation. Provides experience with analysis methods for data science from a variety of quantitative disciplines. Concludes with an open-ended term project focused on the application of data exploration and analysis methods with interpretation via statistical, algorithmic, and mathematical reasoning.

  • Natural Language Processing with Deep Learning (DSCI 691). Natural Language Processing (NLP) is one of the most important technologies of the information age and is a critical component to AI. Recently, deep learning approaches have overtaken the domain. This course explores the basis of these neural models with a heavy emphasis on research.

  • Introduction to Data Science (INFO 103). A first course in data science. Introduces data science as a field, describes the roles and services that various members of the community play and the life cycle of data science projects. Provides an overview of common types of data, where they come from, and the challenges that practitioners face in the modern world of “Big Data.” Provides an introduction to the interdisciplinary mixture of skills that the practice requires.

  • Text Processing Working Group (TPWG). Would you like to learn how to make a computer understand English? Are you starting out in Natural Language Processing and need teams to work with? The Drexel Data Science Club’s (DSC’s) Text Processing Working Group (TPWG) features interactive demos, projects, tutorials and discussions about text processing. We will start from the basics: TF-IDF, cosine similarity scores, regular expressions, topical modeling, Stemming, Tokenization, Lemmatization and likewise explore more advanced topics.


The CODED lab’s research mission in data science focuses on social information and language processing, with the perspective that technological systems that facilitate public communication and knowledge sharing can be designed with open data for meta-purposes that benefit both participants and organizations. The lab has engages with collaborators ranging from broad areas, including mathematics, computer science, physics, chemical engineering, psychology, political science, linguistics, sociology, communications, and health and clinical informatics. Advisees most-often pursue technical degrees at CCI, though many collaborations initiate from ad hoc conversations at CCI, often through our interdisciplinary data science cohorts. New collaborations are always welcome; please inquire via email :).

  • Danielle Boccelli. Ph.D. in Information Science (current). Advising: research and graduate curriculum.

  • Jennifer Bochenek. Ph.D. in Information Science (current). Advising: research and graduate curriculum.

  • Elizabeth Sheffield. Ph.D. in Information Science (current). Advising: research and graduate curriculum.

  • Elizabeth Campbell. Ph.D. in Information Science. Advised: dissertation and graduate curriculum.

  • Munif Mujib. Ph.D. in Information Science. Advised: research and graduate curriculum.

  • Giovanni Santia. Graduate student in Information Science. Advised: research and graduate curriculum.

  • Zeyu (Andrew) Chen M.S. Data Science, AI & ML. Advised: research and graduate curriculum

  • Colin Murphy. M.S. Data Science. Advised: research and graduate curriculum

  • Meghan Colosimo. Ph.D., Clinical Psychology. Advised: research and graduate data science specialization.

  • Jacob Hunsberger. M.S. Chemical Engineering. Advised: research and graduate data science specialization.