Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

Post History

60%

+1 −0

Q&A Is there any tool to fix cases in references (LaTeX + BibTeX)?

I end up using a Python script that relies on the bibtexparser Python package: import re import bibtexparser def wrap_mixed_case_words(title): def replacer(match): word = match.g...

posted 28d ago by Franck Dernoncourt‭ · edited 28d ago by Franck Dernoncourt‭

Answer

#2: Post edited by

Franck Dernoncourt‭ · 2025-05-10T21:46:11Z (28 days ago)

Copy Link

Raw

Markdown

~~I end up using a Python script thatrelies on the `bibtexparser` Python package:~~
```python
import re
import bibtexparser
def wrap_mixed_case_words(title):
def replacer(match):
word = match.group(0)
# Already wrapped? (rare, but check)
if word.startswith('{') and word.endswith('}'):
return word
# All-lowercase? Skip
if word.islower():
return word
# Titlecase? Skip
if word[0].isupper() and word[1:].islower():
return word
# Otherwise: wrap entire word:
return "{" + word + "}"
# Match words of at least two chars (letters or numbers), skip already wrapped
pattern = r'(?<!{)\b[A-Za-z0-9]{2,}\b(?!})'
return re.sub(pattern, replacer, title)
with open('input.bib', 'r', encoding='utf8') as infile:
bib_database = bibtexparser.load(infile)
for entry in bib_database.entries:
if 'title' in entry:
entry['title'] = wrap_mixed_case_words(entry['title'])
with open('output.bib', 'w', encoding='utf8') as outfile:
bibtexparser.dump(bib_database, outfile)
```
Requirements:
```
pip install bibtexparser
```
---
Example:
Input:
```
@inproceedings{li-etal-2025-unifiedmllm,
title = "UnifiedMLLM: LLM Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model",
author = "Li, Zhaowei and
Wang, Wei and
Cai, YiQing and
Xu, Qi and
Wang, Pengyu and
Zhang, Dong and
Song, Hang and
Jiang, Botian and
Huang, Zhida and
Wang, Tao",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.19/",
pages = "334--344",
ISBN = "979-8-89176-195-7",
abstract = "Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental question: Can we develop a unified approach to represent and handle different multi-modal tasks to maximize the generalizability of MLLMs? In this paper, we propose UnifiedMLLM, a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions and preforming reasoning. In addition to generating textual responses, our model also outputs task tokens and grounding tokens, serving as indicators of task types and task granularity. These outputs are subsequently routed through the task router and directed to specific expert models for task completion. To train our model, we construct a task-specific dataset and an 100k multi-task dataset encompassing complex scenarios. Employing a three-stage training strategy, we equip our model with robust reasoning and task processing capabilities while preserving its generalization capacity and knowledge reservoir. Extensive experiments showcase the impressive performance of our unified representation approach across various tasks, surpassing existing methodologies. Furthermore, our approach exhibits exceptional scalability and generality."
}
@inproceedings{li-etal-2025-chatcrs,
title = "ChatCRS: Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems",
author = "Li, Chuang and
Deng, Yang and
Hu, Hengchang and
Kan, Min-Yen and
Li, Haizhou",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.17/",
pages = "295--312",
ISBN = "979-8-89176-195-7",
abstract = "This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17{\%} and proactivity by 27{\%}, and achieving a tenfold enhancement in recommendation accuracy."
}
@inproceedings{lin-etal-2025-pemv,
title = "PEMV: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors",
author = "Lin, Chen and
Li, Fei and
Ji, Donghong and
Teng, Chong",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.20/",
pages = "345--357",
ISBN = "979-8-89176-195-7",
abstract = "Emotion Recognition in Conversation (ERC) aims to identify the emotions expressed in each utterance within a dialogue. Existing research primarily focuses on the analysis of contextual structure in dialogue and the interactions between different emotions. Nonetheless, ERC datasets often contain difficult-to-classify samples and suffer from imbalanced label distributions, which pose challenges to the spatial distribution of dialogue features. To tackle this issue, we propose a method that generates Proximal Emotion Mean Vectors (PEMV) based on emotion feature queues to optimize the spatial representation of text features. We design a Center Loss based on PEMVs to pull hard-to-classify samples closer to their respective category centers and employ Angle Loss to maximize the angular separation between different PEMVs. Furthermore, we utilize PEMV as a classifier to better adapt to the spatial structure of dialogue features. Extensive experiments on three widely used benchmark datasets demonstrate that our method achieves state-of-the-art performance and validates its effectiveness in optimizing feature space representations."
}
@inproceedings{huang-etal-2025-logrules,
title = "LogRules: Enhancing Log Analysis Capability of Large Language Models through Rules",
author = "Huang, Xin and
Zhang, Ting and
Zhao, Wen",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.28/",
pages = "452--470",
ISBN = "979-8-89176-195-7",
abstract = "Currently, large language models (LLMs) have achieved impressive performance in natural language processing tasks. However, LLMs still exhibit many hallucinations when analyzing system logs, which is due to the implicit knowledge and rules in logs that LLMs cannot capture. Based on this, we propose LogRules, a lightweight log analysis framework that generates and utilizes rules through LLMs. LogRules consists of three stages: an induction stage, an alignment stage, and a reasoning stage. Firstly, in the induction stage, an strong LLM (e.g., GPT-4o-mini) is tasked with generating a series of rules related to logs, which are then validated on the training set. When the rules are confirmed to produce correct reasoning results, they are added to a rule repository. Secondly, considering that the LLMs with small size ($\approx$8B parameters) still face challenges in utilizing rules, we design an alignment method based on rule-case contrastive preference optimization (CPO) to effectively enhance the rule reasoning capabilities of these LLMs. Finally, in the reasoning stage, the LLM constructs prompt using the rule repository and performs log analysis on the test set. Experiments show that LogRules outperforms LLM-based methods in log parsing and anomaly detection tasks, and achieves better performance compared to case-based methods."
}
@inproceedings{li-etal-2025-instructany2pix,
title = "InstructAny2Pix: Image Editing with Multi-Modal Prompts",
author = "Li, Shufan and
Singh, Harkanwar and
Grover, Aditya",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.36/",
pages = "594--619",
ISBN = "979-8-89176-195-7",
abstract = "Image Editing has made incredible progress in recent years. Earliest work only supported caption-guided editing. Recently, free-form text instructions and reference images are incorporated to allow more flexibility. However, existing methods still struggle with complicated editing instructions involving multiple objects or reference images. We present InstructAny2Pix, a novel image editing model that leverages a multi-modal LLM to execute complicated edit instructions. Compared with previous, works, InstructAny2Pix extends the flexibility of edit instructions in three ways: First, it can perform complex instructions involving multiple object edits; Second, it supports interleaving text instructions with multiple reference images; Third, it supports audio and music inputs as part of edit prompts, unlocking many creative applications, such as album cover generation and music-inspired merchandise design. To evaluate the effectiveness of InstructAny2Pix, we propose two new benchmark datasets MM-Inst and Dream-booth++ consisting of human written, multi-modal prompts. InstructAny2Pix outperforms baselines in these two proposed multi-modal benchmarks, as well as conventional image editing benchmarks such as InstructPix2Pix."
}
```
Output:
```
@inproceedings{huang-etal-2025-logrules,
abstract = {Currently, large language models (LLMs) have achieved impressive performance in natural language processing tasks. However, LLMs still exhibit many hallucinations when analyzing system logs, which is due to the implicit knowledge and rules in logs that LLMs cannot capture. Based on this, we propose LogRules, a lightweight log analysis framework that generates and utilizes rules through LLMs. LogRules consists of three stages: an induction stage, an alignment stage, and a reasoning stage. Firstly, in the induction stage, an strong LLM (e.g., GPT-4o-mini) is tasked with generating a series of rules related to logs, which are then validated on the training set. When the rules are confirmed to produce correct reasoning results, they are added to a rule repository. Secondly, considering that the LLMs with small size ($\approx$8B parameters) still face challenges in utilizing rules, we design an alignment method based on rule-case contrastive preference optimization (CPO) to effectively enhance the rule reasoning capabilities of these LLMs. Finally, in the reasoning stage, the LLM constructs prompt using the rule repository and performs log analysis on the test set. Experiments show that LogRules outperforms LLM-based methods in log parsing and anomaly detection tasks, and achieves better performance compared to case-based methods.},
address = {Albuquerque, New Mexico},
author = {Huang, Xin and
Zhang, Ting and
Zhao, Wen},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {452--470},
publisher = {Association for Computational Linguistics},
title = {{LogRules}: Enhancing Log Analysis Capability of Large Language Models through Rules},
url = {https://aclanthology.org/2025.findings-naacl.28/},
year = {2025}
}
@inproceedings{li-etal-2025-chatcrs,
abstract = {This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17{\%} and proactivity by 27{\%}, and achieving a tenfold enhancement in recommendation accuracy.},
address = {Albuquerque, New Mexico},
author = {Li, Chuang and
Deng, Yang and
Hu, Hengchang and
Kan, Min-Yen and
Li, Haizhou},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {295--312},
publisher = {Association for Computational Linguistics},
title = {{ChatCRS}: Incorporating External Knowledge and Goal Guidance for {LLM}-based Conversational Recommender Systems},
url = {https://aclanthology.org/2025.findings-naacl.17/},
year = {2025}
}
@inproceedings{li-etal-2025-instructany2pix,
abstract = {Image Editing has made incredible progress in recent years. Earliest work only supported caption-guided editing. Recently, free-form text instructions and reference images are incorporated to allow more flexibility. However, existing methods still struggle with complicated editing instructions involving multiple objects or reference images. We present InstructAny2Pix, a novel image editing model that leverages a multi-modal LLM to execute complicated edit instructions. Compared with previous, works, InstructAny2Pix extends the flexibility of edit instructions in three ways: First, it can perform complex instructions involving multiple object edits; Second, it supports interleaving text instructions with multiple reference images; Third, it supports audio and music inputs as part of edit prompts, unlocking many creative applications, such as album cover generation and music-inspired merchandise design. To evaluate the effectiveness of InstructAny2Pix, we propose two new benchmark datasets MM-Inst and Dream-booth++ consisting of human written, multi-modal prompts. InstructAny2Pix outperforms baselines in these two proposed multi-modal benchmarks, as well as conventional image editing benchmarks such as InstructPix2Pix.},
address = {Albuquerque, New Mexico},
author = {Li, Shufan and
Singh, Harkanwar and
Grover, Aditya},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {594--619},
publisher = {Association for Computational Linguistics},
title = {{InstructAny2Pix}: Image Editing with Multi-Modal Prompts},
url = {https://aclanthology.org/2025.findings-naacl.36/},
year = {2025}
}
@inproceedings{li-etal-2025-unifiedmllm,
abstract = {Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental question: Can we develop a unified approach to represent and handle different multi-modal tasks to maximize the generalizability of MLLMs? In this paper, we propose UnifiedMLLM, a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions and preforming reasoning. In addition to generating textual responses, our model also outputs task tokens and grounding tokens, serving as indicators of task types and task granularity. These outputs are subsequently routed through the task router and directed to specific expert models for task completion. To train our model, we construct a task-specific dataset and an 100k multi-task dataset encompassing complex scenarios. Employing a three-stage training strategy, we equip our model with robust reasoning and task processing capabilities while preserving its generalization capacity and knowledge reservoir. Extensive experiments showcase the impressive performance of our unified representation approach across various tasks, surpassing existing methodologies. Furthermore, our approach exhibits exceptional scalability and generality.},
address = {Albuquerque, New Mexico},
author = {Li, Zhaowei and
Wang, Wei and
Cai, YiQing and
Xu, Qi and
Wang, Pengyu and
Zhang, Dong and
Song, Hang and
Jiang, Botian and
Huang, Zhida and
Wang, Tao},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {334--344},
publisher = {Association for Computational Linguistics},
title = {{UnifiedMLLM}: {LLM} Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model},
url = {https://aclanthology.org/2025.findings-naacl.19/},
year = {2025}
}
@inproceedings{lin-etal-2025-pemv,
abstract = {Emotion Recognition in Conversation (ERC) aims to identify the emotions expressed in each utterance within a dialogue. Existing research primarily focuses on the analysis of contextual structure in dialogue and the interactions between different emotions. Nonetheless, ERC datasets often contain difficult-to-classify samples and suffer from imbalanced label distributions, which pose challenges to the spatial distribution of dialogue features. To tackle this issue, we propose a method that generates Proximal Emotion Mean Vectors (PEMV) based on emotion feature queues to optimize the spatial representation of text features. We design a Center Loss based on PEMVs to pull hard-to-classify samples closer to their respective category centers and employ Angle Loss to maximize the angular separation between different PEMVs. Furthermore, we utilize PEMV as a classifier to better adapt to the spatial structure of dialogue features. Extensive experiments on three widely used benchmark datasets demonstrate that our method achieves state-of-the-art performance and validates its effectiveness in optimizing feature space representations.},
address = {Albuquerque, New Mexico},
author = {Lin, Chen and
Li, Fei and
Ji, Donghong and
Teng, Chong},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {345--357},
publisher = {Association for Computational Linguistics},
title = {{PEMV}: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors},
url = {https://aclanthology.org/2025.findings-naacl.20/},
year = {2025}
}
```
Tested with Python 3.12.8 and `bibtexparser==1.4.3` on Windows 24 24H2 Pro.

I end up using a Python script that relies on the `bibtexparser` Python package:
```python
import re
import bibtexparser
def wrap_mixed_case_words(title):
def replacer(match):
word = match.group(0)
# Already wrapped? (rare, but check)
if word.startswith('{') and word.endswith('}'):
return word
# All-lowercase? Skip
if word.islower():
return word
# Titlecase? Skip
if word[0].isupper() and word[1:].islower():
return word
# Otherwise: wrap entire word:
return "{" + word + "}"
# Match words of at least two chars (letters or numbers), skip already wrapped
pattern = r'(?<!{)\b[A-Za-z0-9]{2,}\b(?!})'
return re.sub(pattern, replacer, title)
with open('input.bib', 'r', encoding='utf8') as infile:
bib_database = bibtexparser.load(infile)
for entry in bib_database.entries:
if 'title' in entry:
entry['title'] = wrap_mixed_case_words(entry['title'])
with open('output.bib', 'w', encoding='utf8') as outfile:
bibtexparser.dump(bib_database, outfile)
```
Requirements:
```
pip install bibtexparser
```
---
Example:
Input:
```
@inproceedings{li-etal-2025-unifiedmllm,
title = "UnifiedMLLM: LLM Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model",
author = "Li, Zhaowei and
Wang, Wei and
Cai, YiQing and
Xu, Qi and
Wang, Pengyu and
Zhang, Dong and
Song, Hang and
Jiang, Botian and
Huang, Zhida and
Wang, Tao",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.19/",
pages = "334--344",
ISBN = "979-8-89176-195-7",
abstract = "Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental question: Can we develop a unified approach to represent and handle different multi-modal tasks to maximize the generalizability of MLLMs? In this paper, we propose UnifiedMLLM, a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions and preforming reasoning. In addition to generating textual responses, our model also outputs task tokens and grounding tokens, serving as indicators of task types and task granularity. These outputs are subsequently routed through the task router and directed to specific expert models for task completion. To train our model, we construct a task-specific dataset and an 100k multi-task dataset encompassing complex scenarios. Employing a three-stage training strategy, we equip our model with robust reasoning and task processing capabilities while preserving its generalization capacity and knowledge reservoir. Extensive experiments showcase the impressive performance of our unified representation approach across various tasks, surpassing existing methodologies. Furthermore, our approach exhibits exceptional scalability and generality."
}
@inproceedings{li-etal-2025-chatcrs,
title = "ChatCRS: Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems",
author = "Li, Chuang and
Deng, Yang and
Hu, Hengchang and
Kan, Min-Yen and
Li, Haizhou",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.17/",
pages = "295--312",
ISBN = "979-8-89176-195-7",
abstract = "This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17{\%} and proactivity by 27{\%}, and achieving a tenfold enhancement in recommendation accuracy."
}
@inproceedings{lin-etal-2025-pemv,
title = "PEMV: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors",
author = "Lin, Chen and
Li, Fei and
Ji, Donghong and
Teng, Chong",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.20/",
pages = "345--357",
ISBN = "979-8-89176-195-7",
abstract = "Emotion Recognition in Conversation (ERC) aims to identify the emotions expressed in each utterance within a dialogue. Existing research primarily focuses on the analysis of contextual structure in dialogue and the interactions between different emotions. Nonetheless, ERC datasets often contain difficult-to-classify samples and suffer from imbalanced label distributions, which pose challenges to the spatial distribution of dialogue features. To tackle this issue, we propose a method that generates Proximal Emotion Mean Vectors (PEMV) based on emotion feature queues to optimize the spatial representation of text features. We design a Center Loss based on PEMVs to pull hard-to-classify samples closer to their respective category centers and employ Angle Loss to maximize the angular separation between different PEMVs. Furthermore, we utilize PEMV as a classifier to better adapt to the spatial structure of dialogue features. Extensive experiments on three widely used benchmark datasets demonstrate that our method achieves state-of-the-art performance and validates its effectiveness in optimizing feature space representations."
}
@inproceedings{huang-etal-2025-logrules,
title = "LogRules: Enhancing Log Analysis Capability of Large Language Models through Rules",
author = "Huang, Xin and
Zhang, Ting and
Zhao, Wen",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.28/",
pages = "452--470",
ISBN = "979-8-89176-195-7",
abstract = "Currently, large language models (LLMs) have achieved impressive performance in natural language processing tasks. However, LLMs still exhibit many hallucinations when analyzing system logs, which is due to the implicit knowledge and rules in logs that LLMs cannot capture. Based on this, we propose LogRules, a lightweight log analysis framework that generates and utilizes rules through LLMs. LogRules consists of three stages: an induction stage, an alignment stage, and a reasoning stage. Firstly, in the induction stage, an strong LLM (e.g., GPT-4o-mini) is tasked with generating a series of rules related to logs, which are then validated on the training set. When the rules are confirmed to produce correct reasoning results, they are added to a rule repository. Secondly, considering that the LLMs with small size ($\approx$8B parameters) still face challenges in utilizing rules, we design an alignment method based on rule-case contrastive preference optimization (CPO) to effectively enhance the rule reasoning capabilities of these LLMs. Finally, in the reasoning stage, the LLM constructs prompt using the rule repository and performs log analysis on the test set. Experiments show that LogRules outperforms LLM-based methods in log parsing and anomaly detection tasks, and achieves better performance compared to case-based methods."
}
@inproceedings{li-etal-2025-instructany2pix,
title = "InstructAny2Pix: Image Editing with Multi-Modal Prompts",
author = "Li, Shufan and
Singh, Harkanwar and
Grover, Aditya",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-naacl.36/",
pages = "594--619",
ISBN = "979-8-89176-195-7",
abstract = "Image Editing has made incredible progress in recent years. Earliest work only supported caption-guided editing. Recently, free-form text instructions and reference images are incorporated to allow more flexibility. However, existing methods still struggle with complicated editing instructions involving multiple objects or reference images. We present InstructAny2Pix, a novel image editing model that leverages a multi-modal LLM to execute complicated edit instructions. Compared with previous, works, InstructAny2Pix extends the flexibility of edit instructions in three ways: First, it can perform complex instructions involving multiple object edits; Second, it supports interleaving text instructions with multiple reference images; Third, it supports audio and music inputs as part of edit prompts, unlocking many creative applications, such as album cover generation and music-inspired merchandise design. To evaluate the effectiveness of InstructAny2Pix, we propose two new benchmark datasets MM-Inst and Dream-booth++ consisting of human written, multi-modal prompts. InstructAny2Pix outperforms baselines in these two proposed multi-modal benchmarks, as well as conventional image editing benchmarks such as InstructPix2Pix."
}
```
Output:
```
@inproceedings{huang-etal-2025-logrules,
abstract = {Currently, large language models (LLMs) have achieved impressive performance in natural language processing tasks. However, LLMs still exhibit many hallucinations when analyzing system logs, which is due to the implicit knowledge and rules in logs that LLMs cannot capture. Based on this, we propose LogRules, a lightweight log analysis framework that generates and utilizes rules through LLMs. LogRules consists of three stages: an induction stage, an alignment stage, and a reasoning stage. Firstly, in the induction stage, an strong LLM (e.g., GPT-4o-mini) is tasked with generating a series of rules related to logs, which are then validated on the training set. When the rules are confirmed to produce correct reasoning results, they are added to a rule repository. Secondly, considering that the LLMs with small size ($\approx$8B parameters) still face challenges in utilizing rules, we design an alignment method based on rule-case contrastive preference optimization (CPO) to effectively enhance the rule reasoning capabilities of these LLMs. Finally, in the reasoning stage, the LLM constructs prompt using the rule repository and performs log analysis on the test set. Experiments show that LogRules outperforms LLM-based methods in log parsing and anomaly detection tasks, and achieves better performance compared to case-based methods.},
address = {Albuquerque, New Mexico},
author = {Huang, Xin and
Zhang, Ting and
Zhao, Wen},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {452--470},
publisher = {Association for Computational Linguistics},
title = {{LogRules}: Enhancing Log Analysis Capability of Large Language Models through Rules},
url = {https://aclanthology.org/2025.findings-naacl.28/},
year = {2025}
}
@inproceedings{li-etal-2025-chatcrs,
abstract = {This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17{\%} and proactivity by 27{\%}, and achieving a tenfold enhancement in recommendation accuracy.},
address = {Albuquerque, New Mexico},
author = {Li, Chuang and
Deng, Yang and
Hu, Hengchang and
Kan, Min-Yen and
Li, Haizhou},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {295--312},
publisher = {Association for Computational Linguistics},
title = {{ChatCRS}: Incorporating External Knowledge and Goal Guidance for {LLM}-based Conversational Recommender Systems},
url = {https://aclanthology.org/2025.findings-naacl.17/},
year = {2025}
}
@inproceedings{li-etal-2025-instructany2pix,
abstract = {Image Editing has made incredible progress in recent years. Earliest work only supported caption-guided editing. Recently, free-form text instructions and reference images are incorporated to allow more flexibility. However, existing methods still struggle with complicated editing instructions involving multiple objects or reference images. We present InstructAny2Pix, a novel image editing model that leverages a multi-modal LLM to execute complicated edit instructions. Compared with previous, works, InstructAny2Pix extends the flexibility of edit instructions in three ways: First, it can perform complex instructions involving multiple object edits; Second, it supports interleaving text instructions with multiple reference images; Third, it supports audio and music inputs as part of edit prompts, unlocking many creative applications, such as album cover generation and music-inspired merchandise design. To evaluate the effectiveness of InstructAny2Pix, we propose two new benchmark datasets MM-Inst and Dream-booth++ consisting of human written, multi-modal prompts. InstructAny2Pix outperforms baselines in these two proposed multi-modal benchmarks, as well as conventional image editing benchmarks such as InstructPix2Pix.},
address = {Albuquerque, New Mexico},
author = {Li, Shufan and
Singh, Harkanwar and
Grover, Aditya},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {594--619},
publisher = {Association for Computational Linguistics},
title = {{InstructAny2Pix}: Image Editing with Multi-Modal Prompts},
url = {https://aclanthology.org/2025.findings-naacl.36/},
year = {2025}
}
@inproceedings{li-etal-2025-unifiedmllm,
abstract = {Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental question: Can we develop a unified approach to represent and handle different multi-modal tasks to maximize the generalizability of MLLMs? In this paper, we propose UnifiedMLLM, a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions and preforming reasoning. In addition to generating textual responses, our model also outputs task tokens and grounding tokens, serving as indicators of task types and task granularity. These outputs are subsequently routed through the task router and directed to specific expert models for task completion. To train our model, we construct a task-specific dataset and an 100k multi-task dataset encompassing complex scenarios. Employing a three-stage training strategy, we equip our model with robust reasoning and task processing capabilities while preserving its generalization capacity and knowledge reservoir. Extensive experiments showcase the impressive performance of our unified representation approach across various tasks, surpassing existing methodologies. Furthermore, our approach exhibits exceptional scalability and generality.},
address = {Albuquerque, New Mexico},
author = {Li, Zhaowei and
Wang, Wei and
Cai, YiQing and
Xu, Qi and
Wang, Pengyu and
Zhang, Dong and
Song, Hang and
Jiang, Botian and
Huang, Zhida and
Wang, Tao},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {334--344},
publisher = {Association for Computational Linguistics},
title = {{UnifiedMLLM}: {LLM} Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model},
url = {https://aclanthology.org/2025.findings-naacl.19/},
year = {2025}
}
@inproceedings{lin-etal-2025-pemv,
abstract = {Emotion Recognition in Conversation (ERC) aims to identify the emotions expressed in each utterance within a dialogue. Existing research primarily focuses on the analysis of contextual structure in dialogue and the interactions between different emotions. Nonetheless, ERC datasets often contain difficult-to-classify samples and suffer from imbalanced label distributions, which pose challenges to the spatial distribution of dialogue features. To tackle this issue, we propose a method that generates Proximal Emotion Mean Vectors (PEMV) based on emotion feature queues to optimize the spatial representation of text features. We design a Center Loss based on PEMVs to pull hard-to-classify samples closer to their respective category centers and employ Angle Loss to maximize the angular separation between different PEMVs. Furthermore, we utilize PEMV as a classifier to better adapt to the spatial structure of dialogue features. Extensive experiments on three widely used benchmark datasets demonstrate that our method achieves state-of-the-art performance and validates its effectiveness in optimizing feature space representations.},
address = {Albuquerque, New Mexico},
author = {Lin, Chen and
Li, Fei and
Ji, Donghong and
Teng, Chong},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
editor = {Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu},
isbn = {979-8-89176-195-7},
month = {April},
pages = {345--357},
publisher = {Association for Computational Linguistics},
title = {{PEMV}: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors},
url = {https://aclanthology.org/2025.findings-naacl.20/},
year = {2025}
}
```
Tested with Python 3.12.8 and `bibtexparser==1.4.3` on Windows 24 24H2 Pro.

#1: Initial revision by

Franck Dernoncourt‭ · 2025-05-10T21:45:44Z (28 days ago)

Copy Link

Raw

Markdown

I end up using a Python script thatrelies on the `bibtexparser` Python package:


```python
import re
import bibtexparser

def wrap_mixed_case_words(title):
    def replacer(match):
        word = match.group(0)
        # Already wrapped? (rare, but check)
        if word.startswith('{') and word.endswith('}'):
            return word
            # All-lowercase? Skip
        if word.islower():
            return word
            # Titlecase? Skip
        if word[0].isupper() and word[1:].islower():
            return word
        # Otherwise: wrap entire word:
        return "{" + word + "}"

        # Match words of at least two chars (letters or numbers), skip already wrapped

    pattern = r'(?<!{)\b[A-Za-z0-9]{2,}\b(?!})'
    return re.sub(pattern, replacer, title)


with open('input.bib', 'r', encoding='utf8') as infile:
    bib_database = bibtexparser.load(infile)

for entry in bib_database.entries:
    if 'title' in entry:
        entry['title'] = wrap_mixed_case_words(entry['title'])

with open('output.bib', 'w', encoding='utf8') as outfile:
    bibtexparser.dump(bib_database, outfile)
```

Requirements:

```
pip install bibtexparser
```

---


Example:

Input:

```
@inproceedings{li-etal-2025-unifiedmllm,
    title = "UnifiedMLLM: LLM Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model",
    author = "Li, Zhaowei  and
      Wang, Wei  and
      Cai, YiQing  and
      Xu, Qi  and
      Wang, Pengyu  and
      Zhang, Dong  and
      Song, Hang  and
      Jiang, Botian  and
      Huang, Zhida  and
      Wang, Tao",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.19/",
    pages = "334--344",
    ISBN = "979-8-89176-195-7",
    abstract = "Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental question: Can we develop a unified approach to represent and handle different multi-modal tasks to maximize the generalizability of MLLMs? In this paper, we propose UnifiedMLLM, a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions and preforming reasoning. In addition to generating textual responses, our model also outputs task tokens and grounding tokens, serving as indicators of task types and task granularity. These outputs are subsequently routed through the task router and directed to specific expert models for task completion. To train our model, we construct a task-specific dataset and an 100k multi-task dataset encompassing complex scenarios. Employing a three-stage training strategy, we equip our model with robust reasoning and task processing capabilities while preserving its generalization capacity and knowledge reservoir. Extensive experiments showcase the impressive performance of our unified representation approach across various tasks, surpassing existing methodologies. Furthermore, our approach exhibits exceptional scalability and generality."
}

@inproceedings{li-etal-2025-chatcrs,
    title = "ChatCRS: Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems",
    author = "Li, Chuang  and
      Deng, Yang  and
      Hu, Hengchang  and
      Kan, Min-Yen  and
      Li, Haizhou",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.17/",
    pages = "295--312",
    ISBN = "979-8-89176-195-7",
    abstract = "This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17{\%} and proactivity by 27{\%}, and achieving a tenfold enhancement in recommendation accuracy."
}

@inproceedings{lin-etal-2025-pemv,
    title = "PEMV: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors",
    author = "Lin, Chen  and
      Li, Fei  and
      Ji, Donghong  and
      Teng, Chong",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.20/",
    pages = "345--357",
    ISBN = "979-8-89176-195-7",
    abstract = "Emotion Recognition in Conversation (ERC) aims to identify the emotions expressed in each utterance within a dialogue. Existing research primarily focuses on the analysis of contextual structure in dialogue and the interactions between different emotions. Nonetheless, ERC datasets often contain difficult-to-classify samples and suffer from imbalanced label distributions, which pose challenges to the spatial distribution of dialogue features. To tackle this issue, we propose a method that generates Proximal Emotion Mean Vectors (PEMV) based on emotion feature queues to optimize the spatial representation of text features. We design a Center Loss based on PEMVs to pull hard-to-classify samples closer to their respective category centers and employ Angle Loss to maximize the angular separation between different PEMVs. Furthermore, we utilize PEMV as a classifier to better adapt to the spatial structure of dialogue features. Extensive experiments on three widely used benchmark datasets demonstrate that our method achieves state-of-the-art performance and validates its effectiveness in optimizing feature space representations."
}
@inproceedings{huang-etal-2025-logrules,
    title = "LogRules: Enhancing Log Analysis Capability of Large Language Models through Rules",
    author = "Huang, Xin  and
      Zhang, Ting  and
      Zhao, Wen",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.28/",
    pages = "452--470",
    ISBN = "979-8-89176-195-7",
    abstract = "Currently, large language models (LLMs) have achieved impressive performance in natural language processing tasks. However, LLMs still exhibit many hallucinations when analyzing system logs, which is due to the implicit knowledge and rules in logs that LLMs cannot capture. Based on this, we propose LogRules, a lightweight log analysis framework that generates and utilizes rules through LLMs. LogRules consists of three stages: an induction stage, an alignment stage, and a reasoning stage. Firstly, in the induction stage, an strong LLM (e.g., GPT-4o-mini) is tasked with generating a series of rules related to logs, which are then validated on the training set. When the rules are confirmed to produce correct reasoning results, they are added to a rule repository. Secondly, considering that the LLMs with small size ($\approx$8B parameters) still face challenges in utilizing rules, we design an alignment method based on rule-case contrastive preference optimization (CPO) to effectively enhance the rule reasoning capabilities of these LLMs. Finally, in the reasoning stage, the LLM constructs prompt using the rule repository and performs log analysis on the test set. Experiments show that LogRules outperforms LLM-based methods in log parsing and anomaly detection tasks, and achieves better performance compared to case-based methods."
}
@inproceedings{li-etal-2025-instructany2pix,
    title = "InstructAny2Pix: Image Editing with Multi-Modal Prompts",
    author = "Li, Shufan  and
      Singh, Harkanwar  and
      Grover, Aditya",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2025",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-naacl.36/",
    pages = "594--619",
    ISBN = "979-8-89176-195-7",
    abstract = "Image Editing has made incredible progress in recent years. Earliest work only supported caption-guided editing. Recently, free-form text instructions and reference images are incorporated to allow more flexibility. However, existing methods still struggle with complicated editing instructions involving multiple objects or reference images. We present InstructAny2Pix, a novel image editing model that leverages a multi-modal LLM to execute complicated edit instructions. Compared with previous, works, InstructAny2Pix extends the flexibility of edit instructions in three ways: First, it can perform complex instructions involving multiple object edits; Second, it supports interleaving text instructions with multiple reference images; Third, it supports audio and music inputs as part of edit prompts, unlocking many creative applications, such as album cover generation and music-inspired merchandise design. To evaluate the effectiveness of InstructAny2Pix, we propose two new benchmark datasets MM-Inst and Dream-booth++ consisting of human written, multi-modal prompts. InstructAny2Pix outperforms baselines in these two proposed multi-modal benchmarks, as well as conventional image editing benchmarks such as InstructPix2Pix."
}
```

Output:

```
@inproceedings{huang-etal-2025-logrules,
 abstract = {Currently, large language models (LLMs) have achieved impressive performance in natural language processing tasks. However, LLMs still exhibit many hallucinations when analyzing system logs, which is due to the implicit knowledge and rules in logs that LLMs cannot capture. Based on this, we propose LogRules, a lightweight log analysis framework that generates and utilizes rules through LLMs. LogRules consists of three stages: an induction stage, an alignment stage, and a reasoning stage. Firstly, in the induction stage, an strong LLM (e.g., GPT-4o-mini) is tasked with generating a series of rules related to logs, which are then validated on the training set. When the rules are confirmed to produce correct reasoning results, they are added to a rule repository. Secondly, considering that the LLMs with small size ($\approx$8B parameters) still face challenges in utilizing rules, we design an alignment method based on rule-case contrastive preference optimization (CPO) to effectively enhance the rule reasoning capabilities of these LLMs. Finally, in the reasoning stage, the LLM constructs prompt using the rule repository and performs log analysis on the test set. Experiments show that LogRules outperforms LLM-based methods in log parsing and anomaly detection tasks, and achieves better performance compared to case-based methods.},
 address = {Albuquerque, New Mexico},
 author = {Huang, Xin  and
Zhang, Ting  and
Zhao, Wen},
 booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
 editor = {Chiruzzo, Luis  and
Ritter, Alan  and
Wang, Lu},
 isbn = {979-8-89176-195-7},
 month = {April},
 pages = {452--470},
 publisher = {Association for Computational Linguistics},
 title = {{LogRules}: Enhancing Log Analysis Capability of Large Language Models through Rules},
 url = {https://aclanthology.org/2025.findings-naacl.28/},
 year = {2025}
}

@inproceedings{li-etal-2025-chatcrs,
 abstract = {This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17{\%} and proactivity by 27{\%}, and achieving a tenfold enhancement in recommendation accuracy.},
 address = {Albuquerque, New Mexico},
 author = {Li, Chuang  and
Deng, Yang  and
Hu, Hengchang  and
Kan, Min-Yen  and
Li, Haizhou},
 booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
 editor = {Chiruzzo, Luis  and
Ritter, Alan  and
Wang, Lu},
 isbn = {979-8-89176-195-7},
 month = {April},
 pages = {295--312},
 publisher = {Association for Computational Linguistics},
 title = {{ChatCRS}: Incorporating External Knowledge and Goal Guidance for {LLM}-based Conversational Recommender Systems},
 url = {https://aclanthology.org/2025.findings-naacl.17/},
 year = {2025}
}

@inproceedings{li-etal-2025-instructany2pix,
 abstract = {Image Editing has made incredible progress in recent years. Earliest work only supported caption-guided editing. Recently, free-form text instructions and reference images are incorporated to allow more flexibility. However, existing methods still struggle with complicated editing instructions involving multiple objects or reference images. We present InstructAny2Pix, a novel image editing model that leverages a multi-modal LLM to execute complicated edit instructions. Compared with previous, works, InstructAny2Pix extends the flexibility of edit instructions in three ways: First, it can perform complex instructions involving multiple object edits; Second, it supports interleaving text instructions with multiple reference images; Third, it supports audio and music inputs as part of edit prompts, unlocking many creative applications, such as album cover generation and music-inspired merchandise design. To evaluate the effectiveness of InstructAny2Pix, we propose two new benchmark datasets MM-Inst and Dream-booth++ consisting of human written, multi-modal prompts. InstructAny2Pix outperforms baselines in these two proposed multi-modal benchmarks, as well as conventional image editing benchmarks such as InstructPix2Pix.},
 address = {Albuquerque, New Mexico},
 author = {Li, Shufan  and
Singh, Harkanwar  and
Grover, Aditya},
 booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
 editor = {Chiruzzo, Luis  and
Ritter, Alan  and
Wang, Lu},
 isbn = {979-8-89176-195-7},
 month = {April},
 pages = {594--619},
 publisher = {Association for Computational Linguistics},
 title = {{InstructAny2Pix}: Image Editing with Multi-Modal Prompts},
 url = {https://aclanthology.org/2025.findings-naacl.36/},
 year = {2025}
}

@inproceedings{li-etal-2025-unifiedmllm,
 abstract = {Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental question: Can we develop a unified approach to represent and handle different multi-modal tasks to maximize the generalizability of MLLMs? In this paper, we propose UnifiedMLLM, a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions and preforming reasoning. In addition to generating textual responses, our model also outputs task tokens and grounding tokens, serving as indicators of task types and task granularity. These outputs are subsequently routed through the task router and directed to specific expert models for task completion. To train our model, we construct a task-specific dataset and an 100k multi-task dataset encompassing complex scenarios. Employing a three-stage training strategy, we equip our model with robust reasoning and task processing capabilities while preserving its generalization capacity and knowledge reservoir. Extensive experiments showcase the impressive performance of our unified representation approach across various tasks, surpassing existing methodologies. Furthermore, our approach exhibits exceptional scalability and generality.},
 address = {Albuquerque, New Mexico},
 author = {Li, Zhaowei  and
Wang, Wei  and
Cai, YiQing  and
Xu, Qi  and
Wang, Pengyu  and
Zhang, Dong  and
Song, Hang  and
Jiang, Botian  and
Huang, Zhida  and
Wang, Tao},
 booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
 editor = {Chiruzzo, Luis  and
Ritter, Alan  and
Wang, Lu},
 isbn = {979-8-89176-195-7},
 month = {April},
 pages = {334--344},
 publisher = {Association for Computational Linguistics},
 title = {{UnifiedMLLM}: {LLM} Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model},
 url = {https://aclanthology.org/2025.findings-naacl.19/},
 year = {2025}
}

@inproceedings{lin-etal-2025-pemv,
 abstract = {Emotion Recognition in Conversation (ERC) aims to identify the emotions expressed in each utterance within a dialogue. Existing research primarily focuses on the analysis of contextual structure in dialogue and the interactions between different emotions. Nonetheless, ERC datasets often contain difficult-to-classify samples and suffer from imbalanced label distributions, which pose challenges to the spatial distribution of dialogue features. To tackle this issue, we propose a method that generates Proximal Emotion Mean Vectors (PEMV) based on emotion feature queues to optimize the spatial representation of text features. We design a Center Loss based on PEMVs to pull hard-to-classify samples closer to their respective category centers and employ Angle Loss to maximize the angular separation between different PEMVs. Furthermore, we utilize PEMV as a classifier to better adapt to the spatial structure of dialogue features. Extensive experiments on three widely used benchmark datasets demonstrate that our method achieves state-of-the-art performance and validates its effectiveness in optimizing feature space representations.},
 address = {Albuquerque, New Mexico},
 author = {Lin, Chen  and
Li, Fei  and
Ji, Donghong  and
Teng, Chong},
 booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
 editor = {Chiruzzo, Luis  and
Ritter, Alan  and
Wang, Lu},
 isbn = {979-8-89176-195-7},
 month = {April},
 pages = {345--357},
 publisher = {Association for Computational Linguistics},
 title = {{PEMV}: Improving Spatial Distribution for Emotion Recognition in Conversations Using Proximal Emotion Mean Vectors},
 url = {https://aclanthology.org/2025.findings-naacl.20/},
 year = {2025}
}
```

Tested with Python 3.12.8 and `bibtexparser==1.4.3` on Windows 24 24H2 Pro.

Communities

Post History