Skip to content

main.utils

Classes

CheckAcronymAbbrAndFullDict

CheckAcronymAbbrAndFullDict(
    names_abbr="names_abbr", names_full="names_full"
)

Checker for acronym, abbreviation and full form dictionaries.

Validates and processes dictionary data containing acronyms with their corresponding abbreviations and full forms.

Attributes:

Name Type Description
names_abbr str

Key name for abbreviations in the dictionary.

names_full str

Key name for full forms in the dictionary.

Initializes the checker with field names.

Parameters:

Name Type Description Default
names_abbr str

Key name for abbreviations, defaults to "names_abbr".

'names_abbr'
names_full str

Key name for full forms, defaults to "names_full".

'names_full'
Source code in pybibtexer/main/utils.py
def __init__(self, names_abbr: str = "names_abbr", names_full: str = "names_full") -> None:
    """Initializes the checker with field names.

    Args:
        names_abbr: Key name for abbreviations, defaults to "names_abbr".
        names_full: Key name for full forms, defaults to "names_full".
    """
    self.names_abbr = names_abbr
    self.names_full = names_full

Functions

compare_and_return_only_in_new
compare_and_return_only_in_new(json_old, json_new)

Compares old and new JSON data to find newly added items.

Parameters:

Name Type Description Default
json_old dict

Old JSON data as dictionary.

required
json_new dict

New JSON data as dictionary.

required

Returns:

Name Type Description
dict dict

Dictionary containing keys that only exist in new data.

Source code in pybibtexer/main/utils.py
def compare_and_return_only_in_new(self, json_old: dict, json_new: dict) -> dict:
    """Compares old and new JSON data to find newly added items.

    Args:
        json_old: Old JSON data as dictionary.
        json_new: New JSON data as dictionary.

    Returns:
        dict: Dictionary containing keys that only exist in new data.
    """
    # Find keys that only exist in new JSON
    keys_only_in_new = sorted(set(json_new.keys()) - set(json_old.keys()))
    new_only_data = {key: json_new[key] for key in keys_only_in_new}

    # Find common keys between old and new JSON
    common_keys = set(json_old.keys()) & set(json_new.keys())

    # Check each common key for new items using pattern matching
    for key in sorted(common_keys):
        for flag in [self.names_full]:
            old_items = [item.lower() for item in json_old[key][flag]]
            old_items = [item.replace("(", "").replace(")", "") for item in old_items]

            new_items = [item.lower() for item in json_new[key][flag]]
            new_items = [item.replace("(", "").replace(")", "") for item in new_items]

            self._old_match_new(json_old, key, flag, old_items, new_items)

    # Return keys that only exist in new data
    return new_only_data
length_dupicate_match
length_dupicate_match(dict_data)

Performs comprehensive validation on dictionary data.

Executes three validation steps: length validation, duplicate checking, and mutual pattern matching.

Parameters:

Name Type Description Default
dict_data dict[str, dict[str, list[str]]]

Dictionary containing acronym data with abbreviations and full forms.

required

Returns:

Name Type Description
tuple tuple[dict[str, dict[str, list[str]]], list[str]]

Validated dictionary and list of acronyms with matches.

Source code in pybibtexer/main/utils.py
def length_dupicate_match(
    self, dict_data: dict[str, dict[str, list[str]]]
) -> tuple[dict[str, dict[str, list[str]]], list[str]]:
    """Performs comprehensive validation on dictionary data.

    Executes three validation steps: length validation, duplicate checking,
    and mutual pattern matching.

    Args:
        dict_data: Dictionary containing acronym data with abbreviations
                  and full forms.

    Returns:
        tuple: Validated dictionary and list of acronyms with matches.
    """
    dict_data = self._validate_length(dict_data)
    dict_data = self._check_duplicate(dict_data)

    # Check for matching patterns in both abbreviations and full forms
    dict_data, abbr_matches = self._mutually_check_match(dict_data, self.names_abbr)
    dict_data, full_matches = self._mutually_check_match(dict_data, self.names_full)
    matches = sorted(set(abbr_matches).union(full_matches))
    return dict_data, matches

StrictOrderedDict

StrictOrderedDict(data)

A dictionary that strictly maintains insertion order.

This implementation guarantees that keys, values, and items will always be returned in the exact order they were inserted, regardless of Python version or internal dictionary implementation changes.

Attributes:

Name Type Description
_keys

List maintaining the order of key insertion.

_data

Dictionary storing the actual key-value pairs.

Initializes the StrictOrderedDict with optional initial data.

Parameters:

Name Type Description Default
data dict

Optional iterable of (key, value) pairs to initialize the dictionary. If provided, must be an iterable containing exactly two-element tuples or lists representing key-value pairs.

required
Example

sod = StrictOrderedDict() sod = StrictOrderedDict([('a', 1), ('b', 2)])

Source code in pybibtexer/main/utils.py
def __init__(self, data: dict) -> None:
    """Initializes the StrictOrderedDict with optional initial data.

    Args:
        data: Optional iterable of (key, value) pairs to initialize the dictionary.
              If provided, must be an iterable containing exactly two-element
              tuples or lists representing key-value pairs.

    Example:
        >>> sod = StrictOrderedDict()
        >>> sod = StrictOrderedDict([('a', 1), ('b', 2)])
    """
    self._keys = []  # Maintains insertion order of keys
    self._data = {}  # Stores the actual key-value mappings

    if data:
        for k, v in data.items():
            self[k] = v

Functions

__contains__
__contains__(key)

Support key in dict syntax.

Parameters:

Name Type Description Default
key str

The key to check for existence.

required

Returns:

Type Description
bool

True if key exists in the dictionary, False otherwise.

Source code in pybibtexer/main/utils.py
def __contains__(self, key: str) -> bool:
    """Support key in dict syntax.

    Args:
        key: The key to check for existence.

    Returns:
        True if key exists in the dictionary, False otherwise.
    """
    return key in self._data
__getitem__
__getitem__(key)

Retrieves the value associated with the given key.

Parameters:

Name Type Description Default
key str

The key to look up.

required

Returns:

Type Description
Any

The value associated with the key.

Raises:

Type Description
KeyError

If the key is not found in the dictionary.

Source code in pybibtexer/main/utils.py
def __getitem__(self, key: str) -> Any:
    """Retrieves the value associated with the given key.

    Args:
        key: The key to look up.

    Returns:
        The value associated with the key.

    Raises:
        KeyError: If the key is not found in the dictionary.
    """
    return self._data[key]
__len__
__len__()

Support len(dict) syntax.

Returns:

Type Description
int

The number of items in the dictionary.

Source code in pybibtexer/main/utils.py
def __len__(self) -> int:
    """Support len(dict) syntax.

    Returns:
        The number of items in the dictionary.
    """
    return len(self._data)
__repr__
__repr__()

Returns a string representation of the dictionary.

Returns:

Type Description
str

A string representation showing all key-value pairs in insertion order,

str

formatted like a standard Python dictionary.

Example

sod = StrictOrderedDict([('x', 10), ('y', 20)]) print(sod)

Source code in pybibtexer/main/utils.py
def __repr__(self) -> str:
    """Returns a string representation of the dictionary.

    Returns:
        A string representation showing all key-value pairs in insertion order,
        formatted like a standard Python dictionary.

    Example:
        >>> sod = StrictOrderedDict([('x', 10), ('y', 20)])
        >>> print(sod)
        {'x': 10, 'y': 20}
    """
    items = [f"'{k}': {v}" for k, v in self.items()]
    return "{" + ", ".join(items) + "}"
__setitem__
__setitem__(key, value)

Sets a key-value pair, maintaining insertion order for new keys.

Parameters:

Name Type Description Default
key str

The key to set or update.

required
value Any

The value to associate with the key.

required
Note

If the key is new, it is added to the end of the insertion order. If the key exists, its value is updated but its position remains unchanged.

Source code in pybibtexer/main/utils.py
def __setitem__(self, key: str, value: Any) -> None:
    """Sets a key-value pair, maintaining insertion order for new keys.

    Args:
        key: The key to set or update.
        value: The value to associate with the key.

    Note:
        If the key is new, it is added to the end of the insertion order.
        If the key exists, its value is updated but its position remains unchanged.
    """
    if key not in self._data:
        self._keys.append(key)  # Only add new keys to maintain order

    self._data[key] = value
get
get(key, default=None)

Safely get a value by key, returning default if key doesn't exist.

Parameters:

Name Type Description Default
key str

The key to look up.

required
default Any

Value to return if key is not found. Defaults to None.

None

Returns:

Type Description
Any

The value associated with the key, or default if key doesn't exist.

Source code in pybibtexer/main/utils.py
def get(self, key: str, default: Any = None) -> Any:
    """Safely get a value by key, returning default if key doesn't exist.

    Args:
        key: The key to look up.
        default: Value to return if key is not found. Defaults to None.

    Returns:
        The value associated with the key, or default if key doesn't exist.
    """
    return self._data.get(key, default)
items
items()

Returns all key-value pairs in insertion order.

Returns:

Type Description
list[tuple[str, Any]]

A list of (key, value) tuples in the order they were inserted.

Source code in pybibtexer/main/utils.py
def items(self) -> list[tuple[str, Any]]:
    """Returns all key-value pairs in insertion order.

    Returns:
        A list of (key, value) tuples in the order they were inserted.
    """
    return [(k, self._data[k]) for k in self._keys]
keys
keys()

Returns all keys in insertion order.

Returns:

Type Description
list[str]

A copy of the list containing all keys in the order they were inserted.

Source code in pybibtexer/main/utils.py
def keys(self) -> list[str]:
    """Returns all keys in insertion order.

    Returns:
        A copy of the list containing all keys in the order they were inserted.
    """
    return self._keys.copy()
values
values()

Returns all values in key insertion order.

Returns:

Type Description
list[Any]

A list of values in the same order as their corresponding keys were inserted.

Source code in pybibtexer/main/utils.py
def values(self) -> list[Any]:
    """Returns all values in key insertion order.

    Returns:
        A list of values in the same order as their corresponding keys were inserted.
    """
    return [self._data[k] for k in self._keys]

Functions

parse_bibtex_file

parse_bibtex_file(full_biblatex, entry_type='article')

Parse BibTeX file and extract conference or journal data.

Parameters:

Name Type Description Default
full_biblatex str

Path to the BibLaTeX file.

required
entry_type str

Type of entry to parse - 'article' or 'inproceedings'.

'article'

Returns:

Type Description
dict[str, dict[str, list[str]]]

Dictionary containing parsed conference or journal data.

Raises:

Type Description
ValueError

If entry_type is not 'article' or 'inproceedings'.

Source code in pybibtexer/main/utils.py
def parse_bibtex_file(full_biblatex: str, entry_type: str = "article") -> dict[str, dict[str, list[str]]]:
    """Parse BibTeX file and extract conference or journal data.

    Args:
        full_biblatex: Path to the BibLaTeX file.
        entry_type: Type of entry to parse - 'article' or 'inproceedings'.

    Returns:
        Dictionary containing parsed conference or journal data.

    Raises:
        ValueError: If entry_type is not 'article' or 'inproceedings'.
    """
    if entry_type not in ["article", "inproceedings"]:
        raise ValueError("entry_type must be 'article' or 'inproceedings'")

    config = {
        "article": {
            "prefix": "J_",
            "pattern": r"@article\{(.*?),\s*([^@]*)\}",
            "full_field": "journaltitle",
            "abbr_field": "shortjournal",
        },
        "inproceedings": {
            "prefix": "C_",
            "pattern": r"@inproceedings\{(.*?),\s*([^@]*)\}",
            "full_field": "booktitle",
            "abbr_field": "eventtitle",
        },
    }

    cfg = config[entry_type]
    content = read_str(full_biblatex)
    entries = re.findall(cfg["pattern"], content, re.DOTALL)

    result_dict = {}
    for cite_key, entry_content in entries:
        # Process only entries with the specified prefix
        if not cite_key.startswith(cfg["prefix"]):
            continue

        # Extract full and abbreviation fields
        full_match = re.search(cfg['full_field'] + r"\s*=\s*{" + r"(.*)" + "}", entry_content)
        abbr_match = re.search(cfg['abbr_field'] + r"\s*=\s*{" + r"(.*)" + "}", entry_content)

        if not full_match:
            continue

        full = full_match.group(1).strip()
        abbr = abbr_match.group(1).strip() if abbr_match else full

        # Remove case-protection
        full = full.replace("{", "").replace("}", "")
        abbr = abbr.replace("{", "").replace("}", "")

        parts = cite_key.split("_")
        if len(parts) >= 3:
            key = parts[1]

            # Check if key already exists
            if key in result_dict:
                existing_entry = result_dict[key]

                # Only add if full name is not already present
                if full not in existing_entry["names_full"]:
                    existing_entry["names_abbr"].append(abbr)
                    existing_entry["names_full"].append(full)
            else:
                # New key - add to dictionary
                result_dict[key] = {"names_abbr": [abbr], "names_full": [full]}

    return result_dict

process_user_conferences_journals_json

process_user_conferences_journals_json(
    full_json_c, full_json_j
)

Process user-defined conferences and journals JSON files.

Notes

The structure of full_json_c follows the format {"publisher": {"conferences": {"abbr": {"names_abbr": [], "names_full": []}}}}, while full_json_j adheres to the format {"publisher": {"journals": {"abbr": {"names_abbr": [], "names_full": []}}}}.

Source code in pybibtexer/main/utils.py
def process_user_conferences_journals_json(full_json_c: str, full_json_j: str) -> tuple[dict, dict]:
    """Process user-defined conferences and journals JSON files.

    Notes:
        The structure of full_json_c follows the format
            {"publisher": {"conferences": {"abbr": {"names_abbr": [], "names_full": []}}}},
        while full_json_j adheres to the format
            {"publisher": {"journals": {"abbr": {"names_abbr": [], "names_full": []}}}}.
    """
    # Process user conferences JSON file
    json_dict = load_json_file(full_json_c)
    full_abbr_inproceedings_dict = {}

    # Try different possible keys for conferences section in JSON structure
    for flag in ["conferences", "Conferences", "CONFERENCES", "conference", "Conference", "CONFERENCE"]:
        full_abbr_inproceedings_dict = {p: json_dict[p].get(flag, {}) for p in json_dict}
        if full_abbr_inproceedings_dict:
            break

    # Flatten the nested dictionary structure to {abbr: value} format
    # Convert from {publisher: {abbr: data}} to {abbr: data}
    full_abbr_inproceedings_dict = {abbr: v[abbr] for v in full_abbr_inproceedings_dict.values() for abbr in v}
    # Standardize the structure to ensure consistent format
    # Extract only usefull information ("names_full" and "names_abbr")
    full_abbr_inproceedings_dict = {
        k: {"names_full": v.get("names_full", []), "names_abbr": v.get("names_abbr", [])}
        for k, v in full_abbr_inproceedings_dict.items()
    }

    # Process user journals JSON file
    json_dict = load_json_file(full_json_j)
    full_abbr_article_dict = {}

    # Try different possible keys for journals section in JSON structure
    for flag in ["journals", "Journals", "JOURNALS", "journal", "Journal", "JOURNAL"]:
        full_abbr_article_dict = {p: json_dict[p].get(flag, {}) for p in json_dict}
        if full_abbr_article_dict:
            break

    # Flatten the nested dictionary structure to {abbr: value} format
    # Convert from {publisher: {abbr: data}} to {abbr: data}
    full_abbr_article_dict = {abbr: v[abbr] for v in full_abbr_article_dict.values() for abbr in v}
    # Standardize the structure to ensure consistent format
    # Extract only usefull information ("names_full" and "names_abbr")
    full_abbr_article_dict = {
        k: {"names_full": v.get("names_full", []), "names_abbr": v.get("names_abbr", [])}
        for k, v in full_abbr_article_dict.items()
    }

    # Return both processed dictionaries
    return full_abbr_inproceedings_dict, full_abbr_article_dict

read_str

read_str(full_file)

Read file content as string.

Parameters:

Name Type Description Default
full_file str

Path to the file to read.

required

Returns:

Type Description
str

Content of the file as string.

Source code in pybibtexer/main/utils.py
def read_str(full_file: str) -> str:
    """Read file content as string.

    Args:
        full_file: Path to the file to read.

    Returns:
        Content of the file as string.
    """
    with open(full_file, encoding="utf-8", newline="\n") as file:
        content = file.read()
    return content