main.utils¶

Classes¶

CheckAcronymAbbrAndFullDict ¶

CheckAcronymAbbrAndFullDict(
    names_abbr="names_abbr", names_full="names_full"
)

Checker for acronym, abbreviation and full form dictionaries.

Validates and processes dictionary data containing acronyms with their corresponding abbreviations and full forms.

Attributes:

Name	Type	Description
`names_abbr`	`str`	Key name for abbreviations in the dictionary.
`names_full`	`str`	Key name for full forms in the dictionary.

Initializes the checker with field names.

Parameters:

Name	Type	Description	Default
`names_abbr`	`str`	Key name for abbreviations, defaults to "names_abbr".	`'names_abbr'`
`names_full`	`str`	Key name for full forms, defaults to "names_full".	`'names_full'`

Source code in pybibtexer/main/utils.py

def __init__(self, names_abbr: str = "names_abbr", names_full: str = "names_full") -> None:
    """Initializes the checker with field names.

    Args:
        names_abbr: Key name for abbreviations, defaults to "names_abbr".
        names_full: Key name for full forms, defaults to "names_full".
    """
    self.names_abbr = names_abbr
    self.names_full = names_full

Functions¶

compare_and_return_only_in_new ¶

compare_and_return_only_in_new(json_old, json_new)

Compares old and new JSON data to find newly added items.

Parameters:

Name	Type	Description	Default
`json_old`	`dict`	Old JSON data as dictionary.	required
`json_new`	`dict`	New JSON data as dictionary.	required

Returns:

Name	Type	Description
`dict`	`dict`	Dictionary containing keys that only exist in new data.

Source code in pybibtexer/main/utils.py

def compare_and_return_only_in_new(self, json_old: dict, json_new: dict) -> dict:
    """Compares old and new JSON data to find newly added items.

    Args:
        json_old: Old JSON data as dictionary.
        json_new: New JSON data as dictionary.

    Returns:
        dict: Dictionary containing keys that only exist in new data.
    """
    # Find keys that only exist in new JSON
    keys_only_in_new = sorted(set(json_new.keys()) - set(json_old.keys()))
    new_only_data = {key: json_new[key] for key in keys_only_in_new}

    # Find common keys between old and new JSON
    common_keys = set(json_old.keys()) & set(json_new.keys())

    # Check each common key for new items using pattern matching
    for key in sorted(common_keys):
        for flag in [self.names_full]:
            old_items = [item.lower() for item in json_old[key][flag]]
            old_items = [item.replace("(", "").replace(")", "") for item in old_items]

            new_items = [item.lower() for item in json_new[key][flag]]
            new_items = [item.replace("(", "").replace(")", "") for item in new_items]

            self._old_match_new(json_old, key, flag, old_items, new_items)

    # Return keys that only exist in new data
    return new_only_data

length_dupicate_match ¶

length_dupicate_match(dict_data)

Performs comprehensive validation on dictionary data.

Executes three validation steps: length validation, duplicate checking, and mutual pattern matching.

Parameters:

Name	Type	Description	Default
`dict_data`	`dict[str, dict[str, list[str]]]`	Dictionary containing acronym data with abbreviations and full forms.	required

Returns:

Name	Type	Description
`tuple`	`tuple[dict[str, dict[str, list[str]]], list[str]]`	Validated dictionary and list of acronyms with matches.

Source code in pybibtexer/main/utils.py

def length_dupicate_match(
    self, dict_data: dict[str, dict[str, list[str]]]
) -> tuple[dict[str, dict[str, list[str]]], list[str]]:
    """Performs comprehensive validation on dictionary data.

    Executes three validation steps: length validation, duplicate checking,
    and mutual pattern matching.

    Args:
        dict_data: Dictionary containing acronym data with abbreviations
                  and full forms.

    Returns:
        tuple: Validated dictionary and list of acronyms with matches.
    """
    dict_data = self._validate_length(dict_data)
    dict_data = self._check_duplicate(dict_data)

    # Check for matching patterns in both abbreviations and full forms
    dict_data, abbr_matches = self._mutually_check_match(dict_data, self.names_abbr)
    dict_data, full_matches = self._mutually_check_match(dict_data, self.names_full)
    matches = sorted(set(abbr_matches).union(full_matches))
    return dict_data, matches

StrictOrderedDict ¶

StrictOrderedDict(data)

A dictionary that strictly maintains insertion order.

This implementation guarantees that keys, values, and items will always be returned in the exact order they were inserted, regardless of Python version or internal dictionary implementation changes.

Attributes:

Name	Type	Description
`_keys`		List maintaining the order of key insertion.
`_data`		Dictionary storing the actual key-value pairs.

Initializes the StrictOrderedDict with optional initial data.

Parameters:

Name	Type	Description	Default
`data`	`dict`	Optional iterable of (key, value) pairs to initialize the dictionary. If provided, must be an iterable containing exactly two-element tuples or lists representing key-value pairs.	required

Example

sod = StrictOrderedDict() sod = StrictOrderedDict([('a', 1), ('b', 2)])

Source code in pybibtexer/main/utils.py

def __init__(self, data: dict) -> None:
    """Initializes the StrictOrderedDict with optional initial data.

    Args:
        data: Optional iterable of (key, value) pairs to initialize the dictionary.
              If provided, must be an iterable containing exactly two-element
              tuples or lists representing key-value pairs.

    Example:
        >>> sod = StrictOrderedDict()
        >>> sod = StrictOrderedDict([('a', 1), ('b', 2)])
    """
    self._keys = []  # Maintains insertion order of keys
    self._data = {}  # Stores the actual key-value mappings

    if data:
        for k, v in data.items():
            self[k] = v

Functions¶

contains ¶

__contains__(key)

Support key in dict syntax.

Parameters:

Name	Type	Description	Default
`key`	`str`	The key to check for existence.	required

Returns:

Type	Description
`bool`	True if key exists in the dictionary, False otherwise.

Source code in pybibtexer/main/utils.py

def __contains__(self, key: str) -> bool:
    """Support key in dict syntax.

    Args:
        key: The key to check for existence.

    Returns:
        True if key exists in the dictionary, False otherwise.
    """
    return key in self._data

getitem ¶

__getitem__(key)

Retrieves the value associated with the given key.

Parameters:

Name	Type	Description	Default
`key`	`str`	The key to look up.	required

Returns:

Type	Description
`Any`	The value associated with the key.

Raises:

Type	Description
`KeyError`	If the key is not found in the dictionary.

Source code in pybibtexer/main/utils.py

def __getitem__(self, key: str) -> Any:
    """Retrieves the value associated with the given key.

    Args:
        key: The key to look up.

    Returns:
        The value associated with the key.

    Raises:
        KeyError: If the key is not found in the dictionary.
    """
    return self._data[key]

len ¶

__len__()

Support len(dict) syntax.

Returns:

Type	Description
`int`	The number of items in the dictionary.

Source code in pybibtexer/main/utils.py

def __len__(self) -> int:
    """Support len(dict) syntax.

    Returns:
        The number of items in the dictionary.
    """
    return len(self._data)

repr ¶

__repr__()

Returns a string representation of the dictionary.

Returns:

Type	Description
`str`	A string representation showing all key-value pairs in insertion order,
`str`	formatted like a standard Python dictionary.

Example

sod = StrictOrderedDict([('x', 10), ('y', 20)]) print(sod)

Source code in pybibtexer/main/utils.py

def __repr__(self) -> str:
    """Returns a string representation of the dictionary.

    Returns:
        A string representation showing all key-value pairs in insertion order,
        formatted like a standard Python dictionary.

    Example:
        >>> sod = StrictOrderedDict([('x', 10), ('y', 20)])
        >>> print(sod)
        {'x': 10, 'y': 20}
    """
    items = [f"'{k}': {v}" for k, v in self.items()]
    return "{" + ", ".join(items) + "}"

setitem ¶

__setitem__(key, value)

Sets a key-value pair, maintaining insertion order for new keys.

Parameters:

Name	Type	Description	Default
`key`	`str`	The key to set or update.	required
`value`	`Any`	The value to associate with the key.	required

Note

If the key is new, it is added to the end of the insertion order. If the key exists, its value is updated but its position remains unchanged.

Source code in pybibtexer/main/utils.py

def __setitem__(self, key: str, value: Any) -> None:
    """Sets a key-value pair, maintaining insertion order for new keys.

    Args:
        key: The key to set or update.
        value: The value to associate with the key.

    Note:
        If the key is new, it is added to the end of the insertion order.
        If the key exists, its value is updated but its position remains unchanged.
    """
    if key not in self._data:
        self._keys.append(key)  # Only add new keys to maintain order

    self._data[key] = value

get ¶

get(key, default=None)

Safely get a value by key, returning default if key doesn't exist.

Parameters:

Name	Type	Description	Default
`key`	`str`	The key to look up.	required
`default`	`Any`	Value to return if key is not found. Defaults to None.	`None`

Returns:

Type	Description
`Any`	The value associated with the key, or default if key doesn't exist.

Source code in pybibtexer/main/utils.py

def get(self, key: str, default: Any = None) -> Any:
    """Safely get a value by key, returning default if key doesn't exist.

    Args:
        key: The key to look up.
        default: Value to return if key is not found. Defaults to None.

    Returns:
        The value associated with the key, or default if key doesn't exist.
    """
    return self._data.get(key, default)

items ¶

items()

Returns all key-value pairs in insertion order.

Returns:

Type	Description
`list[tuple[str, Any]]`	A list of (key, value) tuples in the order they were inserted.

Source code in pybibtexer/main/utils.py

def items(self) -> list[tuple[str, Any]]:
    """Returns all key-value pairs in insertion order.

    Returns:
        A list of (key, value) tuples in the order they were inserted.
    """
    return [(k, self._data[k]) for k in self._keys]

keys ¶

keys()

Returns all keys in insertion order.

Returns:

Type	Description
`list[str]`	A copy of the list containing all keys in the order they were inserted.

Source code in pybibtexer/main/utils.py

def keys(self) -> list[str]:
    """Returns all keys in insertion order.

    Returns:
        A copy of the list containing all keys in the order they were inserted.
    """
    return self._keys.copy()

values ¶

values()

Returns all values in key insertion order.

Returns:

Type	Description
`list[Any]`	A list of values in the same order as their corresponding keys were inserted.

Source code in pybibtexer/main/utils.py

def values(self) -> list[Any]:
    """Returns all values in key insertion order.

    Returns:
        A list of values in the same order as their corresponding keys were inserted.
    """
    return [self._data[k] for k in self._keys]

Functions¶

parse_bibtex_file ¶

parse_bibtex_file(full_biblatex, entry_type='article')

Parse BibTeX file and extract conference or journal data.

Parameters:

Name	Type	Description	Default
`full_biblatex`	`str`	Path to the BibLaTeX file.	required
`entry_type`	`str`	Type of entry to parse - 'article' or 'inproceedings'.	`'article'`

Returns:

Type	Description
`dict[str, dict[str, list[str]]]`	Dictionary containing parsed conference or journal data.

Raises:

Type	Description
`ValueError`	If entry_type is not 'article' or 'inproceedings'.

Source code in pybibtexer/main/utils.py

def parse_bibtex_file(full_biblatex: str, entry_type: str = "article") -> dict[str, dict[str, list[str]]]:
    """Parse BibTeX file and extract conference or journal data.

    Args:
        full_biblatex: Path to the BibLaTeX file.
        entry_type: Type of entry to parse - 'article' or 'inproceedings'.

    Returns:
        Dictionary containing parsed conference or journal data.

    Raises:
        ValueError: If entry_type is not 'article' or 'inproceedings'.
    """
    if entry_type not in ["article", "inproceedings"]:
        raise ValueError("entry_type must be 'article' or 'inproceedings'")

    config = {
        "article": {
            "prefix": "J_",
            "pattern": r"@article\{(.*?),\s*([^@]*)\}",
            "full_field": "journaltitle",
            "abbr_field": "shortjournal",
        },
        "inproceedings": {
            "prefix": "C_",
            "pattern": r"@inproceedings\{(.*?),\s*([^@]*)\}",
            "full_field": "booktitle",
            "abbr_field": "eventtitle",
        },
    }

    cfg = config[entry_type]
    content = read_str(full_biblatex)
    entries = re.findall(cfg["pattern"], content, re.DOTALL)

    result_dict = {}
    for cite_key, entry_content in entries:
        # Process only entries with the specified prefix
        if not cite_key.startswith(cfg["prefix"]):
            continue

        # Extract full and abbreviation fields
        full_match = re.search(cfg['full_field'] + r"\s*=\s*{" + r"(.*)" + "}", entry_content)
        abbr_match = re.search(cfg['abbr_field'] + r"\s*=\s*{" + r"(.*)" + "}", entry_content)

        if not full_match:
            continue

        full = full_match.group(1).strip()
        abbr = abbr_match.group(1).strip() if abbr_match else full

        # Remove case-protection
        full = full.replace("{", "").replace("}", "")
        abbr = abbr.replace("{", "").replace("}", "")

        parts = cite_key.split("_")
        if len(parts) >= 3:
            key = parts[1]

            # Check if key already exists
            if key in result_dict:
                existing_entry = result_dict[key]

                # Only add if full name is not already present
                if full not in existing_entry["names_full"]:
                    existing_entry["names_abbr"].append(abbr)
                    existing_entry["names_full"].append(full)
            else:
                # New key - add to dictionary
                result_dict[key] = {"names_abbr": [abbr], "names_full": [full]}

    return result_dict

process_user_conferences_journals_json ¶

process_user_conferences_journals_json(
    full_json_c, full_json_j
)

Process user-defined conferences and journals JSON files.

Notes

The structure of full_json_c follows the format {"publisher": {"conferences": {"abbr": {"names_abbr": [], "names_full": []}}}}, while full_json_j adheres to the format {"publisher": {"journals": {"abbr": {"names_abbr": [], "names_full": []}}}}.

Source code in pybibtexer/main/utils.py

def process_user_conferences_journals_json(full_json_c: str, full_json_j: str) -> tuple[dict, dict]:
    """Process user-defined conferences and journals JSON files.

    Notes:
        The structure of full_json_c follows the format
            {"publisher": {"conferences": {"abbr": {"names_abbr": [], "names_full": []}}}},
        while full_json_j adheres to the format
            {"publisher": {"journals": {"abbr": {"names_abbr": [], "names_full": []}}}}.
    """
    # Process user conferences JSON file
    json_dict = load_json_file(full_json_c)
    full_abbr_inproceedings_dict = {}

    # Try different possible keys for conferences section in JSON structure
    for flag in ["conferences", "Conferences", "CONFERENCES", "conference", "Conference", "CONFERENCE"]:
        full_abbr_inproceedings_dict = {p: json_dict[p].get(flag, {}) for p in json_dict}
        if full_abbr_inproceedings_dict:
            break

    # Flatten the nested dictionary structure to {abbr: value} format
    # Convert from {publisher: {abbr: data}} to {abbr: data}
    full_abbr_inproceedings_dict = {abbr: v[abbr] for v in full_abbr_inproceedings_dict.values() for abbr in v}
    # Standardize the structure to ensure consistent format
    # Extract only usefull information ("names_full" and "names_abbr")
    full_abbr_inproceedings_dict = {
        k: {"names_full": v.get("names_full", []), "names_abbr": v.get("names_abbr", [])}
        for k, v in full_abbr_inproceedings_dict.items()
    }

    # Process user journals JSON file
    json_dict = load_json_file(full_json_j)
    full_abbr_article_dict = {}

    # Try different possible keys for journals section in JSON structure
    for flag in ["journals", "Journals", "JOURNALS", "journal", "Journal", "JOURNAL"]:
        full_abbr_article_dict = {p: json_dict[p].get(flag, {}) for p in json_dict}
        if full_abbr_article_dict:
            break

    # Flatten the nested dictionary structure to {abbr: value} format
    # Convert from {publisher: {abbr: data}} to {abbr: data}
    full_abbr_article_dict = {abbr: v[abbr] for v in full_abbr_article_dict.values() for abbr in v}
    # Standardize the structure to ensure consistent format
    # Extract only usefull information ("names_full" and "names_abbr")
    full_abbr_article_dict = {
        k: {"names_full": v.get("names_full", []), "names_abbr": v.get("names_abbr", [])}
        for k, v in full_abbr_article_dict.items()
    }

    # Return both processed dictionaries
    return full_abbr_inproceedings_dict, full_abbr_article_dict

read_str ¶

read_str(full_file)

Read file content as string.

Parameters:

Name	Type	Description	Default
`full_file`	`str`	Path to the file to read.	required

Returns:

Type	Description
`str`	Content of the file as string.

Source code in pybibtexer/main/utils.py

def read_str(full_file: str) -> str:
    """Read file content as string.

    Args:
        full_file: Path to the file to read.

    Returns:
        Content of the file as string.
    """
    with open(full_file, encoding="utf-8", newline="\n") as file:
        content = file.read()
    return content

main.utils¶

Classes¶

CheckAcronymAbbrAndFullDict ¶

Functions¶

compare_and_return_only_in_new ¶

length_dupicate_match ¶

StrictOrderedDict ¶

Functions¶

__contains__ ¶

__getitem__ ¶

__len__ ¶

__repr__ ¶

__setitem__ ¶

get ¶

items ¶

keys ¶

values ¶

Functions¶

parse_bibtex_file ¶

process_user_conferences_journals_json ¶

read_str ¶

contains ¶

getitem ¶

len ¶

repr ¶

setitem ¶