Skip to content

tools.generate_dict

Classes

GenerateDataDict

GenerateDataDict(
    conferences_or_journals,
    inproceedings_or_article,
    json_dict,
    for_vue=True,
    path_spidered_conferences_or_journals=None,
)

Generate data dictionaries from JSON input for conferences and journals.

This class processes JSON data containing conference or journal information and generates structured dictionaries for markdown table generation, including publisher metadata, keyword-based indexing, and Mermaid diagram data.

Attributes:

Name Type Description
cj str

Type of publication ('conferences' or 'journals').

ia str

Publication type ('inproceedings' or 'article').

json_dict dict

Input JSON data containing publication information.

path_spidered_cj Optional[str]

Path to spidered conference/journal data.

for_vue bool

Whether to generate Vue.js-compatible format.

Example

generator = GenerateDataDict( ... conferences_or_journals="conferences", ... inproceedings_or_article="inproceedings", ... json_dict=publication_data, ... for_vue=True ... ) publisher_meta, publisher_abbr, keyword_abbr = generator.generate()

Initialize the GenerateDataDict instance.

Parameters:

Name Type Description Default
conferences_or_journals str

Type of publication ('conferences' or 'journals').

required
inproceedings_or_article str

Publication type ('inproceedings' or 'article').

required
json_dict dict

Input JSON data containing publication information.

required
for_vue bool

Whether to generate Vue.js-compatible format. Defaults to True.

True
path_spidered_conferences_or_journals Optional[str]

Path to spidered conference/journal data. Defaults to None.

None
Source code in pyformatjson/tools/generate_dict.py
def __init__(
    self,
    conferences_or_journals: str,
    inproceedings_or_article: str,
    json_dict: dict,
    for_vue: bool = True,
    path_spidered_conferences_or_journals: str | None = None,
) -> None:
    """Initialize the GenerateDataDict instance.

    Args:
        conferences_or_journals (str): Type of publication ('conferences' or 'journals').
        inproceedings_or_article (str): Publication type ('inproceedings' or 'article').
        json_dict (dict): Input JSON data containing publication information.
        for_vue (bool, optional): Whether to generate Vue.js-compatible format.
            Defaults to True.
        path_spidered_conferences_or_journals (Optional[str], optional): Path to
            spidered conference/journal data. Defaults to None.
    """
    self.cj = conferences_or_journals
    self.ia = inproceedings_or_article
    self.json_dict = json_dict

    self.path_spidered_cj = path_spidered_conferences_or_journals
    self.for_vue = for_vue

Functions

conference_or_journal
conference_or_journal(publisher_url, abbr, abbr_dict)

Process conference or journal data and generate formatted information.

This method processes individual conference or journal data, validates name lengths, extracts information, formats URLs, and generates table row data for markdown output.

Parameters:

Name Type Description Default
publisher_url str

Publisher's URL for markdown linking.

required
abbr str

Abbreviation identifier for the publication.

required
abbr_dict dict

Dictionary containing publication details including names, URLs, dates, scores, and keywords.

required

Returns:

Name Type Description
tuple tuple[dict[str, Any], list[str]]

A tuple containing: - dict: Contains formatted about text, remarks, and table row data - list: Sorted list of keywords for the publication

Raises:

Type Description
ValueError

If full and abbreviated names have mismatched lengths.

Example

result = generator.conference_or_journal( ... "https://publisher.com", "ICML", conf_data ... ) abouts, keywords = result

Source code in pyformatjson/tools/generate_dict.py
def conference_or_journal(self, publisher_url: str, abbr: str, abbr_dict: dict) -> tuple[dict[str, Any], list[str]]:
    """Process conference or journal data and generate formatted information.

    This method processes individual conference or journal data, validates
    name lengths, extracts information, formats URLs, and generates table
    row data for markdown output.

    Args:
        publisher_url (str): Publisher's URL for markdown linking.
        abbr (str): Abbreviation identifier for the publication.
        abbr_dict (dict): Dictionary containing publication details including
            names, URLs, dates, scores, and keywords.

    Returns:
        tuple: A tuple containing:
            - dict: Contains formatted about text, remarks, and table row data
            - list: Sorted list of keywords for the publication

    Raises:
        ValueError: If full and abbreviated names have mismatched lengths.

    Example:
        >>> result = generator.conference_or_journal(
        ...     "https://publisher.com", "ICML", conf_data
        ... )
        >>> abouts, keywords = result
    """
    # Validate full and abbreviated names match in length
    self._validate_name_lengths(abbr_dict)

    # Extract basic information
    full_name, abbr_name = self._extract_full_abbr_names(abbr_dict)
    url_home = self._extract_homepage_url(abbr_dict)
    period = self._format_period_with_dblp(abbr_dict)

    # Extract text content
    abouts = self._extract_text_content(abbr_dict, "txt_abouts")
    remarks = self._extract_text_content(abbr_dict, "txt_remarks")
    url_about = self._extract_first_url(abbr_dict, "urls_about")

    # Process keywords with Google search links
    keywords, keywords_url = self._process_keywords(abbr_dict)

    # Format top score with early access link if available
    top = self._format_top_score(abbr_dict)

    # Generate appropriate table row based on type
    row_inf = self._generate_table_row(
        publisher_url, full_name, abbr_name, url_home, url_about, period, top, keywords_url, abbr, abbr_dict
    )

    return {"txt_abouts": abouts, "txt_remarks": remarks, "row_inf": row_inf}, keywords
generate
generate()

Generate publisher metadata and keyword-based publication information.

This method processes the JSON data to create three main dictionaries: 1. Publisher metadata with URLs and descriptions 2. Publisher abbreviation metadata with detailed publication info 3. Keyword-based metadata for easy searching and categorization

Returns:

Name Type Description
tuple tuple[dict, dict, dict]

A tuple containing three dictionaries: - publisher_meta_dict: Publisher metadata including URLs and descriptions - publisher_abbr_meta_dict: Publication details indexed by publisher and abbreviation - keyword_abbr_meta_dict: Publication details indexed by keywords

Example

generator = GenerateDataDict(...) pub_meta, pub_abbr, keyword_abbr = generator.generate()

Source code in pyformatjson/tools/generate_dict.py
def generate(self) -> tuple[dict, dict, dict]:
    """Generate publisher metadata and keyword-based publication information.

    This method processes the JSON data to create three main dictionaries:
    1. Publisher metadata with URLs and descriptions
    2. Publisher abbreviation metadata with detailed publication info
    3. Keyword-based metadata for easy searching and categorization

    Returns:
        tuple: A tuple containing three dictionaries:
            - publisher_meta_dict: Publisher metadata including URLs and descriptions
            - publisher_abbr_meta_dict: Publication details indexed by publisher and abbreviation
            - keyword_abbr_meta_dict: Publication details indexed by keywords

    Example:
        >>> generator = GenerateDataDict(...)
        >>> pub_meta, pub_abbr, keyword_abbr = generator.generate()
    """
    publisher_meta_dict, keyword_abbr_meta_dict, publisher_abbr_meta_dict = {}, {}, {}

    for publisher in self.json_dict:
        # Extract and clean about texts
        abouts = [p for p in self.json_dict[publisher].get("txt_abouts", []) if p.strip()]

        # Extract and clean about URLs
        urls_about = [p.strip() for p in self.json_dict[publisher].get("urls_about", []) if p.strip()]

        # Get full names
        names_full = self.json_dict[publisher].get("names_full", [])

        # Get homepage URLs
        urls_homepage = self.json_dict[publisher].get("urls_homepage", [])

        # Extract and clean conference/journal URLs
        urls_cj = [url.strip() for url in self.json_dict[publisher].get(f"urls_{self.cj}", []) if url.strip()]

        # Create publisher URL with markdown formatting if homepage exists
        publisher_url = f"[{publisher}]({urls_homepage[0]})" if urls_homepage else publisher

        # Create full name URL with markdown formatting if available
        if names_full:
            full_url = f"[{names_full[0]}]({urls_homepage[0]})" if urls_homepage else names_full[0]
        else:
            full_url = publisher

        # Extract and clean remarks
        remarks = [p for p in self.json_dict[publisher].get("txt_remarks", []) if p.strip()]

        # Update publisher metadata
        publisher_meta_dict.setdefault(publisher, {}).update(
            {
                "full_name_url": full_url,
                "txt_abouts": abouts,
                "txt_remarks": remarks,
                "urls_about": urls_about,
                "url_conferences_or_journals": f"[{self.cj.title()}]({urls_cj[0]})" if urls_cj else "",
            }
        )

        # Process each abbreviation (conference/journal)
        for abbr in self.json_dict[publisher][self.cj]:
            abbr_dict = self.json_dict[publisher][self.cj][abbr]

            # Get conference/journal info and keywords
            temp_dict, keywords = self.conference_or_journal(publisher_url, abbr, abbr_dict)

            # Generate mermaid diagram data
            mermaid = self.generate_mermaid_data(publisher, abbr, self.ia)

            publisher_abbr_meta_dict.setdefault(publisher, {}).setdefault(abbr, {}).update(temp_dict)
            publisher_abbr_meta_dict.setdefault(publisher, {}).setdefault(abbr, {}).update({"statistics": mermaid})

            # Index by keywords for quick lookup
            for keyword in keywords:
                keyword_abbr_meta_dict.setdefault(keyword, {}).setdefault(abbr, {}).update(temp_dict)
                keyword_abbr_meta_dict.setdefault(keyword, {}).setdefault(abbr, {}).update({"statistics": mermaid})

    return publisher_meta_dict, publisher_abbr_meta_dict, keyword_abbr_meta_dict
generate_mermaid_data
generate_mermaid_data(
    publisher, abbr, inproceedings_or_article
)

Generate Mermaid diagram data from spidered README files.

This method reads spidered data from README files and generates Mermaid chart configuration for visualizing publication statistics.

Parameters:

Name Type Description Default
publisher str

Publisher name.

required
abbr str

Publication abbreviation.

required
inproceedings_or_article str

Publication type.

required

Returns:

Type Description
list[str]

list[str]: Mermaid chart configuration lines, or empty list if no data found.

Source code in pyformatjson/tools/generate_dict.py
def generate_mermaid_data(self, publisher: str, abbr: str, inproceedings_or_article: str) -> list[str]:
    """Generate Mermaid diagram data from spidered README files.

    This method reads spidered data from README files and generates
    Mermaid chart configuration for visualizing publication statistics.

    Args:
        publisher (str): Publisher name.
        abbr (str): Publication abbreviation.
        inproceedings_or_article (str): Publication type.

    Returns:
        list[str]: Mermaid chart configuration lines, or empty list if no data found.
    """
    path_spidered_cj = self.path_spidered_cj if self.path_spidered_cj else ""
    path_readme = os.path.join(path_spidered_cj, publisher, abbr, inproceedings_or_article)
    full_readme = os.path.expanduser(os.path.join(path_readme, "README.md"))
    if not os.path.exists(full_readme):
        return []

    mermaid, data_dict = [], {}
    # |AAAI|1980|95|Proceedings of the First National Conference on Artificial Intelligence|
    regex = re.compile(r"\|.*\|([0-9]+)\|([0-9]+)\|.*\|")
    with open(full_readme, encoding="utf-8", newline="\n") as file:
        data_list = file.readlines()
    for line in data_list:
        if mch := regex.search(line):
            data_dict.setdefault(mch.group(1), []).append(mch.group(2))
    data_dict = {year: sum([int(n) for n in data_dict[year]]) for year in data_dict}

    # Mermaid
    if len(data_dict) != 0:
        mermaid = ["```mermaid\n"]
        mermaid.extend(
            [
                "---\n",
                "config:\n",
                "    xyChart:\n",
                "        width: 1200\n",
                "        height: 600\n",
                "    themeVariables:\n",
                "        xyChart:\n",
                '            titleColor: "#ff0000"\n',
                "---\n",
            ]
        )
        mermaid.extend(["xychart-beta\n", f'    title "{abbr}"\n'])

        x_axis, bar, line = [], [], []
        for year in data_dict:
            x_axis.append(int(year))
            bar.append(data_dict[year])
            line.append(data_dict[year])

        idx = next((i for i, year in enumerate(x_axis) if year >= 2000), len(x_axis))
        x_axis, bar, line = x_axis[idx:], bar[idx:], line[idx:]

        mermaid.append(f"    x-axis {x_axis}\n")
        mermaid.append('    y-axis "Number of Papers"\n')
        mermaid.append(f"    bar {bar}\n")
        mermaid.append(f"    line {line}\n")
        mermaid.append("```\n")

    return mermaid

Functions

conference_journal_header

conference_journal_header()

Generate markdown table headers for conferences and journals.

This function creates the appropriate markdown table headers for displaying conference and journal information in tabular format.

Returns:

Name Type Description
tuple tuple[list[str], list[str]]

A tuple containing two lists: - conference_header: Markdown table headers for conferences - journal_header: Markdown table headers for journals

Example

conf_header, journal_header = conference_journal_header() print(conf_header[0]) |Publishers|Full/Homepage|Abbr/About|Acronym/Archive|Period/DBLP|...

Source code in pyformatjson/tools/generate_dict.py
def conference_journal_header() -> tuple[list[str], list[str]]:
    """Generate markdown table headers for conferences and journals.

    This function creates the appropriate markdown table headers for displaying
    conference and journal information in tabular format.

    Returns:
        tuple: A tuple containing two lists:
            - conference_header: Markdown table headers for conferences
            - journal_header: Markdown table headers for journals

    Example:
        >>> conf_header, journal_header = conference_journal_header()
        >>> print(conf_header[0])
        |Publishers|Full/Homepage|Abbr/About|Acronym/Archive|Period/DBLP|...
    """
    o = "|Publishers|Full/Homepage|Abbr/About|"
    t = "|-         |-            |-         |"
    conference_header = [
        f"{o}Acronym/Archive|Period/DBLP|Top|CCF|Submission|Days Left|Main Conf.|Days Left|Location|Keywords/Google|\n",
        f"{t}-              |-          |-  |-  |-         |-        |          |-        |-       |-              |\n",
    ]
    journal_header = [
        f"{o}Acronym/Issues|Period/DBLP|Top/Early|CCF|CAS|JCR|IF|Keywords/Google|\n",
        f"{t}-             |-          |-        |-  |-  |-  |- |-              |\n",
    ]
    return conference_header, journal_header