tools.generate_dict¶

Classes¶

GenerateDataDict ¶

GenerateDataDict(
    conferences_or_journals,
    inproceedings_or_article,
    json_dict,
    for_vue=True,
    path_spidered_conferences_or_journals=None,
)

Generate data dictionaries from JSON input for conferences and journals.

This class processes JSON data containing conference or journal information and generates structured dictionaries for markdown table generation, including publisher metadata, keyword-based indexing, and Mermaid diagram data.

Attributes:

Name	Type	Description
`cj`	`str`	Type of publication ('conferences' or 'journals').
`ia`	`str`	Publication type ('inproceedings' or 'article').
`json_dict`	`dict`	Input JSON data containing publication information.
`path_spidered_cj`	`Optional[str]`	Path to spidered conference/journal data.
`for_vue`	`bool`	Whether to generate Vue.js-compatible format.

Example

generator = GenerateDataDict( ... conferences_or_journals="conferences", ... inproceedings_or_article="inproceedings", ... json_dict=publication_data, ... for_vue=True ... ) publisher_meta, publisher_abbr, keyword_abbr = generator.generate()

Initialize the GenerateDataDict instance.

Parameters:

Name	Type	Description	Default
`conferences_or_journals`	`str`	Type of publication ('conferences' or 'journals').	required
`inproceedings_or_article`	`str`	Publication type ('inproceedings' or 'article').	required
`json_dict`	`dict`	Input JSON data containing publication information.	required
`for_vue`	`bool`	Whether to generate Vue.js-compatible format. Defaults to True.	`True`
`path_spidered_conferences_or_journals`	`Optional[str]`	Path to spidered conference/journal data. Defaults to None.	`None`

Source code in pyformatjson/tools/generate_dict.py

def __init__(
    self,
    conferences_or_journals: str,
    inproceedings_or_article: str,
    json_dict: dict,
    for_vue: bool = True,
    path_spidered_conferences_or_journals: str | None = None,
) -> None:
    """Initialize the GenerateDataDict instance.

    Args:
        conferences_or_journals (str): Type of publication ('conferences' or 'journals').
        inproceedings_or_article (str): Publication type ('inproceedings' or 'article').
        json_dict (dict): Input JSON data containing publication information.
        for_vue (bool, optional): Whether to generate Vue.js-compatible format.
            Defaults to True.
        path_spidered_conferences_or_journals (Optional[str], optional): Path to
            spidered conference/journal data. Defaults to None.
    """
    self.cj = conferences_or_journals
    self.ia = inproceedings_or_article
    self.json_dict = json_dict

    self.path_spidered_cj = path_spidered_conferences_or_journals
    self.for_vue = for_vue

Functions¶

conference_or_journal ¶

conference_or_journal(publisher_url, abbr, abbr_dict)

Process conference or journal data and generate formatted information.

This method processes individual conference or journal data, validates name lengths, extracts information, formats URLs, and generates table row data for markdown output.

Parameters:

Name	Type	Description	Default
`publisher_url`	`str`	Publisher's URL for markdown linking.	required
`abbr`	`str`	Abbreviation identifier for the publication.	required
`abbr_dict`	`dict`	Dictionary containing publication details including names, URLs, dates, scores, and keywords.	required

Returns:

Name	Type	Description
`tuple`	`tuple[dict[str, Any], list[str]]`	A tuple containing: - dict: Contains formatted about text, remarks, and table row data - list: Sorted list of keywords for the publication

Raises:

Type	Description
`ValueError`	If full and abbreviated names have mismatched lengths.

Example

result = generator.conference_or_journal( ... "https://publisher.com", "ICML", conf_data ... ) abouts, keywords = result

Source code in pyformatjson/tools/generate_dict.py

def conference_or_journal(self, publisher_url: str, abbr: str, abbr_dict: dict) -> tuple[dict[str, Any], list[str]]:
    """Process conference or journal data and generate formatted information.

    This method processes individual conference or journal data, validates
    name lengths, extracts information, formats URLs, and generates table
    row data for markdown output.

    Args:
        publisher_url (str): Publisher's URL for markdown linking.
        abbr (str): Abbreviation identifier for the publication.
        abbr_dict (dict): Dictionary containing publication details including
            names, URLs, dates, scores, and keywords.

    Returns:
        tuple: A tuple containing:
            - dict: Contains formatted about text, remarks, and table row data
            - list: Sorted list of keywords for the publication

    Raises:
        ValueError: If full and abbreviated names have mismatched lengths.

    Example:
        >>> result = generator.conference_or_journal(
        ...     "https://publisher.com", "ICML", conf_data
        ... )
        >>> abouts, keywords = result
    """
    # Validate full and abbreviated names match in length
    self._validate_name_lengths(abbr_dict)

    # Extract basic information
    full_name, abbr_name = self._extract_full_abbr_names(abbr_dict)
    url_home = self._extract_homepage_url(abbr_dict)
    period = self._format_period_with_dblp(abbr_dict)

    # Extract text content
    abouts = self._extract_text_content(abbr_dict, "txt_abouts")
    remarks = self._extract_text_content(abbr_dict, "txt_remarks")
    url_about = self._extract_first_url(abbr_dict, "urls_about")

    # Process keywords with Google search links
    keywords, keywords_url = self._process_keywords(abbr_dict)

    # Format top score with early access link if available
    top = self._format_top_score(abbr_dict)

    # Generate appropriate table row based on type
    row_inf = self._generate_table_row(
        publisher_url, full_name, abbr_name, url_home, url_about, period, top, keywords_url, abbr, abbr_dict
    )

    return {"txt_abouts": abouts, "txt_remarks": remarks, "row_inf": row_inf}, keywords

generate ¶

generate()

Generate publisher metadata and keyword-based publication information.

This method processes the JSON data to create three main dictionaries: 1. Publisher metadata with URLs and descriptions 2. Publisher abbreviation metadata with detailed publication info 3. Keyword-based metadata for easy searching and categorization

Returns:

Name	Type	Description
`tuple`	`tuple[dict, dict, dict]`	A tuple containing three dictionaries: - publisher_meta_dict: Publisher metadata including URLs and descriptions - publisher_abbr_meta_dict: Publication details indexed by publisher and abbreviation - keyword_abbr_meta_dict: Publication details indexed by keywords

Example

generator = GenerateDataDict(...) pub_meta, pub_abbr, keyword_abbr = generator.generate()

Source code in pyformatjson/tools/generate_dict.py

def generate(self) -> tuple[dict, dict, dict]:
    """Generate publisher metadata and keyword-based publication information.

    This method processes the JSON data to create three main dictionaries:
    1. Publisher metadata with URLs and descriptions
    2. Publisher abbreviation metadata with detailed publication info
    3. Keyword-based metadata for easy searching and categorization

    Returns:
        tuple: A tuple containing three dictionaries:
            - publisher_meta_dict: Publisher metadata including URLs and descriptions
            - publisher_abbr_meta_dict: Publication details indexed by publisher and abbreviation
            - keyword_abbr_meta_dict: Publication details indexed by keywords

    Example:
        >>> generator = GenerateDataDict(...)
        >>> pub_meta, pub_abbr, keyword_abbr = generator.generate()
    """
    publisher_meta_dict, keyword_abbr_meta_dict, publisher_abbr_meta_dict = {}, {}, {}

    for publisher in self.json_dict:
        # Extract and clean about texts
        abouts = [p for p in self.json_dict[publisher].get("txt_abouts", []) if p.strip()]

        # Extract and clean about URLs
        urls_about = [p.strip() for p in self.json_dict[publisher].get("urls_about", []) if p.strip()]

        # Get full names
        names_full = self.json_dict[publisher].get("names_full", [])

        # Get homepage URLs
        urls_homepage = self.json_dict[publisher].get("urls_homepage", [])

        # Extract and clean conference/journal URLs
        urls_cj = [url.strip() for url in self.json_dict[publisher].get(f"urls_{self.cj}", []) if url.strip()]

        # Create publisher URL with markdown formatting if homepage exists
        publisher_url = f"[{publisher}]({urls_homepage[0]})" if urls_homepage else publisher

        # Create full name URL with markdown formatting if available
        if names_full:
            full_url = f"[{names_full[0]}]({urls_homepage[0]})" if urls_homepage else names_full[0]
        else:
            full_url = publisher

        # Extract and clean remarks
        remarks = [p for p in self.json_dict[publisher].get("txt_remarks", []) if p.strip()]

        # Update publisher metadata
        publisher_meta_dict.setdefault(publisher, {}).update(
            {
                "full_name_url": full_url,
                "txt_abouts": abouts,
                "txt_remarks": remarks,
                "urls_about": urls_about,
                "url_conferences_or_journals": f"[{self.cj.title()}]({urls_cj[0]})" if urls_cj else "",
            }
        )

        # Process each abbreviation (conference/journal)
        for abbr in self.json_dict[publisher][self.cj]:
            abbr_dict = self.json_dict[publisher][self.cj][abbr]

            # Get conference/journal info and keywords
            temp_dict, keywords = self.conference_or_journal(publisher_url, abbr, abbr_dict)

            # Generate mermaid diagram data
            mermaid = self.generate_mermaid_data(publisher, abbr, self.ia)

            publisher_abbr_meta_dict.setdefault(publisher, {}).setdefault(abbr, {}).update(temp_dict)
            publisher_abbr_meta_dict.setdefault(publisher, {}).setdefault(abbr, {}).update({"statistics": mermaid})

            # Index by keywords for quick lookup
            for keyword in keywords:
                keyword_abbr_meta_dict.setdefault(keyword, {}).setdefault(abbr, {}).update(temp_dict)
                keyword_abbr_meta_dict.setdefault(keyword, {}).setdefault(abbr, {}).update({"statistics": mermaid})

    return publisher_meta_dict, publisher_abbr_meta_dict, keyword_abbr_meta_dict

generate_mermaid_data ¶

generate_mermaid_data(
    publisher, abbr, inproceedings_or_article
)

Generate Mermaid diagram data from spidered README files.

This method reads spidered data from README files and generates Mermaid chart configuration for visualizing publication statistics.

Parameters:

Name	Type	Description	Default
`publisher`	`str`	Publisher name.	required
`abbr`	`str`	Publication abbreviation.	required
`inproceedings_or_article`	`str`	Publication type.	required

Returns:

Type	Description
`list[str]`	list[str]: Mermaid chart configuration lines, or empty list if no data found.

Source code in pyformatjson/tools/generate_dict.py

def generate_mermaid_data(self, publisher: str, abbr: str, inproceedings_or_article: str) -> list[str]:
    """Generate Mermaid diagram data from spidered README files.

    This method reads spidered data from README files and generates
    Mermaid chart configuration for visualizing publication statistics.

    Args:
        publisher (str): Publisher name.
        abbr (str): Publication abbreviation.
        inproceedings_or_article (str): Publication type.

    Returns:
        list[str]: Mermaid chart configuration lines, or empty list if no data found.
    """
    path_spidered_cj = self.path_spidered_cj if self.path_spidered_cj else ""
    path_readme = os.path.join(path_spidered_cj, publisher, abbr, inproceedings_or_article)
    full_readme = os.path.expanduser(os.path.join(path_readme, "README.md"))
    if not os.path.exists(full_readme):
        return []

    mermaid, data_dict = [], {}
    # |AAAI|1980|95|Proceedings of the First National Conference on Artificial Intelligence|
    regex = re.compile(r"\|.*\|([0-9]+)\|([0-9]+)\|.*\|")
    with open(full_readme, encoding="utf-8", newline="\n") as file:
        data_list = file.readlines()
    for line in data_list:
        if mch := regex.search(line):
            data_dict.setdefault(mch.group(1), []).append(mch.group(2))
    data_dict = {year: sum([int(n) for n in data_dict[year]]) for year in data_dict}

    # Mermaid
    if len(data_dict) != 0:
        mermaid = ["```mermaid\n"]
        mermaid.extend(
            [
                "---\n",
                "config:\n",
                "    xyChart:\n",
                "        width: 1200\n",
                "        height: 600\n",
                "    themeVariables:\n",
                "        xyChart:\n",
                '            titleColor: "#ff0000"\n',
                "---\n",
            ]
        )
        mermaid.extend(["xychart-beta\n", f'    title "{abbr}"\n'])

        x_axis, bar, line = [], [], []
        for year in data_dict:
            x_axis.append(int(year))
            bar.append(data_dict[year])
            line.append(data_dict[year])

        idx = next((i for i, year in enumerate(x_axis) if year >= 2000), len(x_axis))
        x_axis, bar, line = x_axis[idx:], bar[idx:], line[idx:]

        mermaid.append(f"    x-axis {x_axis}\n")
        mermaid.append('    y-axis "Number of Papers"\n')
        mermaid.append(f"    bar {bar}\n")
        mermaid.append(f"    line {line}\n")
        mermaid.append("```\n")

    return mermaid

Functions¶

conference_journal_header ¶

conference_journal_header()

Generate markdown table headers for conferences and journals.

This function creates the appropriate markdown table headers for displaying conference and journal information in tabular format.

Returns:

Name	Type	Description
`tuple`	`tuple[list[str], list[str]]`	A tuple containing two lists: - conference_header: Markdown table headers for conferences - journal_header: Markdown table headers for journals

Example

conf_header, journal_header = conference_journal_header() print(conf_header[0]) |Publishers|Full/Homepage|Abbr/About|Acronym/Archive|Period/DBLP|...

Source code in pyformatjson/tools/generate_dict.py

def conference_journal_header() -> tuple[list[str], list[str]]:
    """Generate markdown table headers for conferences and journals.

    This function creates the appropriate markdown table headers for displaying
    conference and journal information in tabular format.

    Returns:
        tuple: A tuple containing two lists:
            - conference_header: Markdown table headers for conferences
            - journal_header: Markdown table headers for journals

    Example:
        >>> conf_header, journal_header = conference_journal_header()
        >>> print(conf_header[0])
        |Publishers|Full/Homepage|Abbr/About|Acronym/Archive|Period/DBLP|...
    """
    o = "|Publishers|Full/Homepage|Abbr/About|"
    t = "|-         |-            |-         |"
    conference_header = [
        f"{o}Acronym/Archive|Period/DBLP|Top|CCF|Submission|Days Left|Main Conf.|Days Left|Location|Keywords/Google|\n",
        f"{t}-              |-          |-  |-  |-         |-        |          |-        |-       |-              |\n",
    ]
    journal_header = [
        f"{o}Acronym/Issues|Period/DBLP|Top/Early|CCF|CAS|JCR|IF|Keywords/Google|\n",
        f"{t}-             |-          |-        |-  |-  |-  |- |-              |\n",
    ]
    return conference_header, journal_header