Revolutionising business data management with Nearform's Generative AI: Introducing the pgRAG AI accelerator

Revolutionising business data management with Nearform's Generative AI: Introducing the pgRAG AI accelerator

·

5 min read

By Cristian Barlutiu

Discover how Nearform's Gen AI innovations are revolutionising business data management and automation

At Nearform, we engage with the latest technological trends and innovations. A key focus for us is on Generative AI (GenAI), a technology that's transforming how businesses create and handle information. We're actively exploring its potential, particularly in using machine learning to create new content, such as text, images and even code.

This technology is already revolutionising how businesses handle information, and we've been at the forefront of its application. We're constantly exploring new ways to leverage it to empower our clients and unlock the hidden potential within their data.

Generative AI utilises advanced algorithms to create new content, translating to a wealth of benefits for companies. Imagine instantly summarising complex documents, extracting key insights from vast datasets, or generating creative content tailored to specific needs. This opens doors to improvements in efficiency, data-driven decision-making and a whole new level of automation.

Our solution: The pgRAG AI accelerator library

With pgRAG, Nearform is pushing the boundaries of Generative AI, solving common challenges faced by developers and streamlining AI implementation. We've seen GenAI’s transformative power firsthand across numerous client projects. However, we noticed recurring challenges – developers often needed to reinvent the wheel for common GenAI use cases. This sparked the creation of pgRAG, our in-house AI accelerator library.

How does pgRAG work?

The pgRag data flow chart, all the date is stored using a Postgres database

The pgRAG library excels in text analysis and summarisation, supporting various document formats like PDFs, Word documents and PowerPoint presentations. It employs cutting-edge natural language processing techniques to perform semantic searches and extract critical information seamlessly. It leverages advanced natural language processing techniques to perform semantic searches and extract key information from PDFs, Word documents, PowerPoint presentations and more.

An example of how pgRAG leverages advanced natural language processing techniques to perform semantic searches and extract key information from

One of the core features is the ability to generate concise summaries that capture the essence of the document's content. To ensure accuracy, the library employs carefully designed prompts during the summarisation process, mitigating the risk of hallucinations or factual inconsistencies that can sometimes occur with Generative AI models.

What sets pgRAG apart is its capability to analyse and interpret visual elements within documents, such as charts, graphs and images. The AI can extract insights from these visual representations and incorporate them into the overall document summary, providing a more comprehensive understanding of the content.

For instance, when processing a complex financial report, pgRAG can analyse both the textual data and accompanying visualisations like charts and graphs. It then generates a concise summary that captures the key points from the text and insights from the visual elements, presenting decision-makers with a holistic overview of the report.

JavaScript Copy to clipboard

// configuration
const model = new OpenAI{...});
const embeddings = new OpenAIEmbeddings({...});
const dbPool = new pg.Pool({...});

// read the document file
const myDocument = fs.readFileSync('research_paper.pdf');

// initialize pgRag
const pgRagInstance = await PgRag.init({
  embeddings,
  model,
  dbPool
});

// process the document 
const jobId = await pgRagInstance.processDocument({
  data: myDocument, 
  name: 'research_paper.pdf'
});

// document searching
const response = await pgRagInstance.rag({
  prompt: 'Tell me about molecular structure of glucose'
});
console.log('Search response', response);

/* Response example
{
    "response": "C6H12O6 is the molecular formula for glucose. Glucose is a monosaccharide, or simple sugar, that is made from 6 carbons atoms, 12 hydrogen atoms, and 6 oxygen atoms.",
    "documents": [
        {
          "name": "research_paper.pdf",
          "raw_content": "...",
          "content": "...",
          "summary": "...",
          "metadata": "{...}"
        },
        {
          "name": "chemistry_101.pdf",
          "raw_content": "...",
          "content": "...",
          "summary": "...",
          "metadata": "{...}"
        }
    ]    
}
*/

Data privacy and security

At Nearform, data privacy and security are paramount. We implement robust measures to protect sensitive information throughout the entire processing pipeline. To ensure the confidentiality and integrity of data processed by the library, Nearform employs several robust measures:

  1. Data encryption: All data, including documents and their summaries, are encrypted both in transit and at rest. This prevents unauthorised access and ensures that data is secure throughout the processing pipeline.

  2. Anonymisation techniques: When processing documents, any personally identifiable information (PII) is anonymised to protect individual privacy.

  3. Regular audits: The library undergoes regular security audits to identify and mitigate potential vulnerabilities, ensuring continuous improvement in data protection practices.

  4. Prompt design: As mentioned, the library uses carefully crafted prompts to prevent hallucinations, which also plays a role in avoiding the generation of sensitive information that isn't present in the source documents.

By integrating these practices, Nearform demonstrates its commitment to safeguarding user data, ensuring that the pgRAG library is not only powerful and efficient but also trustworthy and secure.

Conclusion

By harnessing the transformative power of Generative AI, Nearform is not just empowering businesses – we're revolutionising the way they utilise their data. Our AI accelerator library, pgRAG, simplifies the integration process, making this cutting-edge technology accessible to a wider audience. Leveraging state-of-the-art natural language processing (NLP) techniques and large language models (LLMs), pgRAG offers powerful text analysis, summarisation, and generation capabilities across various document formats.

Currently, pgRAG is an internal tool within Nearform, but we are actively exploring the possibility of releasing it as an open-source project in the future. This would allow the broader developer community to contribute to its development and leverage its capabilities for their own applications.

At Nearform, we are committed to the responsible and ethical use of AI technologies, ensuring that our solutions prioritise data privacy, security and accuracy.

We're excited to see how GenAI technology continues to evolve and transform the way we work. Our team is actively researching and developing new techniques to improve the accuracy, efficiency and scalability of our AI accelerator library.

Stay tuned for more updates from the Nearform team as we continue to explore and push the boundaries of AI technology. Together, we're shaping the future of business innovation.